Slow indexing speed in 3 nodes cluster

waster · February 28, 2019, 12:53pm

Hello,

We had a problem with slow indexing speed with ES cluster of 3 nodes (5.6.14) with the following roles:

70G RAM, SATA HDD, "mi" role
48G RAM, SSD HDD, "mdi" role
8G RAM, SATA HDD, "mi" role

All ES nodes have heap size in a half of available RAM and indices.memory.index_buffer_size: 60% and are tuned for maximum performance according to the recommendations in the ES documentation.

Also we have hourly dynamic template for indices for logging pretty heavy nginx logs with the following setting and store only last hour index:

"settings": {
      "index": {
        "number_of_shards": "2",
        "number_of_replicas": "0",
        "refresh_interval": "5s" 
      }
    }

Nginx is installed and collecting logs on the same node1 and node2. Bulk requests (5-10MB) are sending with rsyslog-omelasticsearch output module installed on the node1 and node2 through the node2. At the moments of the high traffic nginx requests indexing can't go over 16-17k documents per second so sending rsyslog queue is growing and in stuck. We noticed that LA on the data node2 is quite big (20-22 with 11 cores) at this high traffic moments and is caused by ES java process. Also we tried to redistribute bulk requests through node1 and note2 but it did not help.

Can you please help to investigate bottlenecks and speed up indexing?

DavidTurner · February 28, 2019, 1:08pm

Indexing involves doing a certain amount of work (analysing each document etc) and that work happens on data nodes. You only have one data node, so it's doing all the work. Redistributing the bulk requests across the two nodes doesn't really help because they just get rerouted back to the data node for indexing.

Can you add more data nodes to spread the load?

waster · February 28, 2019, 1:29pm

Yes, we tried to add data role to the node1(SATA) but in this case iowait increases greatly (~30%) on the node1 and nginx stops to work well with upstreams at high traffic peaks. Do I understand correctly that only a separate data node with SSD will help here? As I noticed, ES cluster indexing speed is defined by slower node?

DavidTurner · February 28, 2019, 1:38pm

Yes, it'd probably help to use SSDs on all your data nodes. Involving a spinning disk will probably slow things down.

One other thing to look out for is that by default Elasticsearch sets number_of_replicas: 1 on indices. With one data node you will have no replicas, but if you add a data node then Elasticsearch will add replicas of every shard on that second node, and this means both nodes will be trying to index every document since they both have a copy of every shard, so it won't solve your problem.

In the short term, this means you should check that number_of_replicas: 0 on every index, and in your index templates. Running without replicas isn't a great idea, however, and it might be a good idea to expand your cluster some more to support replicas.

waster · February 28, 2019, 1:53pm

In the dynamic template we already have number_of_replicas: 0 so adding new data node should help us. But in the end it will be helpful to have at least number_of_replicas: 1. How to speed up indexing in that case? More data nodes? What about disabling some index properties, for example disabling not_analyzed fields? In our application we are using near realtime search with big aggregations, first search on timestamp and request regexp, then twice aggregate it on two fields.

UPD: Most of the fields are keywords, so it seems that disabling not_analyzed fields will not help.

DavidTurner · February 28, 2019, 3:07pm

Scaling out sounds like it'll reliably improve performance here so if you're looking for a quick fix I'd try that. However your ingest rate of 16k HTTP logs per sec is quite a bit lower than what we see in internal benchmarks so perhaps there are savings to be had with configuration changes such as a more streamlined mapping.

I don't have any concrete suggestions, I'm afraid, apart from doing some experiments to validate your ideas.

waster · February 28, 2019, 4:34pm

Very interesting and strange. We have really lower results. Also we have some short time intervals where 20k http logs were ingested , but these are rare cases. Here is the template for some customized nginx logs we are using:

PUT _template/nginx_log
{
   "order": 10,
    "template": "httplog*",
    "settings": {
      "index": {
        "number_of_shards": "2",
        "number_of_replicas": "0",
        "refresh_interval": "5s" 
      }
    },
    "mappings": {
      "nginx": {
        "properties": {
          "stationid": {
            "type": "keyword" 
          },
          "request": {
            "type": "keyword" 
          },
          "cookie": {
            "type": "keyword" 
          },
          "response": {
            "type": "short" 
          },
          "bytes": {
            "type": "long" 
          },
          "clientip": {
            "type": "keyword" 
          },
          "verb": {
            "type": "keyword" 
          },
          "timestamp": {
            "format": "strict_date_optional_time||epoch_millis||dd/MMM/YYYY:HH:mm:ss Z",
            "type": "date" 
          }
        }
      },
      "_default_": {
        "dynamic_templates": [
          {
            "string_fields": {
              "mapping": {
                "norms": false,
                "type": "text",
                "fields": {
                  "raw": {
                    "ignore_above": 256,
                    "index": true,
                    "type": "keyword",
                    "doc_values": true
                  }
                }
              },
              "match_mapping_type": "string",
              "match": "*" 
            }
          },
          {
            "other_fields": {
              "mapping": {
                "doc_values": true
              },
              "match_mapping_type": "*",
              "match": "*" 
            }
          }
        ],
        "_all": {
          "norms": true,
          "enabled": true
        },
        "properties": {
          "timestamp": {
            "format": "strict_date_optional_time||epoch_millis||dd/MMM/YYYY:HH:mm:ss Z",
            "type": "date",
            "doc_values": true
          }
        }
      }
    },
    "aliases": {}
}

Can be rsyslog with mmnormalize that we use to parse logs and melasticsearch to send to the ES cluster be bottleneck? What do you mean by "more streamlined mapping"? Is it reasonable to lower heap size on data nodes from a half of available RAM to 8-16GB to save the load?

waster · February 28, 2019, 10:25pm

Next step: enabled data role on node1, so two data nodes are configured now with number_of_replicas: 0 then run esrally from nearly located 10G server with SATA disks to benchmark ES cluster with tracks=http_logs and pipeline=benchmark-only and other track parameters as defaults when traffic and load are quite low. Here are the results:

https://pastebin.com/QwWNeSsn

Strange, but indexing speed is still much slower than in default elasticsearch-benchmarks too.

DavidTurner · March 1, 2019, 7:23am

The internal benchmarks run on fairly beefy machines with rather speedy storage and network in a very carefully controlled environment so you might not be able to reproduce their results. Also they're running the latest master rather than 5.6.14. You managed to achieve 50k+ docs per second which sounds sufficient for your needs?

Your mapping seems to have _all enabled, with norms: true, and is perhaps also dynamically adding text fields (and therefore analysing them) on the fly. It'd be interesting to see the mapping from one of your indices to see what's really being used for indexing.

waster · March 1, 2019, 8:32am

Yes, I see, will try to plan upgrade to ES 6 too. Adding data node role with number_of_replicas: 0 seemed to help. Here is the example of current mapping one of the latest indices:

GET httplog-2019.03.01.08/_mapping

{
  "httplog-2019.03.01.08": {
    "mappings": {
      "_default_": {
        "_all": {
          "enabled": true
        },
        "dynamic_templates": [
          {
            "string_fields": {
              "match": "*",
              "match_mapping_type": "string",
              "mapping": {
                "fields": {
                  "raw": {
                    "ignore_above": 256,
                    "index": true,
                    "type": "keyword",
                    "doc_values": true
                  }
                },
                "norms": false,
                "type": "text"
              }
            }
          },
          {
            "other_fields": {
              "match": "*",
              "mapping": {
                "doc_values": true
              }
            }
          }
        ],
        "properties": {
          "timestamp": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis||dd/MMM/YYYY:HH:mm:ss Z"
          }
        }
      },
      "nginx": {
        "_all": {
          "enabled": true
        },
        "dynamic_templates": [
          {
            "string_fields": {
              "match": "*",
              "match_mapping_type": "string",
              "mapping": {
                "fields": {
                  "raw": {
                    "ignore_above": 256,
                    "index": true,
                    "type": "keyword",
                    "doc_values": true
                  }
                },
                "norms": false,
                "type": "text"
              }
            }
          },
          {
            "other_fields": {
              "match": "*",
              "mapping": {
                "doc_values": true
              }
            }
          }
        ],
        "properties": {
          "agent": {
            "type": "text",
            "norms": false,
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "auth": {
            "type": "text",
            "norms": false,
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "blob": {
            "type": "text",
            "norms": false,
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "bytes": {
            "type": "long"
          },
          "clientip": {
            "type": "keyword"
          },
          "cookie": {
            "type": "keyword"
          },
          "httpversion": {
            "type": "text",
            "norms": false,
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "ident": {
            "type": "text",
            "norms": false,
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "metadata": {
            "properties": {
              "filename": {
                "type": "text",
                "norms": false,
                "fields": {
                  "raw": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "fileoffset": {
                "type": "text",
                "norms": false,
                "fields": {
                  "raw": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "referrer": {
            "type": "text",
            "norms": false,
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "request": {
            "type": "keyword"
          },
          "response": {
            "type": "short"
          },
          "stationid": {
            "type": "keyword"
          },
          "timestamp": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis||dd/MMM/YYYY:HH:mm:ss Z"
          },
          "verb": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

DavidTurner · March 1, 2019, 9:34am

Thanks, this indicates that the following fields have been added dynamically as both text and keyword fields:

agent
auth
blob
httpversion
ident
metadata.filename
metadata.fileoffset
referrer

Do you need full-text search on all of these fields? If not, it might make sense to map them manually, or to set "dynamic": false, so that Elasticsearch does not index any fields it doesn't recognise.

Do you also need the _all field? If not, it might make sense to disable that.

waster · March 1, 2019, 10:37am

Oh, thanks for the suggestion! Sure we don't need full-text search on all of these fields. Will add "dynamic": false and disable _all field both for the nginx` type, and will post test results again.

system · March 29, 2019, 10:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Debugging extremely slow indexing Elasticsearch	39	7105	February 16, 2021
Slow Indexing speed / Bottleneck Elasticsearch	6	765	September 16, 2020
Tune for indexing speed Elasticsearch	11	820	January 2, 2023
Slow Indexing Speed Elasticsearch	5	7224	July 6, 2017
Elasticsearch Indexing Rate Elasticsearch	9	3461	July 5, 2017

Slow indexing speed in 3 nodes cluster

Related topics