Searching performance dramatically decreases (+15s) if the indexation process is enabled

Hello,

We have an Elasticsearch cluster of 7 servers in Amazon, two clients, three master nodes (with data), and two data.

All master and data nodes are m4.2xlarge with 32GB of RAM 8 cores and SSD. The 2 clients are m4.xlarge with 4 cores and 16Gb of RAM. The Elasticsearch service has 15GB of memory (half of the maximum memory) with no swap.

We have three indexes, one of the most important have 100GB of data and every index have 5 shards distributed in the 5 datas (masters included) and everything replicated correctly with all the nodes.

Our frontend is using the cluster pointing to one client, and we are getting searches around 2-3 seconds that it's acceptable for us. The client seems to distribute our queries correctly, we are using the paramedic plugin to see the balancing https://github.com/karmi/elasticsearch-paramedic

The problem:

The problem begins when we start the indexation process (we index using a bulk request of 2000 documents), then we are getting searches around 17 seconds. At the same time every cluster node have 0.5 of maximum load average.

What we tried:

  • Increasing index interval actually to 30s.
  • Checking thread pools, but on the status of the threads _nodes/stats/thread_pool?pretty I can't see anything problematic.

Anyone has any ideas of what could happen, or what we can check/change to address this issue?
Is there any property that could limit the performance of searches when we are doing indexation at the same time?
Is our current cluster properly configured or well defined?

Thanks in advance

As both indexing and querying compete for the same resources, primarily CPU and disk IO, you may need to throttle your indexing so that it has less impact on querying. Have you tried reducing the bulk size and/or reducing the number of indexing threads to see what impact that has?

We tried reducing the indexing threads and we still have the same problem.

We've seen other people complaining for the same problem and one suggestion seems to change the client to use the transport layer (https://www.elastic.co/guide/en/elasticsearch/guide/current/_transport_client_versus_node_client.html)

However, if we use just 1 thread for the indexation process, does that really make any sense to use the transport instead of node client?

Thanks

What ES version?

Do you write documents into the index you just search?

To get optimal performance, use this workload pattern:

  1. Create new index
  2. Bulk index documents
  3. Refresh/optimize
  4. Search
  5. Create another index
  6. Bulk index into the other index
  7. Referesh/optimize the other index
  8. Switch from old to new index or set index alias
  9. Search on new (or on both) indices
  10. Remove unused indices
1 Like

It is quite likely that the increased latencies you are experiencing when indexing are due to resource contention in the cluster, so I do not think it will matter which client you use. How many replicas do you have configured for the indices? How many queries per second is the cluster serving?

Hi

@jprante We currently use Elastic version 1.7.2 and we write on the same index we search. We do incremental indexation because the index is big (more than 100Gb) and we have a lot of updates. We cannot reindex everything on a new index and then swap them.

What do the people do with big indexes with continuous updates?
We haven't executed an optimize on this index, but I don't think this could be the origin of the problem.

@christian_dahlqvist We have 4 replicas for the index. The cluster is on a testing environment, so right now there are a few queries.

We have 2 client nodes, we tried to index using one client node and search using the other, but we had the same results so far.

Thanks

You do not need to reindex everything. Just create a new index for incremental data and add to existing index alias.

Your issue is that indexing has a lot of moving parts (segments) that invalidate your searches, especially when using filters and aggregations. It is expected to kill performance when indexing into an index that is being searched on.

For optimal search performance, you need a compact number of segments. When massive bulk indexing is over, the number of segments may be a bit too high.

BTW you might find this post on improving Elasticsearch indexing performance by Alan Woodward at Flax enlightening

agreed with what @jprante said! Indexing creates new segments (and invokes segment merges creating further new segments)... this definitely impacts cache and hence queries.

If you have a lot of data and properly partitioned, then you can try balancing the indexing requests so that not one shard/index is loaded at any point of time.

@softwaredoug Thanks, we'll check the link, it could help us to index faster.

@jprante Then, if I understood you right, you suggest to index the new documents into another index and use both indexes for searching through an alias.

At some point the new index with the incremental indexation will be big, then the standard solution is to index into a third index? Later on into a fourth? Is that what you suggest?

Our incremental data not only have new documents but updates to existent documents. It means that with your approach we'll need to perform a delete query first into the first index before indexing the document to the second one.

what is a point to have so many replicas if you have only 5 datas?

@Igor_Berman We wanted to increase the search performance. Before starting the indexation process, the cluster was able to handle a big volume of queries, not after the indexation process was started.

Are you suggesting that having that amount of replicas will decrease search performance during indexation? We can try to decrease the number of replicas and see how the cluster perform.

Avoid "delete by query" at all cost. This is a very expensive operation and will also affect your searches in a bad way. Organize your data so you can keep them in separate incremental steps, or introduce special filter tags, so you can filter out old documents at search time.

Reindexing everything is better and simpler in the vast majority of cases where the overall data volume is limited.

1 Like

Albert, Im not sure but replication isnt for free especially when you
indexing. And I have 2 thoughts a)Es should copy each piece of data from
one node to 4 in your case which takes resources...usually when
rebuilding(which is not your case but still) the advice is to turn
replication off and turn in on at the end b) if you have 5 data nodes then
you make almost every node to hold full index volume, so imho canceled
out partitioning of data(yes it probably improves search and you have fault
tolerance)

Have you tried reducing the number of replicas in order to see how this affects the cluster?

Not yet. We'll try to see how it impacts the overall performance.

Thanks