Very poor performance on relatively small index

jogaco · July 24, 2016, 10:40am

My index is not very large: 1 main index, 3M documents, 1 node, 1 shard, 0 replicas, 5GB disk size. Disk is spinning.
Documents have indexed fields: about 12 integer fields, 3 short text fields, 4 date fields, plus some more unindexed fields.

My use pattern is like this: every day about 10k-15k documents are added to the main index in a background job which lasts for about 4 hours. All queries are run against this index, 24h.

I am having many slow queries: about 30% of the queries are above 800ms and 7% above 1000ms.

The queries have filters on several integer fields, and aggregations to count documents on several integer fields as well, with occasional text search in one short text field of about 80-120 chars. The query has 3 nested aggregations, with facets on three integer fields.

I have set up all the recommended settings for production and spinning disks, and the refresh interval to 30s, as it is not critical to have new documents immediately available for search.

What I do see is that the number of segments is quite high.
"segments" : { "count" : 27, "memory_in_bytes" : 5027011, "terms_memory_in_bytes" : 3123095, "stored_fields_memory_in_bytes" : 1815368, "term_vectors_memory_in_bytes" : 0, "norms_memory_in_bytes" : 14528, "doc_values_memory_in_bytes" : 74020, "index_writer_memory_in_bytes" : 541878, "index_writer_max_memory_in_bytes" : 124688793, "version_map_memory_in_bytes" : 2086, "fixed_bit_set_memory_in_bytes" : 0 },

The machine has 16GB memory of which 7 have been assigned to ES heap . CPU load is very very low: less than 10%.
Disk I/O does not seem to be a problem: iostat reports tipically under 10% util with occacional peaks of 20%.

ES version is 2.3.3.
OS is Ubuntu 14.04.

Any tips? How can I detect if there's a problem?

warkolm · July 24, 2016, 10:45am

Not really relevant, if you had lots of shards then it may.

That version doesn't exist, can you check again?

jprante · July 24, 2016, 11:07am

Can you please show your query?

Also, the field mapping configuration would be interesting.

jogaco · July 24, 2016, 2:26pm

Here it goes.

Query:

gist.github.com

https://gist.github.com/jogaco/481de8d13660ee2dea244b39c702e615

gistfile1.txt

{
    "query": {
        "match_all": {}
    },
    "size": 20,
    "sort": [
        {
            "date_end": {
                "order": "desc"
            }

This file has been truncated. show original

Index mapping:

gist.github.com

https://gist.github.com/jogaco/12a342eef24d6b270aeaf60eddee8e5a

gistfile1.txt

{
    "item-8": {
        "mappings": {
            "item": {
                "properties": {
                    "artist": {
                        "type": "string",
                        "analyzer": "synonym"
                    },
                    "artist_verified": {

This file has been truncated. show original

Thanks for responding!

jogaco · July 24, 2016, 2:31pm

Fixed: 2.3.3

jprante · July 24, 2016, 4:39pm

This is an example of a very bad query.

Try to reconsider your data structure and your query to the following principles:

avoid missing filters like hell. They are very slow. Most use cases can be changed, prefer to index special filler terms and filter for them.
Order the filters in the and clause to make them as efficient as they can be. The first filter should filter out the highest number of docs compared to the following filters and so on. missing filter are much slower than term filters. Also, understand filter optimization: think about caching filters instead of forcing ES to compute them again and again in each query.
Depending on the field cardinality, doc values might be a good solution for aggregation in your case. You could try something like this on fields to be aggregated

"aFieldToAggregate" : {
  "type": "long",
  "store":"yes",
  "index": "no",
  "doc_values": true
},

Also, the field type byte is a bit slower than long, since bytes have to be converted internally.

of course, a three level aggregation is slow.
and at last: Do not sort. Sorting is slow as hell. Use relevance scoring wherever you can.

jogaco · July 24, 2016, 6:41pm

Thanks for the tips.
I'm going to reindex with filler terms to replace the missing filters and will apply the ordering of filters as you suggest.
Will see what can I do for the rest.

jogaco · July 26, 2016, 7:02pm

I have been able to remove missing filters.
However I cannot see a real improvement.
Removing sorting is not an option in my case.
Tried to order filters so first one is the most selective but did not see any improvement either.

Is is possible for a given query to see in what is the time spent?
I would like to know if it will be worth it the effort in rewriting queries for aggregation caching.

jprante · July 26, 2016, 8:46pm

Yes, try the profiler API Profile API | Elasticsearch Guide [8.11] | Elastic

jogaco · July 28, 2016, 6:54pm

Aggregation was the culprit of the low performance.
I was even thinking about adding two more aggs but it wasn't an option with such low performance.

I have ended splitting queries in two: one for docs, one for aggregations and no docs. The latter one can be cached.
This together with custom app caching for most costly and frequent aggs has improved performance significantly: only 10% of queries take above 0.6s now.

Thanks!

Luke_Nezda · September 15, 2016, 5:31pm

@jogaco you might like Hits + query_cache=true + aggs in 1 round trip: _msearch?

Topic		Replies	Views
Slow query performance Elasticsearch	2	293	July 6, 2017
Query Performance Elasticsearch	10	1904	July 6, 2017
General tips on performance tuning Elasticsearch	5	383	July 6, 2017
Question: How to gauge/improve performance Elasticsearch	17	1039	July 6, 2017
How to achieve Query Performance Elasticsearch	11	725	July 6, 2017

Very poor performance on relatively small index

Related topics