Search is slower in ElasticSearch 2.0.0 than 1.7.3 when the document count is high ( > 2.5 million)

Hi,

While testing out the search performance, it was found that the search is considerably faster in 1.7.3 than in 2.0.0 when the document count in a single shard index is > 2.5 million.

There is a 200-300 ms difference in the search times between the versions on quite a few search requests.

I also notice that the query cache is being used inspite of the query NOT being executed in a filter context.

Here is the dump of the query cache that I see:

"query_cache" : {
"memory_size_in_bytes" : 145528,
"total_count" : 212888,
"hit_count" : 89651,
"miss_count" : 123237,
"cache_size" : 22,
"cache_count" : 22,
"evictions" : 0
}

Here is the sample query that is sent:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"bool": {}
},
{
"term": {
"filename": {
"value": "file_363121",
"boost": 5
}
}
},
{
"term": {
"filename": {
"value": "file_408663",
"boost": 5
}
}
},
{
"term": {
"filename": {
"value": "file_384012",
"boost": 5
}
}
},
{
"term": {
"filename": {
"value": "file_279082",
"boost": 5
}
}
},
{
"term": {
"filename": {
"value": "file_103020",
"boost": 5
}
}
},
{
"term": {
"filename": {
"value": "file_279445",
"boost": 5
}
}
},
{
"term": {
"filename": {
"value": "file_453014",
"boost": 5
}
}
},
{
"term": {
"filename": {
"value": "file_221093",
"boost": 5
}
}
},
{
"term": {
"filename": {
"value": "file_488775",
"boost": 5
}
}
}
]
}
}
]
}
}
}

Any help on tweaks to make search faster would be much appreciated.

Thanks,
Vignesh

Another observation -

As new files are being actively indexed, average search response time slows down by almost a second. The searches then take ~ 2 seconds to return results.

Here are some additional details

1.7.3

Average Search time without active indexing : ~ 300 - 500 ms
Average Search time with active indexing : ~ 600 - 800 ms

2.0.0

Average Search time without active indexing: ~ 700- 800 ms
Average Search time with active indexing: ~ 1.5 sec - 2 sec

Questions:

  1. Why is the query cache being used inspite of the query not being in the filter context?
  2. Why does the search time shoot up in case of ElasticSearch 2.0.0 as compared to 1.7.3 when active indexing of files is involved?

Any suggestions on these questions will be much appreciated.

Thanks,
Vignesh

There is nothing to cache in this query indeed. Some questions:

  • do you search on an index directly or on a type or a filtered alias?
  • are you including everything that is your sample request? (including sort, aggs, etc.)
  • did you run exactly the same query in 1.7 and did it return exactly the same number of matches?

Query Slows logs download link : https://drive.google.com/file/d/0B8P3p4ZGj_MpY0ZRZGllLUp1RmM/view?usp=sharing

  • do you search on an index directly or on a type or a filtered alias?
    Vignesh : The search was being executed directly on an index. There were no filtered alias calls involved.

  • are you including everything that is your sample request? (including sort, aggs, etc.)
    Vignesh: Yes, this includes everything that is there in the sample request. I have attached the slow logs of 1.7.3 as well as 2.0.0 . Just to give you a better picture and to ensure I am not missing anything

  • did you run exactly the same query in 1.7 and did it return exactly the same number of matches?
    Vignesh: Unfortunately I can't quantify this. However, I can definitely say that the structure of the queries that are executed in both the cases are exactly the same. The document structure is also exactly the same. The document generation and search was done by exactly the same set of tools on both the versions with no changes. There may be minor differences in query terms used for search.

Are there any leads on this? Any suggestions would be much appreciated.

Thanks,
Vignesh

I just upgraded and see similar issues. Was there a resolution