Elasticsearch performance enhancements on heavy index

andrea_patricelli · May 4, 2016, 12:42pm

I'll try to generalize my special case in a few words.
I've an elasticsearch index "sample" with two types: type1, type2.
I have index size of about 1000000 (one million) entries, and parent-child relations between entities of type type1 and type2, there are zero or more type2 entities for one type1 entity.
I'm doing an anlysis of all my index to create groups of similar entities, and I use type1 and type2 fields to make searches. For example: if type1_instance.name = type2_instance.name or type1_instance.name = type1_instance.name and so forth on other fields I have a match. There are cross types searches.

Query are like this:

"bool" : {
"should" : [
{
"bool" : {
/* ... /,
"_name" : "query one"
}
},
{
"bool" : {
/ ... /,
"_name" : "query two"
}
},
{
"bool" : {
/ ... */,
"_name" : "query three"
}
}
]
}

My problem is:
If my index has size 1k (1000 entities) my search takes 1 unit of time, if my index is of 10k my search takes 7 units of time, if 50k takes 53 units of time, BUT if idex is about 500k it takes 4000 units of time! and performances decay with 1000000 of entities on my index.
Scaling up my machine (more memory, more cpus) has been useful, but could not be the only solution, I already have a 12 core machine and 16GB ram (8gb only for ES).
The problem is that, at a certain size the time needed for the search increases exponentially and not linearly.

And my question:
Are there some general best practises to avoid such awful performance decay?
Is normal an exponential decay of the search (even single search) performances with the growth of the index?
Are parent-child, in some way, deleterious for ES index?

Environment:
Linux machine, ES 1.7.4

Many thanks and sorry if I've not been clear.

Bruce_Ritchie · May 4, 2016, 2:01pm

There are a few things that could be impacting the search times but off the top of my head i would say that what you are seeing may be a result of your index exceeding the available disk cache. 8gb for OS, other processes and disk cache may not be enough and thus you may be hitting the HDD's more than you think for the large index scenario. Looking at iostat for the disk(s) in question may be helpful.

That being said, it could be something else as well. In my use case I don't really have parent/child in my index so I can't say if that is a factor or not. You may be able rule it in or out just by querying a single type at a time though and seeing if the times are drastically different.

This document may be helpful - https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-performance.html

andrea_patricelli · May 4, 2016, 2:51pm

Thanks a lot Bruce.
I've already seen that guide about parent-child performance and let me some doubts about memory usage, so I pointed that out.
About disk cache, maybe you are right, I'll check about disk usage and OS remaining memory!
BTW we use an SSD, does it make some difference?

Bruce_Ritchie · May 4, 2016, 3:20pm

SSD's help quite bit in general, yes. They're perfect for random IO loads which ES can generate. IOWAIT percentage for the ES process will let you know if it's a concern or not. Unless the SSD is near full though I suspect it isn't.

andrea_patricelli · May 9, 2016, 8:58am

Thank you very much. I'll use SSD, like explained [here] and monitor I/O operations carefully.(https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html)

Topic		Replies	Views
Index speed degradation Elasticsearch	7	487	July 6, 2017
Performance degradation with index size ~2Gb, ~6M docs Elasticsearch	2	565	July 6, 2017
ElasticSearch at scale Elasticsearch	4	1646	July 6, 2017
How can I tune for Elasticsearch performance? Elasticsearch	9	643	May 18, 2020
Elasticsearch performs slowly when data size increased Elasticsearch	3	942	March 21, 2017

Elasticsearch performance enhancements on heavy index

Related topics