I'll try to generalize my special case in a few words.
I've an elasticsearch index "sample" with two types: type1, type2.
I have index size of about 1000000 (one million) entries, and parent-child relations between entities of type type1 and type2, there are zero or more type2 entities for one type1 entity.
I'm doing an anlysis of all my index to create groups of similar entities, and I use type1 and type2 fields to make searches. For example: if type1_instance.name = type2_instance.name or type1_instance.name = type1_instance.name and so forth on other fields I have a match. There are cross types searches.
Query are like this:
"bool" : {
"should" : [
{
"bool" : {
/* ... /,
"_name" : "query one"
}
},
{
"bool" : {
/ ... /,
"_name" : "query two"
}
},
{
"bool" : {
/ ... */,
"_name" : "query three"
}
}
]
}
My problem is:
If my index has size 1k (1000 entities) my search takes 1 unit of time, if my index is of 10k my search takes 7 units of time, if 50k takes 53 units of time, BUT if idex is about 500k it takes 4000 units of time! and performances decay with 1000000 of entities on my index.
Scaling up my machine (more memory, more cpus) has been useful, but could not be the only solution, I already have a 12 core machine and 16GB ram (8gb only for ES).
The problem is that, at a certain size the time needed for the search increases exponentially and not linearly.
And my question:
Are there some general best practises to avoid such awful performance decay?
Is normal an exponential decay of the search (even single search) performances with the growth of the index?
Are parent-child, in some way, deleterious for ES index?
Environment:
Linux machine, ES 1.7.4
Many thanks and sorry if I've not been clear.