Hi,
I have the following environment:
10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good hardrive,
-Xms18000m -Xmx18000m, default thread pools(in this case 24 threads for
search operations)
2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good
hardrive(hardrive not relevant for this nodes though), -Xms18000m
-Xmx18000m, default thread pools(in this case 24 threads for search
operations)
4 Tomcat 7 instances, with a webapp which has a node client which connects
to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 threads
for Tomcat, -Xms7000m -Xmx7000m
1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.
There have indexed* ~1 billion documents*, distributed in 10 shards and 0
replicas at an insane rate, from* 70000 to 100000 docs/s*. I increased the replicas
to 2 afterwards(in 1h I had 1 replica added). The documents are quite
small:
{
"field1": "13446", //5 digits
"date1": "24/10/2013 03:22 AM", //date
"field2": "3502", //4 digits
"field3": "5310", //4 digits
"date2": "02/04/2012 01:21 AM", //date
"field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
"field5": "2890",//4 digits
"obj": {
"objfield1": "761532940881576", //15 digits
"objfield2": "231806579463504",//15 digits
"objfield3": "879",//3 digits
"objfield4": "416",//3 digits
"objfield5": "14"//2 digits
}
}
All the fields are dates(2 of them) and string in the mapping, even though
they are numbers in real life.
I ran queries using 800 different threads from 4 different jmeter machines,
each machine with 200 threads(this machines are also really powerful).
The queries built by the webapps using the JAVA API look like this(I use
filters and try to take advantage of the cache). The queries are different
combinations between maximum 3 of the fields and range for the 2 dates.
GET index/_search
{
"query": {
"constant_score": {
"filter": {
"fquery": {
"query": {
"bool": {
"must": [
{
"constant_score": {
"filter": {
"fquery": {
"query": {
"query_string": {
"query": "obj.objfield4:416"
}
},
"_cache": true
}
}
}
},
{
"constant_score": {
"filter": {
"fquery": {
"query": {
"query_string": {
"query": "obj.objfield5:29"
}
},
"_cache": true
}
}
}
}
]
}
},
"_cache": true
}
}
}
}
}
The results are outrageous, between 20 seconds and even 100 seconds, and
I have 30 shards even distributed between the nodes.
What am I doing wrong here, because I would expect results below 3 seconds.
Should I have the fields as numbers and not as strings? Should I remove the
query string and use a term there?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.