Huge response time for simple queries in an uber environment

Cosmin_Vasii · October 28, 2014, 8:44am

Hi,

I have the following environment:
10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good hardrive,
-Xms18000m -Xmx18000m, default thread pools(in this case 24 threads for
search operations)
2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good
hardrive(hardrive not relevant for this nodes though), -Xms18000m
-Xmx18000m, default thread pools(in this case 24 threads for search
operations)
4 Tomcat 7 instances, with a webapp which has a node client which connects
to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 threads
for Tomcat, -Xms7000m -Xmx7000m
1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.

There have indexed* ~1 billion documents*, distributed in 10 shards and 0
replicas at an insane rate, from* 70000 to 100000 docs/s*. I increased the replicas
to 2 afterwards(in 1h I had 1 replica added). The documents are quite
small:

{
"field1": "13446", //5 digits
"date1": "24/10/2013 03:22 AM", //date
"field2": "3502", //4 digits
"field3": "5310", //4 digits
"date2": "02/04/2012 01:21 AM", //date
"field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
"field5": "2890",//4 digits
"obj": {
"objfield1": "761532940881576", //15 digits
"objfield2": "231806579463504",//15 digits
"objfield3": "879",//3 digits
"objfield4": "416",//3 digits
"objfield5": "14"//2 digits
}
}

All the fields are dates(2 of them) and string in the mapping, even though
they are numbers in real life.

I ran queries using 800 different threads from 4 different jmeter machines,
each machine with 200 threads(this machines are also really powerful).

The queries built by the webapps using the JAVA API look like this(I use
filters and try to take advantage of the cache). The queries are different
combinations between maximum 3 of the fields and range for the 2 dates.

GET index/_search
{
"query": {
"constant_score": {
"filter": {
"fquery": {
"query": {
"bool": {
"must": [
{
"constant_score": {
"filter": {
"fquery": {
"query": {
"query_string": {
"query": "obj.objfield4:416"
}
},
"_cache": true
}
}
}
},

            {
              "constant_score": {
                "filter": {
                  "fquery": {
                    "query": {
                      "query_string": {
                        "query": "obj.objfield5:29"
                      }
                    },
                    "_cache": true
                  }
                }
              }
            }
          ]
        }
      },
      "_cache": true
    }
  }
}

}
}

The results are outrageous, between 20 seconds and even 100 seconds, and
I have 30 shards even distributed between the nodes.

What am I doing wrong here, because I would expect results below 3 seconds.

Should I have the fields as numbers and not as strings? Should I remove the
query string and use a term there?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · October 28, 2014, 10:04am

Did you monitor the memory usage?

If memory is going low, try to avoid caching (unless you always query for
"416" and "29") and check if you can avoid "query_string" in preference to
"term", and boolean clauses with single term, since they can be simplified.

Also note you only need one "constant_score" per query because there is no
such thing like multiple scores in a single query.

A simplification would look like

GET index/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{ "term": { "objfield4" : 416 } },
{ "term": { "objfield5" : 29 } }
]
}
}
}
}
}

Jörg

On Tue, Oct 28, 2014 at 9:44 AM, Cosmin-Radu Vasii <
cosminradu.vasii@gmail.com> wrote:

Hi,

I have the following environment:
10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good
hardrive, -Xms18000m -Xmx18000m, default thread pools(in this case 24
threads for search operations)
2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good
hardrive(hardrive not relevant for this nodes though), -Xms18000m
-Xmx18000m, default thread pools(in this case 24 threads for search
operations)
4 Tomcat 7 instances, with a webapp which has a node client which connects
to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 threads
for Tomcat, -Xms7000m -Xmx7000m
1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.

There have indexed* ~1 billion documents*, distributed in 10 shards and 0
replicas at an insane rate, from* 70000 to 100000 docs/s*. I increased
the replicas to 2 afterwards(in 1h I had 1 replica added). The
documents are quite small:

{
"field1": "13446", //5 digits
"date1": "24/10/2013 03:22 AM", //date
"field2": "3502", //4 digits
"field3": "5310", //4 digits
"date2": "02/04/2012 01:21 AM", //date
"field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
"field5": "2890",//4 digits
"obj": {
"objfield1": "761532940881576", //15 digits
"objfield2": "231806579463504",//15 digits
"objfield3": "879",//3 digits
"objfield4": "416",//3 digits
"objfield5": "14"//2 digits
}
}

All the fields are dates(2 of them) and string in the mapping, even though
they are numbers in real life.

I ran queries using 800 different threads from 4 different jmeter
machines, each machine with 200 threads(this machines are also really
powerful).

The queries built by the webapps using the JAVA API look like this(I use
filters and try to take advantage of the cache). The queries are different
combinations between maximum 3 of the fields and range for the 2 dates.

GET index/_search
{
"query": {
"constant_score": {
"filter": {
"fquery": {
"query": {
"bool": {
"must": [
{
"constant_score": {
"filter": {
"fquery": {
"query": {
"query_string": {
"query": "obj.objfield4:416"
}
},
"_cache": true
}
}
}
},
            {
              "constant_score": {
                "filter": {
                  "fquery": {
                    "query": {
                      "query_string": {
                        "query": "obj.objfield5:29"
                      }
                    },
                    "_cache": true
                  }
                }
              }
            }
          ]
        }
      },
      "_cache": true
    }
  }
}
}
}

The results are outrageous, between 20 seconds and even 100 seconds,
and I have 30 shards even distributed between the nodes.

What am I doing wrong here, because I would expect results below 3 seconds.

Should I have the fields as numbers and not as strings? Should I remove
the query string and use a term there?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%2Bi8VEZRaQqYizj-DBZOKc0UYkq_f49ZpRJTfEgi1JYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Cosmin_Vasii · October 28, 2014, 10:19am

Marvel tells me that the JVM heap usage is around 75% for all the data
nodes(like 12 Gb) and the CPU is constantly between 90-95%. You are saying
that my queries are pretty different and I am not really using the cache
and ES will try to cache documents, but it will get rid of those really
quickly? Something like the documents are being stored in the cache, but
for nothing?

marți, 28 octombrie 2014, 12:04:59 UTC+2, Jörg Prante a scris:

Did you monitor the memory usage?

If memory is going low, try to avoid caching (unless you always query for
"416" and "29") and check if you can avoid "query_string" in preference to
"term", and boolean clauses with single term, since they can be simplified.

Also note you only need one "constant_score" per query because there is no
such thing like multiple scores in a single query.

A simplification would look like

GET index/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{ "term": { "objfield4" : 416 } },
{ "term": { "objfield5" : 29 } }
]
}
}
}
}
}

Jörg

On Tue, Oct 28, 2014 at 9:44 AM, Cosmin-Radu Vasii <cosminra...@gmail.com
<javascript:>> wrote:
Hi,

I have the following environment:
10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good
hardrive, -Xms18000m -Xmx18000m, default thread pools(in this case 24
threads for search operations)
2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good
hardrive(hardrive not relevant for this nodes though), -Xms18000m
-Xmx18000m, default thread pools(in this case 24 threads for search
operations)
4 Tomcat 7 instances, with a webapp which has a node client which
connects to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250
threads for Tomcat, -Xms7000m -Xmx7000m
1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.

There have indexed* ~1 billion documents*, distributed in 10 shards and
0 replicas at an insane rate, from* 70000 to 100000 docs/s*. I increased
the replicas to 2 afterwards(in 1h I had 1 replica added). The
documents are quite small:

{
"field1": "13446", //5 digits
"date1": "24/10/2013 03:22 AM", //date
"field2": "3502", //4 digits
"field3": "5310", //4 digits
"date2": "02/04/2012 01:21 AM", //date
"field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
"field5": "2890",//4 digits
"obj": {
"objfield1": "761532940881576", //15 digits
"objfield2": "231806579463504",//15 digits
"objfield3": "879",//3 digits
"objfield4": "416",//3 digits
"objfield5": "14"//2 digits
}
}

All the fields are dates(2 of them) and string in the mapping, even
though they are numbers in real life.

I ran queries using 800 different threads from 4 different jmeter
machines, each machine with 200 threads(this machines are also really
powerful).

The queries built by the webapps using the JAVA API look like this(I use
filters and try to take advantage of the cache). The queries are different
combinations between maximum 3 of the fields and range for the 2 dates.

GET index/_search
{
"query": {
"constant_score": {
"filter": {
"fquery": {
"query": {
"bool": {
"must": [
{
"constant_score": {
"filter": {
"fquery": {
"query": {
"query_string": {
"query": "obj.objfield4:416"
}
},
"_cache": true
}
}
}
},
            {
              "constant_score": {
                "filter": {
                  "fquery": {
                    "query": {
                      "query_string": {
                        "query": "obj.objfield5:29"
                      }
                    },
                    "_cache": true
                  }
                }
              }
            }
          ]
        }
      },
      "_cache": true
    }
  }
}
}
}

The results are outrageous, between 20 seconds and even 100 seconds,
and I have 30 shards even distributed between the nodes.

What am I doing wrong here, because I would expect results below 3
seconds.

Should I have the fields as numbers and not as strings? Should I remove
the query string and use a term there?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eb369f30-5c18-4a2a-84f1-32faf004bd62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
4-5 second query time. Only 50 documents. Need help Elasticsearch	11	461	July 6, 2017
How to achieve Query Performance Elasticsearch	11	740	July 6, 2017
Query Performance Elasticsearch	10	1917	July 6, 2017
Improve query time Elasticsearch	16	2167	July 6, 2017
Improving search speed for 100 million queries Elasticsearch	8	2458	July 6, 2017

Huge response time for simple queries in an uber environment

Related topics