ES Query bottleneck

Hi,

I'm trying to figure out the bottleneck in query performance. I'm runing ES
0.90 on a 8GB, 4core machine. The index size is 38GB with 10m documents in
4 shards.

On running a query, I'm getting 2k result in around 21 seconds. On firing
the same query again after clearing ES cache, I'm getting the results in
around 16 seconds. Disk IO is not the bottleneck as iostat is showing not
much activity. I've also tried with the index on the SSD, it gives result
in around 18 seconds. The maximum CPU utilization is also around 40% (of 4
cores).

What might be the reason of this?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

you're saying you are getting 2k results from ES, can you explain why you
need so many? if you only request 10 documents what's the response time?

simon

On Monday, August 19, 2013 2:21:06 PM UTC+2, Anand Nalya wrote:

Hi,

I'm trying to figure out the bottleneck in query performance. I'm runing
ES 0.90 on a 8GB, 4core machine. The index size is 38GB with 10m documents
in 4 shards.

On running a query, I'm getting 2k result in around 21 seconds. On firing
the same query again after clearing ES cache, I'm getting the results in
around 16 seconds. Disk IO is not the bottleneck as iostat is showing not
much activity. I've also tried with the index on the SSD, it gives result
in around 18 seconds. The maximum CPU utilization is also around 40% (of 4
cores).

What might be the reason of this?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Even for 10 results I'm getting similar response time. My query is:

{
"query" : {
"has_child" : {
"query" : {
"match" : {
"content" : {
"query" : "microsoft",
"type" : "phrase"
}
}
},
"child_type" : "content",
"score_type" : "none"
}
},
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"c76" : "http"
}
}, {
"terms" : {
"c150" : [ "13" ]
}
} ]
}
}
}

On Monday, 19 August 2013 18:11:38 UTC+5:30, simonw wrote:

you're saying you are getting 2k results from ES, can you explain why you
need so many? if you only request 10 documents what's the response time?

simon

On Monday, August 19, 2013 2:21:06 PM UTC+2, Anand Nalya wrote:

Hi,

I'm trying to figure out the bottleneck in query performance. I'm runing
ES 0.90 on a 8GB, 4core machine. The index size is 38GB with 10m documents
in 4 shards.

On running a query, I'm getting 2k result in around 21 seconds. On firing
the same query again after clearing ES cache, I'm getting the results in
around 16 seconds. Disk IO is not the bottleneck as iostat is showing not
much activity. I've also tried with the index on the SSD, it gives result
in around 18 seconds. The maximum CPU utilization is also around 40% (of 4
cores).

What might be the reason of this?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you are using parent/child queries, they load an id cache on the first
query. It is recommended that you use warmers to load this cache before
performing the actual query. The fact you are clearing the cache during
testing means your slower response times are expected due to this cache
loading. On top of that you should move to using a boolean filter vs. the
and filter, see
Elasticsearch Platform — Find real-time answers at scale | Elastic.

Thanks,
Matt Weber

On Tue, Aug 20, 2013 at 6:45 AM, Anand Nalya anand.nalya@gmail.com wrote:

Even for 10 results I'm getting similar response time. My query is:

{
"query" : {
"has_child" : {
"query" : {
"match" : {
"content" : {
"query" : "microsoft",
"type" : "phrase"
}
}
},
"child_type" : "content",
"score_type" : "none"
}
},
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"c76" : "http"
}
}, {
"terms" : {
"c150" : [ "13" ]
}
} ]

}

}
}

On Monday, 19 August 2013 18:11:38 UTC+5:30, simonw wrote:

you're saying you are getting 2k results from ES, can you explain why you
need so many? if you only request 10 documents what's the response time?

simon

On Monday, August 19, 2013 2:21:06 PM UTC+2, Anand Nalya wrote:

Hi,

I'm trying to figure out the bottleneck in query performance. I'm runing
ES 0.90 on a 8GB, 4core machine. The index size is 38GB with 10m documents
in 4 shards.

On running a query, I'm getting 2k result in around 21 seconds. On
firing the same query again after clearing ES cache, I'm getting the
results in around 16 seconds. Disk IO is not the bottleneck as iostat is
showing not much activity. I've also tried with the index on the SSD, it
gives result in around 18 seconds. The maximum CPU utilization is also
around 40% (of 4 cores).

What might be the reason of this?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On top of Matt's recommendations I also recommend you to upgrade to ES
0.90.3, which includes three improvements that will improve performance and
reduce memory usage:

  1. optimize has_child query when matching parent count is low · Issue #3190 · elastic/elasticsearch · GitHub
  2. Parent-Child: Improve memory usage id cache · Issue #3028 · elastic/elasticsearch · GitHub
  3. Parent-Child: Improve has_parent & has_child filter execution · Issue #3034 · elastic/elasticsearch · GitHub

On Tuesday, August 20, 2013 5:52:45 PM UTC+2, Matt Weber wrote:

If you are using parent/child queries, they load an id cache on the first
query. It is recommended that you use warmers to load this cache before
performing the actual query. The fact you are clearing the cache during
testing means your slower response times are expected due to this cache
loading. On top of that you should move to using a boolean filter vs. the
and filter, see
Elasticsearch Platform — Find real-time answers at scale | Elastic.

Thanks,
Matt Weber

On Tue, Aug 20, 2013 at 6:45 AM, Anand Nalya <anand...@gmail.com<javascript:>

wrote:

Even for 10 results I'm getting similar response time. My query is:

{
"query" : {
"has_child" : {
"query" : {
"match" : {
"content" : {
"query" : "microsoft",
"type" : "phrase"
}
}
},
"child_type" : "content",
"score_type" : "none"
}
},
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"c76" : "http"
}
}, {
"terms" : {
"c150" : [ "13" ]
}
} ]

}

}
}

On Monday, 19 August 2013 18:11:38 UTC+5:30, simonw wrote:

you're saying you are getting 2k results from ES, can you explain why
you need so many? if you only request 10 documents what's the response time?

simon

On Monday, August 19, 2013 2:21:06 PM UTC+2, Anand Nalya wrote:

Hi,

I'm trying to figure out the bottleneck in query performance. I'm
runing ES 0.90 on a 8GB, 4core machine. The index size is 38GB with 10m
documents in 4 shards.

On running a query, I'm getting 2k result in around 21 seconds. On
firing the same query again after clearing ES cache, I'm getting the
results in around 16 seconds. Disk IO is not the bottleneck as iostat is
showing not much activity. I've also tried with the index on the SSD, it
gives result in around 18 seconds. The maximum CPU utilization is also
around 40% (of 4 cores).

What might be the reason of this?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I upgrade to 0.90.3 and that alone improved the performance by around 45%.

Thanks Martin.

On Tuesday, 20 August 2013 22:08:54 UTC+5:30, Martijn van Groningen wrote:

On top of Matt's recommendations I also recommend you to upgrade to ES
0.90.3, which includes three improvements that will improve performance and
reduce memory usage:

  1. optimize has_child query when matching parent count is low · Issue #3190 · elastic/elasticsearch · GitHub
  2. Parent-Child: Improve memory usage id cache · Issue #3028 · elastic/elasticsearch · GitHub
  3. Parent-Child: Improve has_parent & has_child filter execution · Issue #3034 · elastic/elasticsearch · GitHub

On Tuesday, August 20, 2013 5:52:45 PM UTC+2, Matt Weber wrote:

If you are using parent/child queries, they load an id cache on the first
query. It is recommended that you use warmers to load this cache before
performing the actual query. The fact you are clearing the cache during
testing means your slower response times are expected due to this cache
loading. On top of that you should move to using a boolean filter vs. the
and filter, see
Elasticsearch Platform — Find real-time answers at scale | Elastic
.

Thanks,
Matt Weber

On Tue, Aug 20, 2013 at 6:45 AM, Anand Nalya anand...@gmail.com wrote:

Even for 10 results I'm getting similar response time. My query is:

{
"query" : {
"has_child" : {
"query" : {
"match" : {
"content" : {
"query" : "microsoft",
"type" : "phrase"
}
}
},
"child_type" : "content",
"score_type" : "none"
}
},
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"c76" : "http"
}
}, {
"terms" : {
"c150" : [ "13" ]
}
} ]

}

}
}

On Monday, 19 August 2013 18:11:38 UTC+5:30, simonw wrote:

you're saying you are getting 2k results from ES, can you explain why
you need so many? if you only request 10 documents what's the response time?

simon

On Monday, August 19, 2013 2:21:06 PM UTC+2, Anand Nalya wrote:

Hi,

I'm trying to figure out the bottleneck in query performance. I'm
runing ES 0.90 on a 8GB, 4core machine. The index size is 38GB with 10m
documents in 4 shards.

On running a query, I'm getting 2k result in around 21 seconds. On
firing the same query again after clearing ES cache, I'm getting the
results in around 16 seconds. Disk IO is not the bottleneck as iostat is
showing not much activity. I've also tried with the index on the SSD, it
gives result in around 18 seconds. The maximum CPU utilization is also
around 40% (of 4 cores).

What might be the reason of this?

Thanks,
Anand

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Matt,

I tried with different length of document ids. My initial reading of 21s
was with 32 char hex-string as ids. When I used 16char hex-string it gave
results in around 15s which is almost a 30% improvement.

I was wondering if I can use a long value as the id. I know that i can map
_id to some long value in the document, but will it store id as long or
not. All the java apis expose id as String, so long value like
9223372036854775807L will be store as 8 bytes long or 19 character string.

Thanks,
Anand

On Tuesday, 20 August 2013 21:22:45 UTC+5:30, Matt Weber wrote:

If you are using parent/child queries, they load an id cache on the first
query. It is recommended that you use warmers to load this cache before
performing the actual query. The fact you are clearing the cache during
testing means your slower response times are expected due to this cache
loading. On top of that you should move to using a boolean filter vs. the
and filter, see
Elasticsearch Platform — Find real-time answers at scale | Elastic.

Thanks,
Matt Weber

On Tue, Aug 20, 2013 at 6:45 AM, Anand Nalya <anand...@gmail.com<javascript:>

wrote:

Even for 10 results I'm getting similar response time. My query is:

{
"query" : {
"has_child" : {
"query" : {
"match" : {
"content" : {
"query" : "microsoft",
"type" : "phrase"
}
}
},
"child_type" : "content",
"score_type" : "none"
}
},
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"c76" : "http"
}
}, {
"terms" : {
"c150" : [ "13" ]
}
} ]

}

}
}

On Monday, 19 August 2013 18:11:38 UTC+5:30, simonw wrote:

you're saying you are getting 2k results from ES, can you explain why
you need so many? if you only request 10 documents what's the response time?

simon

On Monday, August 19, 2013 2:21:06 PM UTC+2, Anand Nalya wrote:

Hi,

I'm trying to figure out the bottleneck in query performance. I'm
runing ES 0.90 on a 8GB, 4core machine. The index size is 38GB with 10m
documents in 4 shards.

On running a query, I'm getting 2k result in around 21 seconds. On
firing the same query again after clearing ES cache, I'm getting the
results in around 16 seconds. Disk IO is not the bottleneck as iostat is
showing not much activity. I've also tried with the index on the SSD, it
gives result in around 18 seconds. The maximum CPU utilization is also
around 40% (of 4 cores).

What might be the reason of this?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.