Just wanted to close the loop on this in case anyone stumbled upon the same
issue.
After upgrading to version 1.3.2 which had the performance increase
stemming from Make use of global ordinals in parent/child queries by martijnvg · Pull Request #5846 · elastic/elasticsearch · GitHub, we
were able to see a dramatic decrease in parent/child query latency. We're
executing queries under 150ms which is manageable for now and will be
eagerly awaiting further improvements from the work Clinton highlighted
here: Add option to eagerly build global ordinals for parent-child ID cache · Issue #7394 · elastic/elasticsearch · GitHub.
Along the way in our testing we got a little confused as we attempted to do
our troubleshooting on 1 data node in order to keep things simple, this
manifested in some misplaced assumptions around the performance increases
that came from work released in 1.2.0. In our testing on a single node, we
did not observe a latency decrease at all when going from 1.1.2 to 1.3.2.
However, when we changed our test cluster to use two data nodes, we saw a
huge improvement. So my earlier assertion around not seeing those
improvements in version 1.3.2 was incorrect although I'm still confused as
to why a single node configuration was not benefiting.
In any case, wanted to thank the ES developers for being generous with
their time helping us track this issue down. Now that I realize the
incredible pace in which ES versions are released, we'll be much more
vigilant about keeping up.
Thanks again!
On Monday, August 25, 2014 11:32:38 AM UTC-4, Mark Greene wrote:
Hey Clinton,
Thanks for the heads up on what's on the horizon. That definitely sounds
like a drastic improvement. That being said, my fear here is that even with
that improvement, this data model (parent/child) doesn't seem to that
performant with a moderate amount of documents. In order for us to really
adopt this methodology of using parent/child, we'd expect to see sub 100ms
performance so long as we were feeding ES with enough RAM.
My hunch here is there must be some code path that is hit when running on
more than 1 data node that either doesn't write to the cache or skips it on
the read and hits the disk. We don't have a ton of load on our data nodes,
CPU is well under 30% and IOWait is usually under 0.30.
Just to reiterate, when we run the parent/child query on one data node, it
runs in less than 100ms, when it runs across two data nodes, it's >10s.
This is being experienced on version 1.1.2 and 1.3.2.
On Monday, August 25, 2014 10:55:15 AM UTC-4, Clinton Gormley wrote:
Something else to note: parent-child now uses global ordinals to make
queries 3x faster than they were previously, but global ordinals need to be
rebuilt after the index has refreshed (assuming some data has changed).
Currently there is no way to refresh p/c global ordinals "eagerly" (ie
during the refresh phase) and so it happens on the first query after a
refresh. 1.3.3 and 1.4.0 will include an option to allow eager building of
global ordinals which should remove this latency spike:
Add option to eagerly build global ordinals for parent-child ID cache · Issue #7394 · elastic/elasticsearch · GitHub
You may want to consider increasing the refresh_interval so that global
ordinals remain valid for longer.
On 25 August 2014 16:48, Mark Greene ma...@evertrue.com wrote:
Hi Adrien,
Thanks for reaching out.
We actually were exited to see the performance improvements stated in
the 1.2.0 release notes so we upgraded to 1.3.2. We saw some performance
improvement but it wasn't orders of magnitude and queries are still running
very slow.
We also tried your suggestion of using the 'preference=_local' query
param but we didn't see any difference there. Additionally, running the
query 10 times, we saw no improvement in speed.
Currently, the only major performance increase we've seen with
parent/child queries is dropping down to 1 data node, at which, we see
queries executing well under the 100ms mark.
On Friday, August 22, 2014 6:42:27 PM UTC-4, Adrien Grand wrote:
Hi Mark,
Given that you had 1 replica in your first setup, it could take several
queries to warm up the field data cache completely, does the query still
take 16 seconds to run if you run it eg. 10 times? (3 should be enough, but
just to be sure)
Does it change anything if you query elasticsearch with
preference=_local? This should be equivalent to your single-node setup, so
it would be interesting to see if that changes something.
As a side note, you might want to try out a more recent version of
Elasticsearch since parent/child performance improved quite significantly
in 1.2.0 because of GitHub - elastic/elasticsearch: Free and Open, Distributed, RESTful Search Engine
pull/5846
On Fri, Aug 22, 2014 at 11:15 PM, Mark Greene ma...@evertrue.com
wrote:
I wanted to update the list with an interesting piece of information.
We found that when we took one of our two data nodes out of the cluster,
leaving just one data node with no replicas, the query performance
increased dramatically. The queries are now returning in <100ms on
subsequent executions which is what we'd expect to see as a result of the
data being stored in the field data cache.
Is it possible that there is some kind of inefficient code path when a
query is spread across primary and replica shards?
On Thursday, August 21, 2014 3:53:40 PM UTC-4, Mark Greene wrote:
We are experiencing slow parent/child queries even when we run the
query a second time and I wanted to know if this is just the limit of this
feature within Elasticsearch. According to the ES Docs (
Elasticsearch Platform — Find real-time answers at scale | Elastic
urrent/parent-child-performance.html) parent/child queries can be
5-10x slower and consume a lot of memory.
My impression has been that as long as we give ES enough memory via
the field data cache, subsequent queries would be quicker than the first
time it is executed. We are seeing the following query take ~16 seconds to
complete every time.
{
"from": 0,
"size": 100,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"oid": 61
}
},
{
"has_child": {
"type": "social",
"query": {
"bool": {
"should": [
{
"term": {
"engagement.type": "like"
}
},
{
"term": {
"content.remote_id": "20697868961_10152270678178962"
}
}
]
}
}
}
}
]
}
}
}
},
"fields": "id",
"sort": [
{
"_score": {}
},
{
"id": {
"order": "asc"
}
}
]
}
The index (which has 5 shards with 1 replica shard) we are testing
this on has 2.2 million parent documents and 1.1 million child documents.
We are running our two data nodes on r3.2xlarge's which have 8 CPU's,
60GB of RAM, and SSD.
Our ES data nodes have 30G of heap and the field data cache is only
consuming around ~3GB right now and there are no cache evictions. The field
data cache is also allowed to grow to 75% of the available heap.
I'm looking to understand if this is a limitation with parent/child
or is there additional configuration that has to be set beyond the defaults
that would help speed these queries up?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a6442545-edc0-4e21-9696-925aae517762%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
Adrien Grand
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/87aff37a-04be-472d-88fa-5fe6c6a3f5a7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d411fc6c-ec67-44c8-a775-2192d2917650%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.