Hi ES community,
Searching and retrieving inner hits (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html) is extremely slow!
Any tips to expedite the search?
Thanks!
Parul
Hi ES community,
Searching and retrieving inner hits (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html) is extremely slow!
Any tips to expedite the search?
Thanks!
Parul
Is this happening for parent-child or nested searches?
@NerdSec this is nested search
Can you add some more details, because this is a very generic question:
Elasticsearch version
curl -XGET 'localhost:9200'
{
"status" : 200,
"name" : "Gaea",
"cluster_name" : "v1-cluster",
"version" : {
"number" : "1.7.6",
"build_hash" : "c730b59357f8ebc555286794dcd90b3411f517c9",
"build_timestamp" : "2016-11-18T15:21:16Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
Specific query you are running, maybe even a (simplified) mapping
ES query
peak_results = es.search(body=query,
index=chromosome,
doc_type=assembly,
size=99999)
About peak_query function
query = get_query(start, end, within_region=region_inside_status)
def get_query(start, end, with_inner_hits=True, within_region=True):
"""
return peak query
"""
query = {
'query': {
'filtered': {
'filter': {
'nested': {
'path': 'location',
'filter': {
'bool': {
'should': []
}
}
}
},
'_cache': True,
}
},
'_source': False,
}
search_ranges = {
'inside_range': {
'start': start,
'end': end
},
'range_inside': {
'start': end,
'end': start
},
'overlap_start_range': {
'start': start,
'end': start
},
'overlap_end_range': {
'start': end,
'end': end
}
}
for key, value in search_ranges.items():
query['query']['filtered']['filter']['nested']['filter']['bool']['should'].append(get_bool_query(value['start'], value['end']))
if with_inner_hits:
query['query']['filtered']['filter']['nested']['inner_hits'] = {'size': 99999}
return query
What slow means specifically (in general, in comparison to other queries,...)
To explain slowness in our use case, let me give you an overview of the mapping function, the query is executed against nested object location
'location': {
'type': 'nested',
'properties': {
'start': {
'type': 'long'
},
'end': {
'type': 'long'
},
'state': {
'type': 'string'
},
'val': {
'type': 'string'
}
The longer the search coordinates/values for start, end parameter longer is the query search
i.e. for start = 1 and end = 100 takes 1second return the location information, however for start = 1 and end = 10000 it takes 60 seconds to return location information.
Let me know, what you think.
Best Regards,
Parul
The bad news is that your Elasticsearch version is ancient, which hasn't been supported for quite a while and also misses some very helpful tools for finding performance issues like the profile API.
The good news is that inner hits should have improved quite a bit in more recent versions. We have a lot of performance benchmarks and the one you'll be most interested in is probably major versions of nested — especially about inner hits at the end:
You'll have to update sooner or later, but this will probably require quite a lot of work — rewriting queries and (remote) reindexing the data.
Not sure about quick wins. The for
loop for the should criteria looks dangerous to me. Also {'size': 99999}
for inner hits could be an issue — do you really need that much data? What's the total size of the response document? But even tweaks here won't save you from the update in the long run.
@xeraa Thanks a lot! We will keep the community posted on our implementation and speed improvement
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.