How to make elastic querying faster

Hi , I need to query a lot of data for a given elastic index . I cannot use fielddata true on the required index fields as it will increase the size of cached memory. Currently , it is taking approx 10 min to run the query for a particular application using partitions as querying all in one query gives outofMemory error .I want to reduce the time taken for querying indexes . Any suggestions ?

What is the query? What does the data look like? How many indices and shards are you querying? How much data do these hold?

Query is :{
"from": 0,
"size": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"match_phrase": {
"app_id": {
"query": "APPID"
}
}
},
{
"range": {
"collector_tstamp": {
"from": "FROMDATE",
"to": "TODATE"
}
}
}
]
}
}
]
}
},
"aggregations": {
"page_urlpath": {
"terms": {
"field": "page_urlpath.keyword",
"size": 2147483647,
"include": {
"partition": "PARTITION_NUMBER",
"num_partitions": "TOTAL_PARTITIONS"
}
},
"aggregations": {
"visitors": {
"cardinality": {
"field": "domain_userid.keyword",
"precision_threshold": 40000
}
},
"visits": {
"cardinality": {
"field": "domain_sessionid.keyword",
"precision_threshold": 40000
}
},
"number_of_events": {
"value_count": {
"field": "_index"
}
}
}
}
}
}

it queries for the entire month together . The month has a doc count of 6392551 records . It has 3800 different buckets for field page_urlpath . 2 nodes per node 1013 shards .

You should never set the size parameter to unnecessarily large values as it will use a lot of heap. See this old blog post for a discussion on this.

1 Like

Well i changed that , definitely a good link to read , but perfomance still remains the same , will creating number of threads affect the heap ?

It sounds like you have far, far to many shards given the amount of data you have. Please read this blog post and then try to dramatically reduce the number of shards in the cluster. I would expect having to query only a few shards to give much better performance.