Performance of has_child query in large amounts of data


#1

Hi everyone,
I've had a very big problem recently. When I used the has_child query, it took a few minutes.
Here is the detailed information:

About elasticsearch configuration:
es version: 5.2.2
total documents:100 billion
total data cost storage:30 T
index number:20 (search by alias)
shard number:1700
node number:15(each has 30G heap memory)

About mappings:

**I have a parent type, it has one million documents** and each field like this:

"field_name": {
	"type": "text",
	"fields": {
		"keyword": {
			"type": "keyword",
			"ignore_above": 256
		}
	}
}

Then, I have about 20 child types, and each field like above one. But the number of child document is very very very large (100 billion minus number of parent type).
So , when I query parent type like this, it will cost a few minutes:

{
  "query": {
	"constant_score": {
	  "filter": {
		"has_child": {
		  "child_type": "app_signature",
		  "query": {
			"constant_score": {
			  "filter": {
				"term": {
				  "app_signature_notafter.keyword": "Tue Dec 03 16:52:52 CST 2041"
				}
			  }
			}
		  }
		}
	  }
	}
  }
}

How can I improve query performance ? Add node or shard ? And is there a way to reduces query time to less than 10 seconds ?
Thank you very much!


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.