Multiple bool-should-match phrase query optimization


(Nguyễn Hải Đăng) #1

Hello all, we are now query elasticsearch to get exactly documents which have ID should match one in the ID array. It's smiliar to the following SQL query:

SELECT * FROM myindex WHERE transaction_id IN (id1, id2, id3)

translate it to Elasticsearch Query:

{
	"from": 0,
	"size": 10000,
	"query": {
		"bool": {
			"must": {
				"bool": {
					"should": [
						{
							"match": {
								"transaction_id": {
									"query": "a",
									"type": "phrase"
								}
							}
						},
						{
							"match": {
								"transaction_id": {
									"query": "b",
									"type": "phrase"
								}
							}
						},
						{
							"match": {
								"transaction_id": {
									"query": "c",
									"type": "phrase"
								}
							}
						}
					]
				}
			}
		}
	}
}

However, with ~136 million document (continue growing) and size of the ID array is ~5000, this query come extremely slow.
Any suggestion to optimize this?


(Nik Everett) #2

The terms query (docs) is designed to match one of many terms. Have a look at that one. It isn't analyzed and doesn't support phrase queries, only single terms, but it might help here if you can use it.


(Nguyễn Hải Đăng) #3

I was profile both queries using profile API and it seems to be they are identical. Confused now :frowning:


(Nik Everett) #4

Terms query rewrites to a bool query with a bunch of should clauses if
there are fewer than a certain never of terms iirc.


(Nguyễn Hải Đăng) #5

Thank you :smiley: So what happens if there are many terms? Can you guide me to some resources about this?


(Nguyễn Hải Đăng) #6

@nik9000 I was test with 1056 difference term in one query, from profile API I can see that it still rewrite to many of should clauses.


(何之真) #7

how about using multi-match?
{
"query":{
"multi_match": {
"query": q,
"fields": ["transaction_id"],
"type": "cross_fields",
}
}
}


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.