Multiple bool-should-match phrase query optimization

Hello all, we are now query elasticsearch to get exactly documents which have ID should match one in the ID array. It's smiliar to the following SQL query:

SELECT * FROM myindex WHERE transaction_id IN (id1, id2, id3)

translate it to Elasticsearch Query:

{
	"from": 0,
	"size": 10000,
	"query": {
		"bool": {
			"must": {
				"bool": {
					"should": [
						{
							"match": {
								"transaction_id": {
									"query": "a",
									"type": "phrase"
								}
							}
						},
						{
							"match": {
								"transaction_id": {
									"query": "b",
									"type": "phrase"
								}
							}
						},
						{
							"match": {
								"transaction_id": {
									"query": "c",
									"type": "phrase"
								}
							}
						}
					]
				}
			}
		}
	}
}

However, with ~136 million document (continue growing) and size of the ID array is ~5000, this query come extremely slow.
Any suggestion to optimize this?

The terms query (docs) is designed to match one of many terms. Have a look at that one. It isn't analyzed and doesn't support phrase queries, only single terms, but it might help here if you can use it.

1 Like

I was profile both queries using profile API and it seems to be they are identical. Confused now :frowning:

Terms query rewrites to a bool query with a bunch of should clauses if
there are fewer than a certain never of terms iirc.

Thank you :smiley: So what happens if there are many terms? Can you guide me to some resources about this?

@nik9000 I was test with 1056 difference term in one query, from profile API I can see that it still rewrite to many of should clauses.

how about using multi-match?
{
"query":{
"multi_match": {
"query": q,
"fields": ["transaction_id"],
"type": "cross_fields",
}
}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.