I have a large index (10M+ docs with large bodies of text) and I'm trying to speed up queries for large bodies of text over it.
I know in advance in which documents the "text" is likely to be (i.e. I know the doc's likely "title"), so ideally I'd like the match query to run only over the documents that have the specified title, as that would likely be much faster than matching against the whole index.
Is this a viable approach?
I've already tried a combination of must/should and must/must of a terms and match query, as well as a filter over the title and over an ids query, but the match query's run time is unchanged, it appears that filtering is done over the results from match instead of before.
My expectation was that the terms query/filter, which theoretically has the least cost, would be computed first, and only the matching docs would be scanned for the match query, or at least that was my understanding of the docs (" the goal of filtering is to reduce the number of documents that have to be examined" ).
Mappings:
{
"myIndex": {
"mappings": {
"page": {
"_all": {
"enabled": false
},
"properties": {
"text": {
"type": "text",
"analyzer": "myAnalyzer"
},
"title": {
"type": "keyword"
}
}
}
}
}
}
Queries tried so far:
{
"query": {
"bool":{
"must":{
"terms" : {
"title" : ["t1","t2","t3"]
}
},
"should": { //also tried a "must" query here
"match" : {
"text" : "large body of text here"
}
}
}
}
}
{
"query": {
"bool":{
"filter":{
"terms" : {
"title" : ["t1","t2","t3"]
}
},
"should": {
"match" : {
"text" : "large body of text here"
}
}
}
}
}
{
"query": {
"bool":{
"filter":{
"ids" : {
"values" : ["33934108","1196927","2235504"]
},
"should": {
"match" : {
"text" : "large body of text here"
}
}
}
}
}