Query fields in multimatch, highlight them, but exclude them from results


(Hacker 21) #1

hi,

I have a field called "pages" which is an array which can store up to 1,000 plain-text web pages.

This is useful while querying to identify relevant documents. However, the 1K page array isn't actually displayed to the end user and sending the 1K pages over from elasticsearch to the server worker significantly slows down the query response time.

It can take up to 30+ seconds for Elasticsearch to send the data over (including the pages array for each document) and just a few seconds without the pages array.

Here's my current query:
"query": {
"bool": {
"must": {
"multi_match": {
"query": query,
"type": "most_fields",
"fields": [
"pages",
"homepage_title",
"site_description"
]
}
},
"filter" : {
"bool": {
"must" : [
{ "term" : { "postal_codes.state" : state } }
]
}
}
}
},
"highlight" : {
"pre_tags" : ["<\b>"],
"post_tags" : ["<\b>"],
"fields" : {
"homepage_title" : { "number_of_fragments" : 0 },
"site_description" : { "number_of_fragments" : 0 },
"pages" : { "number_of_fragments" : 15, 'fragment_size' : 60 }
}
}

At the moment, Elasticsearch is sending over the full results (including pages array) to the worker and I am just deleting the pages array manually ... before it is sent to the results page UI.

Is there a way I can still use the pages array as a part of my multimatch, and even highlight fragments from the pages array ... but avoid actually receiving it in the first place from ES? It would save time on query responses.

I've tried using the _source option, but it doesn't make the pages array available for multi_match queries.

Any help would be greatly appreciated! Thank you,


(Hacker 21) #2

Any ideas? Would appreciate


(swarmee.net) #3

Short answer yes (I am not sure why you would think that by excluding fields from the source it would impact your query). See example below - please paste into dev_tools in kibana and run.

You can see that you can exclude source from the response using the request parameters or in the body.

POST /test/1/1
{
  "pages": [
"one two three four five",
"one two nine ten"
  ],
  "homepage_title": "test.html",
  "site_description": "Description",
  "postal_codes": {
"state": "CA"
  }
}



GET /test/_search?_source_excludes=pages
{
  "query": {
"bool": {
  "must": {
    "multi_match": {
      "query": "ten",
      "type": "most_fields",
      "fields": [
        "pages",
        "homepage_title",
        "site_description"
      ]
    }
  },
  "filter": {
    "bool": {
      "must": [
        {
          "match": {
            "postal_codes.state": "CA"
          }
        }
      ]
    }
  }
}
  },
  "highlight": {
"pre_tags": [
  "<\b>"
],
"post_tags": [
  "<\b>"
],
"fields": {
  "homepage_title": {
    "number_of_fragments": 0
  },
  "site_description": {
    "number_of_fragments": 0
  },
  "pages": {
    "number_of_fragments": 15,
    "fragment_size": 60
  }
}
  }
}


GET /test/_search
{
  "query": {
"bool": {
  "must": {
    "multi_match": {
      "query": "ten",
      "type": "most_fields",
      "fields": [
        "pages",
        "homepage_title",
        "site_description"
      ]
    }
  },
  "filter": {
    "bool": {
      "must": [
        {
          "match": {
            "postal_codes.state": "CA"
          }
        }
      ]
    }
  }
}
  },
  "_source": {
"excludes": [
  "pages"
]
  },
  "highlight": {
"pre_tags": [
  "<\b>"
],
"post_tags": [
  "<\b>"
],
"fields": {
  "homepage_title": {
    "number_of_fragments": 0
  },
  "site_description": {
    "number_of_fragments": 0
  },
  "pages": {
    "number_of_fragments": 15,
    "fragment_size": 60
  }
}
  }
}

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.