Suggestions from long text fields

hjklo · January 24, 2020, 10:16pm

We have an ES index that contains a large and constantly increasing number of documents (PDFs, word processord files etc). The index stores several details about each document, including the complete contents of the documents in plaintext in a singular field. What we would need is a completion suggestion scheme that suggests the most common words that match the given query string, occuring anywhere in these plaintext fields.

The best solution so far in terms of both results and performance has been to use the "search_as_you_type" field and let ES only return the relevant documents. The Python code that then receives these results browses the returned documents and finds the matching substrings. (ES highlighting either doesn't work or is too slow.)

So we have a mapping that is something like:

  "mappings": {
    "properties": {
      "plaintext": {
        "type": "search_as_you_type"
      }
    }
  }

And the have several documents of the form:

{
  "plaintext": ".... <thousands of words> ... elasticsearch .... <thousands of words..."
}
{
  "plaintext": ".... <thousands of words> ... elastic search .... <thousands of words..."
}

And when the query is something like

  "query": {
    "match_phrase_prefix": {
      "plaintext": "ela"
    }
  }

then ES should return the aforementioned documents with highlights elasticsearch and elastic. The "search_as_you_type" with highlighting works ok, but the process is much faster if the highlighting is dropped and the results are handled with Python and regular expressions. And when the query string is something like "elastic se" then the returned highlight should be elastic search which only seems to sort of work if the highligh has a separate query for each whitespace-separated substring. And then things get really slow.

On the other hand the problem with the Python method is that we cannot be sure we've grabbed the most common words. So is there a "pure" Elasticsearch way of doing what is described above, or should we stick to the current solution?

system · February 21, 2020, 10:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Complete suggestions highlighting Elastic Search elastic-app-search	2	210	April 17, 2024
Search as you type and highlighter Elasticsearch	2	1094	December 2, 2021
Context Suggestion query, but allow middle-text searching? Elasticsearch	2	966	September 28, 2018
Help needed with search as you type Elasticsearch	3	1013	November 20, 2019
SearchEngine with suggestion Elasticsearch	2	269	July 6, 2017

Suggestions from long text fields

Related topics