Obtaining matching tokens with _search

fivo · March 30, 2019, 6:12pm

Suppose I have the following index:

PUT test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "code_tokenizer"
        }
      },
      "tokenizer": {
        "code_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": []
        }
      }
    }
  },
  "mappings": {
    "codeline": {
      "properties" : {
        "line" : {
          "type": "text",
          "fields": {
            "ngram" : {
              "type" : "text",
              "analyzer" : "my_analyzer"
            }
          }
        }
      }
    }
  }
}

and added the following line

POST test_index/codeline/1
{
  "line" : "coding is fun"
}

When now searching the index, is it somehow possible to get the tokens that match the search query? For example:

GET  test_index/codeline/_search 
{
  "query": {
    "multi_match": {
      "query": "coding fun",
      "fields": [
        "line",
        "line.ngram"
      ],
      "type": "most_fields"
    }
  }
}

would return the document added above plus something along the lines of

{"tokens":[{"token":"coding","start_offset":0,"end_offset":6,"type":"word","position":0}, {"token":"fun","start_offset":10,"end_offset":13,"type":"word","position":0}]}

It does not necessarily need to be the largest matching tokens (although that would be nice), but could also just be the tokens that made up the score of the query.

dadoonet · March 30, 2019, 6:46pm

Would highlighting feature work for you?

fivo · April 2, 2019, 7:14am

Yes that kind of works. One more question though, is it possible to keep the spaces at the beginning of the line? So if the original document is something like:

POST test_index/codeline/1
{
  "line" : "           coding is fun"
}

then the highlighting feature will return something like:

"<em>coding</em> is <em>fun</em>"

where I would prefer something like:

"           <em>coding</em> is <em>fun</em>"

Thanks

dadoonet · April 2, 2019, 7:50am

I don't know.

system · April 30, 2019, 7:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What search to use? Elasticsearch	7	753	July 5, 2017
Search_analyzer Elasticsearch	8	916	May 31, 2018
Match phrase queries to highlighted values Elasticsearch	1	346	July 6, 2017
Analyzer and search_analyzer for common tokens Elasticsearch	1	465	November 14, 2017
How does the match_phrase work for a field with different search_analyzer/index_analyzer? Elasticsearch	1	381	July 6, 2017

Obtaining matching tokens with _search

Related topics