Obtaining matching tokens with _search

Suppose I have the following index:

PUT test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "code_tokenizer"
        }
      },
      "tokenizer": {
        "code_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": []
        }
      }
    }
  },
  "mappings": {
    "codeline": {
      "properties" : {
        "line" : {
          "type": "text",
          "fields": {
            "ngram" : {
              "type" : "text",
              "analyzer" : "my_analyzer"
            }
          }
        }
      }
    }
  }
}

and added the following line

POST test_index/codeline/1
{
  "line" : "coding is fun"
}

When now searching the index, is it somehow possible to get the tokens that match the search query? For example:

GET  test_index/codeline/_search 
{
  "query": {
    "multi_match": {
      "query": "coding fun",
      "fields": [
        "line",
        "line.ngram"
      ],
      "type": "most_fields"
    }
  }
}

would return the document added above plus something along the lines of

{"tokens":[{"token":"coding","start_offset":0,"end_offset":6,"type":"word","position":0}, {"token":"fun","start_offset":10,"end_offset":13,"type":"word","position":0}]}

It does not necessarily need to be the largest matching tokens (although that would be nice), but could also just be the tokens that made up the score of the query.

Would highlighting feature work for you?

Yes that kind of works. One more question though, is it possible to keep the spaces at the beginning of the line? So if the original document is something like:

POST test_index/codeline/1
{
  "line" : "           coding is fun"
}

then the highlighting feature will return something like:

"<em>coding</em> is <em>fun</em>" 

where I would prefer something like:

"           <em>coding</em> is <em>fun</em>" 

Thanks

I don't know.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.