Position as result, instead of highlighting

I try to get positions instead of highlighted text as the result of elasticsearch query.

Create the index:

PUT /test/
{
  "mappings": {
    "article": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "english"
        },
        "author": {
          "type": "text"
        }
      }
    }
  }
}

Put a document:

PUT /test/article/1
{
  "author": "Just Me",
  "text": "This is just a simple test to demonstrate the audience the purpose of the question!"
}

Search the document:

GET /test/article/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "text": {
              "query": "simple test",
              "_name": "must"
            }
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "text": {
              "query": "need help",
              "_name": "first",
              "slop": 2
            }
          }
        },
        {
          "match_phrase": {
            "text": {
              "query": "purpose question",
              "_name": "second",
              "slop": 3
            }
          }
        },
        {
          "match_phrase": {
            "text": {
              "query": "don't know anything",
              "_name": "third"
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "highlight": {
    "fields": {
      "text": {}
    }
  }
}

When i run this search, i get the result like so:
This is just a simple test to demonstrate the audience the purpose of the question!

I'm not interested in getting the results surrounded with em tags, instead i want to get all the positions of the results like so:

"hits": [
   { "start_offset": 30, "end_offset": 40 },
   { "start_offset": 74, "end_offset": 81 }
]

We want to search in documents for words and phrases and want the user to click on "search buttons" and in case of finding, highlight the results in the complete text. Some parts of the the text can have multiple results as well, so the given "highlight result" from ES doesn't work for me. Right now, i search the complete text for the "highlight result" (after removing the em tags) using regexp. This works, but it would be much faster and better, if ES would return me the positions of the findings straightforwardly.

At stackoverflow someone mentioned term_vectors, but i'm not sure whether i don't get the idea of it, or if that just doesn't meet my requirements.

Hope you get my idea!

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.