I'm using the elasticsearch-py client within python to do this so hopefully the following will suffice to get at the problem.
Mapping:
"mappings": {
"tweet": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions_offsets",
"store" : True,
"index_analyzer" : "fulltext_analyzer"
}
},
}
}
Example document indexed:
es.index(index = "boston", doc_type = "tweet", body = {'text': 'the quick brown fox'})
Example query for term start offsets:
qterms = {
"query": {
"query_string": {
"default_field": "text",
"query": "brown AND fox",
"analyzer": "fulltext_analyzer"
}
},
"script_fields": {
"term_positions": {
"script": "def tv = [:]; for (word in doc[field].values){ termInfo = _index[field].get(word,_POSITIONS | _OFFSETS); def pos = []; termInfo.each{ pos.add(it.startOffset); }; tv.put(word,pos); }; return tv;",
"params": {
"field": "text"
}
}
}
}
es.search(index = "boston", body=qterms)
Output (note the -1s):
{u'_shards': {u'failed': 0, u'successful': 1, u'total': 1},
u'hits': {u'hits': [{u'_id': u'AU_Nz2AXPRDwyxJBcoG7',
u'_index': u'boston',
u'_score': 5.207162,
u'_type': u'tweet',
u'fields': {u'term_positions': [{u'brown': [-1],
u'fox': [-1],
u'quick': [-1]}]}}],
u'max_score': 5.207162,
u'total': 1},
u'timed_out': False,
u'took': 3}
In my qterms query above, changing "it.startOffset" to "it.position" correctly returns the positions of the terms:
{u'_shards': {u'failed': 0, u'successful': 1, u'total': 1},
u'hits': {u'hits': [{u'_id': u'AU_Nz2AXPRDwyxJBcoG7',
u'_index': u'boston',
u'_score': 5.207162,
u'_type': u'tweet',
u'fields': {u'term_positions': [{u'brown': [2],
u'fox': [3],
u'quick': [1]}]}}],
u'max_score': 5.207162,
u'total': 1},
u'timed_out': False,
u'took': 3}