Term offsets in scripting returning "-1"


(Patrick Lam) #1

I'm trying to get the term offsets and positions for a field in my index via scripting following the examples here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

I was successful in returning the positions of the terms correctly, but when I try to return the offsets using

_index['FIELD'].get('TERM', _OFFSETS)
POS_OBJECT.startOffset
POS_OBJECT.endOffset

with the appropriate fields filled in, I get all "-1" for both the start and end offsets for all terms. The field is mapped correctly with positions and offsets.

Does anybody know why this is happening?


(Mark Harwood) #2

Can you provide a cut-down example of CURL commands that define:

  1. the mapping
  2. example doc
  3. example query that fails

Thanks


(Patrick Lam) #3

I'm using the elasticsearch-py client within python to do this so hopefully the following will suffice to get at the problem.

Mapping:

"mappings": {
"tweet": {
  "properties": {
    "text": {
      "type": "string",
      "term_vector": "with_positions_offsets",
      "store" : True,
      "index_analyzer" : "fulltext_analyzer"
        }
    },
  }
}

Example document indexed:

es.index(index = "boston", doc_type = "tweet", body = {'text': 'the quick brown fox'})

Example query for term start offsets:

qterms = {
"query": {
    "query_string": {
        "default_field": "text",
        "query": "brown AND fox",
        "analyzer": "fulltext_analyzer"
    }
},

"script_fields": {
    "term_positions": {
        "script": "def tv = [:]; for (word in doc[field].values){ termInfo = _index[field].get(word,_POSITIONS | _OFFSETS); def pos = []; termInfo.each{ pos.add(it.startOffset); }; tv.put(word,pos); }; return tv;",
        "params": {
            "field": "text"
        }
    }
}
}

es.search(index = "boston", body=qterms)

Output (note the -1s):

{u'_shards': {u'failed': 0, u'successful': 1, u'total': 1},
 u'hits': {u'hits': [{u'_id': u'AU_Nz2AXPRDwyxJBcoG7',
    u'_index': u'boston',
    u'_score': 5.207162,
    u'_type': u'tweet',
    u'fields': {u'term_positions': [{u'brown': [-1],
       u'fox': [-1],
       u'quick': [-1]}]}}],
  u'max_score': 5.207162,
  u'total': 1},
 u'timed_out': False,
 u'took': 3}

In my qterms query above, changing "it.startOffset" to "it.position" correctly returns the positions of the terms:

{u'_shards': {u'failed': 0, u'successful': 1, u'total': 1},
 u'hits': {u'hits': [{u'_id': u'AU_Nz2AXPRDwyxJBcoG7',
    u'_index': u'boston',
    u'_score': 5.207162,
    u'_type': u'tweet',
    u'fields': {u'term_positions': [{u'brown': [2],
       u'fox': [3],
       u'quick': [1]}]}}],
  u'max_score': 5.207162,
  u'total': 1},
 u'timed_out': False,
 u'took': 3}

(system) #4