I have an ES index with a mapping like this:
{'texts-temp': {'aliases': {},
'mappings': {'properties': {'data': {'properties':
(...)
'user_text': {'type': 'text',
'fields': {'word_tokenized': {'type': 'text',
'analyzer': 'text_analyzer'}}},
(...)
},
'settings': {'index': {'number_of_shards': '1',
'provided_name': 'texts-temp',
'creation_date': (...),
'analysis': {'filter': {'shingen_filter': {'max_shingle_size': '3',
'min_shingle_size': '2',
'type': 'shingle'}},
'analyzer': {'text_analyzer': {'filter': ['lowercase',
'stop',
'asciifolding',
'apostrophe',
'stemmer',
'shingen_filter'],
'type': 'custom',
'tokenizer': 'standard'}}},
When I do the following search:
search = Search(index = 'texts-temp')
q = Q("terms", data=list_urls_text)
search = search.query(q)
search = search.extra(track_total_hits=True)
reply = search.execute()
[x.data.user_text.word_tokenized for x in reply]
I get an empty list... However, if I look at the elements x.data.user_text
, then I get the texts, but not tokenized.
What am I doing wrong that the index doesn't have the field data.user_text.word_tokenized
?
Is this why my significant text aggregation returns empty?
{'query':{terms:{'data': list_urls_text}
'aggs':{'sample':{'sampler':{'shard_size':200},
'aggs':{'keywords':{'significant_tex':{'field': 'data.user_text.word_tokenized', 'size'=10, 'filter_duplicate_text':True}}...}