In my index I have indicized 2 times a text: the first field, called keywordtext, represents the entire text as keyword datatype, the second one, called sentences, is a nested datatype with the text divided into sentences and every sentence has his start time and end time. I would like to query a term into my dataset and then I would like to make significant query aggregation to retrieve the words correlated to my searched term. In Es 2.x I could make this operation making the entire text field a string datatype with termvector active. With Es 5.x I try to make a fulltext query against the nested field and then make a significant term aggregation, but it seems not working without errors:
Funny you should mention that. I'm currently working on adding exactly that in the form of a new significant_text aggregation
Unlike significant_terms it does not rely on fielddata and can strip out noisy repeated text that otherwise skews stats.
The bad news for you I suspect is that it will not work on nested docs.
Thanks Mark, That's a good news ! Your current work is very interesting for my job, also the representation of dbpedia entities on graph, last week, was very useful. Have you a blog where I can follow you?
Good to hear. Please add comments to the github issue for significant_text if you have any suggestions.
I don't have a blog but I have various demos on a Youtube channel
I hope to add a video on the new significant_text agg once it's merged into the master branch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.