I'm trying to use ElasticSearch to locate important text which I then run a second process to extract information from the highlighted fragment.
The problem is that in these highlighted fragments, the text I need to extract is often after the word that I'm searching for. Eg. In the document - "Numbers: 1 2 3 Colours: red green blue", I'm searching for "colours". The result I see is "3 Colours"
With ElasticSearch it seems like the fragment often finishes after "colours".
I have tried modifying the fragment size and it only increases the text BEFORE my search term and none after it.
Is there a way to make sure that "colour" would be in the middle of the fragment?
Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.
A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.
The idea is to locate the relevant data labelled around "Paint Colours" using Elasticsearch and then process the returned fragment to extract data present after the the label.
For example, if there is a document like:
...some text...
Word1
Word2
Paint Colours: Red, Green
...more text...
I would like to extract [Red, Green] out of this document.
The problem I am facing is Elasticsearch returns highlighted fragments which look like this:
Word2 Paint Colours
And if I increase the fragment size, it looks something like this:
Word1
Word2 Paint Colours
But what I'm really looking for is to have the search term somewhere in the middle of the fragment so that there is text AFTER the search term as well. Like this:
Word2 Paint Colours: Red, Green
This would help extract required text that is next to the search term.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.