Hi Team,
I am using elastic search and Java API version 1.7.1. I have a simple highlighting problem with boundary characters
Here I am setting the source content with
XContentBuilder source = jsonBuilder().startObject();
source.field(PROPERTY_BOOK_ID, bookId)
.field(PROPERTY_CONTENT, parsedContent)
.field("term_vector", "with_positions_offsets")
.field(PROPERTY_FILENAME, file.getName())
.field(PROPERTY_ATTACHMENT, Base64.encodeBase64String(FileUtils.readFileToByteArray(file)));
As per the documentation boundary characters work with term_vector", "with_positions_offsets
But When I query to the elastic search with boundary characters its giving me wrong response. Here is my Search Query with the search content "poem"
QueryBuilder query = boolQuery().must(QueryBuilders.textPhraseQuery(PROPERTY_BOOK_ID, bookId))
.must(QueryBuilders.queryStringQuery("*"+searchTerm+"*"));
Map<String, Object> highlighterOptions = new HashMap<>();
highlighterOptions.put("boundary_chars", "s.,!?\\t\\n\b");
final SearchResponse response = searchClientService.getClient()
.prepareSearch(INDEX_NAME).setTypes(INDEX_TYPE)
.setHighlighterQuery(query)
.addHighlightedField(PROPERTY_CONTENT)
.setHighlighterOptions(highlighterOptions)
.setExplain(true)
.setSize(5000)
.setFrom(0)
.setHighlighterBoundaryMaxScan(10)
.setHighlighterFragmentSize(50)
.setHighlighterNumOfFragments(5000)
.execute().actionGet();
Result :
0)English Literature poetry book, with poems from leading
1)you will
enjoy these poems during your GCSE
2)course and later in life.
Many of the poems deal
3). There are poems that will reflect your own ideas
4)you make the most of the poems and of your GCSE. It
5)in writing about and comparing poems for GCSE
6)you today.
Poems past and present – the AQA
Expected :
0)English Literature poetry book, with poems from leading
1)enjoy these poems during your GCSE
2)Many of the poems deal
3)There are poems that will reflect your own ideas
4)you make the most of the poems and of your GCSE.
5)in writing about and comparing poems for GCSE
6)Poems past and present – the AQA
Did I miss something in the query or while indexing the document ? Or did I misunderstand the boundary characters concept that the returned excerpts from elastic search returns above expected result ?
Thanks in advance