Hi !
Highlighter generates fragments which begins with stopwords. How to make fragments which not begins with stopwords ?
For instance I get now: " and the Caribbean during the spring and summer of 1806"
but i want : "South Atlantic and the Caribbean during the spring and summer of 1806" .
Which highlighter are you using ? There is no rules that implies that a fragment should start with a stopword. The fragment creation is different on each highlighter, for instance the fvh
, unified
and plain
highlighter uses fragment_length
to split the input in fragments. You can try to use the unified
highlighter and set fragment_length
to a big integer in order to make sure that sentences are not splitted. The unified
highlighter splits first on sentence boundary and then applies the max_fragment_length
to each sentence.
Hi !
I'm using fvh. The problem is that I want sentences to split but splitting should not occur on stopwords.
Ok sorry for the misunderstanding. There is no option to achieve this in the highlighter currently. The only check that is done when splitting a sentence is that it does not cross a word boundary.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.