Highlighter generates fragments which begins with stopwords. How to make fragments which not begins with stopwords ?
For instance I get now: " and the Caribbean during the spring and summer of 1806"
but i want : "South Atlantic and the Caribbean during the spring and summer of 1806" .
Which highlighter are you using ? There is no rules that implies that a fragment should start with a stopword. The fragment creation is different on each highlighter, for instance the
plain highlighter uses
fragment_length to split the input in fragments. You can try to use the
unified highlighter and set
fragment_length to a big integer in order to make sure that sentences are not splitted. The
unified highlighter splits first on sentence boundary and then applies the
max_fragment_length to each sentence.
I'm using fvh. The problem is that I want sentences to split but splitting should not occur on stopwords.
Ok sorry for the misunderstanding. There is no option to achieve this in the highlighter currently. The only check that is done when splitting a sentence is that it does not cross a word boundary.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.