(Benjamin) #1

Hi !
Highlighter generates fragments which begins with stopwords. How to make fragments which not begins with stopwords ?
For instance I get now: " and the Caribbean during the spring and summer of 1806"
but i want : "South Atlantic and the Caribbean during the spring and summer of 1806" .

(Jimferenczi) #2

Which highlighter are you using ? There is no rules that implies that a fragment should start with a stopword. The fragment creation is different on each highlighter, for instance the fvh, unified and plain highlighter uses fragment_length to split the input in fragments. You can try to use the unified highlighter and set fragment_length to a big integer in order to make sure that sentences are not splitted. The unified highlighter splits first on sentence boundary and then applies the max_fragment_length to each sentence.

(Benjamin) #3

Hi !
I'm using fvh. The problem is that I want sentences to split but splitting should not occur on stopwords.

(Jimferenczi) #4

Ok sorry for the misunderstanding. There is no option to achieve this in the highlighter currently. The only check that is done when splitting a sentence is that it does not cross a word boundary.

(system) #5

