Optimize corpus before training

Hi all from elastic team.

I am building a pipiline to optimize a corpus before submitting to opennlp training.

  1. remove stopwords
  2. apply lemmatizer or stemmer

The elastic has some functionality that I pass a text and returns the text without stopwords and with lemmatizer or stemmer applied.

Best regards and Happy New Year !!!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.