Stop words and Keyword tokenizer

What would be the usecase for such a process (removing stop words without
tokenization)?

This may be a good read btw:

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Aug 28, 2014 at 9:48 PM, German Carrillo carrillo.german@gmail.com
wrote:

Hi all,

I'm looking for a way to remove stop words from tokens returned by a
keyword tokenizer, i.e., I'd like to obtain the original text without stop
words after the analysis process.

Sample data looks like: "El corregimiento de
Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca)"
After the lowercase token filter: "el corregimiento de mulaló,
jurisdicción del municipio de yumbo (valle del cauca)"
After the ascii folding token filter: "el corregimiento de mulalo,
jurisdiccion del municipio de yumbo (valle del cauca)"
After removing stop words: "corregimiento mulalo,
municipio yumbo (valle cauca)"

The stop words (currently) are: ["la", "el", "de", "del", "los",
"las", "jurisdiccion"]

Is the pattern replace token filter the only (or best) way to go for such
a task?

I'd really like to avoid writing custom regular expressions rather than
specifying a stop words list, which I know would work perfectly fine for
other tokenizers.

Regards,

Germán

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.