Elasticsearch english words analyzer

Hi there,
I found that elasticsearch can't split english words if they don't have
whitespace or dash between them, for example, "reversestring" should be
split into "reverse" and "string". I think the it's because I didn't
configure well. I have tried several analyzers listed on the elasticsearch
plugins page, but it didn't work. My current settings is:

{
"settings": {
"analysis": {
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": [ "a", "about", "above", "after", "again",
"against", "all", "am", "an", "and", "any", "are", "aren't", "as", "at",
"be", "because", "been", "before", "being", "below", "between", "both",
"but", "by", "can't", "cannot", "could", "couldn't", "did", "didn't", "do",
"does", "doesn't", "doing", "don't", "down", "during", "each", "few",
"for", "from", "further", "had", "hadn't", "has", "hasn't", "have",
"haven't", "having", "he", "he'd", "he'll", "he's", "her", "here",
"here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i",
"i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "isn't", "it",
"it's", "its", "itself", "let's", "me", "more", "most", "mustn't", "my",
"myself", "no", "nor", "not", "of", "off", "on", "once", "only", "or",
"other", "ought", "our", "ours ", " ourselves", "out", "over", "own",
"same", "shan't", "she", "she'd", "she'll", "she's", "should", "shouldn't",
"so", "some", "such", "than", "that", "that's", "the", "their", "theirs",
"them", "themselves", "then", "there", "there's", "these", "they",
"they'd", "they'll", "they're", "they've", "this", "those", "through",
"to", "too", "under", "until", "up", "very", "was", "wasn't", "we", "we'd",
"we'll", "we're", "we've", "were", "weren't", "what", "what's", "when",
"when's", "where", "where's", "which", "while", "who", "who's", "whom",
"why", "why's", "with", "won't", "would", "wouldn't", "you", "you'd",
"you'll", "you're", "you've", "your", "yours", "yourself", "yourselves",
"c#"]
}
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "classic",
"filter": ["lowercase", "my_stopwords"]
}
}
}
}
}

Thank you so much!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/36447fb8-f483-4f89-b210-3f2b3de6915c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Is "reversestring" an english word?

But yes this is the expected behavior.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 sept. 2014 à 03:59, Peiyong Lin linpyong@gmail.com a écrit :

Hi there,
I found that elasticsearch can't split english words if they don't have whitespace or dash between them, for example, "reversestring" should be split into "reverse" and "string". I think the it's because I didn't configure well. I have tried several analyzers listed on the elasticsearch plugins page, but it didn't work. My current settings is:

{
"settings": {
"analysis": {
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": [ "a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "aren't", "as", "at", "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "can't", "cannot", "could", "couldn't", "did", "didn't", "do", "does", "doesn't", "doing", "don't", "down", "during", "each", "few", "for", "from", "further", "had", "hadn't", "has", "hasn't", "have", "haven't", "having", "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "isn't", "it", "it's", "its", "itself", "let's", "me", "more", "most", "mustn't", "my", "myself", "no", "nor", "not", "of", "off", "on", "once", "only", "or", "other", "ought", "our", "ours ", " ourselves", "out", "over", "own", "same", "shan't", "she", "she'd", "she'll", "she's", "should", "shouldn't", "so", "some", "such", "than", "that", "that's", "the", "their", "theirs", "them", "themselves", "then", "there", "there's", "these", "they", "they'd", "they'll", "they're", "they've", "this", "those", "through", "to", "too", "under", "until", "up", "very", "was", "wasn't", "we", "we'd", "we'll", "we're", "we've", "were", "weren't", "what", "what's", "when", "when's", "where", "where's", "which", "while", "who", "who's", "whom", "why", "why's", "with", "won't", "would", "wouldn't", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "c#"]
}
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "classic",
"filter": ["lowercase", "my_stopwords"]
}
}
}
}
}

Thank you so much!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/36447fb8-f483-4f89-b210-3f2b3de6915c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7464EBE6-0222-46BF-A111-01853B1B93EF%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

"reversestring" is a compound word, for this a need an analyzer for
decompounding.

For german words, I have a solution

Jörg

On Sun, Sep 28, 2014 at 7:46 AM, David Pilato david@pilato.fr wrote:

Is "reversestring" an english word?

But yes this is the expected behavior.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 28 sept. 2014 à 03:59, Peiyong Lin linpyong@gmail.com a écrit :

Hi there,
I found that elasticsearch can't split english words if they don't have
whitespace or dash between them, for example, "reversestring" should be
split into "reverse" and "string". I think the it's because I didn't
configure well. I have tried several analyzers listed on the elasticsearch
plugins page, but it didn't work. My current settings is:

{
"settings": {
"analysis": {
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": [ "a", "about", "above", "after",
"again", "against", "all", "am", "an", "and", "any", "are", "aren't", "as",
"at", "be", "because", "been", "before", "being", "below", "between",
"both", "but", "by", "can't", "cannot", "could", "couldn't", "did",
"didn't", "do", "does", "doesn't", "doing", "don't", "down", "during",
"each", "few", "for", "from", "further", "had", "hadn't", "has", "hasn't",
"have", "haven't", "having", "he", "he'd", "he'll", "he's", "her", "here",
"here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i",
"i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "isn't", "it",
"it's", "its", "itself", "let's", "me", "more", "most", "mustn't", "my",
"myself", "no", "nor", "not", "of", "off", "on", "once", "only", "or",
"other", "ought", "our", "ours ", " ourselves", "out", "over", "own",
"same", "shan't", "she", "she'd", "she'll", "she's", "should", "shouldn't",
"so", "some", "such", "than", "that", "that's", "the", "their", "theirs",
"them", "themselves", "then", "there", "there's", "these", "they",
"they'd", "they'll", "they're", "they've", "this", "those", "through",
"to", "too", "under", "until", "up", "very", "was", "wasn't", "we", "we'd",
"we'll", "we're", "we've", "were", "weren't", "what", "what's", "when",
"when's", "where", "where's", "which", "while", "who", "who's", "whom",
"why", "why's", "with", "won't", "would", "wouldn't", "you", "you'd",
"you'll", "you're", "you've", "your", "yours", "yourself", "yourselves",
"c#"]
}
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "classic",
"filter": ["lowercase", "my_stopwords"]
}
}
}
}
}

Thank you so much!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/36447fb8-f483-4f89-b210-3f2b3de6915c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/36447fb8-f483-4f89-b210-3f2b3de6915c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7464EBE6-0222-46BF-A111-01853B1B93EF%40pilato.fr
https://groups.google.com/d/msgid/elasticsearch/7464EBE6-0222-46BF-A111-01853B1B93EF%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoETTSzfiK%3DftksYZ1qNYGEs34c6eK_N-kA235z00JPjPw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.