Search issue with snowball stemmer

Hello everyone,

I have follow index mapping:

curl -XPUT 'http://localhost:9200/some_content/' -d ' { "settings":{ "query_string":{ "default_con":"content", "default_operator":"AND" }, "index":{ "analysis":{ "analyzer":{ "en_analyser":{ "filter":[ "snowBallFilter" ], "type":"custom", "tokenizer":"standard" } }, "filter":{ "en_stopFilter":{ "type":"stop", "stopwords_path":"lang/stopwords_en.txt" }, "snowBallFilter":{ "type":"snowball", "language":"English" }, "wordDelimiterFilter":{ "catenate_all":false, "catenate_words":true, "catenate_numbers":true, "generate_word_parts":true, "generate_number_parts":true, "preserve_original":true, "type":"word_delimiter", "split_on_case_change":true }, "en_synonymFilter":{ "synonyms_path":"lang/synonyms_en.txt", "ignore_case":true, "type":"synonym", "expand":false }, "lengthFilter":{ "max":250, "type":"length", "min":3 } } } } }, "mappings":{ "docs":{ "_source":{ "enabled":false }, "analyzer":"en_analyser", "properties":{ "content":{ "type":"string", "index":"analyzed", "term_vector":"with_positions_offsets", "omit_norms":"true" } } } } }'

and I posted the next content:

curl -XPOST http://localhost:9200/some_content/docs/ -d '
{
"content" : "Some sampling text formatted for text data"
}'

When I make this one request:
http://epbyvitw0052:9200/some_content/docs/_search?q=sampling

I'm getting result:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.095891505,
"hits": [
{
"_index": "some_content",
"_type": "docs",
"_id": "saLfx6PYR82YR69je0JbAA",
"_score": 0.095891505
}
]
}
}

but when I send request without type:
http://epbyvitw0052:9200/some_content/_search?q=sampling

then I'm getting nothing:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

although, I can make the next request with term:
http://epbyvitw0052:9200/some_content/_search?q=sampl

the system found it:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.095891505,
"hits": [
{
"_index": "some_content",
"_type": "docs",
"_id": "saLfx6PYR82YR69je0JbAA",
"_score": 0.095891505
}
]
}
}

It's issue appear when I put snowball filter into analyzer.
Could you explain why the system has such behavior?
May be I do something wrong.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b919926-3384-4d72-845a-c73790d05281%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You should use the Analyze API to ensure that the tokens you are producing
are correct:

--
Ivan

On Thu, May 29, 2014 at 7:13 AM, Александр Шаманов al3xshaman@gmail.comwrote:

Hello everyone,

I have follow index mapping:

curl -XPUT 'http://localhost:9200/some_content/' -d ' { "settings":{ "query_string":{ "default_con":"content", "default_operator":"AND" }, "index":{ "analysis":{ "analyzer":{ "en_analyser":{ "filter":[ "snowBallFilter" ], "type":"custom", "tokenizer":"standard" } }, "filter":{ "en_stopFilter":{ "type":"stop", "stopwords_path":"lang/stopwords_en.txt" }, "snowBallFilter":{ "type":"snowball", "language":"English" }, "wordDelimiterFilter":{ "catenate_all":false, "catenate_words":true, "catenate_numbers":true, "generate_word_parts":true, "generate_number_parts":true, "preserve_original":true, "type":"word_delimiter", "split_on_case_change":true }, "en_synonymFilter":{ "synonyms_path":"lang/synonyms_en.txt", "ignore_case":true, "type":"synonym", "expand":false }, "lengthFilter":{ "max":250, "type":"length", "min":3 } } } } }, "mappings":{ "docs":{ "_source":{ "enabled":false }, "analyzer":"en_analyser", "properties":{ "content":{ "type":"string", "index":"analyzed", "term_vector":"with_positions_offsets", "omit_norms":"true" } } } } }'

and I posted the next content:

curl -XPOST http://localhost:9200/some_content/docs/ -d '
{
"content" : "Some sampling text formatted for text data"
}'

When I make this one request:
http://epbyvitw0052:9200/some_content/docs/_search?q=sampling

I'm getting result:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.095891505,
"hits": [
{
"_index": "some_content",
"_type": "docs",
"_id": "saLfx6PYR82YR69je0JbAA",
"_score": 0.095891505
}
]
}
}

but when I send request without type:
http://epbyvitw0052:9200/some_content/_search?q=sampling

then I'm getting nothing:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits":
}
}

although, I can make the next request with term:
http://epbyvitw0052:9200/some_content/_search?q=sampl

the system found it:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.095891505,
"hits": [
{
"_index": "some_content",
"_type": "docs",
"_id": "saLfx6PYR82YR69je0JbAA",
"_score": 0.095891505
}
]
}
}

It's issue appear when I put snowball filter into analyzer.
Could you explain why the system has such behavior?
May be I do something wrong.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9b919926-3384-4d72-845a-c73790d05281%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9b919926-3384-4d72-845a-c73790d05281%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCt6GgkDOkoh-Ti2FQFsPzPQrKyaCu7p63%2B1NSkuY_9NQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

In the beginning, I thought a cause is sitting of analyzers, but I have
this issue when I put other stemmers. If I don't use any stemmers the
request http://epbyvitw0052:9200/some_content/_search?q=sampling
returns me necessary index. I think it's problem with implementation of
adding some stemmers.

четверг, 29 мая 2014 г., 19:45:12 UTC+3 пользователь Ivan Brusic написал:

You should use the Analyze API to ensure that the tokens you are producing
are correct:

Elasticsearch Platform — Find real-time answers at scale | Elastic

--
Ivan

On Thu, May 29, 2014 at 7:13 AM, Александр Шаманов <al3xs...@gmail.com
<javascript:>> wrote:

Hello everyone,

I have follow index mapping:

curl -XPUT 'http://localhost:9200/some_content/' -d ' { "settings":{ "query_string":{ "default_con":"content", "default_operator":"AND" }, "index":{ "analysis":{ "analyzer":{ "en_analyser":{ "filter":[ "snowBallFilter" ], "type":"custom", "tokenizer":"standard" } }, "filter":{ "en_stopFilter":{ "type":"stop", "stopwords_path":"lang/stopwords_en.txt" }, "snowBallFilter":{ "type":"snowball", "language":"English" }, "wordDelimiterFilter":{ "catenate_all":false, "catenate_words":true, "catenate_numbers":true, "generate_word_parts":true, "generate_number_parts":true, "preserve_original":true, "type":"word_delimiter", "split_on_case_change":true }, "en_synonymFilter":{ "synonyms_path":"lang/synonyms_en.txt", "ignore_case":true, "type":"synonym", "expand":false }, "lengthFilter":{ "max":250, "type":"length", "min":3 } } } } }, "mappings":{ "docs":{ "_source":{ "enabled":false }, "analyzer":"en_analyser", "properties":{ "content":{ "type":"string", "index":"analyzed", "term_vector":"with_positions_offsets", "omit_norms":"true" } } } } }'

and I posted the next content:

curl -XPOST http://localhost:9200/some_content/docs/ -d '
{
"content" : "Some sampling text formatted for text data"
}'

When I make this one request:
http://epbyvitw0052:9200/some_content/docs/_search?q=sampling

I'm getting result:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.095891505,
"hits": [
{
"_index": "some_content",
"_type": "docs",
"_id": "saLfx6PYR82YR69je0JbAA",
"_score": 0.095891505
}
]
}
}

but when I send request without type:
http://epbyvitw0052:9200/some_content/_search?q=sampling

then I'm getting nothing:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits":
}
}

although, I can make the next request with term:
http://epbyvitw0052:9200/some_content/_search?q=sampl

the system found it:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.095891505,
"hits": [
{
"_index": "some_content",
"_type": "docs",
"_id": "saLfx6PYR82YR69je0JbAA",
"_score": 0.095891505
}
]
}
}

It's issue appear when I put snowball filter into analyzer.
Could you explain why the system has such behavior?
May be I do something wrong.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9b919926-3384-4d72-845a-c73790d05281%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9b919926-3384-4d72-845a-c73790d05281%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b17e3c57-cae4-4150-940d-49dd657215dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.