Russian search does not work for me

Please let me know what I'm doing wrong or where to look/debug.

  1. I git cloned https://github.com/asyncee/elasticsearch-russian-config/
  2. Downloaded elasticsearch-1.4.2 and copied bin and lib into the same dir.
  3. Installed https://github.com/imotov/elasticsearch-analysis-morphology 1.2

Run it.

Now looks like I have russian analyzer, at least this test gives correct
tokens:

curl -XGET "http://localhost:9200/_analyze?analyzer=russian&text=Веселые%20истории%20про%20котят"

...But then I create an index:

curl -XPOST "http://localhost:9200/blog2" -d'
{
"settings": {
"analysis": {
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "russian"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}
},
"mappings": {
"post": {
"properties": {
"content": {
"type": "string"
},
"published_at": {
"type": "date",
"format": "dateOptionalTime"
},
"tags": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string"
}
}
}
}
}'

And insert the record:
curl -XPUT "http://localhost:9200/blog2/post/2" -d'
{
"title": "Веселые щенки",
"content": "

Смешная история про щенков

",
"tags": [
"щенки",
"смешная история"
],
"published_at": "2014-08-12T20:44:42+00:00"
}'

Now I can find it with
-> POST http://localhost:9200/blog2/post/_search
{
"query": {
"match": {
"title": "щенки"
}
}
}

But not if I provide a single "щенок" instead of the plural "щенки".

So, basically, the morphology doesn't work.

P.S. I'm kind of new to elasticsearch.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8368b6a9-5b21-4f5a-bf8e-c3ad6336f937%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I think you need "type": "custom" inside "analyzer": {"default":{}}.
On Dec 21, 2014 5:08 PM, "Ilya Kantor" iliakan@gmail.com wrote:

Please let me know what I'm doing wrong or where to look/debug.

  1. I git cloned GitHub - asyncee/elasticsearch-russian-config: Basic elasticsearch config with russian language support
  2. Downloaded elasticsearch-1.4.2 and copied bin and lib into the same dir.
  3. Installed GitHub - imotov/elasticsearch-analysis-morphology: Morphological Analysis Plugin for ElasticSearch (No longer maintained)
    1.2

Run it.

Now looks like I have russian analyzer, at least this test gives correct
tokens:

curl -XGET "http://localhost:9200/_analyze?analyzer=russian&text=Веселые%20истории%20про%20котят"

...But then I create an index:

curl -XPOST "http://localhost:9200/blog2" -d'
{
"settings": {
"analysis": {
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "russian"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}
},
"mappings": {
"post": {
"properties": {
"content": {
"type": "string"
},
"published_at": {
"type": "date",
"format": "dateOptionalTime"
},
"tags": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string"
}
}
}
}
}'

And insert the record:
curl -XPUT "http://localhost:9200/blog2/post/2" -d'
{
"title": "Веселые щенки",
"content": "

Смешная история про щенков

",
"tags": [
"щенки",
"смешная история"
],
"published_at": "2014-08-12T20:44:42+00:00"
}'

Now I can find it with
-> POST http://localhost:9200/blog2/post/_search
{
"query": {
"match": {
"title": "щенки"
}
}
}

But not if I provide a single "щенок" instead of the plural "щенки".

So, basically, the morphology doesn't work.

P.S. I'm kind of new to elasticsearch.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8368b6a9-5b21-4f5a-bf8e-c3ad6336f937%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8368b6a9-5b21-4f5a-bf8e-c3ad6336f937%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2AUT77-eAE9ijcADB9SA8%2BocMnQrvkObP86we_NWy6_Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Nikolas,

That didn't help.

For brewity, here's the log of what I'm doing:

curl -XDELETE "http://localhost:9200/blog"

curl -XPOST "http://localhost:9200/blog" -d'
{
"settings": {
"analysis": {
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "russian"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
"type": "custom",
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}
},
"mappings": {
"post": {
"properties": {
"content": {
"type": "string"
},
"published_at": {
"type": "date",
"format": "dateOptionalTime"
},
"title": {
"type": "string"
}
}
}
}
}'

curl -XPUT "http://localhost:9200/blog/post/1" -d'
{
"title": "Веселые щенки",
"content": "

Смешная история про щенков

",
"published_at": "2014-08-12T20:44:42+00:00"
}'

curl -XPOST "http://localhost:9200/blog/post/_search" -d'
{
"query": {
"match": {
"title": "щенок"
}
}
}'

Returns nothing (morphology doesn't work).

Config: https://github.com/asyncee/elasticsearch-russian-config/blob/master/config/elasticsearch.yml
Elasticsearch: 1.4.2.

P.S. The index is created according to a tutorial, kind of unsure if the
settings are correct.

понедельник, 22 декабря 2014 г., 1:21:02 UTC+3 пользователь Nikolas Everett
написал:

I think you need "type": "custom" inside "analyzer": {"default":{}}.
On Dec 21, 2014 5:08 PM, "Ilya Kantor" <ili...@gmail.com <javascript:>>
wrote:

Please let me know what I'm doing wrong or where to look/debug.

  1. I git cloned GitHub - asyncee/elasticsearch-russian-config: Basic elasticsearch config with russian language support
  2. Downloaded elasticsearch-1.4.2 and copied bin and lib into the same
    dir.
  3. Installed GitHub - imotov/elasticsearch-analysis-morphology: Morphological Analysis Plugin for ElasticSearch (No longer maintained)
    1.2

Run it.

Now looks like I have russian analyzer, at least this test gives correct
tokens:

curl -XGET "http://localhost:9200/_analyze?analyzer=russian&text=Веселые%20истории%20про%20котят"

...But then I create an index:

curl -XPOST "http://localhost:9200/blog2" -d'
{
"settings": {
"analysis": {
"filter": {
"ru_stop": {
"type": "stop",
"stopwords": "russian"
},
"ru_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"default": {
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"ru_stop",
"ru_stemmer"
]
}
}
}
},
"mappings": {
"post": {
"properties": {
"content": {
"type": "string"
},
"published_at": {
"type": "date",
"format": "dateOptionalTime"
},
"tags": {
"type": "string",
"index": "not_analyzed"
},
"title": {
"type": "string"
}
}
}
}
}'

And insert the record:
curl -XPUT "http://localhost:9200/blog2/post/2" -d'
{
"title": "Веселые щенки",
"content": "

Смешная история про щенков

",
"tags": [
"щенки",
"смешная история"
],
"published_at": "2014-08-12T20:44:42+00:00"
}'

Now I can find it with
-> POST http://localhost:9200/blog2/post/_search
{
"query": {
"match": {
"title": "щенки"
}
}
}

But not if I provide a single "щенок" instead of the plural "щенки".

So, basically, the morphology doesn't work.

P.S. I'm kind of new to elasticsearch.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8368b6a9-5b21-4f5a-bf8e-c3ad6336f937%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8368b6a9-5b21-4f5a-bf8e-c3ad6336f937%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d4f8314-165c-4af2-acfd-b7a4ad550a8a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.