Can I mix English and Chinese in a search with elasticsearch Chinese analysis plugin?


(hau) #1

I am trying to use tire to index JSONs of objects in my rails application
on an elasticsearch server. The article object JSON looks like:
{
"title": "article subject line",
"tags": ["chinese", "food"],
"content": "article content"
}

where the field contents can be Chinese. So, another example with Chinese
content could be:
{
"title": "中华料理Chinese Cuisine",
"tags": ["中国", "食物", "Chinese", "food"],
"content": "中国菜很好吃。"
}

I set up elasticsearch with the Chinese analysis plugin.

When I did a search for "中华料理“ in the subject using a query_string query,
it correctly returned the instance with that string in the subject. But if
did a search for "Chinese Cuisine" in the subject using a query_string
query, it told me that it did not find anything.

Is it possible to do both English and Chinese search on the same instance
of elasticsearch with Chinese analysis plugin? If so, how should I set
that up? Thanks for helping.


(hau) #2

I noticed this behavior too. If the content has English and Chinese
separated by a space, then both search in Chinese and English returns the
correct result. That is, if the JSON document is
{
"title": "中华料理 Chinese Cuisine",
"tags": ["中国", "食物", "Chinese", "food"],
"content": "中国菜很好吃。"
}

then a search with query string "中华料理" or a search with query string
"Chinese Cuisine" both result in a match for the document.

Only search with the Chinese query string would work if there's no space.

How can I make both Chinese and English query string work if there's no
space between the Chinese and the English in the indexed document?


(system) #3