Despite reading your tips, I don't think I'm making progress. My original
example was not complete. Let me elaborate.
I want to do search on a class of document which are JSON representation of
Article objects. An Article could look like
{
title: "article subject line",
tags: ["chinese", "food"],
content: "article content"
}
where I want to do:
- substring match on title
- exact match on tags
- fulltext search on content
And they are mostly Chinese, but could be a mix of Chinese and English
When I defined the mapping, I did
settings analysis: {
filter: {
smartcn_word: {
'type' => 'smartcn_word'
}
},
analyzer: {
smart_chinese: {
'tokenizer' => 'smartcn_sentence',
'filter' => ['smartcn_word'],
'type' => 'smart_chinese'
}
}
} do
mapping do
indexes :title, type: 'string', analyzer: 'keyword'
indexes :content, type: 'string', analyzer: 'smart_chinese'
indexes :tags, type: 'string', analyzer: 'keyword'
end
end
which means I used the Chinese analyzer for content and keyword analyzer
for title and tags.
When I performed the search, the query JSON looked like this (I got this
from the logs):
'{"query":{"bool":{"should":[{"term":{"title":"search
words"}},{"query_string":{"query":"search
words","default_field":"content","default_operator":"AND"}},{"terms":{"tag_list":["search",
"words"]}}],"minimum_number_should_match":1}}}'
Fulltext search on the content worked fine, but I had problem getting the
substring match on title and exact match on tags to work. They returned
zero search result. It would only return results when I queried against
_all, i.e. if the query JSON was:
'{"query":{"query_string":{"query":"search
words","default_operator":"AND"}}}'
I guess my question(s) is (or are):
- Am I correct in using "term" search and keyword analyzer for substring
match in the title field?
- Am I correct in using the "terms" search and keyword analyzer for exact
match for tags?
- Am I correct in using bool with three "should" queries in it?
I am seeing other issues when I mixed Chinese and English, but I think that
should go in another thread.
Please pardon me if my questions look dumb. I am new to elasticsearch (and
lucene), but I did read through the its Guide and Tutorials. I'd
appreciate it if someone who has more experience could help me figure this
out.
在 2012年1月17日 下午12:17,Matt matt.chu@gmail.com写道:
I'm not sure exactly what Article.search does: does it just call the
endpoint, or can you use it to build a query, and in particular pick the
type of query (Elasticsearch Platform — Find real-time answers at scale | Elastic)?
I've found that it's easier to first stick to the straight ES HTTP API, so
you can figure out exactly how the query DSL works (it's non-trivial, and
can be a little confusing), before trying to use one of the client
libraries. I've been working through this same situation, but with the
Python pyes library, also in a multilingual context, and the client can
mask away some of the details of the API.
To be honest, I'm surprised that "恭喜发财" is parsed as a single term; I
thought it would be split into "恭喜" and "发财", but I don't really know
Chinese (haha). Going back to the first point, are you doing a term query (
Elasticsearch Platform — Find real-time answers at scale | Elastic)?
--
-hoki