Questions about analayzer


(Wing) #1

I have some questions about analyzer:

  1. How can we we check which the analyzer is used in analyzing each
    field during indexing?

  2. by checking the QueryStringQueryBuilder, the default analyzer used
    is "smart search analyzer" and so I have this problem:

the query string contains some words that are not be stripped away by
"smart search analyzer" but the words are stripped away by the
"default" analyzer used to index and analyze the fields, that leads to
searching for some words will yield no result

and I tried to set the QueryStringQueryBuilder to "standard" (I just
guessed the default analyzer is standard, that's why I asked the
question 1 above) and I can search the result

so, is there any problem to change from "smart" to "standard" analyzer?

or is there any documentation that can help to understand the logic of
different analyzers?

Thanks,
Wing


(simonw-2) #2

hey,

On Friday, July 20, 2012 10:03:04 AM UTC+2, Yiu Wing TSANG wrote:

I have some questions about analyzer:

  1. How can we we check which the analyzer is used in analyzing each
    field during indexing?

check your field mapping if you didn't provide any mapping for the field
standard should be used for both indexing and searching.

you can explicitly define a search / index analyzer in the mapping per
field:

"index_analyzer" : "standard",
"search_analyzer" : "your_super_smart_analyzer"

see
this: http://www.elasticsearch.org/guide/reference/mapping/root-object-type.html

  1. by checking the QueryStringQueryBuilder, the default analyzer used
    is "smart search analyzer" and so I have this problem:

the query string contains some words that are not be stripped away by
"smart search analyzer" but the words are stripped away by the
"default" analyzer used to index and analyze the fields, that leads to
searching for some words will yield no result

just use your smart analyzer at index and search time or the other way
around?

and I tried to set the QueryStringQueryBuilder to "standard" (I just
guessed the default analyzer is standard, that's why I asked the
question 1 above) and I can search the result

so, is there any problem to change from "smart" to "standard" analyzer?

if you change the the query side analyzer to be the same as the index time
analyzer you don't have any problems i'd say. if you have different
analyzers at search and index time you just need to make sure they are
compatible. ie. you can use synonyms at query time OR at index time and
keep the rest of the analyzer the same. does that make sense?

or is there any documentation that can help to understand the logic of
different analyzers?

you can check the lucene documenation for analyzers ie. java books or read
lucene in action (good reading anyway) there is no tutorial like
documentation for it around. (at least that I know of)

simon

Thanks,
Wing


(Wing) #3

Thanks simon,

I can find the analyzer mapping configuration used by elasticsearch here:

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/indices/analysis/IndicesAnalysisService.java

So now I can read the details of each analyzer from lucene doc.

But I still cannot find the default "smart" search analyzer used by
elasticsearch (ref: QueryStringQueryBuilder#analyzer).

Can you give me the pointer to the source code of the default "smart"
search analyzer so that I can read the details myself?

Thanks,
Wing

On Fri, Jul 20, 2012 at 11:01 PM, simonw
simon.willnauer@elasticsearch.com wrote:

hey,

On Friday, July 20, 2012 10:03:04 AM UTC+2, Yiu Wing TSANG wrote:

I have some questions about analyzer:

  1. How can we we check which the analyzer is used in analyzing each
    field during indexing?

check your field mapping if you didn't provide any mapping for the field
standard should be used for both indexing and searching.

you can explicitly define a search / index analyzer in the mapping per
field:

"index_analyzer" : "standard",
"search_analyzer" : "your_super_smart_analyzer"

see this:
http://www.elasticsearch.org/guide/reference/mapping/root-object-type.html

  1. by checking the QueryStringQueryBuilder, the default analyzer used
    is "smart search analyzer" and so I have this problem:

the query string contains some words that are not be stripped away by
"smart search analyzer" but the words are stripped away by the
"default" analyzer used to index and analyze the fields, that leads to
searching for some words will yield no result

just use your smart analyzer at index and search time or the other way
around?

and I tried to set the QueryStringQueryBuilder to "standard" (I just
guessed the default analyzer is standard, that's why I asked the
question 1 above) and I can search the result

so, is there any problem to change from "smart" to "standard" analyzer?

if you change the the query side analyzer to be the same as the index time
analyzer you don't have any problems i'd say. if you have different
analyzers at search and index time you just need to make sure they are
compatible. ie. you can use synonyms at query time OR at index time and keep
the rest of the analyzer the same. does that make sense?

or is there any documentation that can help to understand the logic of
different analyzers?

you can check the lucene documenation for analyzers ie. java books or read
lucene in action (good reading anyway) there is no tutorial like
documentation for it around. (at least that I know of)

simon

Thanks,
Wing


(system) #4