Query Time Analysis: Are field value also analyzed?


(Karan Verma) #1

Hi

Lets say I have indexed a field person_name as a string, with a custom
analyzer. person_name is stored int the index in one of the documents as:
"Harry Greenberg"

I make a match query on the field : "harry g"

I have a custom edgengram tokenizer which breaks the query down as follows:

{
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 1
},
{
"token": "ha",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 2
},
{
"token": "har",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 3
},
{
"token": "harr",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 4
},
{
"token": "harry",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 5
},
{
"token": "g",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 6
}
]
}

Will all of these tokens be matched agains "Harry Greenberg" or person_name
will also be broken down as defined by my custom analyzer?

If not, how can I make it so that it will also be broken down? Will it make
the search significantly slower?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af5354e7-5f7b-4b6e-96e6-f5e81df825db%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(jsbonline2006) #2

Hi Karan,

Can you please tell us what mapping you have applied?

If you are applying EdgeNGram in Query time Analyzer then your search query
"harry g" will get tokenized as per your custom analyser.

Regards,
Jayesh Bhoyar

On Wednesday, January 29, 2014 9:51:08 AM UTC+5:30, Karan Verma wrote:

Hi

Lets say I have indexed a field person_name as a string, with a custom
analyzer. person_name is stored int the index in one of the documents as:
"Harry Greenberg"

I make a match query on the field : "harry g"

I have a custom edgengram tokenizer which breaks the query down as
follows:

{
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 1
},
{
"token": "ha",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 2
},
{
"token": "har",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 3
},
{
"token": "harr",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 4
},
{
"token": "harry",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 5
},
{
"token": "g",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 6
}
]
}

Will all of these tokens be matched agains "Harry Greenberg" or
person_name will also be broken down as defined by my custom analyzer?

If not, how can I make it so that it will also be broken down? Will it
make the search significantly slower?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7301bfe9-ae7c-48ca-af38-ed369e7cc78d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #3

Karan,

If you set person_name's analyzer to your custom one, analysis will
generally be done at both query and index time. You also have the ability
to set a different analyzer between index time and search time in which
case they will behave differently when you search and when you index. See
this for more details:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_index_search_analyzers

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be6a73e0-ee20-4a43-83bc-0be074c09fb7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Karan Verma) #4

Thanks your your answer Binh

My mapping is:

    "person_name" : {
      "type" : "string",
      "analyzer" : "person_name_analyzer"
    }

From your explanation looks like ES will analyze both the query string and
the stored value in the document. That is exactly what I want. Is there a
way to test this? I was having problems for a much complex query where I
thought that the tokens were matched against the full string value of the
person_name stored in the document.

On Wed, Jan 29, 2014 at 4:55 AM, Binh Ly binh@hibalo.com wrote:

Karan,

If you set person_name's analyzer to your custom one, analysis will
generally be done at both query and index time. You also have the ability
to set a different analyzer between index time and search time in which
case they will behave differently when you search and when you index. See
this for more details:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_index_search_analyzers

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/uJPXFNRwlJk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/be6a73e0-ee20-4a43-83bc-0be074c09fb7%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Best,
Karan

Life saving Ninja & Software Engineer

Karan pronounced Ka (http://tiny.cc/0lu61w) + Run

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGX3c4GSk79XhcUy0G%2BAA1eSFW_OjSTV4n%3DXRmWjZz8%2BQ8_8OA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #5

Karan,

It should work no problem, if you do a query like this, it should match
"Harry Greenberg":

{
"query": {
"match": {
"person_name": {
"query": "harry g",
"operator": "AND"
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f3d9836f-d1d5-4fcc-9852-74a29f27aca4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6