Query string not working with keyword tokenizer

Andrei · January 14, 2011, 9:48pm

I have a custom analyzer that looks like this:

eulang_tags:
type: custom
tokenizer: keyword
filter: [lowercase, asciifolding]

Then in my mapping, I set the "tags" field to use it like this:

"tags" : {"type" : "string", "index_name" : "tag", "boost" : 1.5,
"analyzer" : "eulang_tags"}

When I try to do a query_string type query against the tags field, it
doesn't seem to work. For example, I have a document that contains a
tag "coney island". If I issue this query:

{
"query_string" : {
"fields" : ["tags"],
"query" : "coney island",
"use_dis_max" : true
}
}

The document is not returned. However, if I switch to "term" query:

{
"term" : {
"tags": "coney island"
}
}

The document is found successfully. Why is this happening?

Adriano_Ferreira · January 14, 2011, 10:40pm

On Fri, Jan 14, 2011 at 7:48 PM, Andrei andrei@zmievski.org wrote:

I have a custom analyzer that looks like this:

eulang_tags:
type: custom
tokenizer: keyword
filter: [lowercase, asciifolding]

Then in my mapping, I set the "tags" field to use it like this:

"tags" : {"type" : "string", "index_name" : "tag", "boost" : 1.5,
"analyzer" : "eulang_tags"}

When I try to do a query_string type query against the tags field, it
doesn't seem to work. For example, I have a document that contains a
tag "coney island". If I issue this query:

{
"query_string" : {
"fields" : ["tags"],
"query" : "coney island",
"use_dis_max" : true
}
}

This is using the "standard" analyzer, which breaks the "coney island" into
terms "coney" and "island". In your doc which uses a "keyword" analyzer, the
term in there is "coney island" itself and not the words. So use the "term"
query like you did below, or explicitly tell the "query_string" which
analyzer you want to use (but I am not certain "keyword" and "query_string"
parse will work well together).

The document is not returned. However, if I switch to "term" query:

{
"term" : {
"tags": "coney island"
}
}

The document is found successfully. Why is this happening?

Andrei · January 15, 2011, 1:28am

I am not sure where you see the "standard" analyzer. I created the
"eulang_tags" custom analyzer as shown above and set the "tags" field
to use it.

-Andrei

On Jan 14, 2:40 pm, Adriano Ferreira a.r.ferre...@gmail.com wrote:

On Fri, Jan 14, 2011 at 7:48 PM, Andrei and...@zmievski.org wrote:

I have a custom analyzer that looks like this:

eulang_tags:
type: custom
tokenizer: keyword
filter: [lowercase, asciifolding]

Then in my mapping, I set the "tags" field to use it like this:

"tags" : {"type" : "string", "index_name" : "tag", "boost" : 1.5,
"analyzer" : "eulang_tags"}

When I try to do a query_string type query against the tags field, it
doesn't seem to work. For example, I have a document that contains a
tag "coney island". If I issue this query:

{
"query_string" : {
"fields" : ["tags"],
"query" : "coney island",
"use_dis_max" : true
}
}

This is using the "standard" analyzer, which breaks the "coney island" into
terms "coney" and "island". In your doc which uses a "keyword" analyzer, the
term in there is "coney island" itself and not the words. So use the "term"
query like you did below, or explicitly tell the "query_string" which
analyzer you want to use (but I am not certain "keyword" and "query_string"
parse will work well together).

The document is not returned. However, if I switch to "term" query:

{
"term" : {
"tags": "coney island"
}
}

The document is found successfully. Why is this happening?

Adriano_Ferreira · January 15, 2011, 4:09pm

On Fri, Jan 14, 2011 at 11:28 PM, Andrei andrei@zmievski.org wrote:

I am not sure where you see the "standard" analyzer. I created the
"eulang_tags" custom analyzer as shown above and set the "tags" field
to use it.

The "standard" analyzer should be the default analyzer for "query_string"
unless you change the defaults or explicitly request another analyzer. This
is not tied to the fact you have used a custom analyzer to some of your
fields in a certain type and index.

-Andrei

On Jan 14, 2:40 pm, Adriano Ferreira a.r.ferre...@gmail.com wrote:

On Fri, Jan 14, 2011 at 7:48 PM, Andrei and...@zmievski.org wrote:

I have a custom analyzer that looks like this:

eulang_tags:
type: custom
tokenizer: keyword
filter: [lowercase, asciifolding]

Then in my mapping, I set the "tags" field to use it like this:

"tags" : {"type" : "string", "index_name" : "tag", "boost" : 1.5,
"analyzer" : "eulang_tags"}

When I try to do a query_string type query against the tags field, it
doesn't seem to work. For example, I have a document that contains a
tag "coney island". If I issue this query:

{
"query_string" : {
"fields" : ["tags"],
"query" : "coney island",
"use_dis_max" : true
}
}

This is using the "standard" analyzer, which breaks the "coney island"
into
terms "coney" and "island". In your doc which uses a "keyword" analyzer,
the
term in there is "coney island" itself and not the words. So use the
"term"
query like you did below, or explicitly tell the "query_string" which
analyzer you want to use (but I am not certain "keyword" and
"query_string"
parse will work well together).

The document is not returned. However, if I switch to "term" query:

{
"term" : {
"tags": "coney island"
}
}

The document is found successfully. Why is this happening?

Clinton_Gormley · January 15, 2011, 4:30pm

    I am not sure where you see the "standard" analyzer. I
    created the
    "eulang_tags" custom analyzer as shown above and set the
    "tags" field
    to use it.

The "standard" analyzer should be the default analyzer for
"query_string" unless you change the defaults or explicitly request
another analyzer. This is not tied to the fact you have used a custom
analyzer to some of your fields in a certain type and index.

Actually, the ES docs for mapping mention:

index_analyzer: used when indexing a field
search_analyzer: used when analyzing a field that is part of
a query string
analyzer: sets both index_analyzer and search_analyzer

See Field data types | Elasticsearch Guide [8.11] | Elastic

So, my reading of this is that it should work - Andrei is, after all,
searching on "fields": ["tags"]

This may be a bug.

Andrei, what happens if you search for:

"query_string": "tags:coney\ island"
"query_string": "tags:coney\ island"
"query_string": "tags:(coney island)"

I'm suggesting a few possibilities, because I'm not sure how to
represent an embedded space in the query string.

clint

Andrei · January 15, 2011, 10:54pm

Clint,

"query_string": "tags:coney\ island" seemed to work. Though now I'm
not sure whether I should just escape the space this way or other non-
alphanumeric characters as well. Maybe Shay can shed some light on
this.

-Andrei

On Jan 15, 8:30 am, Clinton Gormley clin...@iannounce.co.uk wrote:

Actually, the ES docs for mapping mention:

index_analyzer: used when indexing a field

search_analyzer: used when analyzing a field that is part of
a query string

analyzer: sets both index_analyzer and search_analyzer

Seehttp://www.elasticsearch.com/docs/elasticsearch/mapping/core_types/

So, my reading of this is that it should work - Andrei is, after all,
searching on "fields": ["tags"]

This may be a bug.

Andrei, what happens if you search for:

"query_string": "tags:coney\ island"

"query_string": "tags:coney\ island"

"query_string": "tags:(coney island)"

I'm suggesting a few possibilities, because I'm not sure how to
represent an embedded space in the query string.

clint

kimchy · January 16, 2011, 9:55am

The query parser breaks down on whitespaces as well. So, the things that gets passed to the analyzer and then construct the query is "coney" and then "island" without doing the actual escaping. Not ideal, but thats how it works...
On Sunday, January 16, 2011 at 12:54 AM, Andrei wrote:

Clint,

"query_string": "tags:coney\ island" seemed to work. Though now I'm
not sure whether I should just escape the space this way or other non-
alphanumeric characters as well. Maybe Shay can shed some light on
this.

-Andrei

On Jan 15, 8:30 am, Clinton Gormley clin...@iannounce.co.uk wrote:

Actually, the ES docs for mapping mention:

index_analyzer: used when indexing a field

search_analyzer: used when analyzing a field that is part of
a query string

analyzer: sets both index_analyzer and search_analyzer

Seehttp://www.elasticsearch.com/docs/elasticsearch/mapping/core_types/

So, my reading of this is that it should work - Andrei is, after all,
searching on "fields": ["tags"]

This may be a bug.

Andrei, what happens if you search for:

"query_string": "tags:coney\ island"

"query_string": "tags:coney\ island"

"query_string": "tags:(coney island)"

I'm suggesting a few possibilities, because I'm not sure how to
represent an embedded space in the query string.

clint

Andrei · January 16, 2011, 7:18pm

My current query (before this change), actually does this:

{"query_string": {
"fields": ["title", "notes", "tags"],
"query": "coney island",
"default_operator": "AND",
"use_dis_max": true
}}

All the fields had the "standard" analyzer. I wanted to change it so
that the tags are matched completely, without breaking them up into
words, while maintaining the current behavior with regard to title and
notes. What is the best way of achieving this?

-Andrei

On Jan 16, 1:55 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The query parser breaks down on whitespaces as well. So, the things that gets passed to the analyzer and then construct the query is "coney" and then "island" without doing the actual escaping. Not ideal, but thats how it works...

Andrei · January 28, 2011, 8:35pm

So, do I need to convert this into a dis_max query? Because simply
escaping the whitespace in the query string will not work for title or
notes, because then it forces the words to be a phrase, basically.

On Jan 16, 11:18 am, Andrei and...@zmievski.org wrote:

My current query (before this change), actually does this:

{"query_string": {
"fields": ["title", "notes", "tags"],
"query": "coney island",
"default_operator": "AND",
"use_dis_max": true

}}

All the fields had the "standard" analyzer. I wanted to change it so
that the tags are matched completely, without breaking them up into
words, while maintaining the current behavior with regard to title and
notes. What is the best way of achieving this?

-Andrei

On Jan 16, 1:55 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The query parser breaks down on whitespaces as well. So, the things that gets passed to the analyzer and then construct the query is "coney" and then "island" without doing the actual escaping. Not ideal, but thats how it works...

Topic		Replies	Views
Field and query_string don't use configured analyzer Elasticsearch	4	377	July 6, 2017
Filter on not_analyzed field with whitespace/hyphen not working Elasticsearch	11	3027	July 6, 2017
Help with analyzer and mapping Elasticsearch	9	554	July 6, 2017
Analyzers comparison Elasticsearch	12	616	July 6, 2017
Query_string can't find token that _analyze shows is generated, but term query can Elasticsearch	12	609	July 6, 2017

Query string not working with keyword tokenizer

Related topics