Lowercase searches not working

Chris_Kinsalb · May 23, 2013, 7:20am

Hello group,

Here's what I am trying to achieve:

For testing purposes I created a default template which is inserted using
the _template
endpoint into elastic search. The template should be applied to all indexes
starting with the string "docs" (for brevity I omitted some of the other
fields)

{
"template": "docs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+docs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_doc_analyser": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["standard", "lowercase"]
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": true
},
"_source": {
"compress": false
},
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_doc_analyser"
}
}
}
}
}

Now, I insert a document into my elasticsearch instance, for instance
having the
text: (index is automatically created)

"Hello, how are you doing Donny"

I then issue a query_string search to this endpoint:

http://127.0.0.1:9200/docs-testicus/_search

... using this query in the body:

{
"query": {
"query_string": {
"query": "Donny"
}
}
}

.. I then get the (1) result.

Now, I EXPECTED to also be able to issue searches with lowercased inputs,
such as:

{
"query": {
"query_string": {
"query": "donny"
}
}
}

However, this gives me 0 hits.

Can anyone explain WHY I cannot do lower-case searches when a lowercase
filter is part
of the analyzer?

When I tried testing my custom analyzer using the _analyze endpoint, I can
see
that the input tokens are indeed turned into lowercase.

Cheers,

Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Franck_GIRODON · May 23, 2013, 7:45am

Hello, it's strange ... I have created the same index and the same seach
and I have a result for "Donny" or "donny" ...

Le jeudi 23 mai 2013 09:20:36 UTC+2, Chris Kinsalb a écrit :

Hello group,

Here's what I am trying to achieve:

For testing purposes I created a default template which is inserted using
the _template
endpoint into Elasticsearch. The template should be applied to all indexes
starting with the string "docs" (for brevity I omitted some of the other
fields)

{
"template": "docs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+docs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_doc_analyser": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["standard", "lowercase"]
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": true
},
"_source": {
"compress": false
},
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_doc_analyser"
}
}
}
}
}

Now, I insert a document into my elasticsearch instance, for instance
having the
text: (index is automatically created)

"Hello, how are you doing Donny"

I then issue a query_string search to this endpoint:

http://127.0.0.1:9200/docs-testicus/_search

... using this query in the body:

{
"query": {
"query_string": {
"query": "Donny"
}
}
}

.. I then get the (1) result.

Now, I EXPECTED to also be able to issue searches with lowercased inputs,
such as:

{
"query": {
"query_string": {
"query": "donny"
}
}
}

However, this gives me 0 hits.

Can anyone explain WHY I cannot do lower-case searches when a lowercase
filter is part
of the analyzer?

When I tried testing my custom analyzer using the _analyze endpoint, I can
see
that the input tokens are indeed turned into lowercase.

Cheers,

Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris_Kinsalb · May 23, 2013, 8:09am

Hello,

I just upgraded my test instance to 0.90.0 (used 0.20.6 previously) and am
still getting the same effect.

Did you do anything differently?
E.g. create the document type mapping for the index separately (i.e. by
explicitly creating the index?).

Cheerio,

Chris

On Thursday, May 23, 2013 9:45:04 AM UTC+2, Franck GIRODON wrote:

Hello, it's strange ... I have created the same index and the same seach
and I have a result for "Donny" or "donny" ...

Le jeudi 23 mai 2013 09:20:36 UTC+2, Chris Kinsalb a écrit :

Hello group,

Here's what I am trying to achieve:

For testing purposes I created a default template which is inserted using
the _template
endpoint into Elasticsearch. The template should be applied to all
indexes
starting with the string "docs" (for brevity I omitted some of the other
fields)

{
"template": "docs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+docs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_doc_analyser": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["standard", "lowercase"]
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": true
},
"_source": {
"compress": false
},
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_doc_analyser"
}
}
}
}
}

Now, I insert a document into my elasticsearch instance, for instance
having the
text: (index is automatically created)

"Hello, how are you doing Donny"

I then issue a query_string search to this endpoint:

http://127.0.0.1:9200/docs-testicus/_search

... using this query in the body:

{
"query": {
"query_string": {
"query": "Donny"
}
}
}

.. I then get the (1) result.

Now, I EXPECTED to also be able to issue searches with lowercased inputs,
such as:

{
"query": {
"query_string": {
"query": "donny"
}
}
}

However, this gives me 0 hits.

Can anyone explain WHY I cannot do lower-case searches when a lowercase
filter is part
of the analyzer?

When I tried testing my custom analyzer using the _analyze endpoint, I
can see
that the input tokens are indeed turned into lowercase.

Cheers,

Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · May 23, 2013, 10:49am

Chris - check the mapping and settings of your index.

I don't see anything that is obviously wrong, but you may have a typo
somewhere which is producing results other than what you expect

clint

On 23 May 2013 10:09, Chris Kinsalb hc.kinsalb@gmail.com wrote:

Hello,

I just upgraded my test instance to 0.90.0 (used 0.20.6 previously) and am
still getting the same effect.

Did you do anything differently?
E.g. create the document type mapping for the index separately (i.e. by
explicitly creating the index?).

Cheerio,

Chris

On Thursday, May 23, 2013 9:45:04 AM UTC+2, Franck GIRODON wrote:

Hello, it's strange ... I have created the same index and the same seach
and I have a result for "Donny" or "donny" ...

Le jeudi 23 mai 2013 09:20:36 UTC+2, Chris Kinsalb a écrit :

Hello group,

Here's what I am trying to achieve:

For testing purposes I created a default template which is inserted
using the _template
endpoint into Elasticsearch. The template should be applied to all
indexes
starting with the string "docs" (for brevity I omitted some of the other
fields)

{
"template": "docs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+docs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_**shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_doc_analyser": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["standard", "lowercase"]
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": true
},
"_source": {
"compress": false
},
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_doc_analyser"
}
}
}
}
}

Now, I insert a document into my elasticsearch instance, for instance
having the
text: (index is automatically created)

"Hello, how are you doing Donny"

I then issue a query_string search to this endpoint:

http://127.0.0.1:9200/docs-**testicus/_search http://127.0.0.1:9200/docs-testicus/_search

... using this query in the body:

{
"query": {
"query_string": {
"query": "Donny"
}
}
}

.. I then get the (1) result.

Now, I EXPECTED to also be able to issue searches with lowercased
inputs, such as:

{
"query": {
"query_string": {
"query": "donny"
}
}
}

However, this gives me 0 hits.

Can anyone explain WHY I cannot do lower-case searches when a lowercase
filter is part
of the analyzer?

When I tried testing my custom analyzer using the _analyze endpoint, I
can see
that the input tokens are indeed turned into lowercase.

Cheers,

Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris_Kinsalb · May 23, 2013, 1:10pm

Hi Clint,
thanks for the check.

Now I actually found the solution, and it did not really make me laugh
(especially considering the time that I spent figuring out why this
happened) ;-).

In my first post I changed the config a bit, here's the "REAL"
configuration for the template I used (there's only a SLIGHT difference
anyway).
Btw, eventually we will use this solution for storing log messages.

{
"template": "logs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+logs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_log_analyser": {
"tokenizer": "whitespace",
"filter": [
"standard",
"lowercase"
],
"type": "custom"
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": false
},
"_source": {
"compress": false
},
"dynamic_templates": [
{
"fields_template": {
"mapping": {
"type": "string",
"index": "not_analyzed"
},
"path_match": "@fields.*"
}
}
],
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_log_analyser"
},
"@fields": {
"type": "object",
"dynamic": true,
"path": "full"
}
}
}
}
}

Looking at the template you will spot that the only differences actually
are the* index prefix name* to which this template is applied & the name of
the log analyser.

To cut a long story short, the answer is that IFF (as in "if and only if")
the index prefix name for a template starts with either
logs*
or
logs-*

Elasticsearch gives me the finger and only does half of the analysis
(apparently).
Renaming the prefix to anything else, e.g. "narf" or "foo" or "docs" (as
posted in my original message) will do the trick and everything works as
expected.

This seems to have the word "bug" all over it, so I will post an issue on
github with this.

Anyway, thanks a lot for your support!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Aggregations search results are always lowercase Elasticsearch	2	881	July 19, 2018
Elasticsearch template confusion Elasticsearch	1	89	April 3, 2024
Requesting help with Case-insensitive Analyzer Elasticsearch	3	319	March 27, 2024
Lowercase normalizer not working Elasticsearch	6	1248	September 15, 2020
Case Insensitive aggregation not working Elasticsearch	5	281	April 8, 2024

Lowercase searches not working

Related topics