Lowercase searches not working

Hello group,

Here's what I am trying to achieve:

For testing purposes I created a default template which is inserted using
the _template
endpoint into elastic search. The template should be applied to all indexes
starting with the string "docs" (for brevity I omitted some of the other
fields)

{
"template": "docs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+docs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_doc_analyser": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["standard", "lowercase"]
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": true
},
"_source": {
"compress": false
},
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_doc_analyser"
}
}
}
}
}

Now, I insert a document into my elasticsearch instance, for instance
having the
text: (index is automatically created)

"Hello, how are you doing Donny"

I then issue a query_string search to this endpoint:

http://127.0.0.1:9200/docs-testicus/_search

... using this query in the body:

{
"query": {
"query_string": {
"query": "Donny"
}
}
}

.. I then get the (1) result.

Now, I EXPECTED to also be able to issue searches with lowercased inputs,
such as:

{
"query": {
"query_string": {
"query": "donny"
}
}
}

However, this gives me 0 hits.

Can anyone explain WHY I cannot do lower-case searches when a lowercase
filter is part
of the analyzer?

When I tried testing my custom analyzer using the _analyze endpoint, I can
see
that the input tokens are indeed turned into lowercase.

Cheers,

  • Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello, it's strange ... I have created the same index and the same seach
and I have a result for "Donny" or "donny" ...

Le jeudi 23 mai 2013 09:20:36 UTC+2, Chris Kinsalb a écrit :

Hello group,

Here's what I am trying to achieve:

For testing purposes I created a default template which is inserted using
the _template
endpoint into Elasticsearch. The template should be applied to all indexes
starting with the string "docs" (for brevity I omitted some of the other
fields)

{
"template": "docs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+docs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_doc_analyser": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["standard", "lowercase"]
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": true
},
"_source": {
"compress": false
},
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_doc_analyser"
}
}
}
}
}

Now, I insert a document into my elasticsearch instance, for instance
having the
text: (index is automatically created)

"Hello, how are you doing Donny"

I then issue a query_string search to this endpoint:

http://127.0.0.1:9200/docs-testicus/_search

... using this query in the body:

{
"query": {
"query_string": {
"query": "Donny"
}
}
}

.. I then get the (1) result.

Now, I EXPECTED to also be able to issue searches with lowercased inputs,
such as:

{
"query": {
"query_string": {
"query": "donny"
}
}
}

However, this gives me 0 hits.

Can anyone explain WHY I cannot do lower-case searches when a lowercase
filter is part
of the analyzer?

When I tried testing my custom analyzer using the _analyze endpoint, I can
see
that the input tokens are indeed turned into lowercase.

Cheers,

  • Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

I just upgraded my test instance to 0.90.0 (used 0.20.6 previously) and am
still getting the same effect.

Did you do anything differently?
E.g. create the document type mapping for the index separately (i.e. by
explicitly creating the index?).

Cheerio,

  • Chris

On Thursday, May 23, 2013 9:45:04 AM UTC+2, Franck GIRODON wrote:

Hello, it's strange ... I have created the same index and the same seach
and I have a result for "Donny" or "donny" ...

Le jeudi 23 mai 2013 09:20:36 UTC+2, Chris Kinsalb a écrit :

Hello group,

Here's what I am trying to achieve:

For testing purposes I created a default template which is inserted using
the _template
endpoint into Elasticsearch. The template should be applied to all
indexes
starting with the string "docs" (for brevity I omitted some of the other
fields)

{
"template": "docs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+docs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_doc_analyser": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["standard", "lowercase"]
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": true
},
"_source": {
"compress": false
},
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_doc_analyser"
}
}
}
}
}

Now, I insert a document into my elasticsearch instance, for instance
having the
text: (index is automatically created)

"Hello, how are you doing Donny"

I then issue a query_string search to this endpoint:

http://127.0.0.1:9200/docs-testicus/_search

... using this query in the body:

{
"query": {
"query_string": {
"query": "Donny"
}
}
}

.. I then get the (1) result.

Now, I EXPECTED to also be able to issue searches with lowercased inputs,
such as:

{
"query": {
"query_string": {
"query": "donny"
}
}
}

However, this gives me 0 hits.

Can anyone explain WHY I cannot do lower-case searches when a lowercase
filter is part
of the analyzer?

When I tried testing my custom analyzer using the _analyze endpoint, I
can see
that the input tokens are indeed turned into lowercase.

Cheers,

  • Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris - check the mapping and settings of your index.

I don't see anything that is obviously wrong, but you may have a typo
somewhere which is producing results other than what you expect

clint

On 23 May 2013 10:09, Chris Kinsalb hc.kinsalb@gmail.com wrote:

Hello,

I just upgraded my test instance to 0.90.0 (used 0.20.6 previously) and am
still getting the same effect.

Did you do anything differently?
E.g. create the document type mapping for the index separately (i.e. by
explicitly creating the index?).

Cheerio,

  • Chris

On Thursday, May 23, 2013 9:45:04 AM UTC+2, Franck GIRODON wrote:

Hello, it's strange ... I have created the same index and the same seach
and I have a result for "Donny" or "donny" ...

Le jeudi 23 mai 2013 09:20:36 UTC+2, Chris Kinsalb a écrit :

Hello group,

Here's what I am trying to achieve:

For testing purposes I created a default template which is inserted
using the _template
endpoint into Elasticsearch. The template should be applied to all
indexes
starting with the string "docs" (for brevity I omitted some of the other
fields)

{
"template": "docs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+docs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_**shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_doc_analyser": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["standard", "lowercase"]
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": true
},
"_source": {
"compress": false
},
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_doc_analyser"
}
}
}
}
}

Now, I insert a document into my elasticsearch instance, for instance
having the
text: (index is automatically created)

"Hello, how are you doing Donny"

I then issue a query_string search to this endpoint:

http://127.0.0.1:9200/docs-**testicus/_searchhttp://127.0.0.1:9200/docs-testicus/_search

... using this query in the body:

{
"query": {
"query_string": {
"query": "Donny"
}
}
}

.. I then get the (1) result.

Now, I EXPECTED to also be able to issue searches with lowercased
inputs, such as:

{
"query": {
"query_string": {
"query": "donny"
}
}
}

However, this gives me 0 hits.

Can anyone explain WHY I cannot do lower-case searches when a lowercase
filter is part
of the analyzer?

When I tried testing my custom analyzer using the _analyze endpoint, I
can see
that the input tokens are indeed turned into lowercase.

Cheers,

  • Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Clint,
thanks for the check.

Now I actually found the solution, and it did not really make me laugh
(especially considering the time that I spent figuring out why this
happened) ;-).

In my first post I changed the config a bit, here's the "REAL"
configuration for the template I used (there's only a SLIGHT difference
anyway).
Btw, eventually we will use this solution for storing log messages.

{
"template": "logs*",
"settings": {
"query.default_field": "@message",
"action.auto_create_index": "+logs*",
"number_of_shards": 1,
"number_of_replicas": 1,
"routing.allocation.total_shards_per_node": 1,
"auto_expand_replicas": false,
"index": {
"analysis": {
"analyzer": {
"my_log_analyser": {
"tokenizer": "whitespace",
"filter": [
"standard",
"lowercase"
],
"type": "custom"
}
}
}
}
},
"mappings": {
"default": {
"_all": {
"enabled": false
},
"_source": {
"compress": false
},
"dynamic_templates": [
{
"fields_template": {
"mapping": {
"type": "string",
"index": "not_analyzed"
},
"path_match": "@fields.*"
}
}
],
"properties": {
"@time": {
"type": "date",
"index": "not_analyzed"
},
"@message": {
"type": "string",
"index": "analyzed",
"analyzer": "my_log_analyser"
},
"@fields": {
"type": "object",
"dynamic": true,
"path": "full"
}
}
}
}
}

Looking at the template you will spot that the only differences actually
are the* index prefix name* to which this template is applied & the name of
the log analyser.

To cut a long story short, the answer is that IFF (as in "if and only if")
the index prefix name for a template starts with either
logs*
or
logs-*

Elasticsearch gives me the finger and only does half of the analysis
(apparently).
Renaming the prefix to anything else, e.g. "narf" or "foo" or "docs" (as
posted in my original message) will do the trick and everything works as
expected.

This seems to have the word "bug" all over it, so I will post an issue on
github with this.

Anyway, thanks a lot for your support!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.