When I use normalizer in field, it can't work in group by of SQL statement?

Angela_Hsieh · November 1, 2018, 3:06am

I want the query can be insensitive, so set up the mapping has normalizer
when i use group by , it can't work...
Here is index:
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": ,
"filter": ["lowercase", "asciifolding"]
}
}
}
}
}
mapping:
{"properties": { "user": { "type": "keyword","normalizer": "my_normalizer" } } } '

query:
{
"query":"select user from test where user like '%A%' group by user"
}

{"error":{"root_cause":[{"type":"query_shard_exception","reason":
"failed to find field [user.user] and [missing_bucket] is not set" }

I translated to DSL and modify the field of aggregations, it is fine!

I confused of the keyword in normalizer can't use group by and equal

is there any ideas ?

Andrei_Stefan · November 1, 2018, 8:15am

@Angela_Hsieh, thank you for trying out SQL.

Your post is not clear. You posted an error saying {"error":{"root_cause":[{"type":"query_shard_exception","reason": "failed to find field [user.user] and [missing_bucket] is not set" } which indicates that your user field should have a user sub-field. But your mapping of user doesn't show that.

What version of Elasticsearch did you try this on?
Please, provide the complete mapping of your test index by providing the output of GET /test command.

Angela_Hsieh · November 1, 2018, 8:58am

thank you for your reply and sorry about the information is not clear.

Elasticsearch version:6.4.2

**curl -XPUT http://localhost:9200/test/ -H 'Content-Type: application/json' -d '
{
"index": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": ,
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
}'

and i'm not add any sub-field in user , when i used _xpack to translate SQL

curl -XPOST "http://localhost:9200/_xpack/sql?format=txt" -H "Content-Type: application/json" -d "{
{
"query":"select user from test where user like '%A%' group by user"
}

it showed errors, so i use _xpack/sql/translate to look at the DSL.
Then i found DSL generates user.user by itself. If i take the .user away , it works.

Q1: what is the rule in SQL statement convert to DSL?
Q2: if the field type is [ keyword], when i add normalizer to it, is it effect something?

Andrei_Stefan · November 2, 2018, 12:12pm

@Angela_Hsieh I've reproduced the issue on 6.4.2.
Few things:

using normalized fields in SQL is not allowed and we document this limitation: Data Types | Elasticsearch Guide [6.5] | Elastic

As SQL requires exact values, when encountering a text field Elasticsearch SQL will search for an exact multi-field that it can use for comparisons, sorting and aggregations. To do that, it will search for the first keyword that it can find that is not normalized and use that as the original field exact value.

the fact that throws that error, it's a bug. It should still issue an error message, but a different one. I created this bug for the 6.4 version.
in 6.5 and 7.0 the error is not there anymore and the current snapshot versions work with normalized fields. But, they shouldn't be allowed in sorting, aggregations and comparisons. I filed a bug for 6.5 and 7.x here.

As a workaround for your scenario, you need to use a keyword field without a normalizer. I know it's changing the use case, but for the reason mentioned above (and in the documentation), it has to be like this.

system · November 30, 2018, 12:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.