Keyword analyzer not working as expected


(shoebalig) #1

Hi There,

I have index mapping given below.

{
"es_test": {
"mappings": {
"es_test_type": {
"_all": {
"enabled": true
},
"properties": {
"Field1": {
"type": "string",
"store": true,
"analyzer": "keyword"
},
"Field2": {
"type": "string",
"store": true,
"analyzer": "standard"
},
"Field3": {
"type": "string",
"store": true,
"analyzer": "standard"
},
"RecordID": {
"type": "string",
"store": true,
"analyzer": "standard"
},
"Status": {
"type": "string",
"index": "not_analyzed",
"analyzer": "standard"
},
"Status.Code": {
"type": "string",
"index": "not_analyzed",
"analyzer": "standard"
},
"Status.Description": {
"type": "string",
"index": "not_analyzed",
"analyzer": "standard"
}
}
}
}
}
}

Input data
Field1,Field2,Field3
GAURAV GUPTA,SAURABH,MANI
GAURAV KUMAR GUPTA,SAURABH SHARMA,MANI RANA
GAURAV,SAURABH,MANI
gaurav gupta,saurabh sharma,mani

Query:-
POST /es_test/es_test_type/_search
{ "query": {
"simple_query_string": {
"query": "GAURAV GUPTA",
"fields": [
"Field2",
"Field3",
"Field1"
],
"analyzer": "keyword",
"default_operator": "or"
}
}
}

Out put

{
"took": 11,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.25604635,
"hits": [
{
"_index": "es_test",
"_type": "es_test_type",
"_id": "3",
"_score": 0.25604635,
"_source": {
"RecordID": "3",
"Status": null,
"Status.Code": null,
"Status.Description": null,
"Field1": "GAURAV",
"Field2": "SAURABH",
"Field3": "MANI"
}
}
]
}
}


Out is not proper It should return first document
GAURAV GUPTA,SAURABH,MANI

however it is returning 3rd document
GAURAV,SAURABH,MANI

Please suggest if i am doing something wrong??


(David Kemp) #2

Despite declaring your query analyser to be keyword, your simple query string query still gets parsed into query terms. So your query was equivalent to ("GAURAV" OR "GUPTA")

To get the behaviour that you are expecting, you need to enclose your query term in escaped double quotes. I.e.

"query": ""GAURAV GUPTA""


(shoebalig) #3

Thats really help full but for below given query i have doubt again
POST /es_standalone_keyword/es_standalone_keyword_type/_search
{
"query" : {
"simple_query_string" : {
"query" : ""GAURAV"",
"fields" : [ "Field2", "Field3", "Field1"],
"analyzer" : "keyword",
"default_operator" : "and"
}
},
"fields" : [ "Field1", "Field2", "Field3" ]
}
OUtput is:-
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 4.8796144,
"hits": [
{
"_index": "es_standalone_keyword",
"_type": "es_standalone_keyword_type",
"_id": "1",
"_score": 4.8796144,
"fields": {
"Field1": [
"GAURAV"
],
"Field2": [
"GAURAV"
],
"Field3": [
"GAURAV"
]
}
},
{
"_index": "es_standalone_keyword",
"_type": "es_standalone_keyword_type",
"_id": "11",
"_score": 1.322617,
"fields": {
"Field1": [
"GAURAV"
],
"Field2": [
"SAURABH"
],
"Field3": [
"MANI"
]
}
}
]
}
}
My understanding says it should only returns first record not second record.
Any input much appreciated in advance.


(David Kemp) #4

Looks to me like your index has a document with GAURAV in the first, second, and third fields. That's in addition to the documents you state are part of your "input".


(Christoph) #5

I assume your question relates to the default_operator you are using. Note that the operator only applies to the way the query terms are combined, the fields are always combined with an OR (or in boolean should-clauses in Lucene speak).

See the difference for these two queries, the output shows the Lucene query that gets executed:

GET /test/_validate/query?explain
{
  "query": {
    "simple_query_string": {
      "query": "t1",
      "fields": ["field1","field2","field3" ],
      "default_operator": "AND"
    }
  }
}

=> "explanation": "field1:t1 field3:t1 field2:t1"  

so here t1 can appear in any of the three fields, whereas

GET /test/_validate/query?explain
{
  "query": {
    "simple_query_string": {
      "query": "t1 t2",
      "fields": ["field1","field2","field3" ],
      "default_operator": "AND"
    }
  }
}
=> "explanation": "+(field1:t1 field3:t1 field2:t1) +(field1:t2 field3:t2 field2:t2)"

Here t1 and t2 both need to appear (because of the AND operator) but it doesn't matter in which field.


(shoebalig) #6

Thanks , Thats really helpful.


(system) #7