Match query is working differently for few inputs

Hi,
I kept following mappings:

PUT pdlfull/pdlfull_type/_mapping
{
"pdlfull_type": {
"properties": {
"GenericDescriptionId": {
"type": "string",
"index": "not_analyzed"
},
"GroupNumber": {
"type": "string",
"index": "not_analyzed"
},
"catalogDescriptions": {
"properties": {
"catalogDescriptionId": {
"type": "long"
}
}
},
"description": {
"type": "string",
"analyzer": "analyzer_startswith",
"fields": {
"sort_field": {
"type": "string",
"analyzer": "keyword_analyzer"
}
}
}
}

And the relevant analyzers I have used in mappings as shown below :

"analysis": {
"analyzer": {
"keyword_analyzer": {
"filter": "lowercase",
"tokenizer": "keyword"
},
"analyzer_startswith": {
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
},
"whitespace_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "whitespace"
},
"wordAnalyzer": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
],
"tokenizer": "whitespace"
}
},
"filter": {
"nGram_filter": {
"max_gram": "20",
"min_gram": "1",
"type": "nGram",
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
}

These are the settings I have used,
My sample description field data is(where am searching only in description field) :

"A Frame & Trunnion Kat"
"A/C Bypass pulley"
"A/C Bypass pulley(Belts)"
"A/T Fluid Capacity"
"A/T overdrive Button kit"
"A/T shift bezel"
"Abc accumulator"
"ABS Anti-skid Relay"
"ABS Anti-skid switch"

My Query is:

GET pdlfull/pdlfull_type/_search
{
"query": {
"match_phrase_prefix": {
"description":"a/c b"

    }
},

"sort":
{
"description.sort_field"
{
"order":"asc"
}
}
}

The above query is not returning any records.

When am giving the query "a/t s","abc a","a/t f","a/t o","A F" am getting records..

May I know the reason, even am not getting records with few prefixes am unable to find out where the problem is and why my query is behaving differently to different inputs..

You may find it useful to use the Explain API on both the queries that work and the ones that don't to see what the difference is and determine what's causing this behaviour.

It may also be useful to use the Analyze API to test that each of the examples in the description field data you listed are creating the terms you expect.

The following query I have used with Explain:

GET pdlfull/pdlfull_type/AU6F0k0L3NytkeZBUPfI/_explain
{
"query": {
"match": {
"description":
{
"query": "abs a",
"operator": "and",
"type": "phrase_prefix",
"prefix_length": 1
}

    }
}

}

This is giving result as:

{
"_index": "pdlfull",
"_type": "pdlfull_type",
"_id": "AU6F0k0L3NytkeZBUPfI",
"matched": false,
"explanation": {
"value": 0,
"description": "no matching term"
}
}

When I have used the query:

GET pdlfull/pdlfull_type/AU6F0l6b3NytkeZBURzM/_explain
{
"query": {
"match": {
"description":
{
"query": "a/t f",
"operator": "and",
"type": "phrase_prefix",
"prefix_length": 1
}

    }
}

}

Got result as:

"_index": "pdlfull",
"_type": "pdlfull_type",
"_id": "AU6F0l6b3NytkeZBURzM",
"matched": true,
"explanation": {
"value": 203.01616,
"description": "weight(description:"a/t (filler fro forward front fa flow flange flasher fuel flywheel/parts frame frequency fan four flange, fluid f fork/parts flares fuel/water flex final fusible float fog fender follower fuse fit facing filter flasher, flare flng(emissions) fitting, full flywheel freeze fork floor fitting frnt fast formed feedback filter, foot fn fairing front/rear)" in 449) [PerFieldSimilarity], result of:",
"details": [
{
"value": 203.01616,
"description": "fieldWeight in 449, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "phraseFreq=1.0"
}
]
},
{
"value": 406.03232,
"description": "idf(), sum of:",
"details": [
{
"value": 8.127894,
"description": "idf(docFreq=3, maxDocs=4985)"
},
{
"value": 7.029282,
"description": "idf(docFreq=11, maxDocs=4985)"
},
{
.
.
.
.
}

Something like this,
may I know, why I got the explanation as 'false' match for 'abs a' even though records are there with that prefix and 'true' match for 'a/t f' as am using the same prefix query for both???

Could you try running the following?

curl -XGET 'localhost:9200/pdlfull/_analyze?analyzer=analyzer_startswith' -d 'ABS Anti-skid Relay'

and

curl -XGET 'localhost:9200/pdlfull/_analyze?analyzer=analyzer_startswith' -d 'A/T Fluid Capacity'

This will help to see the tokens the analyzer is producing in the index for that field

May I know how to check analyser output:
I have used query as:

GET pdlfull/_analyze?analyzer=analyzer_startswith a/c b

it is showing:

{
"error": "ElasticsearchIllegalArgumentException[text is missing]",
"status": 400
}

So how can I write the GET statement not in curl

What are you using to send requests to Elasticsearch?

You need to put 'a/c b' in the body of the request. This will be different depending on what tool you are using to send requests. I would look at the documentation for the tool.

Am using sense plugin for that

Try the following instead:

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=ABS Anti-skid Relay

and

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=A/T Fluid Capacity

and

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=abs a

and

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=a/t f

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=ABS Anti-skid Relay

Result:
{
"token": "abs",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "anti-skid",
"start_offset": 4,
"end_offset": 13,
"type": "word",
"position": 2
},
{
"token": "relay",
"start_offset": 14,
"end_offset": 19,
"type": "word",
"position": 3
}

For the below text:
GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=abs a

Result:
{
"token": "abs",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "a",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 2
}

In the same way am getting the tokens for the other two text inputs...

Here am getting tokens for all the four inputs.. So where can I find the difference?

This is a bit strange, I was expecting there to be some differences in those outputs that would show the problem.

What version of Elasticsearch are you running?

Its 1.5.2..

Is it because of the mappings that I kept for the field 'description', I am using that analyser and the multi field type newly, so i don't know whether it may be a cause or not for this issue?

I have tried to reproduce your issue using the sense script in this gist. Maybe you could try this on your environment to see if you can reproduce it on your cluster? If not, maybe you can try to modify the script so that it reproduces?

Hi Colin,
Sorry, I didn't get your words...

I tried to re-create the problem you are seeing on my own cluster but I was not able to. On my cluster the queries return correctly. The requests I sent to my cluster to try to recreate the issue are listed at the link I posted.

It would be a good idea for you to try to run the same steps that I did, by pasting the sense commands from that link into your sense client and running the commands (this will create a new index called testIdx). Can you let me know if you find the same issue with this new index?

Hi Colin,
The queries are returning the correct data with the new index in my cluster too.. Then why they are not returning for the old index?

Hi Colin,
I have again created the index with same settings and mappings, and loaded the data. Now am able to get the data to any type of text input.
Before I updated the keyword analyzer and the mappings with the multi-field type, as the mappings got updated without any conflicts there is no need to re-index the data hope so. But without re-indexing the data I faced the problem with few text inputs not giving any results.
So do I need to re-index the data always, even though the settings and mappings got updated?

Yes you will need to re-index all your data if you change the mappings or settings. You are able to add new nulti-fields to an index without a conflict in the mappings but this will only add the new multi-fields for new index requests and will not add them for documents which are already indexed. The best approach is just to re-index any time you change your mappings.

Okay Colin. Thank you very much, this discussion gave me a good experience. :smiley:

1 Like