Match query is working differently for few inputs

anusha6 · July 13, 2015, 9:42am

Hi,
I kept following mappings:

PUT pdlfull/pdlfull_type/_mapping
{
"pdlfull_type": {
"properties": {
"GenericDescriptionId": {
"type": "string",
"index": "not_analyzed"
},
"GroupNumber": {
"type": "string",
"index": "not_analyzed"
},
"catalogDescriptions": {
"properties": {
"catalogDescriptionId": {
"type": "long"
}
}
},
"description": {
"type": "string",
"analyzer": "analyzer_startswith",
"fields": {
"sort_field": {
"type": "string",
"analyzer": "keyword_analyzer"
}
}
}
}

And the relevant analyzers I have used in mappings as shown below :

"analysis": {
"analyzer": {
"keyword_analyzer": {
"filter": "lowercase",
"tokenizer": "keyword"
},
"analyzer_startswith": {
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
},
"whitespace_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "whitespace"
},
"wordAnalyzer": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
],
"tokenizer": "whitespace"
}
},
"filter": {
"nGram_filter": {
"max_gram": "20",
"min_gram": "1",
"type": "nGram",
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
}

These are the settings I have used,
My sample description field data is(where am searching only in description field) :

"A Frame & Trunnion Kat"
"A/C Bypass pulley"
"A/C Bypass pulley(Belts)"
"A/T Fluid Capacity"
"A/T overdrive Button kit"
"A/T shift bezel"
"Abc accumulator"
"ABS Anti-skid Relay"
"ABS Anti-skid switch"

My Query is:

GET pdlfull/pdlfull_type/_search
{
"query": {
"match_phrase_prefix": {
"description":"a/c b"

    }
},

"sort":
{
"description.sort_field"
{
"order":"asc"
}
}
}

The above query is not returning any records.

When am giving the query "a/t s","abc a","a/t f","a/t o","A F" am getting records..

May I know the reason, even am not getting records with few prefixes am unable to find out where the problem is and why my query is behaving differently to different inputs..

colings86 · July 13, 2015, 9:57am

You may find it useful to use the Explain API on both the queries that work and the ones that don't to see what the difference is and determine what's causing this behaviour.

It may also be useful to use the Analyze API to test that each of the examples in the description field data you listed are creating the terms you expect.

anusha6 · July 13, 2015, 10:15am

The following query I have used with Explain:

GET pdlfull/pdlfull_type/AU6F0k0L3NytkeZBUPfI/_explain
{
"query": {
"match": {
"description":
{
"query": "abs a",
"operator": "and",
"type": "phrase_prefix",
"prefix_length": 1
}

}
}

}

This is giving result as:

{
"_index": "pdlfull",
"_type": "pdlfull_type",
"_id": "AU6F0k0L3NytkeZBUPfI",
"matched": false,
"explanation": {
"value": 0,
"description": "no matching term"
}
}

When I have used the query:

GET pdlfull/pdlfull_type/AU6F0l6b3NytkeZBURzM/_explain
{
"query": {
"match": {
"description":
{
"query": "a/t f",
"operator": "and",
"type": "phrase_prefix",
"prefix_length": 1
}

}
}

}

Got result as:

"_index": "pdlfull",
"_type": "pdlfull_type",
"_id": "AU6F0l6b3NytkeZBURzM",
"matched": true,
"explanation": {
"value": 203.01616,
"description": "weight(description:"a/t (filler fro forward front fa flow flange flasher fuel flywheel/parts frame frequency fan four flange, fluid f fork/parts flares fuel/water flex final fusible float fog fender follower fuse fit facing filter flasher, flare flng(emissions) fitting, full flywheel freeze fork floor fitting frnt fast formed feedback filter, foot fn fairing front/rear)" in 449) [PerFieldSimilarity], result of:",
"details": [
{
"value": 203.01616,
"description": "fieldWeight in 449, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "phraseFreq=1.0"
}
]
},
{
"value": 406.03232,
"description": "idf(), sum of:",
"details": [
{
"value": 8.127894,
"description": "idf(docFreq=3, maxDocs=4985)"
},
{
"value": 7.029282,
"description": "idf(docFreq=11, maxDocs=4985)"
},
{
.
.
.
.
}

Something like this,
may I know, why I got the explanation as 'false' match for 'abs a' even though records are there with that prefix and 'true' match for 'a/t f' as am using the same prefix query for both???

colings86 · July 13, 2015, 10:21am

Could you try running the following?

curl -XGET 'localhost:9200/pdlfull/_analyze?analyzer=analyzer_startswith' -d 'ABS Anti-skid Relay'

and

curl -XGET 'localhost:9200/pdlfull/_analyze?analyzer=analyzer_startswith' -d 'A/T Fluid Capacity'

This will help to see the tokens the analyzer is producing in the index for that field

anusha6 · July 13, 2015, 10:21am

May I know how to check analyser output:
I have used query as:

GET pdlfull/_analyze?analyzer=analyzer_startswith a/c b

it is showing:

{
"error": "ElasticsearchIllegalArgumentException[text is missing]",
"status": 400
}

So how can I write the GET statement not in curl

colings86 · July 13, 2015, 10:27am

What are you using to send requests to Elasticsearch?

You need to put 'a/c b' in the body of the request. This will be different depending on what tool you are using to send requests. I would look at the documentation for the tool.

anusha6 · July 13, 2015, 10:29am

Am using sense plugin for that

colings86 · July 13, 2015, 10:35am

Try the following instead:

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=ABS Anti-skid Relay

and

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=A/T Fluid Capacity

and

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=abs a

and

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=a/t f

anusha6 · July 13, 2015, 10:40am

GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=ABS Anti-skid Relay

Result:
{
"token": "abs",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "anti-skid",
"start_offset": 4,
"end_offset": 13,
"type": "word",
"position": 2
},
{
"token": "relay",
"start_offset": 14,
"end_offset": 19,
"type": "word",
"position": 3
}

For the below text:
GET /pdlfull/_analyze?analyzer=analyzer_startswith&text=abs a

Result:
{
"token": "abs",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "a",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 2
}

In the same way am getting the tokens for the other two text inputs...

anusha6 · July 13, 2015, 10:41am

Here am getting tokens for all the four inputs.. So where can I find the difference?

colings86 · July 13, 2015, 10:48am

This is a bit strange, I was expecting there to be some differences in those outputs that would show the problem.

What version of Elasticsearch are you running?

anusha6 · July 13, 2015, 10:57am

Its 1.5.2..

anusha6 · July 13, 2015, 11:02am

Is it because of the mappings that I kept for the field 'description', I am using that analyser and the multi field type newly, so i don't know whether it may be a cause or not for this issue?

colings86 · July 13, 2015, 11:11am

I have tried to reproduce your issue using the sense script in this gist. Maybe you could try this on your environment to see if you can reproduce it on your cluster? If not, maybe you can try to modify the script so that it reproduces?

anusha6 · July 13, 2015, 11:17am

Hi Colin,
Sorry, I didn't get your words...

colings86 · July 13, 2015, 11:21am

I tried to re-create the problem you are seeing on my own cluster but I was not able to. On my cluster the queries return correctly. The requests I sent to my cluster to try to recreate the issue are listed at the link I posted.

It would be a good idea for you to try to run the same steps that I did, by pasting the sense commands from that link into your sense client and running the commands (this will create a new index called testIdx). Can you let me know if you find the same issue with this new index?

anusha6 · July 13, 2015, 11:26am

Hi Colin,
The queries are returning the correct data with the new index in my cluster too.. Then why they are not returning for the old index?

anusha6 · July 13, 2015, 12:13pm

Hi Colin,
I have again created the index with same settings and mappings, and loaded the data. Now am able to get the data to any type of text input.
Before I updated the keyword analyzer and the mappings with the multi-field type, as the mappings got updated without any conflicts there is no need to re-index the data hope so. But without re-indexing the data I faced the problem with few text inputs not giving any results.
So do I need to re-index the data always, even though the settings and mappings got updated?

colings86 · July 13, 2015, 12:16pm

Yes you will need to re-index all your data if you change the mappings or settings. You are able to add new nulti-fields to an index without a conflict in the mappings but this will only add the new multi-fields for new index requests and will not add them for documents which are already indexed. The best approach is just to re-index any time you change your mappings.

anusha6 · July 13, 2015, 12:20pm

Okay Colin. Thank you very much, this discussion gave me a good experience.

Topic		Replies	Views
Prefix query search words rather than sentence Elasticsearch	7	895	July 6, 2017
Problem using custom analyzer in mapping field Elasticsearch	3	390	July 6, 2017
How to search on multiple words and need to start with the given letter Elasticsearch	5	1387	July 6, 2017
Prefix Query doesn't apply the analyzer used when indexing Elasticsearch	2	1000	July 6, 2017
Make pharse prefix search on whole text of a field Elasticsearch	5	722	July 5, 2017

Match query is working differently for few inputs

Related topics