Fuziness changing result of match query


(anusha) #1

Hi all,
Am having my data with different analyzers as shown in the following mappings:

PUT ymme/ymme_type/_mappings
{

"ymme_type": {
"_all": {
"auto_boost": true,
"index_analyzer": "wordAnalyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"Engine": {
"type": "string",
"index": "not_analyzed"
},
"EngineCode": {
"type": "string",
"include_in_all": false
},
"Make": {
"type": "string",
"boost": 3,
"index": "not_analyzed",
"norms": {
"enabled": true
}
},
"MakeCode": {
"type": "string",
"include_in_all": false
},
"Model": {
"type": "string",
"boost": 2,
"index": "not_analyzed",
"norms": {
"enabled": true
}
},
"ModelCode": {
"type": "string",
"include_in_all": false
},
"ShortYear": {
"type": "string",
"boost": 4,
"index": "not_analyzed",
"norms": {
"enabled": true
}
},
"Year": {
"type": "string",
"boost": 5,
"index": "not_analyzed",
"norms": {
"enabled": true
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"YearCode": {
"type": "string",
"include_in_all": false
}
}
}
}
In ShortYear field I kept last two digits of the Year, when we type last two digits need to show the records based on that year.
The reason that I have used _all here to search on multiple fields and those fields may have special characters, as QueryString doesnt support special characters inorder to use match query, I kept the analyzers in _all field.
Boost values I have taken inorder to boost that particular field with more than the other fields.

And my analyzers are as shown in settings:

"analysis": {
"analyzer": {
"analyzer_startswith": {
"type": "custom",
"filter": "lowercase",
"tokenizer": "keyword"
},
"whitespace_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "whitespace"
},
"wordAnalyzer": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
],
"tokenizer": "whitespace"
}
},
"filter": {
"nGram_filter": {
"max_gram": "20",
"min_gram": "1",
"type": "nGram",
"token_chars": [
"letter",
"punctuation",
"symbol",
"digit"
]
}
}
}

Here am using n-gram filter as my search should not be in order, search may be in random fields..

I preferred a query for this is as shown below:

GET testymme/ymme_type/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"_all":
{
"query": "2012",
"operator": "and"
}
}
}
]
}
}
}

Here am also doing sorting using Java API. My intension is am not getting the data relevant to the Search As You Type
Concept.. And when am adding fuziness to the query my intension of boosting is not working and am getting different data.

My sample data is:

"2012", "AM GENERAL-VPG", "MV-1 V8-281", "4.6L SOHC"
"2012", "CHEVROLET", "CAMARO", "All Engine"
"2012", "CHEVROLET", "CAMARO", "v6-3564 3.6L", "DOHC"
"2012", "CHEVROLET", "CAMARO", "V8-376", "4.6L"
"2012", "LAMBORGHINI", "AVENTADOR", "12-654 6.5L", "DOHC"
"2012", "LAMBORGHINI", "GALLARDO", "10-520", "DOHC"

Like this which is a combination of Year, make, model, Engine. In this way I have Years from 1962 to 2015 and different makes for them and models for those makes and engines for those models.

When am searching with the above query am getting the result with the 2012 data but when am using the following query:

"query": {
"bool": {
"must": [
{
"match": {
"_all":
{
"query": "2012",
"operator": "and",
"fuzziness": 1,
"prefix_length": 1
}
}
}
]
}
}
}

Am getting the data with year not based on 2012, getting the documents with different years I dont know what is happening, can anyone help me out to resolve this issue..


(anusha) #2

Even there is a problem with Search As You Type Concept as I said:

My Sample data as I mentioned above "Year", "Make", "Model", "Engine"

When am using the query to return the records which are having model's starting with am in the year "2012"

GET testymme/ymme_type/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"_all":
{
"query": "2012 am",
"operator": "and"
}
}
}
]
}
}
}

The response that I need to get "2012", "AM GENERAL-VPG", "MV-1 V8-281", "4.6L SOHC" as starting record but the other documents are coming first and I know that this is because of the filters and tokenizers that I have used and also based on scoring the response is coming.

How can I achieve the words starting with and the search may be in any field and in any order, should not follow the order as the Year, Make Model and Engine..

And Is there anyway to get the response without score based??


(system) #3