Analyzer not working the same in v2.4 as it does in v1.6


#1

v2.4 brings back many more (unwanted) results than v1.6. Mapping is the same between the two. Settings (analyzers, tokenizers, etc) are the same between the two. The analyzer is very complex with tokenizing and synonyms.

The super simplified query that is explained:
GET /partsearch/epicorpart/_validate/query?explain
{
"query": {
"match": {
"partDescription" : {
"query": "qp-09-04",
"operator": "and"
}
}
}
}

The explain result for v2.4:
"+(partDescription:qp-09-04 partDescription:qp partDescription:quality-plan partDescription:09 partDescription:04 partDescription:qp partDescription:quality-plan partDescription:-09-04 partDescription:09 partDescription:04) #ConstantScore(+ConstantScore(_type:epicorpart))"

The explain result for v1.6:
"filtered(+partDescription:qp-09-04 +(partDescription:qp partDescription:quality-plan partDescription:09 partDescription:04) +(partDescription:qp partDescription:quality-plan partDescription:-09-04 partDescription:09 partDescription:04))->cache(_type:epicorpart)"

UPDATE: Found a difference between my indexes in v1.6 and v2.4. When comparing output of _analyze?analyzer=myAnalyzer&text=QP-01 the output is different as seen below. The same analyzer is used for both. In v2.4 all of the tokens have position number 0 whereas in V1.6 the tokens have various position numbers. And it looks like the position numbers are how the tokens are grouped in the analysis results shown above.

v2.4
"tokens": [
{
"token": "qp-01",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "qp",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "quality-plan",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 0
},
{
"token": "01",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "qp",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "quality-plan",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 0
},
{
"token": "-01",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "01",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]

v1.6
"tokens": [
{
"token": "qp-01",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "qp",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 2
},
{
"token": "quality-plan",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 2
},
{
"token": "01",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 2
},
{
"token": "qp",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 3
},
{
"token": "quality-plan",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 3
},
{
"token": "-01",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 3
},
{
"token": "01",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 3
}
]

Suggestions?


(system) #2