SpanNearQuery bug?


#1

I have two fields, and would like to match terms in the field IF in the same position i.e. span near query with zero slop. However, the spannear query is matching WHEN IT SHOULDN'T - i.e. based on my own design AND certified by looking at the term_vectors!

Here is an example of the incorrect matches I've found shouldn't match but does - I've abstracted to "field1/2" and "value1/2":
{
"query": {
"span_near" : {
"clauses" : [
{ "span_term" : { "text.field1" : "value1" } },
{ "field_masking_span": {
"query": { "span_term" : { "text.field2" : "value2" } }
,
"field": "text.field1"
}
}
],
"slop" : 0,
"in_order" : false
}
}
}

However, this isn't working as expected and seems buggy - deep diving into one of the erroneous hits and using term_vectors for the hit shows that the values are NOT in the same position!

They are next to each other - value1 is at position 165 in field1 and value2 is at 164 in field2

The term vectors returned by elasticsearch for the values using:
curl -X GET "localhost:9200/test/_doc/1/_termvectors" -H 'Content-Type: application/json' -d'
{
"fields" : ["text.field1", "text.field2"],
"offsets" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
'

are:
"value1": {
"doc_freq": 61,
"ttf": 87,
"term_freq": 2,
"tokens": [
{
"position": 165,
"start_offset": 954,
"end_offset": 962
},
{
"position": 431,
"start_offset": 2535,
"end_offset": 2543
}
]
},

"value2": {
"doc_freq": 3029,
"ttf": 72118,
"term_freq": 14,
"tokens": [
{
"position": 33,
"start_offset": 184,
"end_offset": 187
},
{
"position": 68,
"start_offset": 382,
"end_offset": 385
},
{
"position": 69,
"start_offset": 386,
"end_offset": 389
},
{
"position": 163,
"start_offset": 946,
"end_offset": 949
},
{
"position": 164,
"start_offset": 950,
"end_offset": 953
},
{
"position": 227,
"start_offset": 1354,
"end_offset": 1357
},
{
"position": 228,
"start_offset": 1358,
"end_offset": 1361
},
{
"position": 261,
"start_offset": 1522,
"end_offset": 1525
},
{
"position": 262,
"start_offset": 1526,
"end_offset": 1529
},
{
"position": 334,
"start_offset": 1958,
"end_offset": 1961
},
{
"position": 335,
"start_offset": 1962,
"end_offset": 1965
},
{
"position": 382,
"start_offset": 2257,
"end_offset": 2260
},
{
"position": 383,
"start_offset": 2261,
"end_offset": 2264
},
{
"position": 445,
"start_offset": 2629,
"end_offset": 2632
}
]
},


#2

Sorry - I should say I'm using version 6.3 Elasticsearch as this is the highest version supported in our production env (AWS Elasticsearch service)


(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.