Different results based on the ordering of the search terms


(Jesper Skovgård Nielsen) #1

Hi All!

I'm facing a problem where ElasticSearch return different results for different orderings in the query. I have a document with the keyword "newkeyword" present, if I search for "newkeyword stnahoeu" it does not find the document, however "stnahoeu newkeyword" does yield the expected results. You can see the actual queries here:

nomatchlast-pretty.json: https://gist.github.com/nulpunkt/ab928ae0f28e86662e8819e70cc96334
nomatchfirst-pretty.json: https://gist.github.com/nulpunkt/3dd19b8c187efbc962ee3f0eef5b491d

These are the specific results I'm getting:

curl localhost:9200/skyfish/document/_search -d@/tmp/nomatchlast-pretty.json
{"took":11,"timed_out":false,"_shards":{"total":8,"successful":8,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

curl localhost:9200/skyfish/document/_search -d@/tmp/nomatchfirst-pretty.json
{"took":10,"timed_out":false,"_shards":{"total":8,"successful":8,"failed":0},"hits":{"total":1,"max_score":null,"hits":[{"_index":"skyfish-v0","_type":"document","_id":"15818820gid576878","_score":null,"_timestamp":1477656482706,"_source":{"id":"15818820gid576878","media_id":16188086,"unique_media_id":15818820,"group_id":576878,"duration":null,"orientation":"vertical","resolution":"other","width":0,"height":0,"copyright":"","byline":"","company_id":210144,"dimension":"X","status":"released","created":"2016-10-28T11:40:32Z","camera_created":null,"media_type":2,"bucket_id":9,"filename":"3.txt","title":null,"keywords":["font","tattoo","vector","numbers","handwritten","alphabet","type","text","set","ink","read","symbol","elements","character","graphic","black","abc","pack","collection","design","letters","art","style","old school","capital","cartoon","hipster","stone","newkeyword"],"uploaded_by":"Jesper Skyfish","file_mimetype":"text\/plain; charset=us-ascii","file_disksize":12},"sort":["3.txt",15818820]}]}}

Does anyone have a clue as to what I'm doing wrong?

I should note this is a single node test cluster, so all shards are available on the node I'm searching on.

Regardning versions:
curl localhost:9200 { "name" : "Test Cluster", "cluster_name" : "trumanelastic", "version" : { "number" : "2.2.0", "build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe", "build_timestamp" : "2016-01-27T13:32:39Z", "build_snapshot" : false, "lucene_version" : "5.4.1" }, "tagline" : "You Know, for Search" }

Best Regards.


(Xavier Facq) #2

Hi,

What is the field containing "newkeyword" ? Be aware of queries like : "match_phrase_prefix" it may be your problem.

You should also considere MultiMatchQueries in order to simplify your queries :wink:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html

Bye,
Xavier


(Jesper Skovgård Nielsen) #3

Hi,

It is the field "keywords", which is a list of strings, so the match_phrase_prefix should not be a problem.

Yes, we should totally use MultiMatchQueries :smile:

Best Regards.


(Xavier Facq) #4

I'm not sure you can set an array as query in a match query :

"match":{  
    "keywords":{  
        "operator":"and",
        "query":[  
            "stnahoeu",
            "newkeyword"
        ]
    }
}

should be :

"match":{  
    "keywords":{  
        "operator":"and",
        "query": "stnahoeu newkeyword"
    }
}

no ?


(Jesper Skovgård Nielsen) #5

That seems to solve the problem, thank you very much! :slight_smile:


(Xavier Facq) #6

Good ! So you can mark this post as resolved :wink:


(system) #7