Elasticsearch exact term query different unreliable results


(Aguspagnoni) #1

Hi there, thanks for reading :slight_smile:

I'm new to elasticsearch queries, tried reading but can't figure out something I believe due to either lack of knowledge of how ES works under the hood or just using the tool wrongly.

I'm doing unit tests that do the following:

1 - Create index (default shards and replicas)
2 - Put 10 doucments
3 - Query for exact matching values
4 - Delete the index

The problem is that i cannot explain ES results.
queries are of this kind
query -> constant_score -> filter -> bool -> must -> term -> provider -> A
And the curious thing is that the test fails on the first time I run the query, sometimes the next one too, but the third time I run the query it works thereafter Ok.
When it fails the first time, it fails BIG time, like if query is provider A it brings provider Z (and provider has the default analyzer, if any, of a string)

Proof of queries and responses below, data about the mapping and index below the proof.

2016-12-06 13:30:59 -0300: GET http://localhost:9200/scrapper-development-event/event/_search [status:200, request:0.002s, query:0.002s]
2016-12-06 13:30:59 -0300: > {query:{constant_score:{filter:{bool:{must:[{term:{provider:favacard}}],must_not:[]}}}}}
2016-12-06 13:30:59 -0300: < {took:2,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:1,max_score:1.6931472,hits:[{_index:scrapper-development-event,_type:event,_id:AVjQF-p8NUmEYZd_pvLX,_score:1.6931472,_source:{created_at:null,updated_at:2016-12-05T17:46:18.105Z,provider:amex, ..}}]}}
2016-12-06 13:30:59 -0300: GET http://localhost:9200/scrapper-development-event/event/_search [status:200, request:0.002s, query:0.001s]
2016-12-06 13:30:59 -0300: > {query:{constant_score:{filter:{bool:{must:[{term:{provider:favacard}}],must_not:[]}}}}}
2016-12-06 13:30:59 -0300: < {took:1,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:1,max_score:1.0,hits:[{_index:scrapper-development-event,_type:event,_id:AVjQF-qnNUmEYZd_pvLa,_score:1.0,_source:{..,provider:mastercard,..}}]}}

(byebug) asd = Event.search({query:{constant_score:{filter:{bool:{must:[{term:{provider:favacard}}],must_not:[]}}}}})
2016-12-06 13:31:25 -0300: GET http://localhost:9200/scrapper-development-event/event/_search [status:200, request:0.007s, query:0.001s]
2016-12-06 13:31:25 -0300: > {query:{constant_score:{filter:{bool:{must:[{term:{provider:favacard}}],must_not:[]}}}}}
2016-12-06 13:31:25 -0300: < {took:1,timed_out:false,_shards:{total:5,successful:5,failed:0},hits:{total:3,max_score:1.0,hits:[{_index:scrapper-development-event,_type:event,_id:AVjQF-ofNUmEYZd_pvLS,_score:1.0,_source:{..,provider:favacard,..}},{_index:scrapper-development-event,_type:event,_id:AVjQF-qGNUmEYZd_pvLY,_score:1.0,_source:{..,provider:favacard,..}},{_index:scrapper-development-event,_type:event,_id:AVjQF-oxNUmEYZd_pvLT,_score:1.0,_source:{..,provider:favacard,..}}]}}

Sorry for the output, it's as clear as I can get with this editor without getting fancy (i took out data that wasn't relevant from the hit documents, to make clearer what the difference is)

Data about mapping and index:

{mappings=>
{event=>
{properties=>
{activity=>{type=>string},
created_at=>{type=>date, format=>strict_date_optional_time||epoch_millis},
credential_id=>{type=>string},
description=>{type=>string},
document_type=>{type=>string},
login=>{type=>string, fields=>{login=>{type=>string}, raw=>{type=>string, analyzer=>snowball}}},
provider=>{type=>string},
scrapping_date=>{type=>date, format=>strict_date_optional_time||epoch_millis},
stablishment_number=>{type=>string},
status=>{type=>string},
updated_at=>{type=>date, format=>strict_date_optional_time||epoch_millis}}}}},

Hope I explained myself if not ping me back for more info.

Thanks in advance


(Alexander Reelsen) #2

Hey,

can you try and execute the refresh API right before executing your queries to make sure all documents are visible for search.

--Alex


(Aguspagnoni) #3

Alexander, thanks for the interest.

Tried refreshing it right before the query, now it works :smile:

Just for the sake of knowing more (:nerd:). How expensive is this refresh? the docs says it's done periodically, could read how often.

Anyways, this removes my doubts of inconsistent searches, so that's a really good point.

Thanks!


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.