Filtered by IdsQuery vs by TermsQuery


(Xavier Facq) #1

Hi all,

Admit we have lot of documents with _id (String) setted by ourself and a field with the same value of the _id as Long.

I want to run a filtered query on a list of Long , so I have 2 options :

Option 1:

add a must IdsQuery with my list of ids converted as String

Option 2:

add a must TermsQuery with my list of ids as Long

Question : wich one will be the fastest or you recommend ?

thx !


(Christoph) #2

In a filter context they are both rewritten to a lucene TermsQuery, so it shouldn't make any difference.


(Xavier Facq) #3

Ok, as the Long value is converted to a String for the _id field, it may be slower ? Or does Lucene has a super faster map of documents ids ? :slight_smile:

in both cases I'm quite sure it'll make no difference but I prefer to have your expert point of view !


(Christoph) #4

Hi,

If you are using Elasticsearch 2.x, this should not make any difference since both data types are backed by the similar data structure, the inverted index. In ES 5.0 (which uses Lucene 6) this might have changed since Lucene 6 introduced the "dimensional points" feature to represent numeric values. I don't know if that has a big impact on your use case though, in this case I'd suggest to do some measuring of your own. Would be nice to share your results here.


(Xavier Facq) #5

Finally, I have converted all idsQuery to termsQuery for 2 reasons :

1°/ I think that one day we'll probably switch to auto generated _id for documents.

Recommended in the documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/5.0/tune-for-indexing-speed.html#_use_auto_generated_ids

2°/ To my mind, an index over a Long array should be faster than a String array...

I'll try to do benchmarks if possible.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.