Hi,
I have a need for a kind of "join" with immediate indexing and so I want to
keep the end of the relationship which is not searchable in my RDBMS and
query first the RDBMS get the list of ids for that entity which are
applicable and then use the "terms" search as one of the parameter when
querying my heavy-load document (which of course has an otherEntity_id
field on it).
My question is what is a rough upper bound of the number of values I can
put in the "terms" search ("in" clause) for a specific string field?
There isn't really a limit, but the more you have, the slower it will be.
The way the terms filter work is by iterating over all the terms and
building an bitset of the docs matching. The result is cached so if you use
the same set of ids again, you will get fast response.
One way to improve that is to actually build a bool filter composed of
single term filter (which are cached automatically individually), so each
id will have its own cached result. This can become quite expressive though
in terms of building the filter, thats why there is an improvement in
master to allow for "execution" mode in terms filter, see more here: Query DSL: Add different execution models for terms filter · Issue #1568 · elastic/elasticsearch · GitHub.
Hi,
I have a need for a kind of "join" with immediate indexing and so I want
to keep the end of the relationship which is not searchable in my RDBMS and
query first the RDBMS get the list of ids for that entity which are
applicable and then use the "terms" search as one of the parameter when
querying my heavy-load document (which of course has an otherEntity_id
field on it).
My question is what is a rough upper bound of the number of values I can
put in the "terms" search ("in" clause) for a specific string field?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.