Hello all,
I would like to know what are some good options to query an index that each doc in it structured like:
field1, keyword
field2, long
field3, object, enabled: false (mostly below 1 mb but sometimes goes up to 50 mb)
what I want to achieve is,
query by field1 and field2 to retrieve field3. This one retrieves only 1 doc
query by field1 and return only field2 value. This returns multiple docs
When I tried the second option, it took too much time even if I exclude field3 from _source in the query
So, would using docvalue_fields be a solution for second option? Would it still take too much heap memory because of field3?
What do you suggest for option 1?
Welcome to the community! How much time is too much time for your first query?
Can you share the queries you are running? It might also be useful to run the queries through the Search Profiler or the Profiler API to see how long each query stage is taking.
Hello and thank you for your reply!
my first query is
size: 1
query:
bool: must:
[
term: field1: any keyword value,
match: field2: any long value
]
second query
size: 10000
query:
term: field2: any keyword value
_source: [field1]
and also query 3 which I forgot to mention about it.
In this one, we also retrieve the field 4 which is a keyword field.
size: 10000
query: bool: must
[
range: field1: gte to a long value and lte to a long val,
term: field2: a keyword value
_source :[field4, field2]
]
this query runs pretty fast actually (200 to 300 ms), how much time it takes is not a problem. What concerns me is the memory usage and errors that may occur with memory usage.
For the second and third queries, it takes more than 30 seconds and sometimes node disconnected exception occurs. So I tried setting the _source to false and using the docvalue_fields for this two queries, then it takes about 200 ms. But what concerns me, again, memory usage.
Is using the docvalue_field solves node disconnected errors in this case? Is it way more lighter than using _source?
Sorry, sending this from my mobile so shortened queries and errors may be in my text
Thanks for confirming @m4kkur0. If memory usage is your main concern I would recommend using docvalue_fields. There is a bit more detail in this section of the documentation that discusses the loading of the document when using the _source attribute.
With _source usage the entire Lucene document will be loaded:
A document’s _source is stored as a single field in Lucene. This structure means that the whole _source object must be loaded and parsed even if you’re only requesting part of it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.