Route query so that data for a shard is localized


(ElasticSearch Users mailing list) #1

Hi,

I have fairly large data and a ES cluster. Can I use some shard knowledge
to execute queries so that only data relevant to a particular shard is
fetched for that shard/node? I want to make sure that if I have a filter,
then the values in the TermFilter only hold records that are relevant to
the shard it will act upon. Is this a known problem? If so, how is it
solved?

Is there any performance implication in using the tree-like data mapping in
ES? I am evaluating it now, and I wanted to know if it is feasible to
maintain a treelike structure in ES, or just split it into multiple records
or multiple indices?

Thanks,
Sandeep

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

Have you consulted the docs

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism

about the optimizations of term lookup for TermFilter?

There are caches in use, and for term lookup, you can also use routing to
select a particular shard.

Regarding the "tree-like data mapping": ES rolls the tree notation into a
flat format to make use of the Lucene API for fields in documents. There is
no performance implication with this. If you decide to use an extraordinary
high amount of fields (>>1000), you will notice each field consumes a bit
of RAM, but this is not related to a "tree-like data mapping".

Jörg

On Sun, Aug 3, 2014 at 8:37 PM, 'Sandeep Ramesh Khanzode' via elasticsearch
elasticsearch@googlegroups.com wrote:

Hi,

I have fairly large data and a ES cluster. Can I use some shard knowledge
to execute queries so that only data relevant to a particular shard is
fetched for that shard/node? I want to make sure that if I have a filter,
then the values in the TermFilter only hold records that are relevant to
the shard it will act upon. Is this a known problem? If so, how is it
solved?

Is there any performance implication in using the tree-like data mapping
in ES? I am evaluating it now, and I wanted to know if it is feasible to
maintain a treelike structure in ES, or just split it into multiple records
or multiple indices?

Thanks,
Sandeep

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFy48Ga63bH3Q8bmOwa-sRH4yVVODOw9NhxJ0YQD8AC7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(ElasticSearch Users mailing list) #3

Hi Jörg,

Thanks, really appreciate the response and the link. I will do a small PoC
with the approach given therein.

Since we are pulling data from an index, I am assuming we will be limited
the first time by disk speed.

In the cache, if the data for the field that is cached has some updates
(like a new value being added in the multi-valued field or removed), will
the purge and re-cache automatically happen?

I also think that I will need to enable the _source field for updates to
work?

Is there any value to be had by making the columns to be doc_values in this
case? I read that doc_values cannot be used for filtering purposes though.
Please confirm.

Please let me know your comments. Thanks again,

Thanks,
Sandeep

On Monday, 4 August 2014 00:47:12 UTC+5:30, Jörg Prante wrote:

Have you consulted the docs

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism

about the optimizations of term lookup for TermFilter?

There are caches in use, and for term lookup, you can also use routing to
select a particular shard.

Regarding the "tree-like data mapping": ES rolls the tree notation into a
flat format to make use of the Lucene API for fields in documents. There is
no performance implication with this. If you decide to use an extraordinary
high amount of fields (>>1000), you will notice each field consumes a bit
of RAM, but this is not related to a "tree-like data mapping".

Jörg

On Sun, Aug 3, 2014 at 8:37 PM, 'Sandeep Ramesh Khanzode' via
elasticsearch <elasti...@googlegroups.com <javascript:>> wrote:

Hi,

I have fairly large data and a ES cluster. Can I use some shard knowledge
to execute queries so that only data relevant to a particular shard is
fetched for that shard/node? I want to make sure that if I have a filter,
then the values in the TermFilter only hold records that are relevant to
the shard it will act upon. Is this a known problem? If so, how is it
solved?

Is there any performance implication in using the tree-like data mapping
in ES? I am evaluating it now, and I wanted to know if it is feasible to
maintain a treelike structure in ES, or just split it into multiple records
or multiple indices?

Thanks,
Sandeep

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b2c3fb0-a770-4857-9729-1da34a9baf04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4