Route query so that data for a shard is localized

ElasticSearch_Users_ · August 3, 2014, 6:37pm

Hi,

I have fairly large data and a ES cluster. Can I use some shard knowledge
to execute queries so that only data relevant to a particular shard is
fetched for that shard/node? I want to make sure that if I have a filter,
then the values in the TermFilter only hold records that are relevant to
the shard it will act upon. Is this a known problem? If so, how is it
solved?

Is there any performance implication in using the tree-like data mapping in
ES? I am evaluating it now, and I wanted to know if it is feasible to
maintain a treelike structure in ES, or just split it into multiple records
or multiple indices?

Thanks,
Sandeep

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · August 3, 2014, 7:17pm

Have you consulted the docs

about the optimizations of term lookup for TermFilter?

There are caches in use, and for term lookup, you can also use routing to
select a particular shard.

Regarding the "tree-like data mapping": ES rolls the tree notation into a
flat format to make use of the Lucene API for fields in documents. There is
no performance implication with this. If you decide to use an extraordinary
high amount of fields (>>1000), you will notice each field consumes a bit
of RAM, but this is not related to a "tree-like data mapping".

Jörg

On Sun, Aug 3, 2014 at 8:37 PM, 'Sandeep Ramesh Khanzode' via elasticsearch
elasticsearch@googlegroups.com wrote:

Hi,

I have fairly large data and a ES cluster. Can I use some shard knowledge
to execute queries so that only data relevant to a particular shard is
fetched for that shard/node? I want to make sure that if I have a filter,
then the values in the TermFilter only hold records that are relevant to
the shard it will act upon. Is this a known problem? If so, how is it
solved?

Is there any performance implication in using the tree-like data mapping
in ES? I am evaluating it now, and I wanted to know if it is feasible to
maintain a treelike structure in ES, or just split it into multiple records
or multiple indices?

Thanks,
Sandeep

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFy48Ga63bH3Q8bmOwa-sRH4yVVODOw9NhxJ0YQD8AC7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch_Users_ · August 6, 2014, 6:34am

Hi Jörg,

Thanks, really appreciate the response and the link. I will do a small PoC
with the approach given therein.

Since we are pulling data from an index, I am assuming we will be limited
the first time by disk speed.

In the cache, if the data for the field that is cached has some updates
(like a new value being added in the multi-valued field or removed), will
the purge and re-cache automatically happen?

I also think that I will need to enable the _source field for updates to
work?

Is there any value to be had by making the columns to be doc_values in this
case? I read that doc_values cannot be used for filtering purposes though.
Please confirm.

Please let me know your comments. Thanks again,

Thanks,
Sandeep

On Monday, 4 August 2014 00:47:12 UTC+5:30, Jörg Prante wrote:

Have you consulted the docs

Elasticsearch Platform — Find real-time answers at scale | Elastic

about the optimizations of term lookup for TermFilter?

There are caches in use, and for term lookup, you can also use routing to
select a particular shard.

Regarding the "tree-like data mapping": ES rolls the tree notation into a
flat format to make use of the Lucene API for fields in documents. There is
no performance implication with this. If you decide to use an extraordinary
high amount of fields (>>1000), you will notice each field consumes a bit
of RAM, but this is not related to a "tree-like data mapping".

Jörg

On Sun, Aug 3, 2014 at 8:37 PM, 'Sandeep Ramesh Khanzode' via
elasticsearch <elasti...@googlegroups.com <javascript:>> wrote:

Hi,

I have fairly large data and a ES cluster. Can I use some shard knowledge
to execute queries so that only data relevant to a particular shard is
fetched for that shard/node? I want to make sure that if I have a filter,
then the values in the TermFilter only hold records that are relevant to
the shard it will act upon. Is this a known problem? If so, how is it
solved?

Is there any performance implication in using the tree-like data mapping
in ES? I am evaluating it now, and I wanted to know if it is feasible to
maintain a treelike structure in ES, or just split it into multiple records
or multiple indices?

Thanks,
Sandeep

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aef299d8-65f2-4b34-a2ae-8c9abeb9a7b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b2c3fb0-a770-4857-9729-1da34a9baf04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Localized data with Shard Knowledge Elasticsearch	1	444	July 6, 2017
Shard Aware Routing of Query Elasticsearch	5	700	July 6, 2017
Docs about sharding and scatter/gather Elasticsearch	5	1853	July 6, 2017
Routing to a group of shards (nodes)? Elasticsearch	1	452	July 6, 2017
Choosing which shard a document can go to? Elasticsearch	10	2261	July 5, 2017

Route query so that data for a shard is localized

Related topics