If you are mostly searching within a single bucket and buckets have
approximately the same size, it makes sense to either create an index per
bucket (if you have relatively small number of buckets) or use bucket as a
routing field. This way your queries will be limited to only a portion of
your cluster.
On Monday, October 29, 2012 11:23:38 AM UTC-4, Stephan Seidt wrote:
Hi,
I'm trying to figure out whether I'm modeling and querying my fairly large
data set correctly.
Given these conditions:
One index
Thousands of es-types which share some common fields
One of these shared fields is "bucket" (effectively partitions
everything like an S3 bucket)
Millions of documents evenly distributed across types and buckets (note:
buckets don't share any types)
Now, the most common query is of this nature:
Find documents where bucket=B and ((type=TypeA and field1=value1 and
field2>value2) or (type=TypeB and field3=value1 and field4>value2))
The characteristics are basically: Search is always bound to a single
bucket but it happens across a certain number of types within the index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.