Boolean queries across multiple types in an index

Stephan_Seidt · October 29, 2012, 3:23pm

Hi,

I'm trying to figure out whether I'm modeling and querying my fairly large
data set correctly.

Given these conditions:

One index
Thousands of es-types which share some common fields
One of these shared fields is "bucket" (effectively partitions everything
like an S3 bucket)
Millions of documents evenly distributed across types and buckets (note:
buckets don't share any types)

Now, the most common query is of this nature:

Find documents where bucket=B and ((type=TypeA and field1=value1 and
field2>value2) or (type=TypeB and field3=value1 and field4>value2))

The characteristics are basically: Search is always bound to a single
bucket but it happens across a certain number of types within the index.

The way I translated this into the Query DSL:

{
"filtered": {
"filter": {
"and": [
{"term": {"bucket": "B"}},
{"or": [
{"and": [{"type": {"value": "TypeA"}},
{"term": {"field1": "value1"}}
{"range": {"field2": {"gt": value2}}}]},
{"and": [{"type": {"value": "TypeB"}},
{"term": {"field3": "value1"}}
{"range": {"field4": {"gt": value2}}}]}
]}
]
}
}
}

Questions:

Is this already too much wishful thinking on my side, as in, does this
kind of query perform reasonably well with the amount of data I mentioned?
Is there a better way to formulate the query?
Do I have to use filtered or can I put the and into the query's toplevel?

Regards
Stephan

--

Igor_Motov · October 29, 2012, 6:39pm

If you are mostly searching within a single bucket and buckets have
approximately the same size, it makes sense to either create an index per
bucket (if you have relatively small number of buckets) or use bucket as a
routing field. This way your queries will be limited to only a portion of
your cluster.

On Monday, October 29, 2012 11:23:38 AM UTC-4, Stephan Seidt wrote:

Hi,

I'm trying to figure out whether I'm modeling and querying my fairly large
data set correctly.

Given these conditions:

One index

Thousands of es-types which share some common fields

One of these shared fields is "bucket" (effectively partitions
everything like an S3 bucket)

Millions of documents evenly distributed across types and buckets (note:
buckets don't share any types)

Now, the most common query is of this nature:

Find documents where bucket=B and ((type=TypeA and field1=value1 and
field2>value2) or (type=TypeB and field3=value1 and field4>value2))

The characteristics are basically: Search is always bound to a single
bucket but it happens across a certain number of types within the index.

The way I translated this into the Query DSL:

{
"filtered": {
"filter": {
"and": [
{"term": {"bucket": "B"}},
{"or": [
{"and": [{"type": {"value": "TypeA"}},
{"term": {"field1": "value1"}}
{"range": {"field2": {"gt": value2}}}]},
{"and": [{"type": {"value": "TypeB"}},
{"term": {"field3": "value1"}}
{"range": {"field4": {"gt": value2}}}]}
]}
]
}
}
}

Questions:

Is this already too much wishful thinking on my side, as in, does this
kind of query perform reasonably well with the amount of data I mentioned?

Is there a better way to formulate the query?

Do I have to use filtered or can I put the and into the query's toplevel?

Regards
Stephan

--

Topic		Replies	Views
Querying different types with combined criteria Elasticsearch	1	384	July 6, 2017
Distinct queries across multiple types in one request? Elasticsearch	8	1040	July 6, 2017
Filtering based on data in multiple indexes Elasticsearch	2	347	July 6, 2017
Multiple Types within an Index Elasticsearch	14	692	July 6, 2017
Find the intersection of two different types Elasticsearch	1	418	July 6, 2017

Boolean queries across multiple types in an index

Related topics