Boolean queries across multiple types in an index

Hi,

I'm trying to figure out whether I'm modeling and querying my fairly large
data set correctly.

Given these conditions:

  • One index
  • Thousands of es-types which share some common fields
  • One of these shared fields is "bucket" (effectively partitions everything
    like an S3 bucket)
  • Millions of documents evenly distributed across types and buckets (note:
    buckets don't share any types)

Now, the most common query is of this nature:

Find documents where bucket=B and ((type=TypeA and field1=value1 and
field2>value2) or (type=TypeB and field3=value1 and field4>value2))

The characteristics are basically: Search is always bound to a single
bucket but it happens across a certain number of types within the index.

The way I translated this into the Query DSL:

{
"filtered": {
"filter": {
"and": [
{"term": {"bucket": "B"}},
{"or": [
{"and": [{"type": {"value": "TypeA"}},
{"term": {"field1": "value1"}}
{"range": {"field2": {"gt": value2}}}]},
{"and": [{"type": {"value": "TypeB"}},
{"term": {"field3": "value1"}}
{"range": {"field4": {"gt": value2}}}]}
]}
]
}
}
}

Questions:

  • Is this already too much wishful thinking on my side, as in, does this
    kind of query perform reasonably well with the amount of data I mentioned?
  • Is there a better way to formulate the query?
  • Do I have to use filtered or can I put the and into the query's toplevel?

Regards
Stephan

--

If you are mostly searching within a single bucket and buckets have
approximately the same size, it makes sense to either create an index per
bucket (if you have relatively small number of buckets) or use bucket as a
routing field. This way your queries will be limited to only a portion of
your cluster.

On Monday, October 29, 2012 11:23:38 AM UTC-4, Stephan Seidt wrote:

Hi,

I'm trying to figure out whether I'm modeling and querying my fairly large
data set correctly.

Given these conditions:

  • One index
  • Thousands of es-types which share some common fields
  • One of these shared fields is "bucket" (effectively partitions
    everything like an S3 bucket)
  • Millions of documents evenly distributed across types and buckets (note:
    buckets don't share any types)

Now, the most common query is of this nature:

Find documents where bucket=B and ((type=TypeA and field1=value1 and
field2>value2) or (type=TypeB and field3=value1 and field4>value2))

The characteristics are basically: Search is always bound to a single
bucket but it happens across a certain number of types within the index.

The way I translated this into the Query DSL:

{
"filtered": {
"filter": {
"and": [
{"term": {"bucket": "B"}},
{"or": [
{"and": [{"type": {"value": "TypeA"}},
{"term": {"field1": "value1"}}
{"range": {"field2": {"gt": value2}}}]},
{"and": [{"type": {"value": "TypeB"}},
{"term": {"field3": "value1"}}
{"range": {"field4": {"gt": value2}}}]}
]}
]
}
}
}

Questions:

  • Is this already too much wishful thinking on my side, as in, does this
    kind of query perform reasonably well with the amount of data I mentioned?
  • Is there a better way to formulate the query?
  • Do I have to use filtered or can I put the and into the query's toplevel?

Regards
Stephan

--