Hi, I seem to be stuck formulating a certain query on a denormalized matrix.
The original dataset looks like this:
id fieldA fieldB fieldC
"0001" 0.1 0.2 0.3
"0002" 0.4 0.5 0.6
...
In such a structure I can easily find all ids for which multiple fields fulfill a certain condition by using multiple filters in my query and then just returning the id
field of matching documents. Something like this (untested):
"_source": ["id"],
"query": {
"bool": {
"filter": [
{"range": {"fieldA": {"gt": 0.5}}},
{"range": {"fieldB": {"gt": 0.3}}}
]
}
}
Unfortunately, this dataset has far too many ids and fields to be stored in this way, so we denormalized it:
id field value
"0001" fieldA 0.1
"0001" fieldB 0.4
"0002" fieldA 0.2
"0002" fieldB 0.5
...
This structure works great for aggregations, but filtering on multiple conditions like above seems impossible. I can formulate a query that does (feature == fieldA AND value > 0.5) OR (feature == fieldB AND value > 0.3)
, where I get all IDs for which at least one of the field conditions holds true. Using a "AND" instead of "OR" will also not work, because a document cannot contain a the field with value fieldA and fieldB at the same time.
I thought about maybe using the OR query from above, then terms aggregate and check which of the buckets is full (=> all conditions matched), but that seems awkward.
Do you have any suggestions?