Hi, I am currently taking care of the development of an application using elastic search for various types of queries.
Currently we only have one mapping in one index, this mapping represent attributes, they are basically just a piece of data, for example a string, a number, a date, etc.
These attributes are part of another object in the application called "Resource". So a resource has many attributes. All our elastic search documents are attributes and they carry various information such as the id of the resource they belong to, their id and their value.
My question is, how can I run a query such as: I'd like to get all resources that have attribute X with value VALUE1 AND/OR attribute Y with value VALUE2?
That would be a terms aggregation on the field resource_id to return a set of unique resource ids.
have attribute X with value VALUE1
Use a match query to test simple field-has-value type expressions
AND/OR
Express Boolean logic like this by putting your multiple match query clauses inside a bool query. Roughly speaking OR type expressions go in the bool query's should array while AND expressions are grouped in a must array.
First of all thank you very much for your response!!!
I fear I have already tried this solution, unless I misunderstood it, but it cannot work because the AND query will always return 0 results. The OR query would be fine with it but when we apply an AND we are basically asking: "give me all resources that have attribute X (which corresponds to an elastic document) with value VALUE1 AND attribute Y (which is another elastic document whose only connection with the first one is having the same resource id) with value VALUE2.
Basically the whole problem is the different granularity between the documents (attributes) and the container we are sometimes interested in (resources)
Ah. My bad. I see the problem is that the attributes you are testing are in different docs.
To solve this you'll need to bring the related data closer together (assuming you the worst case scenario of having the related data spread across multiple shards). By "closer together" I mean either
Wait sorry, I must have explained myself incorrectly. I don't know what shards have to do with querying. All the attributes are separate JSON documents because that is the unit that we index. We index millions of documents that are the millions of attributes in the system.
Parent child indexing I am not sure it is what we are looking for because a resource in itself is nothing. A resource is nothing more than a container of attributes, but if you think introducing a resource mapping type acting as parent for attributes would be the right way to go to achieve this type of queries I'll give it a go. How would I set up the aforementioned query in case resource was a parent and attributes were children?
In a word, distribution. Shards are designed for distribution. We can join on documents that are in the same low-level Lucene segment (see nested ) or the same machine (see parent/child) but we will not attempt to service joins that span network boundaries due to the cost involved.
Parent/child doc relationships are queried using parent/child queries and the link I shared describes how to set this up.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.