Hello,
I am having trouble finding out if and how the following use case can be done with ElasticSearch.
I have an elastic cluster index with Real Estate properties. Some of these properties are the same property, but originate from a different source.
For example, the following 3 properties. The first and the last property are the same property (as can be seen by the BundleID), but are from different sources.
Note that sources have different reliability scores.
{
"BundleID": "abcdef",
"Address": "Streetname 18",
"Price": 140000,
"Source": "source_a",
"SourceReliability": 100
},
{
"BundleID": "qwert",
"Address": "Other street 5",
"Price": 300000,
"Source": "source_a",
"SourceReliability": 100
},
{
"BundleID": "abcdef",
"Address": "Streetname 18",
"Price": 160000,
"Source": "source_b",
"SourceReliability": 20
}
Now comes the issue:
I only want to return the property in a bundle with the highest reliability score.
Not all users are allowed to search from all sources. So, imagine 3 users perform the same search request, searching for all available properties.
User 1 (may search in source_a AND source_b). The first and the second property should be returned. (the last should not be returned, because the first property is in the same bundle but has a higher reliability score)
User 2 (may search in source_a). The first two properties should be returned.
User 3 (may search in source_b). Only the last property should be returned.
So, as you may notice, most difficult part of this use case is the bundling together of documents with the same BundleID.
I have tried to generate Term aggregates on the BundleID and then return the first value using top_hits. But, when using this technique, I am not able to Sort the returned buckets. If I want my results to be returned sorted by
address, this is not possible (as far as I know).
Another thing I have tried is using the following objects:
{
"BundleID": "abcdef",
"Address": "Streetname 18"
"properties": [
{
"Price": 140000,
"Source": "source_a",
"SourceReliability": 100
},
{
"Price": 160000,
"Source": "source_b",
"SourceReliability": 20
}
]
},
{
"BundleID": "qwert",
"Address": "Other street 5",
"properties": [
{
"Price": 300000,
"Source": "source_a",
"SourceReliability": 100
}
]
}
So, I have bundled the properties in my index, so I don't have to group the results using the query. But, this leads to the following issue:
User 2 (that could only search in source_a), is not allowed to search in the property for source_b. So, when this user searches all properties with a price higher than 150000, both properties are returned.
But, this is not allowed, because the first property is returned because a value from source_b matches.
Can anybody help me with this issue? Is there a query that can handle my use case? Or, should I change my document objects perhaps?
Thanks in advance.
With kind regards,
Martin