Hi everyone,
Disclosure
- I'm new to ES (one week) so feel free to correct me anywhere.
- I am running a proof of concept and have done zero perfomance optimization, I want to check if I'm on the right track before performance tunning.
My use case is as follows:
- Set of data which can be categorized as master-detail. There are around 4K masters with 10M details, so each master has on average 2500 details.
- I want to create a "search" page displaying possible facets to the user. For each facet I want to display the amount of Master elements which have refinements .
- I may have between 20-30 facets
- I want it fast
I toyed around with ES and started plain simple. I just create one document in ES foreach detail (so 10M documents in total). On each document I include both the master and the detail information (so yes, I repeat a lot of information 2500 times).
With this structure, I can create a query like this:
GET /{index}/{type}/_search
{
"size": { ... },
"sort": { ... },
"query": { ... },
"aggs" {
# This structure is repeated 20-30 times, once for each facet I want to allow filtering by
"first_filter": {
"terms": {
"field": "first_filter_field"
},
"aggs": {
"distinct_master_field_id": {
"cardinality": {
"field": "master_field_id"
},
"second_filter": {
"terms": {
"field": "second_filter_field"
},
"aggs": {
"distinct_master_field_id": {
"cardinality": {
"field": "master_field_id"
}
}
....
}
}
}
Thing is, I'm far from the performance I would like to have, it takes seconds and I was heading for millis. My question is: Is this the best/good way to organize and query this information with such a use case?
Considerations:
- [Parent child relationships][1] - This data structure seems to match my scenario, but I am not worried about modifying the index and am really concerned about performance. According to documentation performance is worse with parent-child structures
- Am I abusing cardinality sub-aggregates? I'm always applying the same cardinality sub-aggregate... maybe there's a better way to organize the query
- The option to grow horizontally is always there, don't want to go there from the very beginning (altough I may have to...)
Thank you all for your time. Any feedback will be appreciated!
Xavi
[1]: https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html