Hey, everyone,
We are currently building some search engine for our internal data and we are using Elasticsearch for that. Our data looks like this:
- Object A: {attribute_a_1, attribute_a_2} - 100M of object A;
- Object B: {attribute_b_1, attribute_b_2} - 200M of object B;
- Object C: {attribute_c_1, attribute_c_2} - 50M of object C.
Now the relationships are:
- Each of object A can have none or one or many objects B;
- Each of object A can have none or one or many objects C.
My goal is to have index which can be really fast searched for:
- Objects A - by using its own and other objects attributes;
- Objects B - by using its own and other objects attributes.
Example:
Find all objects B where attribute_a_1="something", attribute_b_2="something", attribute_c_1="something".
Currently we are using nested structures:
PUT object_a
{
"mapping": {
"properties": {
"attribute_a_1": {...},
"attribute_a_2": {...},
"object_b": {
"type": "nested",
"properties": {
"attribute_b_1": {...},
"attribute_b_2": {...}
}
},
"object_c": {
"type": "nested",
"properties": {
"attribute_c_1": {...},
"attribute_c_2": {...}
}
}
}
}
}
Performance isn't bad, but having some objects A, which contains hundreds of thousands of objects B and object C, really slow down the querying. We know that this design costs a lot of resources to update data, but we don't do often updates, so this is not a problem. Any suggestions how can we improve our index design? Maybe we are thinking all wrong. I would really appreciate any suggestions. Thank you in advance