I'm using Elasticsearch 7.10 and I have an ES index which includes about 3M documents.
I am going to add nested field (I call it items
) which has fields as below:
- store_id (long)
- big_category_id (integer / there are only 5 categories now)
- timestamp (date)
Each document has about 5 items. We have to update items data everyday.
I have two plans.
- Add a single nested field and put all items to it with big_category_id as below.
We have to specify big_category_id in the query.
"items": [
{
"store_id": 1,
"big_category_id": 1,
"timestamp": "2021-10-01 00:00:00"
},
{
"store_id": 1,
"big_category_id": 2,
"timestamp": "2021-10-01 00:00:00"
},
...
]
- Add nested fields for each
big_category_id
as below:
We don't need big_category_id in the query.
"items_of_category_1": [
{
"store_id": 1,
"timestamp": "2021-10-01 00:00:00"
},
{
"store_id": 2,
"timestamp": "2021-10-01 00:00:00"
},
...
],
"items_of_category_2": [
{
"store_id": 1,
"timestamp": "2021-10-01 00:00:00"
}
],
...
I think the first plan is better because of its simplicity, but I don't know which is more efficient (especially when querying and fetching data) in terms of the internal structure of Elasticsearch.
Which do you think is better?