Which structure of nested field is more efficient?

I'm using Elasticsearch 7.10 and I have an ES index which includes about 3M documents.
I am going to add nested field (I call it items) which has fields as below:

  • store_id (long)
  • big_category_id (integer / there are only 5 categories now)
  • timestamp (date)

Each document has about 5 items. We have to update items data everyday.

I have two plans.

  1. Add a single nested field and put all items to it with big_category_id as below.
    We have to specify big_category_id in the query.
"items": [
  {
    "store_id": 1,
    "big_category_id": 1,
    "timestamp": "2021-10-01 00:00:00"
  },
  {
    "store_id": 1,
    "big_category_id": 2,
    "timestamp": "2021-10-01 00:00:00"
  },
  ...
]
  1. Add nested fields for each big_category_id as below:
    We don't need big_category_id in the query.
"items_of_category_1": [
  {
    "store_id": 1,
    "timestamp": "2021-10-01 00:00:00"
  },
  {
    "store_id": 2,
    "timestamp": "2021-10-01 00:00:00"
  },
  ...
],
"items_of_category_2": [
  {
    "store_id": 1,
    "timestamp": "2021-10-01 00:00:00"
  }
],
...

I think the first plan is better because of its simplicity, but I don't know which is more efficient (especially when querying and fetching data) in terms of the internal structure of Elasticsearch.
Which do you think is better?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.