We have an indexing which is growing around 1.4 TB per day.
this daily index have 4 million documents almost.
each daily index has 6 primary shards and no replicas.
we have 2 physical server (Ubuntu 20.04.2 LTS)
server A : 3 master only node and 3 data node.
server B : 2 master only node and 2 data node and 1 data-master eligible node.
we used Elasticsearch 7.12.0 and nodes was done with Docker, Figured out the Java heap of 31 GB for each nodes.
about the use case: write is heavy and indexing data currently it is running all time and is pretty fast.
Its a platform which will be used by 100 members at least. but may not be concurrent. Search queries are performed for a period of one month.
documents generally immutable and do not update.
some configs for nodes are:
- "cluster.info.update.interval=2m" - "index.merge.scheduler.max_thread_count=1" - "http.max_content_length=1536mb" - "index.refresh_interval=60s" - "bootstrap.memory_lock=true"
What we really want to find out if, will be the hourly indexes is the better option than a single huge daily index?
I want to thank you in advance.