I have just uploaded 4 years of 4GB data into aws elastic cluster.
I have indexed documents by day.
Since, i had stuck with default shard size of 5, now it has grown into more than 7300 primary shards.
I had also stick with the dynamic mapping of es, but i wanted to make my own template now.
I have thought of different options to decrease the shard size,
- Reindexing data one index at a time
- Need to write a script to reindex each one
- I don't think i can change the mapping of the index.
- Still, i have to index one by one.
- Bulk indexing.
- I don't think it would be good idea to add a index settings/mapping line to each document record. (There are roughly 2 million documents)
- Uploading the data again
4.a Using Logstash with es in input and output
4.b Uploading from scratch