We have an use case where we store data in indexes month wise. For eg. if I have created few blog posts in April 2017 it will be stored in index named as 'abc_2017.04'. So even if I write 100 blogs in april it will get stored in 'abc_2017.04' and for different months it will get stored in corresponding month wise indexes.
Another approach which we think of is to create 6 month indexes so if I write blog posts in first 6 months of year it will go in for eg., 'abc_2017.01-06'. But this index will then be having lots of data.
Which approach is better? Creating month wise indexes or creating few indexes and storing whole data in those few indexes?
How much data do you have? 100 or even 10000 blogs per month doesn't sound like much. At that rate you could probably put a whole year or more in a single index (with a replica for availability) and still not have any issues.
The volume of data is pretty important answer such questions.
As of now per month we have around 10,000 to 30,000 blogs. So if I create six month index and think of the maximum blogs stored as per current data I will have 1,80,000 posts data in the index.
What should be the limit of number of records in a given index ideally for this use case?
Of course it depends on how big those blogs are. Basically you should avoid a lot of small indexes <1GB, as well as big indexes >20GB. If I was in your situation I would probably aim for somewhere around 5GB, monitor performance of the system and adjust later if needed. Keep in mind that you can later reindex your data if necessary.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.