Index management

antoteo · November 9, 2020, 9:08am

Hello i have a case that people will upload their dataset excels csvs and other with python. their data will have different types, i want them to be able to search from all data. I know create one index in every upload but thousand will be created. In the Index patterns i use * and i combine them all.
Is it better to have one large index or thousands of smalls???

I will create i cluster in production.
2 vms with CPU: 16 CPU cores
Memory: 32GB RAM
Storage : 1 TB SSD with minimum 3k dedicated IOPS

will be enough?
thanks

ylasri · November 9, 2020, 9:14am

All depend on how much data will be loaded on a daily basic and the retention period !
Also it's not recommanded that shard exceed 50Gb of size

antoteo · November 9, 2020, 9:23am

lets suppose that each day 5 small excels will be uploaded so 5 indeces every day will be created but i will not erase them until a few years.

If i use one index per file with upload-date*
i will have thousands of indexs
if i use one big it will become more than 50G

what is the best approach?

another issue is the types, types will become too much. Should i tell them to upload files with same column's which means same types?

thanks

ylasri · November 9, 2020, 9:35am

There is ILM to take care of the index size, you can ask to rollover automtically when 50Gb exceeded
As much as possible if you can normalize types that would be better

antoteo · November 9, 2020, 12:54pm

thank you for your quick response

i read about ilm but this means i have to define template for index in advance but i dont know the fields for my indexed documents these would be generated dynamic when people add documents.
do i get it well?

ylasri · November 9, 2020, 12:57pm

The template mapping may not be restrictive on new fields
you can define some fields, some dynamic mapping ... etc, or you can have a template without mapping just to force ILM policy

antoteo · December 1, 2020, 6:47am

Of course i will suggest that there will be a normalization on fields so they use the same name in same fields and not create more. Are there going to be performance issue on search if we reach more than 1000. can i use _source field and put them all there besides some id fields? is this gone help? My index pattern says i already have 108 fields but i am still in the begging.

antoteo · December 2, 2020, 12:00pm

I learned about flattened fields, but to map my dynamically created fields in the flattened field?

thanks

system · December 30, 2020, 12:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Maximum Number of Indexes Elasticsearch	5	8564	December 27, 2016
Large number of indexes Elasticsearch	4	2910	July 5, 2017
Limits on number of indices Elasticsearch	3	361	July 6, 2017
Index design question Elasticsearch	1	293	July 6, 2017
Design of data structure: one big index vs many smaller indexes Elasticsearch	14	8111	March 8, 2019

Index management

Related topics