Index management

Hello i have a case that people will upload their dataset excels csvs and other with python. their data will have different types, i want them to be able to search from all data. I know create one index in every upload but thousand will be created. In the Index patterns i use * and i combine them all.
Is it better to have one large index or thousands of smalls???

I will create i cluster in production.
2 vms with CPU: 16 CPU cores
Memory: 32GB RAM
Storage : 1 TB SSD with minimum 3k dedicated IOPS

will be enough?

All depend on how much data will be loaded on a daily basic and the retention period !
Also it's not recommanded that shard exceed 50Gb of size

lets suppose that each day 5 small excels will be uploaded so 5 indeces every day will be created but i will not erase them until a few years.

If i use one index per file with upload-date*
i will have thousands of indexs
if i use one big it will become more than 50G

what is the best approach?

another issue is the types, types will become too much. Should i tell them to upload files with same column's which means same types?


There is ILM to take care of the index size, you can ask to rollover automtically when 50Gb exceeded
As much as possible if you can normalize types that would be better

thank you for your quick response

i read about ilm but this means i have to define template for index in advance but i dont know the fields for my indexed documents these would be generated dynamic when people add documents.
do i get it well?

The template mapping may not be restrictive on new fields
you can define some fields, some dynamic mapping ... etc, or you can have a template without mapping just to force ILM policy

Of course i will suggest that there will be a normalization on fields so they use the same name in same fields and not create more. Are there going to be performance issue on search if we reach more than 1000. can i use _source field and put them all there besides some id fields? is this gone help? My index pattern says i already have 108 fields but i am still in the begging.

I learned about flattened fields, but to map my dynamically created fields in the flattened field?


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.