Hello i have a case that people will upload their dataset excels csvs and other with python. their data will have different types, i want them to be able to search from all data. I know create one index in every upload but thousand will be created. In the Index patterns i use * and i combine them all.
Is it better to have one large index or thousands of smalls???
I will create i cluster in production.
2 vms with CPU: 16 CPU cores
Memory: 32GB RAM
Storage : 1 TB SSD with minimum 3k dedicated IOPS
There is ILM to take care of the index size, you can ask to rollover automtically when 50Gb exceeded
As much as possible if you can normalize types that would be better
i read about ilm but this means i have to define template for index in advance but i dont know the fields for my indexed documents these would be generated dynamic when people add documents.
do i get it well?
The template mapping may not be restrictive on new fields
you can define some fields, some dynamic mapping ... etc, or you can have a template without mapping just to force ILM policy
Of course i will suggest that there will be a normalization on fields so they use the same name in same fields and not create more. Are there going to be performance issue on search if we reach more than 1000. can i use _source field and put them all there besides some id fields? is this gone help? My index pattern says i already have 108 fields but i am still in the begging.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.