How to index CSV files with big number of columns/fields

Ilia · April 16, 2022, 3:19pm

Hi all,

We have a lot of csv files with varied number of columns. The majority just have few columns but some have more than 1000 columns. At first we decided to consider each column as a field but the index.mapping.total_fields.limit will be violated by some files. We can increase this parameter, of course, but that will be something like the Sorites paradox. I mean if we increase this number to 1001 what if a new file has 1002 columns and in general if we increase it to N what if a new file has N+1 columns and so on. This problem is besides the performance issues that potentially will be caused by big number of fields.

I know we can use the Flattened field type but I guess the performance will be decreased compared to the column as field mode. So we caught half-way between the two options! I'm curious to know if there is a third option? And in general what is the best practice for such a cases in Elasticseach?

system · May 14, 2022, 3:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CSV Logstash filter for big files Logstash	5	1825	January 3, 2018
Document design for indexing csv files Elasticsearch	4	511	July 5, 2017
Best way to Index and Map large csv files with Python into Elasticsearch Elasticsearch	2	1946	July 2, 2019
12K fields in the mapping Elasticsearch	7	487	February 8, 2022
How to list out / export all the fields along with data types Elasticsearch	8	1202	March 28, 2023

How to index CSV files with big number of columns/fields

Related topics