Hey, I have a vast amount of csv files that I want to index. My first idea how a mapping for those files could look like is a simple array, where each array element is a list of n strings, one for each column, so each array element is one record of the csv.
Unfortunately I have a lot of different schemes in my csv's. This would mean one mapping per scheme. While I am not sure if this approach is bad for performance, it is definitly bad for usability in my use case, where we filter a lot based on the document type (and we don't need such a fine granular document filter where we respect the different schemes)
So another approach I was thinking of is a nested array. The first one contains the records, as in the first approach. But instead of having a list of strings as elements we could use another array, which elements then would be a pair of strings: column_name and column_content. But in this case a query on the column names becomes much more complicated.
Actually I also thought of a third approach. One array for the records, but in this record ALL columns from ALL csv files and fill each one either with the real value if the csv has this column or a NULL value if it doesnt. But the number of distinct columns is just too big (around 200, whereas the typical csv has less than 10).
So do you have any ideas what mapping would be useful for me?