I have many fields (~800) in my CSV file. Most are integers while few are strings.
I do have a mapping file where I list the type. But it is not feasible to list every field and its type since it keeps getting updated. If I miss any field then it gets indexed as string. Is there any way to say treat every field as integer by default unless the ones I specify as string?
That way I only need to specify the few string fields in my mapping file.
If I'm not mistaken, that indeed becomes the question.
This would mean that in logstash filter I convert each and every field to integer. I have ~800 of those out of which 8-10 are strings and rest all integers. I don't want to define those 790 odd fields as integers.
Hope that clears it up. Thanks.
Whatever script you want apply to detect which fields are numeric or not, I'd recommend doing that in logstash.
You can also use an ingest pipeline to convert all the fields as number and then if it fails catch the error and do nothing to leave the field as a string.
David,
Thanks for your responses.
Currently I am using a CSV filter to read the data in logstash.
Are you suggesting to use some ingest pipeline AFTER this step?
CSV filter gives me an option to convert the type of fields but I don't want to do that for every field.
-Thanks
Nikhil
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.