Recently I started to migrate some data into Elasticsearch using the Python client.
The data is already structured in json files and needs some processing like:
lowercase all the values
handle value inconsistencies for country field( ex: "United States of America" and "USA")
add some extra fields
In this case, is still worth it to use Logstash( from what I have seen in documentation and tutorials is mainly used for formatting log files using grok) or is better to make all the processing in python and then add the data into Elasticsearch?
may be we can help each other :). I am good in logstash but zero in python.
but yes you can do all three bullet point you listed in logstash.
in sort version
mutate - lowercase => [fieldname]
if ([country] /United Stats of America to USA/ ) { change to USA }
mutate - add_field
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.