We're trying to parse multiline json file from an s3 bucket which results in "_jsonparsefailure".
It reads the file line by line. The codec we're using is json_lines.
The json_lines codec expects each line to be a complete JSON object. If your JSON object is pretty-printed across multiple lines you will need to use a multiline codec.
@Badger Thanks for your reply. With large files, wouldn't multiline codec reach a limit? I remember it was breaking down the file since it was too large and then the json would then not make sense
The multiline codec has options to set the limit on the number of bytes and lines that can be combined. The defaults are 500 lines and 10 megabytes. You are free to increase them if you need to.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.