When I don’t set the delimiter field, I have other problems, but at least it separates events correctly. However, when I put delimiter => "\n" (which is the default value, it shouldn’t change the output), it parses as one big event.
This leads me to think that there might be a problem with the way that the "delimiter" field for the line codec handles \n and/or \r. Is there something I might have overlooked?
Note that your data and your delimiters have opposite endianness. So if you get lines, all of your 16 bit characters have the bytes swapped. I believe it would be possible to fix that in a ruby filter.
Looking at the code, the tokenizer knows nothing about the charset. So you need your delimiter to be '\r\0\n\0'.
There is a PR open to add support for \0, but at the moment I think you are out of luck. I do not know if @yaauie plans to merge this in a future version.
You could get rid of the codec and tokenize it yourself in a ruby filter.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.