Is Logstash expected to handle Unicode (UTF-8) in its configuration files?
I found an issue today where a pipeline (actually, an Elasticsearch mapping template file) which looks perfectly good fails to load because some edit process resulted in it being formatted with UTF-8 whitespace characters (specifically C2 A0, the non-breaking space 00A0).
This is of course an easy fix (and "don't do that") but not obvious to diagnose, and it seems that a NBS should be a perfectly valid whitespace character.
Well, I think I can probably answer my own question, for the benefit of others - according to the spec, a specific list of whitespace characters are accepted:
Whitespace is any sequence of one or more of the following code points: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020).
I imagine that implementations depart from the standard, but it's fair game if they reject other characters.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.