Checking our dependencies, IBM865 (CP 865) is defined, but it's no official HTML5 encoding, that's why it's not included in the tables
A many other IBM encodings are missing as well. Please open a github issue about missing codecs in filebeat.
Potential workaround. This is quite hacky and might actually not work out (I'd prefer to fix the bug in filebeat):
As filebeat already reads the contents and tries to serialize it to UTF-8, we'd have to use some 8bit code map, which is ASCII compatible for all values up to 0x7f. Reading with utf-8 codec might combine 2 consecutive characters, getting you something non-reconstructible (plus, it might insert invalid-code-point control characters). You can configure iso-8859-1 (check the lib it's actually windows-1252, but this should be no problem). The official code map of iso-8859-1 does not specify all required mappings, but there are some code points defined in the decoder source code it seems. Now you will have some invalid characters. Next, one can use the translate or ruby filter to fix wrong code points. For example in CP865 the ø character has the code map ID 0x9B (155), and code point 00F8 (unicode). In the windows 152 mapping the code map ID 0x9B has the unicode code point 203a. That is, we can add a mapping of u+203a => u+00F8 to our translation table and this way create the correct utf-8 encoded text.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.