"Incompatible encoding" when using Logstash to ship JSON files to Elasticsearch

Hello,

We have been successfully using Logstash to parse our JSON logs data and
import them to Elasticsearch database, but recently had failures on some
machines. Here's the error Logstash displays:

←[32mRegistering file input
{:path=>["D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/BridgeSoap..txt"],
:level=>:info}←[0m
←[32mPipeline started {:level=>:info}←[0m
←[31mA plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::File add_field=>{"_environment"=>"prod-ndoa",
"_application"=>"bridge_rest"},
path=>["D:/Octopus/Applications/prod-ndoa/Bridge.Rest.Host/logs/BridgeRest.
.txt"],
sincedb_path=>"D:/Octopus/Applications/prod-ndoa/Bridge.Rest.Host/logs/sincedb",tags=>["bridge_rest"],
start_position=>"end">
Error: incompatible encodings: Windows-1252 and UTF-8 {:level=>:error}←[0m

The input file is a set of JSON documents in UTF-8 encoding (with BOM). If
edit the file and remove BOM symbols, the import goes fine.

And here's the input file configuration:

input {
file {
path =>
"D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/BridgeSoap.*.txt"
sincedb_path =>
"D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/sincedb"
codec => json
start_position => "end"
}
}

If I remove codec "json", it doesn't fail but the output of course is wrong
because it treats JSON documents as plain text.

The strangest things is that on other machines it works properly (same
1.4.2 version of Logstash).

Does anyone have an idea why this might happen?

Thanks in advance

Vagif Abilov

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5865299-c16a-4cb7-b185-811d3b4c8ec0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thank you for bringing this to our attention. Can you please create an
issue at https://github.com/logstash-plugins/logstash-codec-json ?

Thanks!

On Wednesday, December 10, 2014 7:13:16 AM UTC-8, Vagif Abilov wrote:

Hello,

We have been successfully using Logstash to parse our JSON logs data and
import them to Elasticsearch database, but recently had failures on some
machines. Here's the error Logstash displays:

←[32mRegistering file input
{:path=>["D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/BridgeSoap..txt"],
:level=>:info}←[0m
←[32mPipeline started {:level=>:info}←[0m
←[31mA plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::File add_field=>{"_environment"=>"prod-ndoa",
"_application"=>"bridge_rest"},
path=>["D:/Octopus/Applications/prod-ndoa/Bridge.Rest.Host/logs/BridgeRest.
.txt"],
sincedb_path=>"D:/Octopus/Applications/prod-ndoa/Bridge.Rest.Host/logs/sincedb",tags=>["bridge_rest"],
start_position=>"end">
Error: incompatible encodings: Windows-1252 and UTF-8
{:level=>:error}←[0m

The input file is a set of JSON documents in UTF-8 encoding (with BOM). If
edit the file and remove BOM symbols, the import goes fine.

And here's the input file configuration:

input {
file {
path =>
"D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/BridgeSoap.*.txt"
sincedb_path =>
"D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/sincedb"
codec => json
start_position => "end"
}
}

If I remove codec "json", it doesn't fail but the output of course is
wrong because it treats JSON documents as plain text.

The strangest things is that on other machines it works properly (same
1.4.2 version of Logstash).

Does anyone have an idea why this might happen?

Thanks in advance

Vagif Abilov

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1fb33e9b-6fb9-49b7-ba4a-af97cf0610cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thank you Aaron, done. I've created an issue. But I'd like to find out if
there's a workaround for this problem. What's really strange that the same
Logstash installation works with similar JSON files on other machines.

Vagif

On Wednesday, December 10, 2014 4:13:16 PM UTC+1, Vagif Abilov wrote:

Hello,

We have been successfully using Logstash to parse our JSON logs data and
import them to Elasticsearch database, but recently had failures on some
machines. Here's the error Logstash displays:

←[32mRegistering file input
{:path=>["D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/BridgeSoap..txt"],
:level=>:info}←[0m
←[32mPipeline started {:level=>:info}←[0m
←[31mA plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::File add_field=>{"_environment"=>"prod-ndoa",
"_application"=>"bridge_rest"},
path=>["D:/Octopus/Applications/prod-ndoa/Bridge.Rest.Host/logs/BridgeRest.
.txt"],
sincedb_path=>"D:/Octopus/Applications/prod-ndoa/Bridge.Rest.Host/logs/sincedb",tags=>["bridge_rest"],
start_position=>"end">
Error: incompatible encodings: Windows-1252 and UTF-8
{:level=>:error}←[0m

The input file is a set of JSON documents in UTF-8 encoding (with BOM). If
edit the file and remove BOM symbols, the import goes fine.

And here's the input file configuration:

input {
file {
path =>
"D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/BridgeSoap.*.txt"
sincedb_path =>
"D:/Octopus/Applications/prod-ndoa/Bridge.Web/logs/sincedb"
codec => json
start_position => "end"
}
}

If I remove codec "json", it doesn't fail but the output of course is
wrong because it treats JSON documents as plain text.

The strangest things is that on other machines it works properly (same
1.4.2 version of Logstash).

Does anyone have an idea why this might happen?

Thanks in advance

Vagif Abilov

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07dc4a43-5ccb-45fc-bdaa-29d4e5245f3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

We use the HTTP protocol from logstash to send to Elasticsearch, and
therefore we have never had this issue.

There is a version of ES bundled with logstash, and if it doesn't match the
version of ES you are using to store the logs then you may see problems if
you don't use the HTTP protocol.

Brian

On Wednesday, December 10, 2014 3:53:30 PM UTC-5, Vagif Abilov wrote:

Thank you Aaron, done. I've created an issue. But I'd like to find out if
there's a workaround for this problem. What's really strange that the same
Logstash installation works with similar JSON files on other machines.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a86923e8-8e9f-429e-b85e-8ab8f7ab20d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Actually this has nothing to do with the Elasticsearch output plugin being
http vs. node.

Jordan has already confirmed the issue with
BOM: https://github.com/logstash-plugins/logstash-codec-json/issues/1#issuecomment-66532688

On Wednesday, December 10, 2014 4:05:38 PM UTC-8, Brian wrote:

We use the HTTP protocol from logstash to send to Elasticsearch, and
therefore we have never had this issue.

There is a version of ES bundled with logstash, and if it doesn't match
the version of ES you are using to store the logs then you may see problems
if you don't use the HTTP protocol.

Brian

On Wednesday, December 10, 2014 3:53:30 PM UTC-5, Vagif Abilov wrote:

Thank you Aaron, done. I've created an issue. But I'd like to find out if
there's a workaround for this problem. What's really strange that the same
Logstash installation works with similar JSON files on other machines.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/16ad8f2f-6f8d-4007-becc-49a2df176ac1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wow, that was quick! Now I need to find out what piece of our code puts
BOMs into a UTF-8 file.

Thanks a million!

Vagif
On Dec 11, 2014 1:16 AM, "Aaron Mildenstein" aaron@mildensteins.com wrote:

Actually this has nothing to do with the Elasticsearch output plugin being
http vs. node.

Jordan has already confirmed the issue with BOM:
https://github.com/logstash-plugins/logstash-codec-json/issues/1#issuecomment-66532688

On Wednesday, December 10, 2014 4:05:38 PM UTC-8, Brian wrote:

We use the HTTP protocol from logstash to send to Elasticsearch, and
therefore we have never had this issue.

There is a version of ES bundled with logstash, and if it doesn't match
the version of ES you are using to store the logs then you may see problems
if you don't use the HTTP protocol.

Brian

On Wednesday, December 10, 2014 3:53:30 PM UTC-5, Vagif Abilov wrote:

Thank you Aaron, done. I've created an issue. But I'd like to find out
if there's a workaround for this problem. What's really strange that the
same Logstash installation works with similar JSON files on other machines.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/WSCgVfgYCmA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/16ad8f2f-6f8d-4007-becc-49a2df176ac1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/16ad8f2f-6f8d-4007-becc-49a2df176ac1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2Bxi7%2B08ke2u%3DNmP6q_EfPwcgbtAGpu78cdWAtiPiad4JZGkOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.