Bug parsing json

Hi everybody

After search a lot about this problem I get the conclusion that this is a bug for logstash. I have asked in this forum and unfortunately I didn't got any answer, so I decided to give step by step the way to reproduce the bug. Please if this is not the right place to report a bug, tell me where.

In resum I get the bug when I try to import data from a table in postgresql, that have a field type json, to an index in elasticsearch using logstash. My aim will be to get a nested field in elasticsearch index using as source the field type json in the postgres table.

To reproduce the error, please following this next stepes:

  1. create a table in postgresql:
    CREATE TABLE test_jsonfield (
    customer integer NOT NULL,
    categories_json json
    );

  2. insert 2 records in the table
    INSERT INTO test_jsonfield VALUES (1, '[{"first_level":297,"second_level":null}]');
    INSERT INTO test_jsonfield VALUES (2, '[{"first_level":585,"second_level":[1559,2445]},{"first_level":987,"second_level":[2]}]');

  3. Create the logstash configuration file
    input {
    jdbc {
    jdbc_connection_string => "jdbc:postgresql://mydomain:5432/mydatabase"
    jdbc_user => "postgres"
    jdbc_password => "mypassword"
    jdbc_paging_enabled => true
    jdbc_page_size => "50000"
    jdbc_validate_connection => true
    jdbc_driver_library => "/usr/share/elasticsearch/lib/postgresql-9.4.1208.jar"
    jdbc_driver_class => "org.postgresql.Driver"
    statement => "SELECT * FROM test_jsonfield"
    }
    }

filter {
json {
source => "categories_json"
target => "categories"
remove_field => ["categories_json"]
}
}

output {
elasticsearch {
document_id => "%{customer}"
index => "test_jsonfield_nested"
document_type => "test"
}
}

  1. Create the mapping for the index "test_jsonfield_nested"
    POST test_jsonfield_nested/
    {
    "mappings": {
    "test": {
    "properties": {
    "customer": {
    "type": "string"
    },
    "categories": {
    "type": "nested",
    "properties": {
    "first_level": {
    "type": "integer"
    },
    "second_level": {
    "type": "integer"
    }
    }
    }
    }
    }
    }
    }

  2. Check the mapping:
    {
    "test_jsonfield_nested": {
    "mappings": {
    "test": {
    "properties": {
    "categories": {
    "type": "nested",
    "properties": {
    "first_level": {
    "type": "integer"
    },
    "second_level": {
    "type": "integer"
    }
    }
    },
    "customer": {
    "type": "string"
    }
    }
    }
    }
    }
    }

  3. run logstash
    sh logstash -f test_jsonfield.conf

in this point I get the following Errors:
Settings: Default pipeline workers: 3
Pipeline main started
Error parsing json {:source=>"categories_json", :raw=>#Java::OrgPostgresqlUtil::PGobject:0x62f1e143, :exception=>java.lang.ClassCastException: org.jruby.java.proxies.ConcreteJavaProxy cannot be cast to org.jruby.RubyIO, :level=>:warn}
Error parsing json {:source=>"categories_json", :raw=>#Java::OrgPostgresqlUtil::PGobject:0x66c5829a, :exception=>java.lang.ClassCastException: org.jruby.java.proxies.ConcreteJavaProxy cannot be cast to org.jruby.RubyIO, :level=>:warn}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

The data is imported but not as a nested field.

I have expected a field name "categories" as a nested field, according to the configuration of filter "json" using in the config file "test_jsonfield.conf" of logstash and the mapping of the index.

Instead of them I get a docu with a field called "categories_json(the same field that have in the postgres table) , something like this:
"_source": {
"customer": 2,
"categories_json": {
"type": "json",
"value": "[{"first_level":585,"second_level":[1559,2445]},{"first_level":987,"second_level":[2]}]"
},

In advance thanks for your support.

Regards

Jorge von Rudno

Oh, so the top-level content of the JSON string is an array and not an object? That could definitely be a problem.

Hi Magnusbaeck,

Thanks for your replay!!!

I have tested changing the filter to use the field ""[categories_json][value]" as source:
filter {
json {
source => "[categories_json][value]"
target => "categories"
remove_field => "categories_json"
}
}

And now I ger an error:
Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. {"exception"=>#<NoMethodError: undefined method []' for #<Java::OrgPostgresqlUtil::PGobject:0x37ff123>>, "backtrace"=>["/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-event-2.3.2-java/lib/logstash/util/accessors.rb:56:inget'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-event-2.3.2-java/lib/logstash/event.rb:122:in []'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-json-2.0.6/lib/logstash/filters/json.rb:69:infilter'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/filters/base.rb:151:in multi_filter'", "org/jruby/RubyArray.java:1613:ineach'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/filters/base.rb:148:in multi_filter'", "(eval):41:infilter_func'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:267:in filter_batch'", "org/jruby/RubyArray.java:1613:ineach'", "org/jruby/RubyEnumerable.java:852:in inject'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:265:infilter_batch'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:223:in worker_loop'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:201:instart_workers'"], :level=>:error}
NoMethodError: undefined method `[]' for #Java::OrgPostgresqlUtil::PGobject:0x37ff123
get at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-event-2.3.2-java/lib/logstash/util/accessors.rb:56
[] at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-event-2.3.2-java/lib/logstash/event.rb:122
filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-json-2.0.6/lib/logstash/filters/json.rb:69
multi_filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/filters/base.rb:151
each at org/jruby/RubyArray.java:1613
multi_filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/filters/base.rb:148
filter_func at (eval):41
filter_batch at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:267
each at org/jruby/RubyArray.java:1613
inject at org/jruby/RubyEnumerable.java:852
filter_batch at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:265
worker_loop at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:223
start_workers at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:201

Regards

Jorge von Rudno

Hi Magnusbaeck,

May I ask you about the estimate time that can take to solve this issue and if perhaps you have some alternative to bypass this problems while it will solve.

Regards

Jorge

@JvonRudno as @magnusbaeck said you should check your json, this is not valid one. on the other side, we're here to help, so please don't ask for estimates.

To move forward I would recommend you testing your json values with a pipeline that is basically stdin -> filters -> stdout (codec rubydebug). when this is ok, you could introduce the database back.

hope this helps.