Unable to index nested xml rest api data

Hi All,

WE are trying to integrate sciencelogic performance data into elasticsearch, for which we are trying through logstash below is my config

input {
  http_poller {
    urls => {
      url => "https://poc.sciencelogic.com/api/data_performance_raw/device/dynamic_app?endstamp=1553188800&duration=1h&presentation_objects=6398"
    }
    cacert => "/opt/sciencelogic/pocsciencelogiccom.pem"
    truststore => "/opt/sciencelogic/sciencelogic.jks"
    truststore_password => "******"
    request_timeout => 60
    user => "admin"
    password => "*******"
    schedule => { cron => "* * * * *"}
    codec => "json"
    metadata_target => "http_poller_metadata"
  }
}
filter
       {
         split { field => "result_set" }

}
output {
 elasticsearch {
    hosts => ["1.1.1.6:9200"]
    user => "elastic"
    password => "******"
    index => "xpath" }
#stdout { codec => rubydebug }
}

here is the sample of my xml

<data_performance>
<searchspec>...</searchspec>
<total_matched elemtype="null"/>
<total_returned>3</total_returned>
<result_set elemtype="list">
<dataset>
<device>/api/device/695</device>
<index>0</index>
<index_label elemtype="null"/>
<presentation>6397</presentation>
<field_names elemtype="list">
<v>collection_time</v>
<v>data</v>
</field_names>
<data elemtype="list">
<data elemtype="list">
<v>1554286500</v>
<v>0.388</v>
</data>
<data elemtype="list">
<v>1554286800</v>
<v>0.388</v>
</data>
<data elemtype="list">
<v>1554287100</v>
<v>0.388</v>
</data>
<data elemtype="list">
<v>1554287400</v>
<v>0.388</v>
</data>
<data elemtype="list">
<v>1554287700</v>
<v>0.388</v>
</data>
<data elemtype="list">
<v>1554288000</v>
<v>0.388</v>
</data>
</data>
</dataset>
</result_set>
</data_performance>

This split is not happening properly, not sure what mistake i'm doing,
Is split is really possible through the split filter or we have to go with xpath for this type of data.

Please advice.

Thanks
Gauti

Does the URL return JSON or XML? Not sure why you would have a json codec on the input filter if it returns XML. If it does return XML then you would need an XML filter before the split.

@Badger it returns XML.

Along with xml filter which settings should i need to use, if you gimme some example it'll be helpful for me to understand it properly.

Thanks
Gauti

The documentation has an example

filter {
  xml {
    source => "message"
  }
}

@Badger I have did some changes in input and filter part, but not getting any output either through rubydebug or to the elasticsearch

here is my new config

input {
  http_poller {
    urls => {
      url => "https://poc.sciencelogic.com/api/data_performance_raw/device/dynamic_app?endstamp=1553188800&duration=1h&presentation_objects=6398"
    }
    cacert => "/opt/sciencelogic/pocsciencelogiccom.pem"
    truststore => "/opt/sciencelogic/sciencelogic.jks"
    truststore_password => "*****"
    request_timeout => 60
    user => "admin"
    password => "*****"
    schedule => { cron => "* * * * *"}
#codec => multiline {
#     pattern => "<dataset>"
#     negate => "true"
#     what => "previous"
#     }
    metadata_target => "http_poller_metadata"
     }
}
filter {
  xml {
    store_xml => false
    source => "message"
  }
         split { field => "result_set" }
#         split { field => "data" }
}

Am i missing something, please advice

Thanks
Gauti

That's not going to work. You want something more like

xml { store_xml => true source => "message" target => "theXML" }
split { field => "[theXML][result_set]" }

@Badger i have done the changes still not getting any output. :frowning:

Config File:

    input {
      http_poller {
        urls => {
          url => "https:/poc.sciencelogic.com/api/data_performance_raw/device/dynamic_app?endstamp=1553188800&duration=1h&presentation_objects=6398"
    #       url => "https://poc.sciencelogic.com/api/device?limit=3"
        }
        cacert => "/opt/sciencelogic/pocsciencelogiccom.pem"
        truststore => "/opt/sciencelogic/sciencelogic.jks"
        truststore_password => "*******"
        request_timeout => 60
        user => "*******"
        password => "*****"
        schedule => { cron => "* * * * *"}
    #codec => multiline {
    #     pattern => "<XMLdata>"
    #     negate => "true"
    #     what => "previous"
    #     }
        metadata_target => "http_poller_metadata"
         }
    }
    filter {
      xml {
        store_xml => true
        target => "theXML"
        source => "message"
      }
    #         split { field => "[theXML][result_set]" }
    #         split { field => "data" }
    }

Thanks
Gauti

Are you getting the xml in the rubydebug output on stdout? If so, what does it look like. You may need

split { field => "[theXML][data_performance][result_set]" }

I'm not getting any rubydebug output also, its just says successfully started logstash after that i'm not getting any information on screen.

2019-04-03T19:24:14,139][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2019-04-03T19:24:15,197][INFO ][logstash.inputs.http_poller] Registering http_poller Input {:type=>nil, :schedule=>{"cron"=>"* * * * *"}, :timeout=>nil}
[2019-04-03T19:24:15,229][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x2544efbb run>"}
[2019-04-03T19:24:15,317][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-04-03T19:24:15,688][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

i'm not getting any data in the screen after this line..

Thanks
Gauti

Enable --log.level debug. It will then log a message every time it requests the URL. Verify that this is happening once a minute.

@Badger Have enabled debug mode but still not able to get any proper information on what is going on, but there is no sigh of any output in stdout

Thanks
Gauti

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.