Parsing XML files using LogStash

I have the following logstash conf file:

input {  
file 
{
    path => "C:\Dashboard\Elmah\*.xml"
    start_position => "beginning"
    type => "error"
    codec => multiline 
    {
        pattern => "^<\?error .*\>"
        negate => true
        what => "previous"
    }
    sincedb_path =>"C:\Dashboard\Elmah"
  }
}

filter 
{
    xml 
    {
        source => "error"
        xpath => 
        [
            "/error/@errorId", "ErrorId",
            "/error/@type", "Type",
            "/error/@message", "Message",
            "/error/@time", "Time",
            "/error/@user", "User"
        ]
        store_xml => true
    }
}

output 
{
    elasticsearch 
    { 
        action => "index"
        host => "localhost"
        index => "stock"
        workers => 1
    }
    stdout 
    {
        codec => rubydebug
    }
}

When I run bin/logstash -f agent.conf I do not get an error but no data gets inserted into Elasticsearch. An example of the file is: https://www.dropbox.com/s/6oni2zhorsdtz6p/error-2015-06-26203423Z-3026bd43-07d6-44d6-a6cf-6d27b28a607e.xml?dl=0

How do I get Logstash to read in a collection of external xml files?

LogStash Output:

io/console not supported; tty will not be manipulated
Jul 11, 2015 12:34:09 AM org.elasticsearch.node.internal.InternalNode <init>
INFO: [logstash-AGOEL2-LT-6584-13462] version[1.5.1], pid[6584], build[5e38401/2
015-04-09T13:41:35Z]
Jul 11, 2015 12:34:09 AM org.elasticsearch.node.internal.InternalNode <init>
INFO: [logstash-AGOEL2-LT-6584-13462] initializing ...
Jul 11, 2015 12:34:09 AM org.elasticsearch.plugins.PluginsService <init>
INFO: [logstash-AGOEL2-LT-6584-13462] loaded [], sites []
Jul 11, 2015 12:34:11 AM org.elasticsearch.node.internal.InternalNode <init>
INFO: [logstash-AGOEL2-LT-6584-13462] initialized
Jul 11, 2015 12:34:11 AM org.elasticsearch.node.internal.InternalNode start
INFO: [logstash-AGOEL2-LT-6584-13462] starting ...
Jul 11, 2015 12:34:11 AM org.elasticsearch.transport.TransportService doStart
INFO: [logstash-AGOEL2-LT-6584-13462] bound_address {inet[/0:0:0:0:0:0:0:0:9301]
}, publish_address {inet[/192.168.1.67:9301]}
Jul 11, 2015 12:34:11 AM org.elasticsearch.discovery.DiscoveryService doStart
INFO: [logstash-AGOEL2-LT-6584-13462] elasticsearch/Xg4w5J-yRmiy1aoisMheZw
Jul 11, 2015 12:34:15 AM org.elasticsearch.cluster.service.InternalClusterServic
e$UpdateTask run
INFO: [logstash-AGOEL2-LT-6584-13462] detected_master [Achilles][wM8JEr9GSg67qfN
d-8lvuQ][AGOEL2-LT][inet[/192.168.1.67:9300]], added {[Achilles][wM8JEr9GSg67qfN
d-8lvuQ][AGOEL2-LT][inet[/192.168.1.67:9300]],}, reason: zen-disco-receive(from
master [[Achilles][wM8JEr9GSg67qfNd-8lvuQ][AGOEL2-LT][inet[/192.168.1.67:9300]]]
)
Jul 11, 2015 12:34:16 AM org.elasticsearch.node.internal.InternalNode start
INFO: [logstash-AGOEL2-LT-6584-13462] started
Logstash startup completed

ElasticSearch Output:

[2015-07-13 18:30:21,656][WARN ][bootstrap                ] jvm uses the client
vm, make sure to run `java` with the server vm for best performance by adding `-
server` to the command line
[2015-07-13 18:30:22,379][INFO ][node                     ] [Battering Ram] vers
ion[1.6.0], pid[4228], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-07-13 18:30:22,379][INFO ][node                     ] [Battering Ram] init
ializing ...
[2015-07-13 18:30:22,441][INFO ][plugins                  ] [Battering Ram] load
ed [], sites [head]
[2015-07-13 18:30:22,754][INFO ][env                      ] [Battering Ram] usin
g [1] data paths, mounts [[Default (C:)]], net usable_space [73.6gb], net total_
space [297.7gb], types [NTFS]
[2015-07-13 18:30:32,937][INFO ][node                     ] [Battering Ram] init
ialized
[2015-07-13 18:30:32,938][INFO ][node                     ] [Battering Ram] star
ting ...
[2015-07-13 18:30:34,146][INFO ][transport                ] [Battering Ram] boun
d_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.67:930
0]}
[2015-07-13 18:30:35,578][INFO ][discovery                ] [Battering Ram] elas
ticsearch/zQlVTprlR2C23Kmi4yHXsQ
[2015-07-13 18:30:39,449][INFO ][cluster.service          ] [Battering Ram] new_
master [Battering Ram][zQlVTprlR2C23Kmi4yHXsQ][AGOEL2-LT][inet[/192.168.1.67:930
0]], reason: zen-disco-join (elected_as_master)
[2015-07-13 18:30:39,593][INFO ][gateway                  ] [Battering Ram] reco
vered [0] indices into cluster_state
[2015-07-13 18:30:40,213][INFO ][http                     ] [Battering Ram] boun
d_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.67:920
0]}
[2015-07-13 18:30:40,214][INFO ][node                     ] [Battering Ram] star
ted
[2015-07-13 18:32:28,782][INFO ][cluster.service          ] [Battering Ram] adde
d {[logstash-AGOEL2-LT-7384-13462][kWlvJTERTqWRxojNZsrTRQ][AGOEL2-LT][inet[/192.
168.1.67:9301]]{client=true, data=false},}, reason: zen-disco-receive(join from
node[[logstash-AGOEL2-LT-7384-13462][kWlvJTERTqWRxojNZsrTRQ][AGOEL2-LT][inet[/19
2.168.1.67:9301]]{client=true, data=false}])

If you have already run LS on these files before then it's likely to be a sincedb issue.

Thanks Mark.
I am a elasticsearch newbie so please forgive me if my questions are very basic.
I have added a sincedb_path => "C:\Dashboard\Elmah" to the file plugin but I still do not see any documents in elasticsearch. How do we debug these kinds of issues?

Have you tried to process these files with LS before?

I have not. This is my first attempt to parse xml files into LogStash so it
can be stored in ElasticSearch. My understanding is that this should be a
common usecase.. Would you have any additional pointers on what could be
wrong?
Before this, I was able to successfully parse and log files into
ElasticSearch using ElasticSearch.net.

Thanks,
Ajit Goel

I suspect it could be a problem of the multiline codec not emitting the first message because it's waiting for the start of the second message but the second message never comes since the file only contains a single message.

The file/multiline combination just isn't very good at slurping a whole file into a single message and this isn't the first time someone has issues doing it.

Thanks Magnus. In this case, what could you suggest?
If nothing else works, I will write a .Net client to insert the data into ElasticSearch directly. I was trying to avoid writing a program to process millions of xml files, and also keep track of which files, folders have been parsed.

The main problem here is the multiline filter, so if you could just get the contents of your XML files on a single line you'll be okay. That should be easy as long as the files aren't too big (not sure if there are any problematic line length limitations). You could e.g. write a small piece of code that reads the input XML files, joins the lines into a single line, and appends to a file that Logstash monitors via the file input.

That way you also have full control over the lifecycle of the XML files, so if you want to delete the source file once it's been taken care of you can do that. Otherwise you might end up asking how to get Logstash to delete files it has processed, which it can't do since the file input has no concept of "this file is done".

Magnus Back: For debugging purposes, I have kept just one file in the input "C:\Dashboard\Elmah" folder. The data in that file is on just one line. LogStash still does not parse and insert the data into elasticsearch. Would you have any further inputs on what could be wrong?

Sample file:

<error errorId="18f62f4e-e9ab-49e0-b135-cdb15434316c" host="RPIDALWEB344" type="Realpage.Crossfire.Exception.InvalidModelStateException" message="Error when saving occupants information" detail="Realpage.Crossfire.Exception.InvalidModelStateException: Error when saving occupants information" user="orchardsapp1" time="2015-07-11T02:01:22.7052586Z"></error>

Make sure you have start_position => beginning, and keep in mind that this only matters for unseen files. You may have to delete the associated sincedb file or create a new file that's covered by the input's filename pattern.

Also, keep ES out of the picture for now. Make sure Logstash is reading the file(s) and producing well-formed messages to stdout.

It seems that LogStash is logging the "Failed to install template: Connection refused: connect" error when it is trying to process the xml files. Please see the output log when I run the logstash command with the --debug parameter.
https://www.dropbox.com/s/g7g1154uvf9fr1f/outputlog2.txt?dl=0

Any inputs on what could be wrong?

You have configured Logstash to connect to Elasticsearch 192.168.1.67:9200 but there's nothing listening on that host:port combo or there's a firewall blocking the access.

Hi Magnus,
I want to import xml file in to Elasticsearch through logstash.
I created config file as follows:
input
{
file
{
path => "D:/logstash-5.0.0/logstash-5.0.0/bin/data.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
}
}
filter
{
xml {
source => "message"
}
}
output
{
elasticsearch
{
codec => json
hosts => "localhost"
index => "xmldata"
}
stdout
{
codec => rubydebug
}
}

I have created small data.xml file and the xml record is in one line only.
But I am getting following error.

D:\logstash-5.0.0\logstash-5.0.0\bin>logstash -f D:\logstash-5.0.0\logstash-5.0.0\bin\logstash-xmlConfig.conf
Sending Logstash logs to D:/logstash-5.0.0/logstash-5.0.0/logs which is now configured via log4j2.properties.
[2016-12-01T19:59:21,474][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>["http://localhost:9200"]}}
[2016-12-01T19:59:21,477][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2016-12-01T19:59:21,631][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword"}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2016-12-01T19:59:21,638][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost"]}
[2016-12-01T19:59:23,329][ERROR][logstash.agent ] Pipeline aborted due to error {:exception=>#<LogStash::ConfigurationError: translation missing: en.logstash.agent.configuration.invalid_plugin_register>, :backtrace=>["D:/logstash-5.0.0/logstash-5.0.0/vendor/bundle/jruby/1.9/gems/logstash-filter-xml-4.0.1/lib/logstash/filters/xml.rb:106:in register'", "D:/logstash-5.0.0/logstash-5.0.0/logstash-core/lib/logstash/pipeline.rb:197:instart_workers'", "org/jruby/RubyArray.java:1613:in each'", "D:/logstash-5.0.0/logstash-5.0.0/logstash-core/lib/logstash/pipeline.rb:197:instart_workers'", "D:/logstash-5.0.0/logstash-5.0.0/logstash-core/lib/logstash/pipeline.rb:153:in run'", "D:/logstash-5.0.0/logstash-5.0.0/logstash-core/lib/logstash/agent.rb:250:instart_pipeline'"]}
[2016-12-01T19:59:23,455][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2016-12-01T19:59:26,354][WARN ][logstash.agent ] stopping pipeline {:id=>"main"}

I am using 5.x version of elasticsearch and logstash.

Thanks in advance..

Please start your own thread, this one is a year old!

1 Like