XML load with Logstash into Elasticsearch

Hi ,

I have tried to create a load process with Logstash from an XML file to Elasticsearch.

i use ELK 7.9.0 on windows.

this is my config file :
input {

file {
path => "C:/Talend/workspace/data/giata/geography/geography.xml"
start_position => beginning
sincedb_path => "nul"
exclude => ".gz"
type => "xml"
codec => multiline {
pattern => "^<?countries.
>"
negate => "true"
what => "previous"
auto_flush_interval => 1
max_lines => 3000
}
}
}

filter
{
xml
{
source => "message"
target => "parsed"
store_xml => false
xpath => [
"/countries/country/countryCode", "countryCode"
]
}

}

output {

elasticsearch {
		hosts => "localhost:9200"
		index => "cities"
		user => elastic
		password => hibahiba
		
	
}

stdout {}
}

and this is the result :

[2021-02-09T17:13:52,302][INFO ][org.reflections.Reflections] Reflections took 42 ms to scan 1 urls, producing 22 keys and 45 values
[2021-02-09T17:13:54,946][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch pool URLs updated {:changes=>{:removed=>, :added=>[http://elastic:xxxxxx@localhost:9200/]}}
[2021-02-09T17:13:55,193][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"http://elastic:xxxxxx@localhost:9200/"}
[2021-02-09T17:13:55,248][INFO ][logstash.outputs.elasticsearch][main] ES Output version determined {:es_version=>7}
[2021-02-09T17:13:55,253][WARN ][logstash.outputs.elasticsearch][main] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>7}
[2021-02-09T17:13:55,301][INFO ][logstash.outputs.elasticsearch][main] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2021-02-09T17:13:55,353][INFO ][logstash.outputs.elasticsearch][main] Using a default mapping template {:es_version=>7, :ecs_compatibility=>:disabled}
[2021-02-09T17:13:55,422][INFO ][logstash.outputs.elasticsearch][main] Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}
[2021-02-09T17:13:57,095][INFO ][logstash.javapipeline ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/bin/cities.conf"], :thread=>"#<Thread:0x53bd177 run>"}
[2021-02-09T17:13:57,951][INFO ][logstash.javapipeline ][main] Pipeline Java execution initialization time {"seconds"=>0.85}
[2021-02-09T17:13:58,455][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
[2021-02-09T17:13:58,524][INFO ][filewatch.observingtail ][main][5cc77fc600c7d47f00b9f6b636904fa0759cb6ac5e7fd4af5ffd4689848973ab] START, creating Discoverer, Watch with file and sincedb collections
[2021-02-09T17:13:58,528][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}
[2021-02-09T17:13:58,978][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
{
"message" => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
"@timestamp" => 2021-02-09T16:14:00.611Z,
"@version" => "1",
"type" => "xml",
"host" => "DE4",
"path" => "C:/Talend/workspace/data/giata/geography/geography.xml"
}

my file xml looks like this :

.....................................

i want to load all the data.

Any help please !

Xpath requires you to define whether to look for a value or an attribute at the XML node. If your data is:

<countries>
  <country>
    <countryCode>US</countryCode>
  </country>
</countries>

Then you gotta set xpath to look at the value, /countries/country/countryCode/text()

However, if it looks like:

<countries>
  <country @countryCode="US" />
</countries>

Then you gotta set xpath to look at the attribute, /countries/country/@countryCode

Hi @wwalker ,

thanks for reply,

this is my xml :

how can i load this xml in elasticsearch !

You gotta start from the root, so for the country node with the attribute countrycode, it would be /geography/countries/country/@countryCode

i got this error:

Sending Logstash logs to C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/logs which is now configured via log4j2.properties
[2021-02-12T16:27:19,497][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.9.0", "jruby.version"=>"jruby 9.2.12.0 (2.5.7) 2020-07-01 db01a49ba6 Java HotSpot(TM) 64-Bit Server VM 25.211-b12 on 1.8.0_211-b12 +indy +jit [mswin32-x86_64]"}
[2021-02-12T16:27:19,692][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2021-02-12T16:27:21,370][INFO ][org.reflections.Reflections] Reflections took 33 ms to scan 1 urls, producing 22 keys and 45 values
[2021-02-12T16:27:22,422][ERROR][logstash.filters.xml ] Invalid setting for xml filter plugin:

filter {
xml {
# This setting must be a hash
# This field must contain an even number of items, got 1
xpath => ["/geography/countries/country/@countryCode"]
...
}
}
[2021-02-12T16:27:22,430][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"Java::JavaLang::IllegalStateException", :message=>"Unable to configure plugins: (ConfigurationError) Something is wrong with your configuration.", :backtrace=>["org.logstash.config.ir.CompiledPipeline.(CompiledPipeline.java:119)", "org.logstash.execution.JavaBasePipelineExt.initialize(JavaBasePipelineExt.java:82)", "org.logstash.execution.JavaBasePipelineExt$INVOKER$i$1$0$initialize.call(JavaBasePipelineExt$INVOKER$i$1$0$initialize.gen)", "org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:837)", "org.jruby.ir.runtime.IRRuntimeHelpers.instanceSuper(IRRuntimeHelpers.java:1169)", "org.jruby.ir.runtime.IRRuntimeHelpers.instanceSuperSplatArgs(IRRuntimeHelpers.java:1156)", "org.jruby.ir.targets.InstanceSuperInvokeSite.invoke(InstanceSuperInvokeSite.java:39)", "C_3a_.ELK.ELK_dot_7_dot_9_dot_0.logstash_minus_7_dot_9_dot_0.logstash_minus_7_dot_9_dot_0.logstash_minus_core.lib.logstash.java_pipeline.RUBY$method$initialize$0(C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/logstash-core/lib/logstash/java_pipeline.rb:44)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:82)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:70)", "org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:332)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:86)", "org.jruby.RubyClass.newInstance(RubyClass.java:939)", "org.jruby.RubyClass$INVOKER$i$newInstance.call(RubyClass$INVOKER$i$newInstance.gen)", "org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:207)", "C_3a_.ELK.ELK_dot_7_dot_9_dot_0.logstash_minus_7_dot_9_dot_0.logstash_minus_7_dot_9_dot_0.logstash_minus_core.lib.logstash.pipeline_action.create.RUBY$method$execute$0(C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/logstash-core/lib/logstash/pipeline_action/create.rb:52)", "C_3a_.ELK.ELK_dot_7_dot_9_dot_0.logstash_minus_7_dot_9_dot_0.logstash_minus_7_dot_9_dot_0.logstash_minus_core.lib.logstash.pipeline_action.create.RUBY$method$execute$0$VARARGS(C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/logstash-core/lib/logstash/pipeline_action/create.rb)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:82)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:70)", "org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:207)", "C_3a_.ELK.ELK_dot_7_dot_9_dot_0.logstash_minus_7_dot_9_dot_0.logstash_minus_7_dot_9_dot_0.logstash_minus_core.lib.logstash.agent.RUBY$block$converge_state$2(C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/logstash-core/lib/logstash/agent.rb:357)", "org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:138)", "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:58)", "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:52)", "org.jruby.runtime.Block.call(Block.java:139)", "org.jruby.RubyProc.call(RubyProc.java:318)", "org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:105)", "java.lang.Thread.run(Unknown Source)"]}
warning: thread "Converge PipelineAction::Create" terminated with exception (report_on_exception is true):
LogStash::Error: Don't know how to handle Java::JavaLang::IllegalStateException for PipelineAction::Create<main>
create at org/logstash/execution/ConvergeResultExt.java:129
add at org/logstash/execution/ConvergeResultExt.java:57
converge_state at C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/logstash-core/lib/logstash/agent.rb:370
[2021-02-12T16:27:22,441][ERROR][logstash.agent ] An exception happened when converging configuration {:exception=>LogStash::Error, :message=>"Don't know how to handle Java::JavaLang::IllegalStateException for PipelineAction::Create<main>"}
[2021-02-12T16:27:22,490][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<LogStash::Error: Don't know how to handle Java::JavaLang::IllegalStateException for PipelineAction::Create<main>>, :backtrace=>["org/logstash/execution/ConvergeResultExt.java:129:in create'", "org/logstash/execution/ConvergeResultExt.java:57:in add'", "C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/logstash-core/lib/logstash/agent.rb:370:in `block in converge_state'"]}
[2021-02-12T16:27:22,510][ERROR][org.logstash.Logstash ] java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit