Hi ,
I have tried to create a load process with Logstash from an XML file to Elasticsearch.
i use ELK 7.9.0 on windows.
this is my config file :
input {
file {
path => "C:/Talend/workspace/data/giata/geography/geography.xml"
start_position => beginning
sincedb_path => "nul"
exclude => ".gz"
type => "xml"
codec => multiline {
pattern => "^<?countries.>"
negate => "true"
what => "previous"
auto_flush_interval => 1
max_lines => 3000
}
}
}
filter
{
xml
{
source => "message"
target => "parsed"
store_xml => false
xpath => [
"/countries/country/countryCode", "countryCode"
]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "cities"
user => elastic
password => hibahiba
}
stdout {}
}
and this is the result :
[2021-02-09T17:13:52,302][INFO ][org.reflections.Reflections] Reflections took 42 ms to scan 1 urls, producing 22 keys and 45 values
[2021-02-09T17:13:54,946][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch pool URLs updated {:changes=>{:removed=>, :added=>[http://elastic:xxxxxx@localhost:9200/]}}
[2021-02-09T17:13:55,193][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"http://elastic:xxxxxx@localhost:9200/"}
[2021-02-09T17:13:55,248][INFO ][logstash.outputs.elasticsearch][main] ES Output version determined {:es_version=>7}
[2021-02-09T17:13:55,253][WARN ][logstash.outputs.elasticsearch][main] Detected a 6.x and above cluster: the type
event field won't be used to determine the document _type {:es_version=>7}
[2021-02-09T17:13:55,301][INFO ][logstash.outputs.elasticsearch][main] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2021-02-09T17:13:55,353][INFO ][logstash.outputs.elasticsearch][main] Using a default mapping template {:es_version=>7, :ecs_compatibility=>:disabled}
[2021-02-09T17:13:55,422][INFO ][logstash.outputs.elasticsearch][main] Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}
[2021-02-09T17:13:57,095][INFO ][logstash.javapipeline ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["C:/ELK/ELK.7.9.0/logstash-7.9.0/logstash-7.9.0/bin/cities.conf"], :thread=>"#<Thread:0x53bd177 run>"}
[2021-02-09T17:13:57,951][INFO ][logstash.javapipeline ][main] Pipeline Java execution initialization time {"seconds"=>0.85}
[2021-02-09T17:13:58,455][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
[2021-02-09T17:13:58,524][INFO ][filewatch.observingtail ][main][5cc77fc600c7d47f00b9f6b636904fa0759cb6ac5e7fd4af5ffd4689848973ab] START, creating Discoverer, Watch with file and sincedb collections
[2021-02-09T17:13:58,528][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}
[2021-02-09T17:13:58,978][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
{
"message" => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
"@timestamp" => 2021-02-09T16:14:00.611Z,
"@version" => "1",
"type" => "xml",
"host" => "DE4",
"path" => "C:/Talend/workspace/data/giata/geography/geography.xml"
}
my file xml looks like this :
.....................................i want to load all the data.
Any help please !