Indexing data in elasticsearch through logstash using xml files


(Andrey Querejeta) #1

Hello, everybody:
I am new using ELK, although I have managed to index data into elasticsearch through logstash, using the CSV filter.
For several days I have tried to use the XML filter without results. I have read several posts but I have not found the solution to the problem.
The index is not created in elasticsearch, does'nt show anything in the log.
I would appreciate the help, because I don't know what else to do, I've really tried everything, but I haven't been able to index anything to ES using this filter.

XML file


1 1 25 2012 1 2012/01/20 15:37:35 2012/01/01 00:00:00 1 TICK 2012/01/20 15:37:35 3 26 1000 2012 1 2012/02/07 15:19:34 2012/01/01 00:00:00 3 TICK 2012/02/07 15:19:34

logstash.conf


input {
file {
path => "C:/Bitnami/elk/logstash/data/tabla.xml"
start_position => "beginning"
sincedb_path => "C:/Bitnami/elk/logstash/data/dev/nullo"
codec => multiline
{
pattern => "^<?main .*>"
negate => true
what => "previous"
}
}
}

filter {
xml {
source => "message"
force_array => false
remove_namespaces =>true
store_xml => true
target => "doc"
xpath => [
"//main/DATA_RECORD/IDRANGODOCTRAZA/text()", "myid",
"//main/DATA_RECORD/CODIGOTIPODOC/text()", "codigo"
]
}
mutate {
remove_field => ["message", "@metadata"]
}
}

output {
elasticsearch {
index => "table"
hosts => ["127.0.0.1:9200"]
}
stdout { codec => rubydebug }
}


(Andrey Querejeta) #2

here is my xml... again
<main>
<DATA_RECORD>
<IDRANGODOCTRAZA>1</IDRANGODOCTRAZA>
<DESDE>1</DESDE>
<HASTA>25</HASTA>
<ANNO>2012</ANNO>
<IDTIPOOPERMOVDOC>1</IDTIPOOPERMOVDOC>
<FECHAMOV>2012/01/20 15:37:35</FECHAMOV>
<FECHAENTREGA>2012/01/01 00:00:00</FECHAENTREGA>
<ID_USUA>1</ID_USUA>
<CODIGOTIPODOC>TICK</CODIGOTIPODOC>
<FECHACREACION>2012/01/20 15:37:35</FECHACREACION>
</DATA_RECORD>
<DATA_RECORD>
<IDRANGODOCTRAZA>3</IDRANGODOCTRAZA>
<DESDE>26</DESDE>
<HASTA>1000</HASTA>
<ANNO>2012</ANNO>
<IDTIPOOPERMOVDOC>1</IDTIPOOPERMOVDOC>
<FECHAMOV>2012/02/07 15:19:34</FECHAMOV>
<FECHAENTREGA>2012/01/01 00:00:00</FECHAENTREGA>
<ID_USUA>3</ID_USUA>
<CODIGOTIPODOC>TICK</CODIGOTIPODOC>
<FECHACREACION>2012/02/07 15:19:34</FECHACREACION>
</DATA_RECORD>
<DATA_RECORD>
<IDRANGODOCTRAZA>4</IDRANGODOCTRAZA>
<DESDE>26</DESDE>
<HASTA>50</HASTA>
<ANNO>2012</ANNO>
<IDTIPOOPERMOVDOC>3</IDTIPOOPERMOVDOC>
<FECHAMOV>2012/02/07 15:27:29</FECHAMOV>
<FECHAENTREGA>2012/01/01 00:00:00</FECHAENTREGA>
<ID_VEND>8</ID_VEND>
<ID_USUA>3</ID_USUA>
<CODIGOTIPODOC>TICK</CODIGOTIPODOC>
<FECHACREACION>2012/02/07 15:27:29</FECHACREACION>
</DATA_RECORD>
<DATA_RECORD>
<IDRANGODOCTRAZA>5</IDRANGODOCTRAZA>
<DESDE>51</DESDE>
<HASTA>1000</HASTA>
<ANNO>2012</ANNO>
<IDTIPOOPERMOVDOC>3</IDTIPOOPERMOVDOC>
<FECHAMOV>2012/03/23 10:51:02</FECHAMOV>
<FECHAENTREGA>2012/01/01 00:00:00</FECHAENTREGA>
<ID_VEND>3</ID_VEND>
<ID_USUA>5</ID_USUA>
<CODIGOTIPODOC>TICK</CODIGOTIPODOC>
<FECHACREACION>2012/03/23 10:51:02</FECHACREACION>
</DATA_RECORD>
<DATA_RECORD>
<IDRANGODOCTRAZA>6</IDRANGODOCTRAZA>
<DESDE>51</DESDE>
<HASTA>1000</HASTA>
<ANNO>2012</ANNO>
<IDTIPOOPERMOVDOC>4</IDTIPOOPERMOVDOC>
<FECHAMOV>2012/03/23 11:03:19</FECHAMOV>
<FECHAENTREGA>2012/01/01 00:00:00</FECHAENTREGA>
<ID_VEND>3</ID_VEND>
<ID_USUA>5</ID_USUA>
<CODIGOTIPODOC>TICK</CODIGOTIPODOC>
<FECHACREACION>2012/03/23 11:03:19</FECHACREACION>
</DATA_RECORD>
</main>


(Andrey Querejeta) #4

Thank you for answering. I tried your suggestion, but it didn't work either. I think there's something wrong with my ELK, but I don't know what it can be. Here is the logstash:
[2018-11-27T08:35:56,309][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-11-27T08:35:56,400][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.4.2"}
[2018-11-27T08:35:57,935][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-11-27T08:35:58,105][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>, :added=>[http://127.0.0.1:9200/]}}
[2018-11-27T08:35:58,106][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://127.0.0.1:9200/, :path=>"/"}
[2018-11-27T08:35:58,177][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://127.0.0.1:9200/"}
[2018-11-27T08:35:58,214][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-11-27T08:35:58,214][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>6}
[2018-11-27T08:35:58,216][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//127.0.0.1:9200"]}
[2018-11-27T08:35:58,217][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-11-27T08:35:58,221][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>, :added=>[http://127.0.0.1:9200/]}}
[2018-11-27T08:35:58,222][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://127.0.0.1:9200/, :path=>"/"}
[2018-11-27T08:35:58,222][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-11-27T08:35:58,228][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://127.0.0.1:9200/"}
[2018-11-27T08:35:58,245][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-11-27T08:35:58,245][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>6}
[2018-11-27T08:35:58,247][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//127.0.0.1:9200"]}
[2018-11-27T08:35:58,247][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-11-27T08:35:58,249][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-11-27T08:35:59,759][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2018-11-27T08:35:59,759][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2018-11-27T08:35:59,760][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x546e06b8@C:/Bitnami/elk/logstash/logstash-core/lib/logstash/pipeline.rb:157 sleep>"}
[2018-11-27T08:35:59,774][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}
[2018-11-27T08:35:59,869][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}


(Lewis Barclay) #5

Sorry, I just realised my mistake, your original config was correct. There is no error in your log however and is starting correctly?


(Andrey Querejeta) #6

Apparently ELK is working, in fact, I test others functionalities in Kibana with other indexs and everything works well, but the problem is that it does not index my xml data. Can you give me an example of a logstash.conf and an xml file that works for you, for me to test it on another computer? I don't know what else I can do. Yesterday I posted this topic in this forum because I had already tried several examples of this filter without results. Thanks in advance.