Elasticsearch xml file

Hana_Ne · June 4, 2018, 5:37pm

Hi i need index xml file to elasticsearch
my xml file is like this

<Talk Speaker = "Alastair Parvin" Title= " Architecture for the people by the people" >
	<Segment id ="1" >
		<Time-slot>00:00:12,884 --> 00:00:16,053</Time-slot>
		<Original_text lang="en"></Original_text>
		<Translation lang="ar"></Translation>
		<Translation lang="fr"></Translation>
	</Segment>
</Talk>
</MulTed>

I need help please

dadoonet · June 4, 2018, 5:56pm

You need to transform it to JSON document first.

You can use logstash if needed or do that by yourself depending what is the real source of this content.

Hana_Ne · June 4, 2018, 6:04pm

Ok i try use logstash thanks

Hana_Ne · June 4, 2018, 6:16pm

i try use logstash but i have error

input {
    file {
		path => "C:/Users/Dev/Desktop/file1.xml"
		start_position => "beginning"
		sincedb_path => "/dev/null"
		type => "xml"
		codec => multiline {
             pattern =>  "^<\?Multed .*\>"
             negate => "true"
             what => "previous"
}
	}
}
filter {
	xml {
    source => "message"
    target => "Multed"
	xpath =>["/Multed/Talk/Segment/@id","id",
		"/Multed/Talk/Segment/Original_text/text()","original_text"
		
	]
  }

       mutate { 
            remove_field => [ "message" ] 
        
            add_field => ["IDIndexed", "%{id}"] 
            add_field => ["Original_text", "%{original_text}"]           
                         
						}}
output{
    elasticsearch{
        hosts => ["localhost:9200"]
        index => "indexXml"
    }
    stdout{
	codec => rubydebug

    }
}

Error in %{id} and %{original_text}

dadoonet · June 4, 2018, 6:38pm

Please don't post images of text as they are hardly readable and not searchable.

Instead paste the text and format it with </> icon. Check the preview window.

I moved your question to #logstash

Hana_Ne · June 4, 2018, 6:43pm

Ok
{
"type" => "xml",
"IDIndexed" => "%{id}",
"@timestamp" => 2018-06-04T18:16:49.466Z,
"host" => "Dev-PC",
"path" => "C:/Users/Dev/Desktop/file1.xml",
"@version" => "1",
"Original_text" => "%{original_text}",
"tags" => [
[0] "multiline",
[1] "multiline_codec_max_lines_reached",
[2] "_xmlparsefailure"
]
}

magnusbaeck · June 4, 2018, 8:17pm

The multiline codec is incorrectly configured. Which line from the XML file is ^<\?Multed .*\> supposed to match?

Hana_Ne · June 5, 2018, 12:38am

`^<\?Multed .*\>`  is root of document
        <Multed>
        <Talk Speaker = "Alastair Parvin" Title= " Architecture for the people by the people" >
        	<Segment id ="1" >
        		<Time-slot>00:00:12,884 --> 00:00:16,053</Time-slot>
        		<Original_text lang="en"></Original_text>
        		<Translation lang="ar"></Translation>
        		<Translation lang="fr"></Translation>
        	</Segment>
        </Talk>
        </MulTed>

magnusbaeck · June 5, 2018, 10:47am

The regular expression <\?Multed .*\> does not match any of the lines in your example document.

Hana_Ne · June 5, 2018, 4:28pm

what's regular expression is correct

Badger · June 5, 2018, 5:18pm

You could try

        codec => multiline {
            pattern =>  "^<MulTed>"
            negate => "true"
            what => "previous"
            auto_flush_interval => 2
        }

Hana_Ne · June 5, 2018, 5:37pm

The same problem

Badger · June 5, 2018, 5:47pm

If the XML is indented then get rid of the start-of-line anchor and use

pattern =>  "<MulTed>"

Hana_Ne · June 5, 2018, 5:56pm

The same error value

        input {
            file {
        		path => "C:/Users/Dev/Desktop/file1.xml"
        		start_position => "beginning"
        		sincedb_path => "/dev/null"
        		type => "xml"
        		   codec => multiline {
                    pattern =>  "<MulTed>"
                    negate => "true"
                    what => "previous"
                    auto_flush_interval => 2
                }
        	}
        }
        filter {
    		
    	xml {
        source => "Talk"
        target => "MulTed"
    	xpath =>["MulTed/Talk/Segment/@id","id",
    		"MulTed/Talk/Segment/Original_text/text()","original_text"]
      }

           mutate { 
                remove_field => [ "message" ] 
            
                add_field => ["IDIndexed", "%{id}"] 
                add_field => ["Original_text", "%{original_text}"]           
                             
    						}}
    output{
        elasticsearch{
            hosts => ["localhost:9200"]
            index => "senind"
        }
        stdout{
    	codec => rubydebug

        }
    }

{
"tags" => [
[0] "multiline"
],
"Original_text" => "%{original_text}",
"@version" => "1",
"path" => "C:/Users/Dev/Desktop/file1.xml",
"type" => "xml",
"host" => "Dev-PC",
"@timestamp" => 2018-06-05T17:54:47.115Z,
"IDIndexed" => "%{id}"
}

Badger · June 5, 2018, 6:27pm

There is no error there. The multiline codec worked.

Hana_Ne · June 5, 2018, 6:29pm

But the value of% {id} and %{original_text} is not insert

Badger · June 5, 2018, 6:33pm

That's because your xpath expressions are wrong. They refer to Multed, but the XML has MulTed. Or perhaps the other way around. Either way, it is case sensitive. Also, Original_text/text() is empty.

Note also that xpath always returns arrays, so you might want to

if [id] { mutate { replace => { "id" => "%{[id][0]}" } } }

Hana_Ne · June 5, 2018, 6:44pm

Ok thanks i try use it

Hana_Ne · June 5, 2018, 7:08pm

the same error Thanks

Badger · June 5, 2018, 9:51pm

OK, so comment out 'remove_field => [ "message" ]' and show us what an event looks like, either using stdout { codec => rubydebug }, or copy and paste from the JSON event in Kibana.

Topic		Replies	Views
Help! Logstash send to Elasticsearch use XML file or JSON file Logstash	9	263	September 27, 2023
XML on Elasticsearch Logstash	10	764	August 29, 2018
Upload xml file into elasticsearch Elasticsearch	11	4298	January 31, 2019
XML match IDs in different elements Logstash	1	307	November 30, 2018
Parsing xml using logstaash xpath Logstash	24	5826	March 12, 2018

Elasticsearch xml file

Related topics