Hana_Ne  
                (Hana Ne)
               
                 
              
                  
                    June 4, 2018,  5:37pm
                   
                   
              1 
               
             
            
              Hi   i need  index xml  file to  elasticsearch 
my xml  file is like this 
<Talk Speaker = "Alastair Parvin" Title= " Architecture for the people by the people" >
	<Segment id ="1" >
		<Time-slot>00:00:12,884 --> 00:00:16,053</Time-slot>
		<Original_text lang="en"></Original_text>
		<Translation lang="ar"></Translation>
		<Translation lang="fr"></Translation>
	</Segment>
</Talk>
</MulTed>
 
I need help  please
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    June 4, 2018,  5:56pm
                   
                   
              2 
               
             
            
              You need to transform it to JSON document first.
You can use logstash if needed or do that by yourself depending what is the real source of this content.
             
            
               
               
              1 Like 
            
            
           
          
            
              
                Hana_Ne  
                (Hana Ne)
               
              
                  
                    June 4, 2018,  6:04pm
                   
                   
              3 
               
             
            
              Ok i try use logstash thanks
             
            
               
               
               
            
            
           
          
            
              
                Hana_Ne  
                (Hana Ne)
               
                 
              
                  
                    June 4, 2018,  6:16pm
                   
                   
              4 
               
             
            
              i try use logstash but  i have  error
input {
    file {
		path => "C:/Users/Dev/Desktop/file1.xml"
		start_position => "beginning"
		sincedb_path => "/dev/null"
		type => "xml"
		codec => multiline {
             pattern =>  "^<\?Multed .*\>"
             negate => "true"
             what => "previous"
}
	}
}
filter {
	xml {
    source => "message"
    target => "Multed"
	xpath =>["/Multed/Talk/Segment/@id","id",
		"/Multed/Talk/Segment/Original_text/text()","original_text"
		
	]
  }
       mutate { 
            remove_field => [ "message" ] 
        
            add_field => ["IDIndexed", "%{id}"] 
            add_field => ["Original_text", "%{original_text}"]           
                         
						}}
output{
    elasticsearch{
        hosts => ["localhost:9200"]
        index => "indexXml"
    }
    stdout{
	codec => rubydebug
    }
}
 
 
Error in %{id} and %{original_text}
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    June 4, 2018,  6:38pm
                   
                   
              5 
               
             
            
              Please don't post images of text as they are hardly readable and not searchable.
Instead paste the text and format it with </> icon. Check the preview window.
I moved your question to #logstash  
             
            
               
               
               
            
            
           
          
            
              
                Hana_Ne  
                (Hana Ne)
               
              
                  
                    June 4, 2018,  6:43pm
                   
                   
              6 
               
             
            
              Ok 
{ 
"type" => "xml", 
"IDIndexed" => "%{id}", 
"@timestamp " => 2018-06-04T18:16:49.466Z, 
"host" => "Dev-PC", 
"path" => "C:/Users/Dev/Desktop/file1.xml", 
"@version " => "1", 
"Original_text" => "%{original_text}", 
"tags" => [ 
[0] "multiline", 
[1] "multiline_codec_max_lines_reached", 
[2] "_xmlparsefailure" 
] 
}
             
            
               
               
               
            
            
           
          
            
            
              The multiline codec is incorrectly configured. Which line from the XML file is ^<\?Multed .*\> supposed to match?
             
            
               
               
               
            
            
           
          
            
              
                Hana_Ne  
                (Hana Ne)
               
              
                  
                    June 5, 2018, 12:38am
                   
                   
              8 
               
             
            
              `^<\?Multed .*\>`  is root of document
        <Multed>
        <Talk Speaker = "Alastair Parvin" Title= " Architecture for the people by the people" >
        	<Segment id ="1" >
        		<Time-slot>00:00:12,884 --> 00:00:16,053</Time-slot>
        		<Original_text lang="en"></Original_text>
        		<Translation lang="ar"></Translation>
        		<Translation lang="fr"></Translation>
        	</Segment>
        </Talk>
        </MulTed> 
             
            
               
               
               
            
            
           
          
            
            
              The regular expression <\?Multed .*\> does not match any of the lines in your example document.
             
            
               
               
               
            
            
           
          
            
              
                Hana_Ne  
                (Hana Ne)
               
              
                  
                    June 5, 2018,  4:28pm
                   
                   
              10 
               
             
            
              what's regular expression is correct
             
            
               
               
               
            
            
           
          
            
              
                Badger  
                
               
              
                  
                    June 5, 2018,  5:18pm
                   
                   
              11 
               
             
            
              You could try
        codec => multiline {
            pattern =>  "^<MulTed>"
            negate => "true"
            what => "previous"
            auto_flush_interval => 2
        }
 
             
            
               
               
               
            
            
           
          
            
              
                Badger  
                
               
              
                  
                    June 5, 2018,  5:47pm
                   
                   
              13 
               
             
            
              If the XML is indented then get rid of the start-of-line anchor and use
pattern =>  "<MulTed>" 
             
            
               
               
               
            
            
           
          
            
              
                Hana_Ne  
                (Hana Ne)
               
              
                  
                    June 5, 2018,  5:56pm
                   
                   
              14 
               
             
            
              The same error  value
        input {
            file {
        		path => "C:/Users/Dev/Desktop/file1.xml"
        		start_position => "beginning"
        		sincedb_path => "/dev/null"
        		type => "xml"
        		   codec => multiline {
                    pattern =>  "<MulTed>"
                    negate => "true"
                    what => "previous"
                    auto_flush_interval => 2
                }
        	}
        }
        filter {
    		
    	xml {
        source => "Talk"
        target => "MulTed"
    	xpath =>["MulTed/Talk/Segment/@id","id",
    		"MulTed/Talk/Segment/Original_text/text()","original_text"]
      }
           mutate { 
                remove_field => [ "message" ] 
            
                add_field => ["IDIndexed", "%{id}"] 
                add_field => ["Original_text", "%{original_text}"]           
                             
    						}}
    output{
        elasticsearch{
            hosts => ["localhost:9200"]
            index => "senind"
        }
        stdout{
    	codec => rubydebug
        }
    }
 
{ 
"tags" => [ 
[0] "multiline" 
], 
"Original_text" => "%{original_text}", 
"@version " => "1", 
"path" => "C:/Users/Dev/Desktop/file1.xml", 
"type" => "xml", 
"host" => "Dev-PC", 
"@timestamp " => 2018-06-05T17:54:47.115Z, 
"IDIndexed" => "%{id}" 
}
             
            
               
               
               
            
            
           
          
            
              
                Badger  
                
               
              
                  
                    June 5, 2018,  6:27pm
                   
                   
              15 
               
             
            
              There is no error there. The multiline codec worked.
             
            
               
               
               
            
            
           
          
            
              
                Hana_Ne  
                (Hana Ne)
               
              
                  
                    June 5, 2018,  6:29pm
                   
                   
              16 
               
             
            
              But the value of% {id} and %{original_text} is not insert
             
            
               
               
               
            
            
           
          
            
              
                Badger  
                
               
              
                  
                    June 5, 2018,  6:33pm
                   
                   
              17 
               
             
            
              That's because your xpath expressions are wrong. They refer to Multed, but the XML has MulTed. Or perhaps the other way around. Either way, it is case sensitive. Also, Original_text/text() is empty.
Note also that xpath always returns arrays, so you might want to
if [id] { mutate { replace => { "id" => "%{[id][0]}" } } } 
             
            
               
               
              1 Like 
            
            
           
          
            
              
                Badger  
                
               
              
                  
                    June 5, 2018,  9:51pm
                   
                   
              20 
               
             
            
              OK, so comment out 'remove_field => [ "message" ]' and show us what an event looks like, either using stdout { codec => rubydebug }, or copy and paste from the JSON event in Kibana.