Ingest xml file using Logstash

Hi,

I am very new to Elastic search and Logstash. But I am trying to ingest using Logstash.
My xml file looks something like this

<control version='1'
<cust_details must_id='101'
cust_name="Stacy"
cust_city="Chicago"
cust_state="illinois </>
</control>

<prod_details prod_id="Prod123"
prod_name="Chips"
prod_aisle="B1"
prod_available="yes"
</prod_details>

<response trasaction_status="yes"
total amount="$1010"
</repsonse>

So this is what I have done so far

input {
beats{
port=>5044
}
fie{
path=>"file path.xml"
start_position=>"beginning"
type="xml"
codex=>multiline{
pattern=>"^<?control.*>"
negate="true"
what="previous"
auto_flush_interval=>1
max_lines=>3000
}
}
}
output{
Elasticsearch{
hosts=>["localhost:9200"]
index=>"Logstash"
}
}

I don't know how to process.

Anyone can help?

Please edit you post, select the XML, and click on </> in the toolbar above the edit panel. That will change the display from

<cust_details must_id="101"
cust_name="Stacy"
cust_city="Chicago"
cust_state="illinois </>

to

<cust_details must_id="101"
cust_name="Stacy"
cust_city="Chicago"
cust_state="illinois </>

then do the same for the logstash configuration.

The codex option should be codec.

Your pattern appears to be wrong. If you want to combine all the lines that follow the <control version="1" line up until the next occurrence of that pattern then use

pattern => "^<control"

The values for negate and what look OK.

Note that your prod_details element is not valid XML. If it really looks like that you will get parse failures.

Hi,

Can you give me a general idea on how to do it? I can't paste the actual xml file due to sensitivity of the file. But it pretty much is like the example I gave. It have different tags and various values in them.

Any leads would be helpful. There are no videos or links for proper parsing of xml data to Elastic search. So any leads you can give me now, I'll take it

If you want to consume the entire file as a single event then you can do something like this.

If your file contains multiple XML documents you must consume them separately. If they all start with <control then your multiline codec should be

codec => multiline {
    pattern=>"^<control"
    negate => "true"
    what => "previous"
    auto_flush_interval => 1
    max_lines => 3000
}

If you want to parse the entire message then just use

xml { source => "message" store_xml => true target => "theXML" }

If you need specific elements from the document you can use the xpath option instead of setting store_xml.

The xml filter typically just works. The hard part is tweaking the multiline codec so that each event contains a complete XML document.

is there a video or a document I can follow? just to understand the different field names and the values we can give in them

Considering my previous xml file, I would like to get different fields within each tag
say for example, we have the cust_details. I want to extract the different fields available in them - cust_name, cust_city, cust_state.
How do I achieve this

Use

xml { source => "message" store_xml => true target => "theXML" }

If you then need to move all the fields to the top level use a ruby filter, like this.

what is "target" in the above syntax?

The documentation covers that.

In the Logstash configuration I can see beats as the input.
You need to install and configure filebeat in your server to send the xml to Logstash.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.