Logstash configuration not creating index in elasticseach

i am trying to create index based on quarter and each document is unique in elasticsearch via logstash. the data is coming from multiple xml files. there seems to be no error in configuration file and pipeline is started but i am unable to see index created in elasticsearch.

ELK version is 7.2
Java is 1.8
OS : windows 10

Please find below my logstash configuration:-

input {
file
{
path => "D:\xml*.xml"
start_position => "beginning"
sincedb_path => "NUL"
exclude => "*.gz"
type => "xml"
codec => multiline {
pattern => "<?xml"
negate => "true"
what => "previous"
}
}
}

filter
{

fingerprint {
    method => "SHA256"
    source => ["abc", "def", "ghi" ]
    concatenate_sources => true
    target => "documentId"
}

xml {
source => "message"
store_xml => false
xpath => ["/Metadata/alphabet/BILL", "bill"]
}

split {
field => "bill"
}

xml
{
source => "message"
store_xml => false
xpath => [
"/bill/abc/text()", "abc",
"/bill/def/text()", "def",
"/bill/ghi/text()", "ghi",
"/bill/jkl/text()", "jkl" ]
}

ruby {
code => "
event.set('quarter','q' + ((Time.now.month/3.0).ceil).to_s + '-' + Time.now.year.to_s )"
}

}

output
{
elasticsearch
{
codec => json
hosts => "localhost:9200"
index => "customerdoc-%{quarter}"
document_id => "%{documentId}"
}
stdout
{
codec => rubydebug
}
}

Do not use backslash in the path option of a file input, use forward slash.

Thanks @Badger . that 's correct.

Now the index is created but all the xml is not split into multiple documents but a single document is created and the entire xml is copied to the message field.

Unless you can show us what the XML looks like it will be hard to help you.

<?xml version="1.0"?>
<Metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='XML.xsd'>
  <Total-Bills-Processed>165</Total-Bills-Processed>
  <BILL-DATA>
    <BILL>
  <abc>191560004687</abc>
  <def>08012018</def>
  <ghi>906202553</ghi>
  <jkl>191560004687.08012018.PROD.pdf</jkl>
  </BILL>
 <BILL>
  <abc>191560002058</abc>
  <def>06012019</def>
  <ghi>909168139</ghi>
  <jkl>191560002058.06012019.PROD.pdf</jkl>
</BILL>
 </BILL-DATA>
</Metadata>

required output in elasticsearch should be :-

doc1
{
abc:191560004687
def:08012018
ghi:906202553
jkl:191560004687.08012018.PROD.pdf
}

doc2
{
abc:191560002058
def:06012019
ghi:909168139
jkl:191560002058.06012019.PROD.pdf
}

The first problem is your multiline codec. It produces a single event that only contains

   "message" => "<?xml version=\"1.0\"?>",

There is no "<?xml..." line at the end of the file to trigger the rest of the file to be flushed to the pipeline. If you add auto_flush_interval => 1 then it will get flushed after a 1 second timeout.

Personally I would ingest the entire file as a single event by using a pattern that does not match plus a timeout.

codec => multiline {
    pattern => "^Spalanzani"
    negate => true
    what => previous
    auto_flush_interval => 1
}

Once you have an event to work on this bit works OK

    xml {
        source => "message"
        store_xml => false
        xpath => ["/Metadata/BILL-DATA/BILL", "bill"]
    }
    split {
        field => "bill"
    }

Although I would add 'remove_field => [ "message" ]' to the xml filter so that message is removed if it is successfully parsed.

At that point I would have chosen to do

    xml { source => "bill" target => "[@metadata][theXML]" force_array => false remove_field => ["bill"] }
    ruby { code => 'event.get("[@metadata][theXML]").each { |k, v| event.set(k, v) }' }

(which is really a matter of taste) which will get you

{
      "tags" => [
    [0] "multiline"
],
       "jkl" => "191560002058.06012019.PROD.pdf",
       "abc" => "191560002058",
"@timestamp" => 2019-08-02T18:31:17.751Z,
       "def" => "06012019",
       "ghi" => "909168139"
}

Thanks @Badger , but in the logstash console , the output looks fine but when i view the index in kibana , only the last document(1 document) in the last file among the list of files is visible. could you help with that?

Also can you tell me why below part of the code does not work :-

xml
{
source => "bill"
store_xml => false
xpath => [
"/bill/abc/text()", "abc",
"/bill/def/text()", "def",
"/bill/ghi/text()", "ghi",
"/bill/jkl/text()", "jkl" ] 
 remove_field => ["bill"] 
}

It does not work because element names are case sensitive. bill cannot be used to refer to BILL, which is what your document has. Also, you probably want force_array => false on that.

thanks @Badger

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.