Logstash configuration not creating index in elasticseach

Abdul_Gaffar_Shaikh · August 2, 2019, 1:24pm

i am trying to create index based on quarter and each document is unique in elasticsearch via logstash. the data is coming from multiple xml files. there seems to be no error in configuration file and pipeline is started but i am unable to see index created in elasticsearch.

ELK version is 7.2
Java is 1.8
OS : windows 10

Please find below my logstash configuration:-

input {
file
{
path => "D:\xml*.xml"
start_position => "beginning"
sincedb_path => "NUL"
exclude => "*.gz"
type => "xml"
codec => multiline {
pattern => "<?xml"
negate => "true"
what => "previous"
}
}
}

filter
{

fingerprint {
    method => "SHA256"
    source => ["abc", "def", "ghi" ]
    concatenate_sources => true
    target => "documentId"
}

xml {
source => "message"
store_xml => false
xpath => ["/Metadata/alphabet/BILL", "bill"]
}

split {
field => "bill"
}

xml
{
source => "message"
store_xml => false
xpath => [
"/bill/abc/text()", "abc",
"/bill/def/text()", "def",
"/bill/ghi/text()", "ghi",
"/bill/jkl/text()", "jkl" ]
}

ruby {
code => "
event.set('quarter','q' + ((Time.now.month/3.0).ceil).to_s + '-' + Time.now.year.to_s )"
}

}

output
{
elasticsearch
{
codec => json
hosts => "localhost:9200"
index => "customerdoc-%{quarter}"
document_id => "%{documentId}"
}
stdout
{
codec => rubydebug
}
}

Badger · August 2, 2019, 1:35pm

Do not use backslash in the path option of a file input, use forward slash.

Abdul_Gaffar_Shaikh · August 2, 2019, 5:00pm

Thanks @Badger . that 's correct.

Abdul_Gaffar_Shaikh · August 2, 2019, 5:02pm

Now the index is created but all the xml is not split into multiple documents but a single document is created and the entire xml is copied to the message field.

Badger · August 2, 2019, 5:31pm

Unless you can show us what the XML looks like it will be hard to help you.

Abdul_Gaffar_Shaikh · August 2, 2019, 6:11pm

<?xml version="1.0"?>
<Metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='XML.xsd'>
  <Total-Bills-Processed>165</Total-Bills-Processed>
  <BILL-DATA>
    <BILL>
  <abc>191560004687</abc>
  <def>08012018</def>
  <ghi>906202553</ghi>
  <jkl>191560004687.08012018.PROD.pdf</jkl>
  </BILL>
 <BILL>
  <abc>191560002058</abc>
  <def>06012019</def>
  <ghi>909168139</ghi>
  <jkl>191560002058.06012019.PROD.pdf</jkl>
</BILL>
 </BILL-DATA>
</Metadata>

required output in elasticsearch should be :-

doc1
{
abc:191560004687
def:08012018
ghi:906202553
jkl:191560004687.08012018.PROD.pdf
}

doc2
{
abc:191560002058
def:06012019
ghi:909168139
jkl:191560002058.06012019.PROD.pdf
}

Badger · August 2, 2019, 6:33pm

The first problem is your multiline codec. It produces a single event that only contains

   "message" => "<?xml version=\"1.0\"?>",

There is no "<?xml..." line at the end of the file to trigger the rest of the file to be flushed to the pipeline. If you add auto_flush_interval => 1 then it will get flushed after a 1 second timeout.

Personally I would ingest the entire file as a single event by using a pattern that does not match plus a timeout.

codec => multiline {
    pattern => "^Spalanzani"
    negate => true
    what => previous
    auto_flush_interval => 1
}

Once you have an event to work on this bit works OK

    xml {
        source => "message"
        store_xml => false
        xpath => ["/Metadata/BILL-DATA/BILL", "bill"]
    }
    split {
        field => "bill"
    }

Although I would add 'remove_field => [ "message" ]' to the xml filter so that message is removed if it is successfully parsed.

At that point I would have chosen to do

    xml { source => "bill" target => "[@metadata][theXML]" force_array => false remove_field => ["bill"] }
    ruby { code => 'event.get("[@metadata][theXML]").each { |k, v| event.set(k, v) }' }

(which is really a matter of taste) which will get you

{
      "tags" => [
    [0] "multiline"
],
       "jkl" => "191560002058.06012019.PROD.pdf",
       "abc" => "191560002058",
"@timestamp" => 2019-08-02T18:31:17.751Z,
       "def" => "06012019",
       "ghi" => "909168139"
}

Abdul_Gaffar_Shaikh · August 3, 2019, 7:22pm

Thanks @Badger , but in the logstash console , the output looks fine but when i view the index in kibana , only the last document(1 document) in the last file among the list of files is visible. could you help with that?

Also can you tell me why below part of the code does not work :-

xml
{
source => "bill"
store_xml => false
xpath => [
"/bill/abc/text()", "abc",
"/bill/def/text()", "def",
"/bill/ghi/text()", "ghi",
"/bill/jkl/text()", "jkl" ] 
 remove_field => ["bill"] 
}

Badger · August 3, 2019, 10:01pm

It does not work because element names are case sensitive. bill cannot be used to refer to BILL, which is what your document has. Also, you probably want force_array => false on that.

Abdul_Gaffar_Shaikh · August 13, 2019, 12:41pm

thanks @Badger

system · September 10, 2019, 12:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing data in elasticsearch through logstash using xml files Logstash	5	337	December 25, 2018
Index not being created in elasticsearch Logstash	2	1001	August 30, 2017
Logstash config not creating an index in elasticsearch Logstash	7	511	September 18, 2019
Logstash not creating index on elasticsearch Logstash	5	4704	May 17, 2017
Logstash does not create index in Elasticsearch (Windows 10) Logstash	3	545	August 26, 2021

Logstash configuration not creating index in elasticseach

Related topics