Log file parsing and ingesting into ES

I have a log file with 1 root of xml as below. I have created a logstash parser and tested it out in manual mode and it seems to work fine. But while ingesting into ES and get the error.

Input file:

    <?xml version="1.0" encoding="utf-8"?>
<trouble_shooter_log xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="TroubleShooterLog.xsd"
	version="1.0" >
<event date="2020-07-21" time="03:17:36" line="814" text="Diameter peer connection up"/>
<event date="2020-07-21" time="03:17:39" line="301" text="tcpproxy;"/>
<event date="2020-07-21" time="03:23:53" line="253" text="Http Client with 7 worker threads started"/>
</trouble_shooter_log>

Logstash configuration:

input {
file
{
    path => "/root/h.log"
    start_position => "beginning"
        codec => multiline {
                        pattern => "^<event "
                        negate => "true"
                        what => previous
                        auto_flush_interval => 1
        }
  }
}

filter {
if "<event " in [message]{
                xml {
                        namespaces => {
                                "xsi" => "http://www.w3.org/2001/XMLSchema-instance"
                        }
                        store_xml => true
                        source => "message"
                        target => "parsed"
                }
}
}
output {
        stdout{}
}

Error wen trying to ingest into ES:

[ERROR] 2020-09-07 14:38:58.833 [Converge PipelineAction::Create<main>] agent - Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}

log file without ingestion shows multiline tag for the first xml line

{
          "host" => "elk01.novalocal",
    "@timestamp" => 2020-09-07T19:41:42.413Z,
      "@version" => "1",
       "message" => "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<trouble_shooter_log xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n\txsi:noNamespaceSchemaLocation=\"TroubleShooterLog.xsd\"\n\tversion=\"1.0\" >",
          **"tags" => [**
**            [0] "multiline"**
    ],
          "path" => "/root/h.log"
}
{
          "host" => "elk01.novalocal",
        "parsed" => {
        "line" => "814",
        "time" => "03:17:36",
        "date" => "2020-07-21",
        "text" => "Diameter peer connection up"
    },
    "@timestamp" => 2020-09-07T19:41:42.454Z,
      "@version" => "1",
       "message" => "<event date=\"2020-07-21\" time=\"03:17:36\" line=\"814\" text=\"Diameter peer connection up\"/>",
          "path" => "/root/h.log"
}

What else is in the log file? I would expect there to be another ERROR message.

Your multiline configuration looks wrong to me. It will consume the first 4 lines as one event, then the first two event elements will be flushed as events, then the last two lines. How about

pattern => "</trouble_shooter_log>"
negate => true
what => next

@Badger
This is what I get when I run:

[root@elk01 ~]# sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/sample.conf
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2020-09-07 15:02:23.210 [main] runner - Starting Logstash {"logstash.version"=>"7.9.1", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.141-b16 on 1.8.0_141-b16 +indy +jit [linux-x86_64]"}
[WARN ] 2020-09-07 15:02:23.684 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-09-07 15:02:27.310 [Converge PipelineAction::Create] Reflections - Reflections took 52 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-09-07 15:02:30.570 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>, :added=>[http://10.111.13.209:9200/:9200]}}
[WARN ] 2020-09-07 15:02:30.888 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://10.111.13.209:9200/:9200"}
[ERROR] 2020-09-07 15:02:31.256 [Converge PipelineAction::Create] agent - Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create, action_result: false", :backtrace=>nil}
[INFO ] 2020-09-07 15:02:31.800 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
[INFO ] 2020-09-07 15:02:36.491 [LogStash::Runner] runner - Logstash shut down.

@Badger: Also I get error "tags" => [ [0] "multiline", [1] "_xmlparsefailure" when use pattern => "</trouble_shooter_log>". I guess because my xml is a bit differet. There's no end to for every event.

I do not know why you would get that error with no additional information about why it failed.

@Badger

With the below input file. What would be the logstash config? I guess the xml header line is coming with xml parser failure which is causing the ES ingesting error.

<?xml version="1.0" encoding="utf-8"?>
<trouble_shooter_log xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="TroubleShooterLog.xsd"
	version="1.0" >
<event date="2020-07-21" time="03:17:36" line="814" text="Diameter peer connection up"/>
<event date="2020-07-21" time="03:17:39" line="301" text="tcpproxy;"/>
<event date="2020-07-21" time="03:23:53" line="253" text="Http Client with 7 worker threads started"/>
</trouble_shooter_log>

I am getting below error when writing to stdout.

{
    "@timestamp" => 2020-09-07T20:48:10.353Z,
       "message" => "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<trouble_shooter_log xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n\txsi:noNamespaceSchemaLocation=\"TroubleShooterLog.xsd\"\n\tversion=\"1.0\" >\n<event date=\"2020-07-21\" time=\"03:17:36\" line=\"814\" text=\"Diameter peer connection up\"/>",
      "@version" => "1",
          "tags" => [
        [0] "multiline",
        [1] "_xmlparsefailure"
    ],
          "path" => "/root/p.log",
          "host" => "elk01.novalocal"
}

If you want to ingest the entire file as one event then use a pattern that does not match.

codec => multiline { pattern => "^Spalanzani" negate => true what => "previous" auto_flush_interval => 2 }

If you have a variety of xml objects in a file you could try

pattern => "^<?xml"
negate => true
what => previous

@Badger : I just want the date,time,line and text of every event line to be parsed and ingested in elasticsearch as fields.

Then do not use a multiline codec at all. Each of those is valid XML.

@Badger. I did that as well. But getting "tags" => [ [0] "multiline", [1] "_xmlparsefailure" for the xml header and trouble_shooter_log head.

You are saying you are getting a multiline tag even when you remove the multiline codec?!

I tried without multiline codec with below config and get the error with _xmlparsefailure.

Error parsing xml with XmlSimple {:source=>"message", :value=>"\txsi:noNamespaceSchemaLocation="TroubleShooterLog.xsd"", :exception=>#<ArgumentError: File does not exist: xsi:noNamespaceSchemaLocation="TroubleShooterLog.xsd".>, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:996:in find_xml_file'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:168:in xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in xml_in'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-xml-4.1.1/lib/logstash/filters/xml.rb:195:in filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159:in do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:178:in block in multi_filter'", "org/jruby/RubyArray.java:1809:in each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:175:in multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:134:in multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:293:in block in start_workers'"]}

input {
   file {
      path => "/root/sample7.xml"
      start_position => "beginning"
      type => "xml"
   }
}
filter {
   if [type] == "xml" {
      xml {
         namespaces => {
            "xsi" => "http://www.w3.org/2001/XMLSchema-instance"
         }
         source => "message"
         store_xml => true
         target => "parsed"
      }
   }
}
output {
   stdout{}
}

Then I tried to filter out the xml header as file was not found by adding below line in filter and it worked fine.

if ([message] !~ "<event ") {
      drop { }
   }

Then when I tried writing same to ES: got error
[Converge PipelineAction::Create] agent - Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create, action_result: false", :backtrace=>nil}

@Badger
@Badger I need to use multiline as I found my input also has multiline data(with \n) without closing tags as shown below.

<?xml version="1.0" encoding="utf-8"?>
<trouble_shooter_log xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="TroubleShooterLog.xsd"
	version="1.0" >
<event date="2019-08-28" time="09:33:05.637996261" object="PAL" type="note" process_name="nodesupervisor" process_identity="7301" thread_name="Script Executor" thread_identity="7342" file="PalVioConfigData.h" line="496" text="Invalid IP address: #networkname#ipv6list#Internet# in prsTcpClientSourceIpV4/6 parameter."/>
<event date="2019-08-28" time="09:33:05.638065493" object="VioConfigData" type="note" process_name="nodesupervisor" process_identity="7301" thread_name="Script Executor" thread_identity="7342" file="PalVioConfigData.cpp" line="934" text="In networknames.json:
bond0 -- OAM, 
bond3 -- Internal, 
vlan3823 -- Radius, 

Network: Internet, Interface: 
"/>
</trouble_shooter_log>

I used the below config file, I got a multiline tag for second line. Is that fine? Will multiline tag cause problem when ingesting into ES?

input {
   file {
      path => "/root/aa8.xml"
      start_position => "beginning"
      sincedb_path => "/dev/null"

      codec => multiline {
         pattern => "^\<event.*"
         negate => true
         what => previous
         auto_flush_interval => 1
      }
   }
}
filter {
      if ([message] !~ "<event ") {
      drop { }
   }
      xml {
         source => "message"
         force_array => false
         store_xml => true
         target => "dest"
      }
   }
output {
   stdout{}
}

But writing to ES isnt working still? is it because of the multiline tag for the second xml input? How to fix this? I ran with debug and this is what I get.

[DEBUG] 2020-09-08 11:24:50.555 [[main]-pipeline-manager] javapipeline - Pipeline terminated by worker error {:pipeline_id=>"main", :exception=>#<NoMethodError: undefined method []' for nil:NilClass>, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:485:in get_es_version'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:274:in block in healthcheck!'", "org/jruby/RubyHash.java:1415:in each'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:266:in healthcheck!'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:382:in update_urls'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:82:in update_initial_urls'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:76:in start'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client.rb:302:in build_pool'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client.rb:64:in initialize'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client_builder.rb:105:in create_http_client'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client_builder.rb:101:in build'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch.rb:307:in build_client'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/common.rb:23:in register'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:126:in register'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:68:in register'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:226:in block in register_plugins'", "org/jruby/RubyArray.java:1809:in each'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:225:in register_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:559:in maybe_setup_out_plugins'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:238:in start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:183:in run'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134:in `block in start'"], "pipeline.sources"=>["/etc/logstash/conf.d/sample.conf"], :thread=>"#<Thread:0x17046405 run>"}
[ERROR] 2020-09-08 11:24:50.586 [Converge PipelineAction::Create] agent - Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create, action_result: false", :backtrace=>nil}

At the point where it gets that error it is still creating the pipeline, so it has nothing to do with the events, since none of them exist at that point. It is trying to connect to elasticsearch and check the version number. It is making assumptions about the response being JSON and having a [number] field inside the [version] field which are not valid and result in an exception.

Do you have an elasticsearch output configured? If so, are you sure the host/port you are pointing it to are correct?

That was a silly mistake. I had a / mentioned in the hosts in output which was causing the error. Thanks for your help!