Not able to parse large XML nmap scan with logstash-nmap-codec

Hi,

I've got some serieus issues with parsing relativly large nmap scan using the logstash-nmap-codec. In my situation I've got a ~1.8MB XML file with the resutls of the nmap scan which I want to import but I've got issues importing it when using either stdin or HTTP as input.

    input {
            stdin {
                    codec => nmap {
                            emit_hosts => true
                            emit_ports => true
                            emit_traceroute_links => true
                            emit_scan_metadata => true
                    }
            }
    }

    output {
            elasticsearch {
                    hosts => [ "127.0.0.1:9200" ]
                    index => "nmapdebug"
                    codec => "json"
            }
    }

When using the above configuration, Logstash fails with the following error:

[INFO ] 2018-08-09 16:35:41.861 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9603}
[ERROR] 2018-08-09 16:35:42.093 [[main]<stdin] pipeline - A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::Stdin codec=><LogStash::Codecs::Nmap emit_hosts=>true, emit_ports=>true, emit_traceroute_links=>true, emit_scan_metadata=>true, id=>"466103d9-1a18-4549-b34e-f016f96cdc41", enable_metric=>true>, id=>"10bfae584e3d4f5f26d5926b2455d9f9fa5068a0b3bad04ff4cd88d41af5d737", enable_metric=>true>
  Error: undefined method `attributes' for nil:NilClass
  Exception: NoMethodError
  Stack: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-codec-nmap-0.0.21/lib/logstash/codecs/nmap.rb:45:in `decode'
/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-stdin-3.2.6/lib/logstash/inputs/stdin.rb:38:in `run'
/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:512:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:505:in `block in start_input'

After some debugging, I found out that somehow the XML file is segmented in large parts which results in the Ruby-nmap gem to fail because the XML file is not valid. This in turn results in the above error. Even ensuring that the entire XML file is on 1 single line does not resolve the above issue. When using smaller nmap scan with smaller XML file, the codec works perfectly, this issue is only with larger files.

Due to this issue I've switched to the HTTP plugin as input with the following configuration:

    input {
            http {
                    host => "127.0.0.1"
                    port => 34568
                    codec => nmap {
                            emit_hosts => true
                            emit_ports => true
                            emit_traceroute_links => true
                            emit_scan_metadata => true
                    }
            }
    }

    output {
            elasticsearch {
                    hosts => [ "127.0.0.1:9200" ]
                    index => "nmapdebug"
                    codec => "json"
            }
    }

I've then used curl to upload the XML file. Which works with smaller files, but when using larger files I've got that curl timeouts before the entire file is processed. Resulting in only a part of the results being written to ES.

$ cat foobar.xml | curl -H "x-nmap-target: foobar" http://localhost:34568 --data-binary @-
curl: (52) Empty reply from server

I'm basically looking for a way to parse the entire Nmap XML file using Logstash and the logstash-codec-nmap, but somehow I don't seem to be able to do it.

Any help is highly appreciated.

Yes, a stdin input works in 16 KB chunks, and always emits an event at the end of a chunk.

A file input might work, use a multiline codec with a pattern that will never match and an auto_flush_interval longer than it takes to read the file.

file {
    path => "/path/to/file.xml"
    codec => multiline { pattern => "^Spalanzani" negate => true what => "previous" auto_flush_interval => 5 }
    start_position => "beginning"
    sincedb_path => "/dev/null"
}

Thanks for the input. However, I'm already using the nmap codec in order to parse the XML file. I'm unaware of any method to combine multiple codec's for a single input.

So, is basically the only method for resolving this issue to not use the nmap codec but the multiline codec, because unfortunately that is not an option for me?

To have two codecs you need two inputs. So run multiple pipelines. One with a file input with a multiline codec, and write that to a tcp output bound to localhost, then use a tcp input with an nmap codec. Does that work?

So I've tried the following configuration. Basically 1 logstash instance using file as input with multiline codec and writing to tcp port, and another logstash instance reading on tcp and using the nmap codec.

However, although I verified that the first instance works as expected using rubydebug that the entire XML is written into single event. I've still got the same issue on the second instance. Mainly that the input data from the tcp input is segmented into multiple parts which results in the nmap-codec failing.

So unfortunately your proposed solution did not work, any other suggestions are welcome.

If you are on a current version the beta pipeline to pipeline communications might work. The other pairs of matched input/output pairs are UDP (extremely unlikely), HTTP, kafka, and rabbitMQ. It is possible one of them would handle it better than TCP.

To be honest ingesting entire files as a single event is not what logstash was designed for, and it works right up to the point where it stops working, which you may have reached.

Thanks for the info. I'll try HTTP as a final attempt.

I'm inclined to agree with you that I'm taking Logstash as far as it would ago and that this is basically the end. Which is fine, but I'll contact the developer of the nmap-codec so that some info can be added to, for example, the github page which states that the codec works fine but that Logstash & nmap-codec are not meant for 'larger' Nmap scans.

Okay, so using the HTTP output / input does seem to work, using the following configuration for the receiving instance. So as input on HTTP I've received a complete file, however instead of the nmap-codec parsing the data the result in ES was just the multiline event, instead of multiple events from nmap-codec.

    input {
            http {
                    host => "127.0.0.1"
                    port => 34569
                    codec => nmap {
                            emit_hosts => true
                            emit_ports => true
                            emit_traceroute_links => true
                            emit_scan_metadata => true
                    }
            }
    }


    output {

            elasticsearch {
                    hosts => [ "127.0.0.1:9200" ]
                    index => "nmap"
                    codec => "json"
            }

            stdout { codec => rubydebug }
    }

So any advice on how to get the data from the multiline event and ensure that it parsed by the nmap codec?

Yeah, it looks like an http input defaults to the plain codec. However, if you can set a content-type option on the output you can use the additional_codecs option to map that to nmap.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.