Logstash enrichement: URLHaus

Hello ,

I would like to enrich my SIEM with information from URLhaus,
I configured my logstash pipline like this:

input {
  exec {
    command => 'curl https://urlhaus.abuse.ch/downloads/csv/'
    interval => 86400
    type => 'iphaus'
  }
}
filter {
  if [type] == "iphaus" {
    csv {
      columns => ["id","dateadded","url","url_status","threat","tags","urlhaus_link","reporter"]
      separator => ","
    }
  }
}

output {
  elasticsearch {
    hosts => ["https://X.X.X.X:9200"]
    index => "malware-%{+YYYY.MM.dd}"
    cacert => 'ca.crt'
    user => "elastic"
    password => "password"
  }
}

And when I run it with the command:

sudo bin/logstash -f /etc/logstash/conf.d/main.conf

I get some output like that, where we can see at the end that there is an exception CSV::MalformedCSVError: Illegal quoting in line 1 :

\\xFAB\\xED!\\x8C~\\x85L\\xC1\\x882\\xF9\\u0018\\xC8P灳\\xC2)eV1l\\x8B\\xFA%\\x94\\xAC\\u0011\\xC9X\\u001D\\xE5\\f\\xAA\\u0011\\xAE\\xA9\\xBC+\\xCD\\f\\x95?\\a\\xAEtr\\u0010+\\x98\\xD6\\xE4\\xA2%\\a\\tO9+\\xD4\\xD5\\u0016%\\xDFP\\xF8\\xEB\\v\\xB6\\u001E`\\xFA\\xE2\\xEC\\fd\\xA4\\xFAƚ=\\xDC\\xC8\\xFFH\\xD4|ߩ v\\xB5Dh\\x84\\xC2\\u0018a\\x80\\x81a\\xB0\\x81\\x9C\\u0015\\xF9\\xCEl\\u0013/\\x92$\\x8B'\\xF0}\\xED\\u0011\\xF6+V\\xA6`\\u001A^\\x88\\xE0\\xAA\\xF5\\u0016\\xE6\\xFD\\x89I\\u0006_\\xA6\\xFB\\xB6i\\u0013\\xB2\\xB3L\\\"\\xC2\\xFC\\xBB`\\f\\x94\\xDE<k\\x84\\xFC\\x98\\xF6(d*s\\xF0,\\xAE\\x89eX\\x98x\\xE7cX\\v3Z`E!\\x83JR֎\\xFC\\xC0\\x82(j*\\f\\u00185\\\"!\\xAC\\xB3\\\\\\x89\\xF20\\xDF\\xCE\\eC\\xA8G\\xE95Yѣ,S\\xDCW\\teO\\xB6y\\xA9\\x98\\xC66\\u0006JWy\\xEE\\x81\\xDFMDΜu߭Y5Y\\xD7\\xD3}\\xBE\\xC9\\xF64\\xB1\\xCDWC<\\u000E\\u0013\\e$^K\\xA3x-d\\xCA/\\x83\\xE76\\xE7\\xE2e\\be\\u0010\\xE2\\x9C\\u0006\\xF3\\x99\\xEBOJ\\xC1l\\x94\\xAE1?BG\\xA6\\u0018\\xF8\\xFE\\f\\xED\\xF0\\xDB0P5Q\\x93\\xFE\\n\\xD8\\u001D-\\xEC\\x8F=\\xE1)\\xD7Qhg\\xBDل\\u0000,ՙ\\u0005\\u0012\\xE2;Q\\xBD\\xFFh\\u007F\\x94\\r\\xCDQ`h\\xDA*2\\xF8\\u001CX\\x95\\x963\\bW\\u0014s7y\\xF8+\\xFDn\\u007FR\\fu\\u0015\\u0011.\\b\\u0001}=\\x92j\\x9A>\\xA1\\xA4\\x99\\xB0\\u0004\\xF2\\x8B\\xF7\\xF7\\xB7\\u001F\\xA7\\x9F\\u07BE\\xBB\\x9D\\xFB\\x8B\\x99\\xFFx\\xFB\\xC7\\xCD\\xE7{%\\xA4\\xE6z\\x8E\\xFA\\xA0\\x85\\e\\xCC]w.\\xFD\\xF8\\x8F\\xC7\\u001Fw\\xCE\\xDB?\\xDE_\\xE4..k5\\xA2\\u0012\\xF7(|](\\xB4\\xE6\\r\\x91\\t\\u007FY\\x93J\\x9D\\xEB\\xC7\\xE8\\u001D\\x9F^rs/\\xEB\\x9D }\\xA2\\xE0g\\x88\\u0004\\xEC\\u0006\\x95\\xCF\\xF9X39E\\xD3\\xF6\\xCB%(p\\u001E\\u0005\\xA7\\x99\\xE1\\x9C\\u0016\\u0005W_\\u001C\\u0019\\xEC\\u0001\\xB7x\\xC8FԼ\\x85s\\xA3\\x8E\\x91\\xC9H\\xE0\\u0018\\xBFx!(\\xE3\\xCD\\xED\\xC7\\xFC\\xC8X\\x8F!\\xD7k\\xF3+J\\xBA\\xB6\\xA7\\xDBCj\\xA7\\xF7F\\xD7\\xD3\\u0015tF\\xDD\\xE8:\\x9D\\xD1s\\xC7\\xF2?PK\\u0001\\u0002\\u001E\\u0003\\u0014\\u0000\\u0002\\u0000\\b\\u0000\\\\N\\x9CQ\\xE69G\\xEF\\xA0\\xF9G\\u0001XZ\\x85\\t\\a\\u0000\\u0018\\u0000\\u0000\\u0000\\u0000\\u0000\\u0001\\u0000\\u0000\\u0000\\xB4\\x81\\u0000\\u0000\\u0000\\u0000csv.txtUT\\u0005\\u0000\\u0003\\x80\\xAA\\xE9_ux\\v\\u0000\\u0001\\u0004\\xE8\\u0003\\u0000\\u0000\\u0004\\xE8\\u0003\\u0000\\u0000PK\\u0005\\u0006\\u0000\\u0000\\u0000\\u0000\\u0001\\u0000\\u0001\\u0000M\\u0000\\u0000\\u0000\\xE1\\xF9G\\u0001\\u0000\\u0000", :**exception=>#<`CSV::MalformedCSVError: Illegal quoting in line 1`.**>}

and when I try to visualise the information in Kibana, Kibana keep searching and freeze like that and cometimes crashes (and everything be okay when I delete the malware index created) :

Could you please help me to configure my pipeline

Thanks :slight_smile:

try this:

input {
  exec {
    command => 'curl https://urlhaus.abuse.ch/downloads/csv/'
    interval => 86400
    type => 'iphaus'
    codec => gzip_lines
  }
}

Thanks for your answer @fadjar340

I am getting an error while trying that :

[ERROR] 2020-12-28 15:33:53.922 [Converge PipelineAction::Create<main>] registry - Tried to load a plugin's code, but failed. {:exception=>#<LoadError: no such file to load -- logstash/codecs/gzip_lines>, :path=>"logstash/codecs/gzip_lines", :type=>"codec", :name=>"gzip_lines"}

I solved the problem of gzip_lines by installing it with the command:

bin/logstash-plugin install logstash-codec-gzip_lines

and now when I run logstash with the command

sudo bin/logstash -f /etc/logstash/conf.d/main.conf

I am getting errors saying that the input is not in a gzip format

 [ERROR] 2020-12-28 15:48:05.130 [[main]<exec] javapipeline - A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::Exec codec=><LogStash::Codecs::GzipLines id=>"gzip_lines_485b20a1-8af7-4dcf-b00d-4cff8bd33965", enable_metric=>true, charset=>"UTF-8">, interval=>86400, id=>"9a18761f4d770b81af3c998b16d198c4ca283f03a282d8020ffe00d22b87604d", type=>"iphaus", command=>"curl https://urlhaus.abuse.ch/downloads/csv/", enable_metric=>true>
  Error: not in gzip format
  Exception: Zlib::GzipFile::Error
  Stack: org/jruby/ext/zlib/JZlibRubyGzipReader.java:148:in `initialize'
org/jruby/ext/zlib/JZlibRubyGzipReader.java:92:in `new'
org/jruby/ext/zlib/JZlibRubyGzipReader.java:83:in `new'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-gzip_lines-3.0.4/lib/logstash/codecs/gzip_lines.rb:35:in `decode'
/usr/share/logstash/logstash-core/lib/logstash/codecs/delegator.rb:62:in `block in decode'
org/logstash/instrument/metrics/AbstractSimpleMetricExt.java:65:in `time'
org/logstash/instrument/metrics/AbstractNamespacedMetricExt.java:64:in `time'
/usr/share/logstash/logstash-core/lib/logstash/codecs/delegator.rb:61:in `decode'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-exec-3.3.3/lib/logstash/inputs/exec.rb:82:in `execute'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-exec-3.3.3/lib/logstash/inputs/exec.rb:52:in `run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:405:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:396:in `block in start_input'

Perhaps, you need to see this for zip format

1 Like

Thanks for your help @fadjar340

So now it's working perfectly and my configuration looks like that:

input {
  exec {
    command => 'curl https://urlhaus.abuse.ch/downloads/csv/ --output text.zip && unzip -c text.zip'
    interval => 86400
    type => 'iphaus'
    codec => line
  }
}
filter {
  if [type] == "iphaus" {
    csv {
      columns => ["id","dateadded","url","url_status","threat","tags","urlhaus_link","reporter"]
      separator => ","
    }
    mutate {
      remove_field => ["message"]
    }
  }
}

output {
  elasticsearch {
    hosts => ["https://X.X.X.X:9200"]
    index => "malware-%{+YYYY.MM.dd}"
    cacert => 'ca.crt'
    user => "elastic"
    password => "password"
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.