Python json array ... minor problems

Elasticstack 7.3
I did a bulk import some info from shodan in json form, this is one of the hosts...

{
  "host": {
    "data": [
      {
        "banner": "421 4.3.2 Service not available\r\n",
        "port": 25,
        "product": "Microsoft Exchange 2010 smtpd",
        "shodan_module": "smtp"
      }
    ],
    "hostname": "smtp21",
    "ip": "ip address",
    "org": "organization",
    "os": "None given",
    "updated": "2019-08-07T11:07:00.616447",
    "vulns": [
      "CVE-2018-8581"
    ]
  }
}

But I get this in kibana...
image

it seemed to go into ES OK and created the index with the right fields

I know this used to be a kibana problem, and I found this plugin-


but it only goes up to 6.4.2
and I guess this is related to it -

but it seems to be around a year old.

This is the bulk python code I used that does seem to get the data in there -

import requests, json, os
from elasticsearch import Elasticsearch, helpers
from pprint import pprint
import uuid

user = 'username'
password = 'password'
# ES = "http://ipaddress:9200"
es = Elasticsearch([{'host': 'ipaddress', 'port': 9200}],http_auth=(user, password))
data = r'/mnt/c/Users/money/Documents/Python/modules/data.json'
with open(data, "r") as json_file:
    nodes = json.load(json_file)

actions = [
    {
    "_index" : "shodan",
    "_type" : "external",
    "_id" : uuid.uuid4(),
    "_source" : node
    }
    for node in nodes['hosts']
]

try:
    response = helpers.bulk(es, actions, index="shodan",doc_type='_doc')
    print ("\nRESPONSE:", response)
except Exception as e:
    print("\nERROR:", e)

Is there no way to fix I guess what amounts to an array within an array?
Sorry for such a long topic, I can't believe no one has come up with a good solution yet.

Is the easiest solution just to use filebeat?

Hey @mathurin68, using Filebeat to ingest this data without relying upon the nested datatype is the clearest path forward. Kibana lacks nested field support and despite how frequently the request comes up, it's a rather complex undertaking.

Thanks @Brandon_Kobel! Still working this and playing around with it -- my python gives me this json to get into ES -

  "host": {
    "data": [
      {
        "banner": "HTTP/1.1 404 Not Found\r\nContent-Type: text/html; charset=us-ascii\r\nServer: Microsoft-HTTPAPI/2.0\r\nDate: Fri, 06 Sep 2019 21:18:05 GMT\r\nConnection: close\r\nContent-Length: 315\r\n\r\n",
        "http": {
          "server": "Microsoft-HTTPAPI/2.0"
        },
        "port": 443,
        "product": "Microsoft HTTPAPI httpd",
        "shodan_module": "https",
        "version": "2.0"
      }
    ],
    "hostname": "hostname1",
    "ip": "1.1.1.1",
    "org": "OIC",
    "os": "None given",
    "updated": "2019-09-06T21:17:54.362743",
    "vulns": "None given"
  }
},
{
  "host": {
    "data": [
      {
        "banner": "HTTP/1.1 404 Not Found\r\nContent-Type: text/html; charset=us-ascii\r\nServer: Microsoft-HTTPAPI/2.0\r\nDate: Sun, 18 Aug 2019 17:09:36 GMT\r\nConnection: close\r\nContent-Length: 315\r\n\r\n",
        "http": {
          "server": "Microsoft-HTTPAPI/2.0"
        },
        "port": 443,
        "product": "Microsoft HTTPAPI httpd",
        "shodan_module": "https",
        "version": "2.0"
      }
    ],
    "hostname": "hostname2",
    "ip": "2.2.2.2",
    "org": "OIC",
    "os": "None given",
    "updated": "2019-08-18T17:09:36.774406",
    "vulns": "None given"
  }

.... al I want to dump into ES is under all the host records

"host": {

so I modified my filebeat

- type: log
  enabled: true
  paths:
    - /opt/filebeat/*.json
  fields_under_root: true
  multiline.pattern: '^"host": {'
  multiline.negate: true
  multiline.match: after
  fields:
    tags: ['shodan']

... and this to get the arrays decoded

processors:
  - decode_json_fields:
      fields: ["data","vulns"]
      process_array: true
      max_depth: 3
      #target: ""
      overwrite_keys: false

Nothing that I'm doing is working... I've tried various versions of the above. I can't get the different json fields into ES fields, and I can't get it to go through the arrays.

The help is much appreciated!

You can try to add this into processors

  - extract_array
      field: host.data
      mappings:
         host.data: 0

to extract host.data element 0 back into host.data. Since element 0 is a map I think it might just get rid of the array wrapper and expose those as nested fields.

Thanks @slavo , I'll have to try this when I get home.

Problems -

  1. Sending the Json directly to ES broke the fields out for me but not the array's.
  2. Filebeat > ES isn't even breaking anything into fields, it all either comes in as one big field OR as one line a time neither of which works.

I thought I didn't have to add the codec to logstash

`codec => "json"`

but I'll try that tonight or on Monday.

Almost every security tool we have outputs information in JSON, most of them have the nested JSON which apparently doesn't seem to work with directly input to ES, plus I will probably need to manipulate sometthing in logstash as well.

Still learning all this stuff... I'll post it here once it's figured out.

You don't have to use logstash for everything. Did you look at ingest pipelines? https://www.elastic.co/guide/en/elasticsearch/reference/current/pipeline.html

Briefly, the problem is time and learning this stuff. I'm already somewhat comfortable with logstash, and I'd like to keep everything going into sort of the same pipeline.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.