Help parsing nested JSON from VirusTotal

kolmai · August 30, 2017, 5:36pm

Hello everyone,
I'm trying to get started with Elastic stack and my first attempt is to index a complex JSON into ES using the following config has failed.

input {
  beats {
    port => 5044
    tags => "beats"
  }
}

filter {
  json {
    source => "message"
  }
}

output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

Perhaps there is some extra tailoring needed in order to make it work?
I've noticed that my problem is possibly due to a data structure in particular two objects:

"imports": {"data": ["data"]}
"sections": [ [ ] ]

Any help here would be much appreciated!

This is an example of JSON I'd like to store:

{
  "vhash": "04505666234sfdfs2nz25z17z",
  "submission_names": [
    "a0b0fe57a5c6ff0f3359d8d21519f136615a7843"
  ],
  "scan_date": "2017-08-23 19:13:05",
  "first_seen": "2017-08-23 19:13:05",
  "total": 65,
  "additional_info": {
    "magic": "PE32 executable for MS Windows (console) Intel 80386 32-bit",
    "sigcheck": {
      "link date": "5:12 AM 8/18/2017"
    },
    "exiftool": {
      "MIMEType": "application/octet-stream",
      "Subsystem": "Windows command line",
      "MachineType": "Intel 386 or later, and compatibles",
      "TimeStamp": "2017:08:18 05:12:57+01:00",
      "FileType": "Win32 EXE",
      "PEType": "PE32",
      "CodeSize": "12288",
      "LinkerVersion": "8.0",
      "FileTypeExtension": "exe",
      "InitializedDataSize": "0",
      "SubsystemVersion": "5.0",
      "EntryPoint": "0x1840",
      "OSVersion": "4.0",
      "ImageVersion": "0.0",
      "UninitializedDataSize": "0"
    },
    "trid": "Win32 Dynamic Link Library (generic) (43.5%)\nWin32 Executable (generic) (29.8%)\nGeneric Win/DOS Executable (13.2%)\nDOS Executable Generic (13.2%)",
    "pe-imphash": "f77945ec4c575514afd3ce14a41d99e0",
    "pe-timestamp": 1503029577,
    "imports": {
      "KERNEL32.dll": [
        "FreeLibrary",
        "GetLastError",
        "RaiseException",
        "GetModuleFileNameA",
        "CreateThread",
        "GetProcAddress",
        "LocalAlloc",
        "LocalFree",
        "InterlockedExchange",
        "GetNumberOfConsoleInputEvents",
        "ExitProcess",
        "SetFileApisToANSI",
        "GetOEMCP",
        "GetCurrentThreadId",
        "LoadLibraryA",
        "SetConsoleOutputCP",
        "GetModuleHandleW",
        "GetBinaryTypeA"
      ],
      "WS2_32.dll": [
        "send"
      ],
      "USER32.dll": [
        "GetAsyncKeyState",
        "SetProcessDefaultLayout"
      ]
    },
    "pe-entry-point": 6208,
    "sections": [
      [
        ".text",
        4096,
        3928,
        4096,
        "6.19",
        "b99ec9dd44c6ad6a9647424e0cc36914"
      ],
      [
        ".code",
        8192,
        6024,
        8192,
        "4.93",
        "84992d7c1bef4a95c3dad93946a8e8c8"
      ],
      [
        ".rdata",
        16384,
        2528,
        4096,
        "1.84",
        "3ec7cc43afd3f444ac97ffdca3f19ec1"
      ],
      [
        ".data",
        20480,
        423344,
        417792,
        "7.99",
        "38d3a7dbccb081e7dadcd116102f310c"
      ],
      [
        ".reloc",
        446464,
        1480,
        4096,
        "1.10",
        "56a544f15432e1b792cf42f425e166bf"
      ]
    ],
    "pe-machine-type": 332
  },
  "size": 442368,
  "scan_id": "c06e7ad4ae7749678c213ceb734cb0a64f2d47e464198351c76ceca3363522b6-1503515585",
  "times_submitted": 1,
  "harmless_votes": 0,
  "verbose_msg": "Scan finished, information embedded",
  "sha256": "c06e7ad4ae7749678c213ceb734cb0a64f2d47e464198351c76ceca3363522b6",
  "type": "Win32 EXE",
  "scans": {
    "Bkav": {
      "detected": true,
      "version": "1.3.0.9282",
      "result": "HW32.Packed.F89F",
      "update": "20170823"
    }
  },
  "tags": [
    "peexe"
  ],
  "authentihash": "1492aee71ea44f0969f6ef91b4c854b692630d15735a71b5b3206e1b87890d1c",
  "unique_sources": 1,
  "positives": 30,
  "ssdeep": "12288:ZA2Gi/n0uNIj5icepynKmUuj2cq6kfRTiA:ZA2Gisz5iHZ9nXJT",
  "md5": "0067b99af76ce96087ef17d73e773f5b",
  "permalink": "https://www.virustotal.com/file/c06e7ad4ae7749678c213ceb734cb0a64f2d47e464198351c76ceca3363522b6/analysis/1503515585/",
  "sha1": "a0b0fe57a5c6ff0f3359d8d21519f136615a7843",
  "resource": "0067b99af76ce96087ef17d73e773f5b",
  "response_code": 1,
  "community_reputation": 0,
  "malicious_votes": 0,
  "ITW_urls": [
    
  ],
  "last_seen": "2017-08-23 19:13:05"
}

magnusbaeck · August 30, 2017, 7:52pm

And in what way is it failing?

kolmai · August 30, 2017, 8:18pm

Thanks for the fast response!
I think my data is able to reach ES however that's not the case with Kibana.
For example, when I try to omit the aforementioned elements (“imports”: {“data”: [“data”]}
& “sections”: [ [ ] ]) my data is able to reach kibana successfully and I can create the index for it.

magnusbaeck · August 30, 2017, 8:33pm

I don't see why Kibana would have any issues with such a document. Have you verified that the document is stored in ES? Use the ES APIs, not Kibana.

kolmai · August 30, 2017, 8:51pm

I just used the API to find my document and it seems that i'm getting 0 hits.
So, I guess that sort of JSONs aren't getting into ES in the first place.
What would you recommend to do in this case?

magnusbaeck · August 30, 2017, 8:57pm

Is Logstash getting the event in the first place? Replace the elasticsearch output with a stdout { codec => rubydebug } output to find out. Have you looked in the Logstash log for clues?

kolmai · August 30, 2017, 9:13pm

I did it. The entire document was printed beautifully on the screen, no errors, then when I query ES I get 0 hits.
Oh here is the error from the logs:

:exception=>#<LogStash::Json::ParserError: Unexpected character ('K' (code 75)): was expecting comma to separate OBJECT entries

kolmai · August 31, 2017, 7:59pm

Hello.
I was looking carefully through the logs in order to understand what happened. It seems that I provided irrelevant information to our conversation on my previous reply. Actually, there are no exceptions in logstash logs at all, while the problem, I think I'm facing, is with ES. So this time I looked at the logs in the right place, and found that it may be due to a content lenght limitation, here is the relevant log:

[2017-08-31T22:07:55,794][DEBUG][o.e.a.b.TransportShardBulkAction] [_pSNtfS] [logstash-2017.08.31][2] failed to execute bulk item (index) BulkShardRequest [[logstash-2017.08.31][2]] containing [index {[logstash-2017.08.31][Win32 EXE][AV45sVHbEo4x7KEmXLzZ], source[n/a, actual length: [19.4kb], max length: 2kb]}]
java.lang.IllegalArgumentException: mapper [additional_info.sections] of different type, current_type [long], merged_type [text]
	at org.elasticsearch.index.mapper.FieldMapper.doMerge(FieldMapper.java:347) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.NumberFieldMapper.doMerge(NumberFieldMapper.java:1097) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:333) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:49) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentParser.createDynamicUpdate(DocumentParser.java:210) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:78) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:277) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:529) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:506) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.prepareIndexOperationOnPrimary(TransportShardBulkAction.java:450) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:458) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:143) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:113) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:69) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:939) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:908) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:322) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:264) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:888) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:885) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:147) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock(IndexShard.java:1657) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:897) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.access$400(TransportReplicationAction.java:93) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:281) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:260) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:252) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:644) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.1.jar:5.5.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

I also found a related post which was never resolved:
https://discuss.elastic.co/t/logstash-not-importing-data-into-elasticsearch/88197
Since it was asked on the other thread, here is a print of my _cat/indices

yellow open logstash-2017.08.31 6vam6SxUQJyc7aZEv_zNKA 5 1 0 0  955b  955b
yellow open .kibana             7yQFy8piRLm84MZb4rwYiw 1 1 2 1 9.3kb 9.3kb

What would be the right way to deal with it?

magnusbaeck · September 1, 2017, 5:11am

You need to decide the desired type of the [additional_info][sections] and make sure that the mappings and the values you send are consistent. Right now the field has been mapped as long but you're sending a string value.

kolmai · September 8, 2017, 9:15pm

Thanks, I can see my problem now.
Will it be the correct resolution if I send a mapping update with the following request?

PUT my_reports
{
    "mappings": {
        "additional_info": {
            "properties": {
                "sections" : [{
                  "properties": [
                    {"type" : "text"}
                      ]}
                ]
            }
        }
    }
}

magnusbaeck · September 11, 2017, 6:08am

Mappings can't be updated so you have to reindex or create a new index. Otherwise you're probably fine except that there shouldn't be any arrays in the mapping definition. See the docs for examples.

kolmai · September 24, 2017, 12:52pm

I have managed to solve my issue by using "filter" >> "mutate" in the logstash conf:

filter {
  mutate {
  convert => {
            "[additional_info][sections]" => "string"
             
             }
         }
}

system · October 22, 2017, 12:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parse nested JSON data into logstash Logstash	4	1613	September 30, 2021
Nested Json logs - how to filter correctly Logstash	12	4631	July 6, 2017
Parsing nested json logs iin logstash Logstash	7	1060	July 7, 2017
Parse nested json - Logstash v16.2 on Windows 10 Logstash	2	194	April 29, 2022
Logstash 1.5.0 json filter output to elasticsearch fails on nested json Logstash	2	998	July 6, 2017

Help parsing nested JSON from VirusTotal

Related topics