Help parsing nested JSON from VirusTotal

Hello everyone,
I'm trying to get started with Elastic stack and my first attempt is to index a complex JSON into ES using the following config has failed.

input {
  beats {
    port => 5044
    tags => "beats"
  }
}

filter {
  json {
    source => "message"
  }
}

output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

Perhaps there is some extra tailoring needed in order to make it work?
I've noticed that my problem is possibly due to a data structure in particular two objects:

  • "imports": {"data": ["data"]}
  • "sections": [ [ ] ]

Any help here would be much appreciated!

This is an example of JSON I'd like to store:

{
  "vhash": "04505666234sfdfs2nz25z17z",
  "submission_names": [
    "a0b0fe57a5c6ff0f3359d8d21519f136615a7843"
  ],
  "scan_date": "2017-08-23 19:13:05",
  "first_seen": "2017-08-23 19:13:05",
  "total": 65,
  "additional_info": {
    "magic": "PE32 executable for MS Windows (console) Intel 80386 32-bit",
    "sigcheck": {
      "link date": "5:12 AM 8/18/2017"
    },
    "exiftool": {
      "MIMEType": "application/octet-stream",
      "Subsystem": "Windows command line",
      "MachineType": "Intel 386 or later, and compatibles",
      "TimeStamp": "2017:08:18 05:12:57+01:00",
      "FileType": "Win32 EXE",
      "PEType": "PE32",
      "CodeSize": "12288",
      "LinkerVersion": "8.0",
      "FileTypeExtension": "exe",
      "InitializedDataSize": "0",
      "SubsystemVersion": "5.0",
      "EntryPoint": "0x1840",
      "OSVersion": "4.0",
      "ImageVersion": "0.0",
      "UninitializedDataSize": "0"
    },
    "trid": "Win32 Dynamic Link Library (generic) (43.5%)\nWin32 Executable (generic) (29.8%)\nGeneric Win/DOS Executable (13.2%)\nDOS Executable Generic (13.2%)",
    "pe-imphash": "f77945ec4c575514afd3ce14a41d99e0",
    "pe-timestamp": 1503029577,
    "imports": {
      "KERNEL32.dll": [
        "FreeLibrary",
        "GetLastError",
        "RaiseException",
        "GetModuleFileNameA",
        "CreateThread",
        "GetProcAddress",
        "LocalAlloc",
        "LocalFree",
        "InterlockedExchange",
        "GetNumberOfConsoleInputEvents",
        "ExitProcess",
        "SetFileApisToANSI",
        "GetOEMCP",
        "GetCurrentThreadId",
        "LoadLibraryA",
        "SetConsoleOutputCP",
        "GetModuleHandleW",
        "GetBinaryTypeA"
      ],
      "WS2_32.dll": [
        "send"
      ],
      "USER32.dll": [
        "GetAsyncKeyState",
        "SetProcessDefaultLayout"
      ]
    },
    "pe-entry-point": 6208,
    "sections": [
      [
        ".text",
        4096,
        3928,
        4096,
        "6.19",
        "b99ec9dd44c6ad6a9647424e0cc36914"
      ],
      [
        ".code",
        8192,
        6024,
        8192,
        "4.93",
        "84992d7c1bef4a95c3dad93946a8e8c8"
      ],
      [
        ".rdata",
        16384,
        2528,
        4096,
        "1.84",
        "3ec7cc43afd3f444ac97ffdca3f19ec1"
      ],
      [
        ".data",
        20480,
        423344,
        417792,
        "7.99",
        "38d3a7dbccb081e7dadcd116102f310c"
      ],
      [
        ".reloc",
        446464,
        1480,
        4096,
        "1.10",
        "56a544f15432e1b792cf42f425e166bf"
      ]
    ],
    "pe-machine-type": 332
  },
  "size": 442368,
  "scan_id": "c06e7ad4ae7749678c213ceb734cb0a64f2d47e464198351c76ceca3363522b6-1503515585",
  "times_submitted": 1,
  "harmless_votes": 0,
  "verbose_msg": "Scan finished, information embedded",
  "sha256": "c06e7ad4ae7749678c213ceb734cb0a64f2d47e464198351c76ceca3363522b6",
  "type": "Win32 EXE",
  "scans": {
    "Bkav": {
      "detected": true,
      "version": "1.3.0.9282",
      "result": "HW32.Packed.F89F",
      "update": "20170823"
    }
  },
  "tags": [
    "peexe"
  ],
  "authentihash": "1492aee71ea44f0969f6ef91b4c854b692630d15735a71b5b3206e1b87890d1c",
  "unique_sources": 1,
  "positives": 30,
  "ssdeep": "12288:ZA2Gi/n0uNIj5icepynKmUuj2cq6kfRTiA:ZA2Gisz5iHZ9nXJT",
  "md5": "0067b99af76ce96087ef17d73e773f5b",
  "permalink": "https://www.virustotal.com/file/c06e7ad4ae7749678c213ceb734cb0a64f2d47e464198351c76ceca3363522b6/analysis/1503515585/",
  "sha1": "a0b0fe57a5c6ff0f3359d8d21519f136615a7843",
  "resource": "0067b99af76ce96087ef17d73e773f5b",
  "response_code": 1,
  "community_reputation": 0,
  "malicious_votes": 0,
  "ITW_urls": [
    
  ],
  "last_seen": "2017-08-23 19:13:05"
}

And in what way is it failing?

Thanks for the fast response!
I think my data is able to reach ES however that's not the case with Kibana.
For example, when I try to omit the aforementioned elements (“imports”: {“data”: [“data”]}
& “sections”: [ [ ] ]) my data is able to reach kibana successfully and I can create the index for it.

I don't see why Kibana would have any issues with such a document. Have you verified that the document is stored in ES? Use the ES APIs, not Kibana.

I just used the API to find my document and it seems that i'm getting 0 hits.
So, I guess that sort of JSONs aren't getting into ES in the first place.
What would you recommend to do in this case?

Is Logstash getting the event in the first place? Replace the elasticsearch output with a stdout { codec => rubydebug } output to find out. Have you looked in the Logstash log for clues?

I did it. The entire document was printed beautifully on the screen, no errors, then when I query ES I get 0 hits.
Oh here is the error from the logs:

:exception=>#<LogStash::Json::ParserError: Unexpected character ('K' (code 75)): was expecting comma to separate OBJECT entries

Hello.
I was looking carefully through the logs in order to understand what happened. It seems that I provided irrelevant information to our conversation on my previous reply. Actually, there are no exceptions in logstash logs at all, while the problem, I think I'm facing, is with ES. So this time I looked at the logs in the right place, and found that it may be due to a content lenght limitation, here is the relevant log:

[2017-08-31T22:07:55,794][DEBUG][o.e.a.b.TransportShardBulkAction] [_pSNtfS] [logstash-2017.08.31][2] failed to execute bulk item (index) BulkShardRequest [[logstash-2017.08.31][2]] containing [index {[logstash-2017.08.31][Win32 EXE][AV45sVHbEo4x7KEmXLzZ], source[n/a, actual length: [19.4kb], max length: 2kb]}]
java.lang.IllegalArgumentException: mapper [additional_info.sections] of different type, current_type [long], merged_type [text]
	at org.elasticsearch.index.mapper.FieldMapper.doMerge(FieldMapper.java:347) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.NumberFieldMapper.doMerge(NumberFieldMapper.java:1097) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:333) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:49) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentParser.createDynamicUpdate(DocumentParser.java:210) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:78) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:277) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:529) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:506) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.prepareIndexOperationOnPrimary(TransportShardBulkAction.java:450) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:458) ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:143) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:113) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:69) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:939) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:908) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:113) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:322) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:264) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:888) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:885) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShardOperationsLock.acquire(IndexShardOperationsLock.java:147) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock(IndexShard.java:1657) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:897) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.access$400(TransportReplicationAction.java:93) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:281) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:260) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:252) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:644) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.1.jar:5.5.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

I also found a related post which was never resolved:
https://discuss.elastic.co/t/logstash-not-importing-data-into-elasticsearch/88197
Since it was asked on the other thread, here is a print of my _cat/indices

yellow open logstash-2017.08.31 6vam6SxUQJyc7aZEv_zNKA 5 1 0 0  955b  955b
yellow open .kibana             7yQFy8piRLm84MZb4rwYiw 1 1 2 1 9.3kb 9.3kb

What would be the right way to deal with it?

You need to decide the desired type of the [additional_info][sections] and make sure that the mappings and the values you send are consistent. Right now the field has been mapped as long but you're sending a string value.

Thanks, I can see my problem now.
Will it be the correct resolution if I send a mapping update with the following request?

PUT my_reports
{
    "mappings": {
        "additional_info": {
            "properties": {
                "sections" : [{
                  "properties": [
                    {"type" : "text"}
                      ]}
                ]
            }
        }
    }
}

Mappings can't be updated so you have to reindex or create a new index. Otherwise you're probably fine except that there shouldn't be any arrays in the mapping definition. See the docs for examples.

I have managed to solve my issue by using "filter" >> "mutate" in the logstash conf:

filter {
  mutate {
  convert => {
            "[additional_info][sections]" => "string"
             
             }
         }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.