Help parsing nested JSON from VirusTotal

(Kolmai) #1

Hello everyone,
I'm trying to get started with Elastic stack and my first attempt is to index a complex JSON into ES using the following config has failed.

input {
  beats {
    port => 5044
    tags => "beats"

filter {
  json {
    source => "message"

output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }

Perhaps there is some extra tailoring needed in order to make it work?
I've noticed that my problem is possibly due to a data structure in particular two objects:

  • "imports": {"data": ["data"]}
  • "sections": [ [ ] ]

Any help here would be much appreciated!

This is an example of JSON I'd like to store:

  "vhash": "04505666234sfdfs2nz25z17z",
  "submission_names": [
  "scan_date": "2017-08-23 19:13:05",
  "first_seen": "2017-08-23 19:13:05",
  "total": 65,
  "additional_info": {
    "magic": "PE32 executable for MS Windows (console) Intel 80386 32-bit",
    "sigcheck": {
      "link date": "5:12 AM 8/18/2017"
    "exiftool": {
      "MIMEType": "application/octet-stream",
      "Subsystem": "Windows command line",
      "MachineType": "Intel 386 or later, and compatibles",
      "TimeStamp": "2017:08:18 05:12:57+01:00",
      "FileType": "Win32 EXE",
      "PEType": "PE32",
      "CodeSize": "12288",
      "LinkerVersion": "8.0",
      "FileTypeExtension": "exe",
      "InitializedDataSize": "0",
      "SubsystemVersion": "5.0",
      "EntryPoint": "0x1840",
      "OSVersion": "4.0",
      "ImageVersion": "0.0",
      "UninitializedDataSize": "0"
    "trid": "Win32 Dynamic Link Library (generic) (43.5%)\nWin32 Executable (generic) (29.8%)\nGeneric Win/DOS Executable (13.2%)\nDOS Executable Generic (13.2%)",
    "pe-imphash": "f77945ec4c575514afd3ce14a41d99e0",
    "pe-timestamp": 1503029577,
    "imports": {
      "KERNEL32.dll": [
      "WS2_32.dll": [
      "USER32.dll": [
    "pe-entry-point": 6208,
    "sections": [
    "pe-machine-type": 332
  "size": 442368,
  "scan_id": "c06e7ad4ae7749678c213ceb734cb0a64f2d47e464198351c76ceca3363522b6-1503515585",
  "times_submitted": 1,
  "harmless_votes": 0,
  "verbose_msg": "Scan finished, information embedded",
  "sha256": "c06e7ad4ae7749678c213ceb734cb0a64f2d47e464198351c76ceca3363522b6",
  "type": "Win32 EXE",
  "scans": {
    "Bkav": {
      "detected": true,
      "version": "",
      "result": "HW32.Packed.F89F",
      "update": "20170823"
  "tags": [
  "authentihash": "1492aee71ea44f0969f6ef91b4c854b692630d15735a71b5b3206e1b87890d1c",
  "unique_sources": 1,
  "positives": 30,
  "ssdeep": "12288:ZA2Gi/n0uNIj5icepynKmUuj2cq6kfRTiA:ZA2Gisz5iHZ9nXJT",
  "md5": "0067b99af76ce96087ef17d73e773f5b",
  "permalink": "",
  "sha1": "a0b0fe57a5c6ff0f3359d8d21519f136615a7843",
  "resource": "0067b99af76ce96087ef17d73e773f5b",
  "response_code": 1,
  "community_reputation": 0,
  "malicious_votes": 0,
  "ITW_urls": [
  "last_seen": "2017-08-23 19:13:05"

(Magnus Bäck) #2

And in what way is it failing?

(Kolmai) #3

Thanks for the fast response!
I think my data is able to reach ES however that's not the case with Kibana.
For example, when I try to omit the aforementioned elements (“imports”: {“data”: [“data”]}
& “sections”: [ [ ] ]) my data is able to reach kibana successfully and I can create the index for it.

(Magnus Bäck) #4

I don't see why Kibana would have any issues with such a document. Have you verified that the document is stored in ES? Use the ES APIs, not Kibana.

(Kolmai) #5

I just used the API to find my document and it seems that i'm getting 0 hits.
So, I guess that sort of JSONs aren't getting into ES in the first place.
What would you recommend to do in this case?

(Magnus Bäck) #6

Is Logstash getting the event in the first place? Replace the elasticsearch output with a stdout { codec => rubydebug } output to find out. Have you looked in the Logstash log for clues?

(Kolmai) #7

I did it. The entire document was printed beautifully on the screen, no errors, then when I query ES I get 0 hits.
Oh here is the error from the logs:

:exception=>#<LogStash::Json::ParserError: Unexpected character ('K' (code 75)): was expecting comma to separate OBJECT entries

(Kolmai) #8

I was looking carefully through the logs in order to understand what happened. It seems that I provided irrelevant information to our conversation on my previous reply. Actually, there are no exceptions in logstash logs at all, while the problem, I think I'm facing, is with ES. So this time I looked at the logs in the right place, and found that it may be due to a content lenght limitation, here is the relevant log:

[2017-08-31T22:07:55,794][DEBUG][o.e.a.b.TransportShardBulkAction] [_pSNtfS] [logstash-2017.08.31][2] failed to execute bulk item (index) BulkShardRequest [[logstash-2017.08.31][2]] containing [index {[logstash-2017.08.31][Win32 EXE][AV45sVHbEo4x7KEmXLzZ], source[n/a, actual length: [19.4kb], max length: 2kb]}]
java.lang.IllegalArgumentException: mapper [additional_info.sections] of different type, current_type [long], merged_type [text]
	at org.elasticsearch.index.mapper.FieldMapper.doMerge( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.NumberFieldMapper.doMerge( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.FieldMapper.merge( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.FieldMapper.merge( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentParser.createDynamicUpdate( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentParser.parseDocument( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.mapper.DocumentMapper.parse( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndex( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.prepareIndexOperationOnPrimary( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary( ~[elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest( [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary( [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary( [elasticsearch-5.5.1.jar:5.5.1]
	at$PrimaryShardReference.perform( [elasticsearch-5.5.1.jar:5.5.1]
	at$PrimaryShardReference.perform( [elasticsearch-5.5.1.jar:5.5.1]
	at [elasticsearch-5.5.1.jar:5.5.1]
	at$AsyncPrimaryAction.onResponse( [elasticsearch-5.5.1.jar:5.5.1]
	at$AsyncPrimaryAction.onResponse( [elasticsearch-5.5.1.jar:5.5.1]
	at$1.onResponse( [elasticsearch-5.5.1.jar:5.5.1]
	at$1.onResponse( [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShardOperationsLock.acquire( [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationLock( [elasticsearch-5.5.1.jar:5.5.1]
	at [elasticsearch-5.5.1.jar:5.5.1]
	at$400( [elasticsearch-5.5.1.jar:5.5.1]
	at$AsyncPrimaryAction.doRun( [elasticsearch-5.5.1.jar:5.5.1]
	at [elasticsearch-5.5.1.jar:5.5.1]
	at$PrimaryOperationTransportHandler.messageReceived( [elasticsearch-5.5.1.jar:5.5.1]
	at$PrimaryOperationTransportHandler.messageReceived( [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived( [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.transport.TransportService$7.doRun( [elasticsearch-5.5.1.jar:5.5.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun( [elasticsearch-5.5.1.jar:5.5.1]
	at [elasticsearch-5.5.1.jar:5.5.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$ [?:1.8.0_131]
	at [?:1.8.0_131]

I also found a related post which was never resolved:
Since it was asked on the other thread, here is a print of my _cat/indices

yellow open logstash-2017.08.31 6vam6SxUQJyc7aZEv_zNKA 5 1 0 0  955b  955b
yellow open .kibana             7yQFy8piRLm84MZb4rwYiw 1 1 2 1 9.3kb 9.3kb

What would be the right way to deal with it?

(Magnus Bäck) #9

You need to decide the desired type of the [additional_info][sections] and make sure that the mappings and the values you send are consistent. Right now the field has been mapped as long but you're sending a string value.

(Kolmai) #10

Thanks, I can see my problem now.
Will it be the correct resolution if I send a mapping update with the following request?

PUT my_reports
    "mappings": {
        "additional_info": {
            "properties": {
                "sections" : [{
                  "properties": [
                    {"type" : "text"}

(Magnus Bäck) #11

Mappings can't be updated so you have to reindex or create a new index. Otherwise you're probably fine except that there shouldn't be any arrays in the mapping definition. See the docs for examples.

(Kolmai) #12

I have managed to solve my issue by using "filter" >> "mutate" in the logstash conf:

filter {
  mutate {
  convert => {
            "[additional_info][sections]" => "string"

(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.