Elasticsearch Bulk API problems

Hello,
I'm in the possession of some JSON files that i would like to send to Elasticsearch so that they can be viewed in Kibana.

To do this I'm using a cUrl commando as found here in the Bulk API:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/test/_bulk\\n --data-binary "@test.json"

But unfortunately I keep getting this error:

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"illegal_argument_exception","reason":"Malformed content, found extra data after parsing: START_OBJECT"}},"status":400} 

When searching on the error it seems to have something to do with that my JSON is badly configured. Which seems odd as it comes from using tshark to convert a pcap file to json as per:

Where the commando I'm using is:

tshark -r capture.pcap -T ek > packets.json 

My tshark is of the version:

tshark -version
TShark (Wireshark) 3.2.1 (v3.2.1-0-gbf38a67724d0)

Here is an excerpt from one of the JSON files that get generated from tshark:

{"index":{"_index":"packets-2020-04-07","_type":"doc"}}
{"timestamp":"1586268390258","layers":{"frame":{"frame_frame_encap_type":"1","frame_frame_time":"2020-04-07T14:06:30.258778000Z","frame_frame_offset_shift":"0.000000000","frame_frame_time_epoch":"1586268390.258778000","frame_frame_time_delta":"0.000000000","frame_frame_time_delta_displayed":"0.000000000","frame_frame_time_relative":"0.000000000","frame_frame_number":"1","frame_frame_len":"60","frame_frame_cap_len":"60","frame_frame_marked":false,"frame_frame_ignored":false,"frame_frame_protocols":"eth:ethertype:ip:tcp"},"eth":{"eth_eth_dst":"50:7b:9d:63:55:a9","eth_eth_dst_resolved":"LCFCHeFe_63:55:a9","eth_eth_dst_oui":"5274525","eth_eth_dst_oui_resolved":"LCFC(HeFei) Electronics Technology co., ltd","eth_eth_addr":"50:7b:9d:63:55:a9","eth_eth_addr_resolved":"LCFCHeFe_63:55:a9","eth_eth_addr_oui":"5274525","eth_eth_addr_oui_resolved":"LCFC(HeFei) Electronics Technology co., ltd","eth_eth_dst_lg":false,"eth_eth_lg":false,"eth_eth_dst_ig":false,"eth_eth_ig":false,"eth_eth_src":"a0:8e:78:68:f2:91","eth_eth_src_resolved":"Sagemcom_68:f2:91","eth_eth_src_oui":"10522232","eth_eth_src_oui_resolved":"Sagemcom Broadband SAS","eth_eth_addr":"a0:8e:78:68:f2:91","eth_eth_addr_resolved":"Sagemcom_68:f2:91","eth_eth_addr_oui":"10522232","eth_eth_addr_oui_resolved":"Sagemcom Broadband SAS","eth_eth_src_lg":false,"eth_eth_lg":false,"eth_eth_src_ig":false,"eth_eth_ig":false,"eth_eth_type":"0x00000800","eth_eth_padding":"00:00:00:00:00:00"},"ip":{"ip_ip_version":"4","ip_ip_hdr_len":"20","ip_ip_dsfield":"0x00000000","ip_ip_dsfield_dscp":"0","ip_ip_dsfield_ecn":"0","ip_ip_len":"40","ip_ip_id":"0x00003895","ip_ip_flags":"0x00000000","ip_ip_flags_rb":false,"ip_ip_flags_df":false,"ip_ip_flags_mf":false,"ip_ip_frag_offset":"0","ip_ip_ttl":"123","ip_ip_proto":"6","ip_ip_checksum":"0x00000580","ip_ip_checksum_status":"2","ip_ip_src":"83.255.237.16","ip_ip_addr":["83.255.237.16","192.168.0.3"],"ip_ip_src_host":"83.255.237.16","ip_ip_host":["83.255.237.16","192.168.0.3"],"ip_ip_dst":"192.168.0.3","ip_ip_dst_host":"192.168.0.3"},"tcp":{"tcp_tcp_srcport":"443","tcp_tcp_dstport":"52030","tcp_tcp_port":["443","52030"],"tcp_tcp_stream":"0","tcp_tcp_len":"0","tcp_tcp_seq":"1","tcp_tcp_seq_raw":"3775193797","tcp_tcp_nxtseq":"2","tcp_tcp_ack":"1","tcp_tcp_ack_raw":"3625584346","tcp_tcp_hdr_len":"20","tcp_tcp_flags":"0x00000011","tcp_tcp_flags_res":false,"tcp_tcp_flags_ns":false,"tcp_tcp_flags_cwr":false,"tcp_tcp_flags_ecn":false,"tcp_tcp_flags_urg":false,"tcp_tcp_flags_ack":true,"tcp_tcp_flags_push":false,"tcp_tcp_flags_reset":false,"tcp_tcp_flags_syn":false,"tcp_tcp_flags_fin":true,"_ws_expert":{"tcp_tcp_connection_fin":null,"_ws_expert__ws_expert_message":"Connection finish (FIN)","_ws_expert__ws_expert_severity":"2097152","_ws_expert__ws_expert_group":"33554432"},"tcp_tcp_flags_str":"·······A···F","tcp_tcp_window_size_value":"306","tcp_tcp_window_size":"306","tcp_tcp_window_size_scalefactor":"-1","tcp_tcp_checksum":"0x00003d2d","tcp_tcp_checksum_status":"2","tcp_tcp_urgent_pointer":"0","text":"Timestamps","tcp_tcp_time_relative":"0.000000000","tcp_tcp_time_delta":"0.000000000"}}}
{"index":{"_index":"packets-2020-04-07","_type":"doc"}}
{"timestamp":"1586268390258","layers":{"frame":{"frame_frame_encap_type":"1","frame_frame_time":"2020-04-07T14:06:30.258831000Z","frame_frame_offset_shift":"0.000000000","frame_frame_time_epoch":"1586268390.258831000","frame_frame_time_delta":"0.000053000","frame_frame_time_delta_displayed":"0.000053000","frame_frame_time_relative":"0.000053000","frame_frame_number":"2","frame_frame_len":"54","frame_frame_cap_len":"54","frame_frame_marked":false,"frame_frame_ignored":false,"frame_frame_protocols":"eth:ethertype:ip:tcp"},"eth":{"eth_eth_dst":"a0:8e:78:68:f2:91","eth_eth_dst_resolved":"Sagemcom_68:f2:91","eth_eth_dst_oui":"10522232","eth_eth_dst_oui_resolved":"Sagemcom Broadband SAS","eth_eth_addr":"a0:8e:78:68:f2:91","eth_eth_addr_resolved":"Sagemcom_68:f2:91","eth_eth_addr_oui":"10522232","eth_eth_addr_oui_resolved":"Sagemcom Broadband SAS","eth_eth_dst_lg":false,"eth_eth_lg":false,"eth_eth_dst_ig":false,"eth_eth_ig":false,"eth_eth_src":"50:7b:9d:63:55:a9","eth_eth_src_resolved":"LCFCHeFe_63:55:a9","eth_eth_src_oui":"5274525","eth_eth_src_oui_resolved":"LCFC(HeFei) Electronics Technology co., ltd","eth_eth_addr":"50:7b:9d:63:55:a9","eth_eth_addr_resolved":"LCFCHeFe_63:55:a9","eth_eth_addr_oui":"5274525","eth_eth_addr_oui_resolved":"LCFC(HeFei) Electronics Technology co., ltd","eth_eth_src_lg":false,"eth_eth_lg":false,"eth_eth_src_ig":false,"eth_eth_ig":false,"eth_eth_type":"0x00000800"},"ip":{"ip_ip_version":"4","ip_ip_hdr_len":"20","ip_ip_dsfield":"0x00000000","ip_ip_dsfield_dscp":"0","ip_ip_dsfield_ecn":"0","ip_ip_len":"40","ip_ip_id":"0x0000eca7","ip_ip_flags":"0x00004000","ip_ip_flags_rb":false,"ip_ip_flags_df":true,"ip_ip_flags_mf":false,"ip_ip_frag_offset":"0","ip_ip_ttl":"128","ip_ip_proto":"6","ip_ip_checksum":"0x00000000","ip_ip_checksum_status":"2","ip_ip_src":"192.168.0.3","ip_ip_addr":["192.168.0.3","83.255.237.16"],"ip_ip_src_host":"192.168.0.3","ip_ip_host":["192.168.0.3","83.255.237.16"],"ip_ip_dst":"83.255.237.16","ip_ip_dst_host":"83.255.237.16"},"tcp":{"tcp_tcp_srcport":"52030","tcp_tcp_dstport":"443","tcp_tcp_port":["52030","443"],"tcp_tcp_stream":"0","tcp_tcp_len":"0","tcp_tcp_seq":"1","tcp_tcp_seq_raw":"3625584346","tcp_tcp_nxtseq":"1","tcp_tcp_ack":"2","tcp_tcp_ack_raw":"3775193798","tcp_tcp_hdr_len":"20","tcp_tcp_flags":"0x00000010","tcp_tcp_flags_res":false,"tcp_tcp_flags_ns":false,"tcp_tcp_flags_cwr":false,"tcp_tcp_flags_ecn":false,"tcp_tcp_flags_urg":false,"tcp_tcp_flags_ack":true,"tcp_tcp_flags_push":false,"tcp_tcp_flags_reset":false,"tcp_tcp_flags_syn":false,"tcp_tcp_flags_fin":false,"tcp_tcp_flags_str":"·······A····","tcp_tcp_window_size_value":"6178","tcp_tcp_window_size":"6178","tcp_tcp_window_size_scalefactor":"-1","tcp_tcp_checksum":"0x000001d6","tcp_tcp_checksum_status":"2","tcp_tcp_urgent_pointer":"0","tcp_tcp_analysis":null,"tcp_tcp_analysis_acks_frame":"1","tcp_tcp_analysis_ack_rtt":"0.000053000","text":"Timestamps","tcp_tcp_time_relative":"0.000053000","tcp_tcp_time_delta":"0.000053000"}}}

I have also tried to convert different pcap files using the same method and still get the same result.
At this point I do not know if the fault lies in the pcap file, the json file or the curl command.

The error indicates your JSON file is malformed indeed. Can you share the whole file?

Is this a typo? The URL should be localhost:9200/test/_bulk, no trailing \\n.

Of course,
How would you like me to go about sharing the whole file as it is very large.

I tried inserting the excerpt you shared and got a different error about malformed JSON:

$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/test/_bulk --data-binary "@test.json" | jq .
{
  "took": 13,
  "errors": true,
  "items": [
    {
      "index": {
        "_index": "packets-2020-04-07",
        "_type": "doc",
        "_id": "xH7PWHEBkp5pgMeq5I9D",
        "status": 400,
        "error": {
          "type": "mapper_parsing_exception",
          "reason": "failed to parse",
          "caused_by": {
            "type": "json_parse_exception",
            "reason": "Duplicate field 'eth_eth_addr'\n at [Source: org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper@19a15b5; line: 1, column: 1150]"
          }
        }
      }
    },
    {
      "index": {
        "_index": "packets-2020-04-07",
        "_type": "doc",
        "_id": "xX7PWHEBkp5pgMeq5I9D",
        "status": 400,
        "error": {
          "type": "mapper_parsing_exception",
          "reason": "failed to parse",
          "caused_by": {
            "type": "json_parse_exception",
            "reason": "Duplicate field 'eth_eth_addr'\n at [Source: org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper@40a71f8e; line: 1, column: 1130]"
          }
        }
      }
    }
  ]
}

Its not a typo its just that when I ran the command.

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/test/_bulk --data-binary "@test.json"

I got an error saying that the _bulk tag needs to be followed by an \n

But now I when I want it to happen it no longer does.

So now my command look like this:

 curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/test/_bulk --data-binary "@test.json"

And I get this error:

{"took":3,"errors":true,"items":[{"index":{"_index":"packets-2020-04-07","_type":"doc","_id":"34rSWHEBv6GDe8EVwAl8","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'eth_eth_addr'\n at [Source: org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper@31b5ffb4; line: 1, column: 1150]"}}}},{"index":{"_index":"packets-2020-04-07","_type":"doc","_id":"4IrSWHEBv6GDe8EVwAl8","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'eth_eth_addr'\n at [Source: org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper@c6bdc44; line: 1, column: 1130]"}}}}]}

I truly have no Idea what happened considering that as far as I know I duplicated all of the step.

When I use your command:

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/test/_bulk --data-binary "@test.json" | jq .

I get this as the response:

{
  "took": 3,
  "errors": true,
  "items": [
    {
      "index": {
        "_index": "packets-2020-04-07",
        "_type": "doc",
        "_id": "4YrWWHEBv6GDe8EVEwkp",
        "status": 400,
        "error": {
          "type": "mapper_parsing_exception",
          "reason": "failed to parse",
          "caused_by": {
            "type": "json_parse_exception",
            "reason": "Duplicate field 'eth_eth_addr'\n at [Source: org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper@587b250c; line: 1, column: 1150]"
          }
        }
      }
    },
    {
      "index": {
        "_index": "packets-2020-04-07",
        "_type": "doc",
        "_id": "4orWWHEBv6GDe8EVEwkp",
        "status": 400,
        "error": {
          "type": "mapper_parsing_exception",
          "reason": "failed to parse",
          "caused_by": {
            "type": "json_parse_exception",
            "reason": "Duplicate field 'eth_eth_addr'\n at [Source: org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper@4f25e307; line: 1, column: 1130]"
          }
        }
      }
    }
  ]
}

This error message looks correct to me: the JSON does indeed contain a duplicate field called eth_eth_addr.

Ok, thats what I thought caused the problem as well, thank you very much for the assistance.

Do you have any idea what causes this, as this is just normal ethernet traffic I collected through wireshark and then used tshark to convert to JSON.

I'm asking cause, I want/need to handle of multiple different large JSON files, and who knows which events has duplicate fields and which ones does not.
Thus making it very bad from a future proofing perspective to try and write a python script that goes in and handle each identified field that causes the problem as new ones might appear further down the line, and writing a script that goes through each and every field to look if there are any duplicate fields would be very slow.

I would also like to be able to handle real-time traffic with as little of an delay as possible, which also makes the python script idea a poor choice of solution.

Or will I have to go and create my own tshark conversion template to make sure these duplicate fields will be handled during the conversion process.

I think you'll need to take this up with the Wireshark people. tshark definitely used to have problems with duplicate fields that were apparently fixed, but assuming you're using the latest version it looks like another case has crept in.

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.