Json import error on elasticsearch with curl command

Hi, I have a problem when load a json file on elasticsearch with curl command.
The json file is this:
https://drive.google.com/file/d/13nCXdIY1n096SSWcL36TEtqkGTVhn28o/view?usp=sharing

When I launch this command:
curl -H 'Content-Type:application/json' -XPOST "localhost:9200/pacchetti3/doc/_bulk?pretty" --data-binary @C:\Users\Thebe\Desktop\singolopacchetto.json

I received this error:
{
"error" : {
"root_cause" : [
{
"type" : "json_e_o_f_exception",
"reason" : "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3c861e6a; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3c861e6a; line: 2, column: 3]"
}
],
"type" : "json_e_o_f_exception",
"reason" : "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3c861e6a; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3c861e6a; line: 2, column: 3]"
},
"status" : 500
}

I checked if the json is well formatted on the site https://jsonformatter.curiousconcept.com/ and the answer is positive.

The version of elasticsearch and kibana I'm using is 5.6.9
What is the problem? And how can I solve?

As you are using the bulk API, have you formatted the file according to the requirements of this API?

Yes, I tried to format it according to the Bulk API (at least I think). The result is this:

{"index":{"_index":"pacchetti3"}}
{
  "_type": "pcap_file",
  "_score": null,
  "_source": {
    "layers": {
      "frame": {
        "frame.interface_id": "0",
        "frame.interface_id_tree": {
          "frame.interface_name": "any"
        },
        "frame.encap_type": "25",
             ....
             ....
        }
      }
    }
  }
}

This time the error is this:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Malformed action/metadata line [3], expected START_OBJECT but found [VALUE_STRING]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Malformed action/metadata line [3], expected START_OBJECT but found [VALUE_STRING]"
  },
  "status" : 400
}

What am I doing wrong?

Each header and document must be on a single line and the document should not contain _type, _score or _source fields. You may also run into problems as you have dots in your field names.

I followed the advice, the json file I formatted it as follows:
{"index":{"_index":"pacchetti3"}} {"layers":{"frame":{"frame.interface_id":"0","frame.interface_id_tree":{ ... }}}}

However the problem persists with a new error:
{
"error" : {
"root_cause" : [
{
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: no requests added;"
}
],
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: no requests added;"
},
"status" : 400
}

That does not seem to be the format specified in the documentation. The file should look something like this, with a newline after each line:

{ "index" : { "_index" : "pacchetti3", "_type" : "doc" } }
{ "field1" : "value1" }
{ "index" : { "_index" : "pacchetti3", "_type" : "doc" } }
{ "field1" : "value2" }

Thanks!
I did a test by formatting only a couple of lines of my json:
{"index":{"_index":"pacchetti4", "_type": "doc"}}
{"frame.interface_id": "0", "frame.interface_name": "any", "frame.encap_type": "25", "frame.time": "Apr 20, 2018 15:30:52.669797277 ora legale Europa occidentale", "frame.number": "1", "frame.len": "649", "frame.cap_len": "649", "frame.marked": "0", "frame.ignored": "0", "frame.protocols": "sll:ethertype:ip:tcp:http:json", "frame.coloring_rule.name": "HTTP", "frame.coloring_rule.string": "http || tcp.port == 80 || http2"}
{"index":{"_index":"pacchetti4", "_type": "doc"}}
{"sll.pkttype": "0", "sll.hatype": "772", "sll.halen": "6"}
{"index":{"_index":"pacchetti4", "_type": "doc"}}
{"ip.version": "4", "ip.hdr_len": "20", "ip.dsfield": "0x00000000", "ip.dsfield.dscp": "0", "ip.dsfield.ecn": "0", "ip.len": "633", "ip.id": "0x0000b60a", "ip.flags": "0x00000002", "ip.frag_offset": "0", "ip.ttl": "64", "ip.proto": "6", "ip.checksum": "0x00008472", "ip.checksum.status": "2", "ip.src": "127.0.0.1", "ip.addr": "127.0.0.1", "ip.src_host": "127.0.0.1", "ip.host": "127.0.0.1", "ip.dst": "127.0.0.1", "ip.dst_host": "127.0.0.1", "Source GeoIP: Unknown": "", "Destination GeoIP: Unknown": ""}

It was finally loaded. Once this problem is solved, I would like to know if there is a simple way to quickly format a much larger json (at the beginning of the topic I have only shown one, but in reality there are about 400.000).

How can I do?

If you have the data formatted as a JSON object per line, you can use Logstash or one of the language to script the ingestion. You generally want to limit the size of each bulk request to around 5MB or so, and then send multiple requests to Elasticsearch in parallel.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.