When I launch this command: curl -H 'Content-Type:application/json' -XPOST "localhost:9200/pacchetti3/doc/_bulk?pretty" --data-binary @C:\Users\Thebe\Desktop\singolopacchetto.json
I received this error:
{
"error" : {
"root_cause" : [
{
"type" : "json_e_o_f_exception",
"reason" : "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3c861e6a; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3c861e6a; line: 2, column: 3]"
}
],
"type" : "json_e_o_f_exception",
"reason" : "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3c861e6a; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3c861e6a; line: 2, column: 3]"
},
"status" : 500
}
Each header and document must be on a single line and the document should not contain _type, _score or _source fields. You may also run into problems as you have dots in your field names.
I followed the advice, the json file I formatted it as follows: {"index":{"_index":"pacchetti3"}} {"layers":{"frame":{"frame.interface_id":"0","frame.interface_id_tree":{ ... }}}}
However the problem persists with a new error:
{
"error" : {
"root_cause" : [
{
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: no requests added;"
}
],
"type" : "action_request_validation_exception",
"reason" : "Validation Failed: 1: no requests added;"
},
"status" : 400
}
Thanks!
I did a test by formatting only a couple of lines of my json:
{"index":{"_index":"pacchetti4", "_type": "doc"}}
{"frame.interface_id": "0", "frame.interface_name": "any", "frame.encap_type": "25", "frame.time": "Apr 20, 2018 15:30:52.669797277 ora legale Europa occidentale", "frame.number": "1", "frame.len": "649", "frame.cap_len": "649", "frame.marked": "0", "frame.ignored": "0", "frame.protocols": "sll:ethertype:ip:tcp:http:json", "frame.coloring_rule.name": "HTTP", "frame.coloring_rule.string": "http || tcp.port == 80 || http2"}
{"index":{"_index":"pacchetti4", "_type": "doc"}}
{"sll.pkttype": "0", "sll.hatype": "772", "sll.halen": "6"}
{"index":{"_index":"pacchetti4", "_type": "doc"}}
{"ip.version": "4", "ip.hdr_len": "20", "ip.dsfield": "0x00000000", "ip.dsfield.dscp": "0", "ip.dsfield.ecn": "0", "ip.len": "633", "ip.id": "0x0000b60a", "ip.flags": "0x00000002", "ip.frag_offset": "0", "ip.ttl": "64", "ip.proto": "6", "ip.checksum": "0x00008472", "ip.checksum.status": "2", "ip.src": "127.0.0.1", "ip.addr": "127.0.0.1", "ip.src_host": "127.0.0.1", "ip.host": "127.0.0.1", "ip.dst": "127.0.0.1", "ip.dst_host": "127.0.0.1", "Source GeoIP: Unknown": "", "Destination GeoIP: Unknown": ""}
It was finally loaded. Once this problem is solved, I would like to know if there is a simple way to quickly format a much larger json (at the beginning of the topic I have only shown one, but in reality there are about 400.000).
If you have the data formatted as a JSON object per line, you can use Logstash or one of the language to script the ingestion. You generally want to limit the size of each bulk request to around 5MB or so, and then send multiple requests to Elasticsearch in parallel.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.