Is there a quicker way to import data to Elastic?

I have exported elastic indices using logstash with the following logstash configuration:

    - pipeline.id: export-process
      pipeline.workers: 4
      config.string: |
        input {
          elasticsearch {
            hosts => "http://elastic:80/elasticsearch/"
            user => "elastic"
            password => ""
            ssl => "false"
            index => "metricbeat-*"
            docinfo => true
            query => '{
                "query": {
                  "bool": {
                    "filter": {
                      "range": {
                          "@timestamp": {
                          "gte": "now-35m",
                          "lte": "now",
                          "format": "strict_date_optional_time||epoch_millis"
                          }
                      }
                    }
                  }
              }
            }'
          }
        }
        output {
          file {
            gzip => "true"
            path => "/usr/share/logstash/export/export_%{[@metadata][_index]}.json.gz"
          }
        }

Now I am trying to import it back into another instance. I have unzipped the gz json file, and I am going over each line in the document and doing:

curl -s -XPOST http://1.2.3.4:9000/metricbeat/_doc/ -H "Content-Type: application/json" -d "$1"

where $1 is a line item from the json file. This method is very slow. I started the import of one index which is 1.7Gb and it is still running after 90 minutes. Is there a better way of doing this?

Hi John,

Are you able to use the _bulk API instead?

Why not use Logstash with a file input and an elasticsearch input? Or even Filebeat?

Also, if you have communication between your instances you could try a remote reindex, or maybe create a snapshot on a cloud service and restore from the snapshot.

3 Likes

Hi @carly.richmond When I try the bulk import I get this in the response:

< Warning: 299 Elasticsearch-8.3.3-801fed82df74dbe537f89b71b098ccaff88d2c56 "Unsupported action: [stream]. Supported values are [create], [delete], [index], and [update]. Unsupported actions are currently accepted but will be rejected in a future version."
< content-type: application/json;charset=utf-8
< content-length: 329
* HTTP error before end of send, stop sending
<
* Closing connection 0
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]"}],"type":"illegal_argument_exception","reason":"Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]"},"status":400}

So it looks like the format of the output is not as expected. I also had to update the http.max_content_length as 100mb was too small also.
Thanks for the link!

The instances I am exporting from are ephemeral, hence the reason for trying to harvest the data from them to be imported at a later date. I take it you mean elasticsearch output rather than input. That might be an option. I will try it out.

Yes, I think @leandrojmp's great suggestion is using file input and Elasticsearch output plugins as you've clarified. I would recommend trying his approach instead of bulk given the error above.

Let us know how you get on!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.