Export huge dataset (10+million entries) kibana / logstash?

First of all I'm fairly new to the elastic stack!
I use the elastic APM to monitor an application.
I have to export the collected dataset about 10+million entries.

I already tried to set the kibanas xpack.reporting.csv.maxSizeBytes as well as elastic search http.max_content_lenght to their limits. But I still can't export everything.

So clearly there must be a better way to export such huge amount of data. So I'm trying to get my head around elastics APM and how it stores data on the elastic search server.
Is there a way to retrieve the apm data directly from the elastic search server (maybe using logstash), such it is in the same form as in kibanas discover section?

Hi and welcome to the forum :wave:

You could try using https://github.com/elasticsearch-dump/elasticsearch-dump

1 Like

Thanks for your fast reply.
For sure i could use the tool you mentioned, but it only allows to ex/import between elastic search server instances.

What I am trying to accomplish is export a 10mio+ entries from kibana, ideally as CSV. I already tried to set the kibana max csv export size to it’s max but now the limiting factor is the elastic search server http.body size which is ca 7gb. The export is bigger than 7gb.

Is there a way to export the data collected by the apm and aggregate it in the form the kibana ui does in the discover section?
I guess, i have to introduce logstash to read the logs line by line since getting them in one request does not work. But i have no clue how to merge the different indexes (the apm creates different indexes on the elastic search server or am i wrong?). Do you have any idea? Maybe there is a easier way to export the data without introducing logstash?

You may also use the scroll API and programmatically write your CSV.

Alternatively, export to JSON with elasticsearch-dump and convert that to CSV.

Why do you want the end result to be in CSV anyway? What are you planning to do with the exported data?

You can also use logstash to export large data sets to CSV

Here is a sample logstash config... you would change the index to the index(s) you want and provide the proper column headings.

This uses the sample kibana_sample_data_flights index

input {
  elasticsearch {
    hosts => "http://localhost:9200"
    # user=> "elastic"
    # password=> "changeme"
    index => "kibana_sample_data_flights"
    #query => '{ "query": { "query_string": { "query": "*" } } }'
    size => 1000
    scroll => "5m"
    docinfo => true  
  }
}

output {

  csv {
    path => "/Users/sbrown/Downloads/output.csv"
    fields => ["AvgTicketPrice","Cancelled","Carrier","Dest","DestAirportID","DestCityName","DestCountry","[DestLocation][lat]","[DestLocation][lon]","DestRegion","DestWeather","DistanceKilometers","DistanceMiles","FlightDelay","FlightDelayMin","FlightDelayType","FlightNum","FlightTimeHour","FlightTimeMin","Origin","OriginAirportID","OriginCityName","OriginCountry","[OriginLocation][lat]","[OriginLocation][lon]","OriginRegion","OriginWeather","_id","_index","_score","_type","dayOfWeek","hour_of_day","timestamp"]
    }

  # stdout {
  #   codec => rubydebug { metadata => true }
  # }

}

Also I agree @felixbarny what do you plan to do? if you plan to import to another tool you may be able to do that directly with logstash or another tool.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.