Elasticsearch Logstash importing very large JSON files

bgow · December 11, 2015, 7:52pm

I'm trying to import some very large JSON files (up to 80GB / file) into Elasticsearch and have tried a couple different approaches but neither is giving me an efficient working solution:

Using the BULK API - I received heap memory errors (even with export ES_HEAP_SIZE=4g) so decided to write a bash script to break the JSON file up before sending the JSON information to Elasticsearch, see the marked solution here for more info. This works but is extremely slow for files this large. To continue with this approach I'd need to improve the bash script or write a program to split the files more efficiently.
Using Logstash - I have also tried indexing/updating via Logstash. I set export LS_HEAP_SIZE=4g. It's working for my medium size files (~4GB) but not the larger (80GB) ones. When I try to send the largest files I get this error:

HTTP content length exceeded 104857600 bytes

My .conf is simply:

input {
stdin {
type => "stdin-type"
}
file {
	path => ["C:/path/file.json"]
	start_position => "beginning"
	}
}
filter {
	json{
		source => "message"
	}
}
output {
		elasticsearch {
			hosts => "localhost"
			index => "indexname"
			document_type=>"subject"
			document_id=>"%{id}"
			action=>"update"
	}
}

Prior to running this .conf for the large files I run a .conf which has every subjects id in it using the action=>"index" command. For all files I've aggregated the documents under their appropriate id. For the largest files there can be thousands of documents under a given id.

From what I'm seeing online it seems unrealistic to try to import a single file around 80GB. Can I use the filters in Logstash to break the file up into smaller chunks prior to indexing/updating? If not, could you provide a suggestion for improving my bash script or a program to use for efficiently breaking up these files? One last note I'm using Windows.

CallumHolden · April 3, 2017, 3:35pm

Hi, just wondering if you found a solution to this problem? I am having the same issue so am hoping to get some light shed on it - cheers!

bgow · April 3, 2017, 5:05pm

I ended up splitting my large JSON files into smaller files and using the BULK API. I haven't done much with Elastic recently so I'm not sure if there is a better solution now.

CallumHolden · April 4, 2017, 8:29am

Hi Brian, thanks for the quick response. Ah ok, yeah that's the solution we are currently working with - I am hoping to find a better solution however.

Cheers!

Topic		Replies	Views
Loading many (big) json files into elasticsearch Logstash	14	12525	May 23, 2018
Bulk Import to Elasticsearch Elasticsearch	6	2034	December 5, 2017
How can I send large JSON file (6 GB) to Elasticsearch using bulk API? Elasticsearch	5	11848	May 18, 2020
Import large number of json documents failing via bulk import with no error message Elasticsearch	4	3583	November 25, 2018
To give input to ELasticsearch via logstash Logstash	5	1226	July 6, 2017

Elasticsearch Logstash importing very large JSON files

Related topics