I am planning to index 150 Gb of jsob objects (200 million rows of
index+json object) using elastic search.
The box i have is 250 Gb with 8 Gb of ram
i am planning to use the bulk entry point
also planning to increase RAm by doing
set JAVA_OPTS=-Xmx2g -Xms2g
then split the files using unix split into chunks and do the following
#!/usr/bin/perl
use strict;
use warnings;
my @files = glob("input/*");
foreach my $f (@files) {
my $cmd = "curl -s -XPOST 'http://brigho.com:9200/_bulk' --data-binary
@".$f;
print $cmd; #print $cmd."\n";
}
has anyone tried doing this? will this work? are the system configs
enough? any pointers would be helpful
first: es is a distributed search engine, so maybe its a good idea to setup
more than one node/machine if performance is not sufficient
then: 8gb ram is not bad but make sure youre on a 64bit operation system
and 64bit java so that the full meory can be addressed by the operation
system
To use your 8 GB memory do (leave 2 gig to the operation system)
set JAVA_OPTS=%JAVA_OPTS% -Xmx6g -Xms6g
To get your data into ea you maybe want consider the UDP Bulk endpoint (
)
or use your perl scripts (iam not into perl, so i cannot say anything about
it). For perl there is also a ea client available:
Am Freitag, 11. Oktober 2013 01:31:30 UTC+2 schrieb gautam singhania:
Hi
I am planning to index 150 Gb of jsob objects (200 million rows of
index+json object) using Elasticsearch.
The box i have is 250 Gb with 8 Gb of ram
i am planning to use the bulk entry point
also planning to increase RAm by doing
set JAVA_OPTS=-Xmx2g -Xms2g
then split the files using unix split into chunks and do the following
#!/usr/bin/perl
use strict;
use warnings;
my @files = glob("input/*");
foreach my $f (@files) {
my $cmd = "curl -s -XPOST 'http://brigho.com:9200/_bulk' --data-binary
@".$f;
print $cmd; #print $cmd."\n";
}
has anyone tried doing this? will this work? are the system configs
enough? any pointers would be helpful
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.