Indexing 150 G of data


(gautam singhania) #1

Hi

I am planning to index 150 Gb of jsob objects (200 million rows of
index+json object) using elastic search.

The box i have is 250 Gb with 8 Gb of ram
i am planning to use the bulk entry point

also planning to increase RAm by doing

set JAVA_OPTS=-Xmx2g -Xms2g

then split the files using unix split into chunks and do the following

#!/usr/bin/perl
use strict;
use warnings;
my @files = glob("input/*");
foreach my $f (@files) {
my $cmd = "curl -s -XPOST 'http://brigho.com:9200/_bulk' --data-binary
@".$f;

print $cmd;
#print $cmd."\n";
}

has anyone tried doing this? will this work? are the system configs
enough? any pointers would be helpful

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hendrik) #2

depends

first: es is a distributed search engine, so maybe its a good idea to setup
more than one node/machine if performance is not sufficient
then: 8gb ram is not bad but make sure youre on a 64bit operation system
and 64bit java so that the full meory can be addressed by the operation
system

To use your 8 GB memory do (leave 2 gig to the operation system)
set JAVA_OPTS=%JAVA_OPTS% -Xmx6g -Xms6g

To get your data into ea you maybe want consider the UDP Bulk endpoint (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk-udp.html
)
or use your perl scripts (iam not into perl, so i cannot say anything about
it). For perl there is also a ea client available:
http://www.elasticsearch.org/guide/en/elasticsearch/client/perl-api/current/index.html

Am Freitag, 11. Oktober 2013 01:31:30 UTC+2 schrieb gautam singhania:

Hi

I am planning to index 150 Gb of jsob objects (200 million rows of
index+json object) using elastic search.

The box i have is 250 Gb with 8 Gb of ram
i am planning to use the bulk entry point

also planning to increase RAm by doing

set JAVA_OPTS=-Xmx2g -Xms2g

then split the files using unix split into chunks and do the following

#!/usr/bin/perl
use strict;
use warnings;
my @files = glob("input/*");
foreach my $f (@files) {
my $cmd = "curl -s -XPOST 'http://brigho.com:9200/_bulk' --data-binary
@".$f;

print $cmd;
#print $cmd."\n";
}

has anyone tried doing this? will this work? are the system configs
enough? any pointers would be helpful

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3