After Working with the ElasticSearch for 3 monts., Now I
have got Some Knowledge in ElasticSearch.
But I have Some Questions in my Mind. First I will Share What is my system
Enveronment and then I List out my Questions. First: My Working Environment
We have Two Elasticsearch Servers[one is master and another is client]
running in Our Server Machines.
Elasticsearch version is., 0.20.5
jvm parametres are for all servers is., -Xms:5gb -Xmx:5gb.
and java version is 1.6.0
Actually *we have some Millions of data[each record has More than 50 Fields]
in Fixed Length Files[Delimitor Files], and
we have implemented some Spring3 based apllications, which will Load data
into ElasyicSearch.
And also we are Now Implementing Applications for Searchings.
And We have Tested Applications in our servers for Testing Performance of
The Index Operations. [here we tested with <6 to 8> shards with 0 replicas]
And we Have Some statistical data about Index Operations.
Say., e.g., *for Loading 1Million of data with 8 fields our Application
has taken the Time is: [in the Range 9 to 13 Minutes] *
we have used the logic for Index asynchronously as., <no of Bulks load is: 10000 records at a time>
bulkRequest.execute(
new ActionListener() { @Override
public void onResponse(BulkResponse bulkResponse) {
//displaying the time taken for index
} @Override
public void onFailure(Throwable e) {
//here we display msg if index operation is failure
}
});
Two: Now My Questions are:
What is the Avg Time that ElasticSearch will Support to index 1Million
Records.?
From Performance point of view., What is the bulk size we have to use
for Indexing such Huge Volume of Data?
What is The Maximum Capacity of 1 Shard for Load Data with Same Speed?
For Good Practice., How Many, Shards we have to take in one Single
Index? and Number of Nodes for Server?
Any Suggestion For Boost the Performance of Bulk Loads?
This depends on your records and your ES config. You can connect your
client to a lot of powerful ES nodes and put up some network bandwith,
CPU, and fast disks, and ES will be faster than you can send docs
You have to find the "sweet spot" of your current system. Start with
1000 docs per bulk, start with number of concurrent bulks = number of
CPU cores in your system, start bulk index for at least 35 or 40 minutes
(when large segment merging starts). Repeat this from scratch, but with
higher number of docs per bulk request. Increase it to the point where
the docs per second rate does not get higher. Typical range of docs per
second on a single ES node cluster with current PC server hardware I
have observed is everything between 1000 and 10000 docs per second, OOTB.
Always set up one ES node per server. Number of shards per node
should roughly not exceed the number of CPU cores.
Use fast disk susbystem (e.g. SSD RAID0). Use latest Java 7. Disable
refresh, disable replica.
Jörg
Am 09.05.13 10:09, schrieb rafi:
What is the Avg Time that Elasticsearch will Support to index
1Million Records.?
From Performance point of view., What is the bulk size we have to
use for Indexing such Huge Volume of Data?
What is The Maximum Capacity of 1 Shard for Load Data with Same Speed?
For Good Practice., How Many, Shards we have to take in one Single
Index? and Number of Nodes for Server?
Any Suggestion For Boost the Performance of Bulk Loads?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.