We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.
Right now my application and ElasticSearch are residing on same box. I
intend to use Java ElasticSearch client using a node of type client as
suggested in documentation here http://www.elasticsearch.org/guide/reference/java-api/client.html.
Since my application is multithreaded I will share client with them,
is it ok?
For high data writes in ElasticSearch is using Bulk API better?
Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.
Bulk is good indeed. -Xmx and JVM settings matter. If this is
write-heavy, relatively speaking, any index merging params should be looked
at. Refresh interval can/should be high unless you really need NRT.
May be best to wait until/if you hit issues and then you can provide
concrete info about what you are doing and others can provide feedback.
On Tuesday, December 18, 2012 10:43:12 PM UTC-5, Meetu Maltiar wrote:
Hi,
We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.
Right now my application and Elasticsearch are residing on same box. I
intend to use Java Elasticsearch client using a node of type client as
suggested in documentation here Elasticsearch Platform — Find real-time answers at scale | Elastic.
Since my application is multithreaded I will share client with them,
is it ok?
For high data writes in Elasticsearch is using Bulk API better?
Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.
I am going with your suggestion of using bulk api. I will look at the
a) JVM settings b) index merging patterns c) Refresh interval.
Right now I have a singleton node and have "node-client" that is
shared by threads. Is this fine? I am trying to minimise client
creation in each call otherwise I will have to create client for each
document to be Indexed.
Bulk is good indeed. -Xmx and JVM settings matter. If this is
write-heavy, relatively speaking, any index merging params should be looked
at. Refresh interval can/should be high unless you really need NRT.
May be best to wait until/if you hit issues and then you can provide
concrete info about what you are doing and others can provide feedback.
On Tuesday, December 18, 2012 10:43:12 PM UTC-5, Meetu Maltiar wrote:
Hi,
We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.
Right now my application and Elasticsearch are residing on same box. I
intend to use Java Elasticsearch client using a node of type client as
suggested in documentation here Elasticsearch Platform — Find real-time answers at scale | Elastic.
Since my application is multithreaded I will share client with them,
is it ok?
For high data writes in Elasticsearch is using Bulk API better?
Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.
I am going with your suggestion of using bulk api. I will look at the
a) JVM settings b) index merging patterns c) Refresh interval.
Right now I have a singleton node and have "node-client" that is
shared by threads. Is this fine? I am trying to minimise client
creation in each call otherwise I will have to create client for each
document to be Indexed.
Bulk is good indeed. -Xmx and JVM settings matter. If this is
write-heavy, relatively speaking, any index merging params should be looked
at. Refresh interval can/should be high unless you really need NRT.
May be best to wait until/if you hit issues and then you can provide
concrete info about what you are doing and others can provide feedback.
On Tuesday, December 18, 2012 10:43:12 PM UTC-5, Meetu Maltiar wrote:
Hi,
We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.
Right now my application and Elasticsearch are residing on same box. I
intend to use Java Elasticsearch client using a node of type client as
suggested in documentation here Elasticsearch Platform — Find real-time answers at scale | Elastic.
Since my application is multithreaded I will share client with them,
is it ok?
For high data writes in Elasticsearch is using Bulk API better?
Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.
NIce will share client across. BTW, I am using Scala as a language,
Elastic Search Java API and using Akka for parallelizing things.
Though not using Spring at the moment, may do so after some time.
Thanks for the github link it looks gr8 to use.
I am going with your suggestion of using bulk api. I will look at the
a) JVM settings b) index merging patterns c) Refresh interval.
Right now I have a singleton node and have "node-client" that is
shared by threads. Is this fine? I am trying to minimise client
creation in each call otherwise I will have to create client for each
document to be Indexed.
Bulk is good indeed. -Xmx and JVM settings matter. If this is
write-heavy, relatively speaking, any index merging params should be looked
at. Refresh interval can/should be high unless you really need NRT.
May be best to wait until/if you hit issues and then you can provide
concrete info about what you are doing and others can provide feedback.
On Tuesday, December 18, 2012 10:43:12 PM UTC-5, Meetu Maltiar wrote:
Hi,
We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.
Right now my application and Elasticsearch are residing on same box. I
intend to use Java Elasticsearch client using a node of type client as
suggested in documentation here Elasticsearch Platform — Find real-time answers at scale | Elastic.
Since my application is multithreaded I will share client with them,
is it ok?
For high data writes in Elasticsearch is using Bulk API better?
Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.