High volume Indexing of Documents

Meetu_Maltiar · December 19, 2012, 3:43am

Hi,

We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.

Right now my application and ElasticSearch are residing on same box. I
intend to use Java ElasticSearch client using a node of type client as
suggested in documentation here http://www.elasticsearch.org/guide/reference/java-api/client.html.
Since my application is multithreaded I will share client with them,
is it ok?

For high data writes in ElasticSearch is using Bulk API better?

Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.

Regards,
Meetu Maltiar

--

otisg · December 19, 2012, 5:17am

Hi,

Bulk is good indeed. -Xmx and JVM settings matter. If this is
write-heavy, relatively speaking, any index merging params should be looked
at. Refresh interval can/should be high unless you really need NRT.

May be best to wait until/if you hit issues and then you can provide
concrete info about what you are doing and others can provide feedback.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, December 18, 2012 10:43:12 PM UTC-5, Meetu Maltiar wrote:

Hi,

We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.

Right now my application and Elasticsearch are residing on same box. I
intend to use Java Elasticsearch client using a node of type client as
suggested in documentation here
Elasticsearch Platform — Find real-time answers at scale | Elastic.
Since my application is multithreaded I will share client with them,
is it ok?

For high data writes in Elasticsearch is using Bulk API better?

Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.

Regards,
Meetu Maltiar

--

Meetu_Maltiar · December 19, 2012, 6:15am

Thanks a lot Otis,

I am going with your suggestion of using bulk api. I will look at the
a) JVM settings b) index merging patterns c) Refresh interval.

Right now I have a singleton node and have "node-client" that is
shared by threads. Is this fine? I am trying to minimise client
creation in each call otherwise I will have to create client for each
document to be Indexed.

Regards,
Meetu Maltiar

On Dec 19, 10:17 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi,

Bulk is good indeed. -Xmx and JVM settings matter. If this is
write-heavy, relatively speaking, any index merging params should be looked
at. Refresh interval can/should be high unless you really need NRT.

May be best to wait until/if you hit issues and then you can provide
concrete info about what you are doing and others can provide feedback.

Otis

ELASTICSEARCH Performance Monitoring -Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, December 18, 2012 10:43:12 PM UTC-5, Meetu Maltiar wrote:

Hi,

We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.

Right now my application and Elasticsearch are residing on same box. I
intend to use Java Elasticsearch client using a node of type client as
suggested in documentation here
Elasticsearch Platform — Find real-time answers at scale | Elastic.
Since my application is multithreaded I will share client with them,
is it ok?

For high data writes in Elasticsearch is using Bulk API better?

Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.

Regards,
Meetu Maltiar

--

dadoonet · December 19, 2012, 6:32am

Yes. Share the same client within all threads.
BTW, If you are using Spring, you can have look at this: GitHub - dadoonet/spring-elasticsearch: Spring factories for elasticsearch

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 déc. 2012 à 07:15, Meetu Maltiar meetu@knoldus.com a écrit :

Thanks a lot Otis,

I am going with your suggestion of using bulk api. I will look at the
a) JVM settings b) index merging patterns c) Refresh interval.

Right now I have a singleton node and have "node-client" that is
shared by threads. Is this fine? I am trying to minimise client
creation in each call otherwise I will have to create client for each
document to be Indexed.

Regards,
Meetu Maltiar

On Dec 19, 10:17 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi,

Bulk is good indeed. -Xmx and JVM settings matter. If this is
write-heavy, relatively speaking, any index merging params should be looked
at. Refresh interval can/should be high unless you really need NRT.

May be best to wait until/if you hit issues and then you can provide
concrete info about what you are doing and others can provide feedback.

Otis

ELASTICSEARCH Performance Monitoring -Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, December 18, 2012 10:43:12 PM UTC-5, Meetu Maltiar wrote:

Hi,

We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.

Right now my application and Elasticsearch are residing on same box. I
intend to use Java Elasticsearch client using a node of type client as
suggested in documentation here
Elasticsearch Platform — Find real-time answers at scale | Elastic.
Since my application is multithreaded I will share client with them,
is it ok?

For high data writes in Elasticsearch is using Bulk API better?

Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.

Regards,
Meetu Maltiar

--

Meetu_Maltiar · December 19, 2012, 7:37am

Thanks David,

NIce will share client across. BTW, I am using Scala as a language,
Elastic Search Java API and using Akka for parallelizing things.
Though not using Spring at the moment, may do so after some time.
Thanks for the github link it looks gr8 to use.

Meetu Maltiar
Twitter: @meetumaltiar

On Dec 19, 11:32 am, David Pilato da...@pilato.fr wrote:

Yes. Share the same client within all threads.
BTW, If you are using Spring, you can have look at this:GitHub - dadoonet/spring-elasticsearch: Spring factories for elasticsearch

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 déc. 2012 à 07:15, Meetu Maltiar me...@knoldus.com a écrit :

Thanks a lot Otis,

I am going with your suggestion of using bulk api. I will look at the
a) JVM settings b) index merging patterns c) Refresh interval.

Right now I have a singleton node and have "node-client" that is
shared by threads. Is this fine? I am trying to minimise client
creation in each call otherwise I will have to create client for each
document to be Indexed.

Regards,
Meetu Maltiarhttp://blog.knoldus.com/

On Dec 19, 10:17 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi,

Bulk is good indeed. -Xmx and JVM settings matter. If this is
write-heavy, relatively speaking, any index merging params should be looked
at. Refresh interval can/should be high unless you really need NRT.

May be best to wait until/if you hit issues and then you can provide
concrete info about what you are doing and others can provide feedback.

Otis

ELASTICSEARCH Performance Monitoring -Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, December 18, 2012 10:43:12 PM UTC-5, Meetu Maltiar wrote:

Hi,

We have an application that generates around 7000-10000 JSON messages
per second. Each message size is around 2.6 KB. What are the best
practices that needs to be followed at the java API level so that my
application as well as Elastic-Search scales well.

Right now my application and Elasticsearch are residing on same box. I
intend to use Java Elasticsearch client using a node of type client as
suggested in documentation here
Elasticsearch Platform — Find real-time answers at scale | Elastic.
Since my application is multithreaded I will share client with them,
is it ok?

For high data writes in Elasticsearch is using Bulk API better?

Please suggest any other best practices I can include in my
implementation. I will like to scale to 13 nodes in a cluster soon.

Regards,
Meetu Maltiar

--

--

Topic		Replies	Views
Bulk throughput issues Elasticsearch	15	1674	July 6, 2017
Just Pushed: Bulk API Elasticsearch	5	272	July 6, 2017
Bad performance with varying bulk size Elasticsearch	8	1577	July 5, 2017
Java Client Bulk API performance settings ES 5.x Elasticsearch	6	1732	October 5, 2017
Java bulk API slows down if client is not closed and reopened Elasticsearch	9	520	July 6, 2017

High volume Indexing of Documents

Otis

Otis

Otis

Otis

Related topics