River puts too much load on only one node

KashiPashoria_Tonny · May 7, 2019, 7:52pm

I'm using Elasticsearch 1.5. It uses rivers to ingest data. I've 3 nodes in my cluster. The river that ingest large amount of data puts too much load on a specific node. I want to divert this utilization to another node. Is there any way to balance or totally project this load to any other node in my cluster?

Christian_Dahlqvist · May 7, 2019, 8:01pm

This is one of the reasons why rivers were deprecated and largely replaced by Logstash and other external processes. You should really look to upgrade as 1.5 is very, very old.

KashiPashoria_Tonny · May 7, 2019, 8:05pm

Yes you are damn right. But for now i don't have choice to upgrade whole cluster and setup a new one. If you can advise any other option.

By the way thanx for quick rply

Christian_Dahlqvist · May 7, 2019, 8:09pm

I have not touched rivers in years so am not sure, but do not think there is any way to balance the load as it is always one node that does the processing at any time.

KashiPashoria_Tonny · May 7, 2019, 8:09pm

Can you predict the behaviour if i simply go and set data false to config file of my desired node.
Will this solve my problem?
Or
It may ruins my cluster?
Or
Any other ambiguity?

Christian_Dahlqvist · May 7, 2019, 8:10pm

I have no idea. Which river are you using? If it is the JDBC river I believe there is a stand-alone version that you could run outside the cluster in order to get better balance. You might also consider Logstash.

KashiPashoria_Tonny · May 8, 2019, 7:11am

Yeah I'm using JDBC river

dadoonet · May 8, 2019, 7:23am

Have a look at the jdbc input plugin of logstash then.

KashiPashoria_Tonny · May 8, 2019, 7:27am

I've used it for earlier. It skips some of my records giving an unresolved error. Whatsoever this is not the solution of my problem. I need to go with rivers and just need to balance or totally divert load.

Christian_Dahlqvist · May 8, 2019, 7:31am

Not being able to distribute load is an inherent problem with rivers, so as long as you continue using it I do not think you can solve your problem. I believe later versions of the JDBC river allowed execution as a separate process, and this could allow you to move the processing off the ES nodes. Logstash is another option, and I am not aware of any issues like the one you describe so it may be a matter of incorrect configuration.

KashiPashoria_Tonny · May 8, 2019, 7:39am

its the issue with logstash I posted when I was using it a couple of months ago and it went unsolved. However, I can't now move back to logstash.

dadoonet · May 8, 2019, 7:41am

What is the use case at the end?
You want to be able to use elasticsearch in your application which is using a sql data store?

I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/

Basically, I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.

Christian_Dahlqvist · May 8, 2019, 7:42am

That issue is due to the default mapping and you need to provide an index template that maps that field as gloat/double.

dadoonet · May 8, 2019, 7:42am

BTW I suspect you were using an old version of the stack, right?

Logstash is super stable nowadays. Try version 7.0.1

system · June 5, 2019, 7:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
River not ingesting data Elasticsearch	5	368	June 9, 2019
[6.8.2] Very High Load on One Node Kibana	3	367	September 4, 2019
Oneshot River Replacement in ES 5 Elasticsearch	2	549	July 28, 2017
Uneven load distribution for search Elasticsearch	6	1376	September 14, 2018
[6.8.2] Unusual Server Load Elasticsearch	7	648	September 9, 2019

River puts too much load on only one node

Related topics