Elasticsearch Architecture Problem

josephmanalo · January 4, 2019, 8:46am

Hi guys. The picture below is the existing data pipeline of a certain client.

Their MAIN PROBLEM are:

Not all the data are being ingested in Node 2 and Node 3
Slow retrieval of data

Do you have any idea what might be the cause of the problem?

Is it a hardware related? If they need to upgrade what is the ideal specs per Node?
Will adding another node will help in solving the problem?
Will it solve the problem by implementing 2nd picture or 3rd picture below?
Also, they want to implement Machine Learning in their data, what can you suggest for them?

Other Details:

The machines and the nodes are just connected to a same network
There are other machines which are connected from a different network but sends the data in logstash (Node 1)
49206385_2237499219623044_8859078541510180864_o.jpg1920×1080 116 KB

49899279_2237578882948411_7035400178831982592_o.jpg1920×1080 141 KB

49205917_2237625009610465_4643750758200639488_o.jpg1920×1080 123 KB

dadoonet · January 4, 2019, 9:03am

Hey @josephmanalo

Please note that fortunately we are not all guys here

Do you have any elasticsearch monitoring activated so you can understand may be better what's the cause of this?

It sounds like you are using HDD disks for some nodes and SSD on other nodes. That's an issue IMO.

I also wonder why are you using Logstash for?

May be you have also too many shards per node.
What is the output of:

GET /_cat/health?v
GET /_cat/indices?v

May I suggest you look at the following resources about sizing:

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

Christian_Dahlqvist · January 4, 2019, 9:05am

Elasticsearch assumes all nodes in the cluster are equal, so having different types of hardware can cause a problem. Are you sure your cluster has formed properly and that you have set discovery.zen.minimum_master_nodes correctly according to these guidelines?

josephmanalo · January 4, 2019, 9:09am

Hi! thanks for the reply.
i'll check out what you've sent

Here's the result of the ff:

josephmanalo · January 4, 2019, 9:23am

We are using logstash to append a new column to the data and push it to its corresponding node.
We tried solving the problem by dividing the data some modules will be stored in Node 2 and other modules will be stored in Node 3.
But still it doesn't solve the problem.

josephmanalo · January 4, 2019, 9:24am

Hi. Thanks for the suggestion, I'll try get into this and I'll update you back. Thanks!

Christian_Dahlqvist · January 4, 2019, 9:25am

You seem to have a lot of shards being generated daily given the small data volumes. I would recommend reducing this significantly e.g. by changing to use only a single primary shard per index, use weekly or monthly indices or simply consolidate the data into fewer indices.

dadoonet · January 4, 2019, 9:34am

Please don't post images of text as they are hardly readable and not searchable.

Instead paste the text and format it with </> icon. Check the preview window.

dadoonet · January 4, 2019, 9:35am

I'm not sure I understood. Anyway, may be look at the node ingest feature which might be enough to replace your logstash pipeline.

system · February 1, 2019, 9:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need Architectural Support Elasticsearch	6	348	June 23, 2020
Performance degradation Elasticsearch	9	579	May 9, 2020
ElasticSearch performance trouble when indexing data Elasticsearch	11	4452	April 28, 2021
Advice on Elasticsearch Architecture design Elasticsearch	4	516	April 13, 2020
Kibana is slow to respond Kibana	13	5253	July 6, 2017

Elasticsearch Architecture Problem

Related topics