What do you think is more critical? To allocate more resources or change architecture?

I have multiple issues with Elasticsearch right now.

  • I'm running out of disk
  • I'm running out of memory
  • Architecture may need to change since I have more than 600 shards per node. For sure, I will add a new node to the cluster

I'm not sure what should be the priority. And also I'm thinking that maybe some of these 3 issues caused the others.

GET /

{
  "name" : "a",
  "cluster_name" : "cluster_name",
  "cluster_uuid" : "uuid",
  "version" : {
    "number" : "7.1.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "606a173",
    "build_date" : "2019-05-16T00:43:15.323135Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

GET /_cat/allocation?v

shards disk.indices disk.used disk.avail disk.total disk.percent   host          ip            node
   605       16.4gb   257.9gb     37.1gb    295.1gb      87     ip-adress-b  ip-address-b   name-node-b
   642       24.8gb   242.8gb     52.3gb    295.1gb      82     ip-address-a ip-address-a   name-node-a
    39                                                                                       UNASSIGNED

GET /_cat/nodes?v

ip            heap.percent  ram.percent cpu load_1m load_5m load_15m   node.role   master name
ip-adress-a           63          94     9    0.62     0.90      0.95 mdi       *   name-node-a
ip-adress-b           48          98     2    0.63     0.35      0.29 mdi       -    name-node-b

GET /_cat/health

epoch      timestamp cluster   status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1607443543 16:05:43  name    yellow          2         2   1247 643    0    0       39             0                  -                 97.0%

GET /_cat/indices?v

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

You might want to add nodes and scale horizontally, honestly you can always reuse different hardware and add more nodes. Elasticsearch is meant to be scalable.

maybe provide some hardware info ?

How many nodes ?

Could you share the missing:

GET /
GET /_cat/health?v
GET /_cat/indices?v

Please don't forget the ?v

1 Like

You seem oversharded, looks like most of them are ~3GB. Look at using _shrink to reduce that.

Updated post with all indices

Can you explain your indexing strategy? You have tonnes of tiny indices.

I'm afraid I can't answer your question, about indexing strategy, as I've inherited this project. Indices are created dynamically with source code.

Ok. Well it's pretty wasteful, so it's definitely something you should dig into and try to optimise.

Ok but given the situation do you think increasing the disk and/or memory would resolve these issues?
I'm also planning to add a third node to the cluster.

It would yes. It's ultimately a short term fix though as if you increase your indexing you will still reach the same point in the future.

What about long term?

I'm planning to add a third node to the cluster. I believe this would reduce shards per index which I believe is very critical.
But will this be enough? Do I need to review the whole indexing strategy? I would be happy if I can avoid this because the code is a mess.

PS:
BTW, what do you mean by indexing strategy. Can you provide me a link or resource to study it?

Per node, yes. Not per index.

It will be enough, but for how long I can't say.

I mean - what's in these indices? Why are you creating so many empty indices, or indices with little to no documents? What sort of data is it?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.