Which RAID config for 2TB * 12disks

Hello,
There are 6 servers, and each have 64GB ram and 12 spinning disks (2 TB per disk. so 24TB in total).

When I give 32GB RAM to ES, It seems that maximum storable index size for a node is around 4TB.
So, If I set RAID 0, 20TB disks will be unused.
And I can not increase replica due to the ES heap limit.

How about,
Make 6 Raid1 array? I mean (raid1 = 2disk) * 6 ==> 6 path, 12TB available disk space.
Then one raid1 give to OS, /data0
Five raid1 will be configured for path.data=/data1,/data2,/data3,/data4,/data5

How does it seems?

This will generally depend on your data and use-case as well as how well you optimize heap usage.

  • What type of data are you indexing?
  • Have you gone through and optimised your mappings?
  • What is the average shard size in your cluster? Having lots of small shards and indices can be inefficient and drive up heap usage.
  • Are you using any coordinating-only nodes?
1 Like

@Christian_Dahlqvist

OK,
I store user click stream logs to ES, a index per a day, with 5 shard, no replica.
In a day, 2 billion logs indexed and it's index size is about 750GB.

I thinks, this click stream logs are not mission critical, so replica required not importantly.

Actually there are 6 hot-nodes, 2 warm-nodes, 3 master dedicated nodes

What I described in the first comment is about NEW warm-nodes.
because I want to increase retention date for warm-node data.

Below is all about the current warm-node.

Kibana monitoring

single node

image


single index

image


Here is mapping of index.

{
  "jpl_raw_20180724": {
    "mappings": {
      "_default_": {
        "dynamic": "false",
        "_all": {
          "enabled": false
        },
        "_source": {
          "excludes": [
            "ac_hash",
            "event_hash"
          ]
        },
        "properties": {
          "action_id": {
            "type": "keyword"
          },
          "app_ver": {
            "type": "keyword",
            "ignore_above": 30
          },
          "classifier": {
            "type": "keyword"
          },
          "client_ip": {
            "type": "keyword"
          },
          "country": {
            "type": "keyword"
          },
          "deliver_delay_time": {
            "type": "long"
          },
          "device_id": {
            "type": "keyword",
            "ignore_above": 200
          },
          "event_time": {
            "type": "date"
          },
          "ingest_host": {
            "type": "keyword",
            "ignore_above": 50
          },
          "ingest_time": {
            "type": "date",
            "format": "epoch_millis"
          },
          "language": {
            "type": "keyword",
            "index": false
          },
          "os_name": {
            "type": "keyword",
            "ignore_above": 30
          },
          "os_ver": {
            "type": "keyword",
            "ignore_above": 30
          },
          "p0value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p1value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p2value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p3value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p4value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p5value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p6value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p7value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p8value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p9value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "product": {
            "type": "keyword"
          },
          "scene_id": {
            "type": "keyword",
            "ignore_above": 50
          },
          "service_id": {
            "type": "keyword"
          },
          "user_key": {
            "type": "keyword"
          }
        }
      },
      "default": {
        "dynamic": "false",
        "_all": {
          "enabled": false
        },
        "_source": {
          "excludes": [
            "ac_hash",
            "event_hash"
          ]
        },
        "properties": {
          "action_id": {
            "type": "keyword"
          },
          "app_ver": {
            "type": "keyword",
            "ignore_above": 30
          },
          "classifier": {
            "type": "keyword"
          },
          "client_ip": {
            "type": "keyword"
          },
          "country": {
            "type": "keyword"
          },
          "deliver_delay_time": {
            "type": "long"
          },
          "device_id": {
            "type": "keyword",
            "ignore_above": 200
          },
          "event_time": {
            "type": "date"
          },
          "ingest_host": {
            "type": "keyword",
            "ignore_above": 50
          },
          "ingest_time": {
            "type": "date",
            "format": "epoch_millis"
          },
          "language": {
            "type": "keyword",
            "index": false
          },
          "os_name": {
            "type": "keyword",
            "ignore_above": 30
          },
          "os_ver": {
            "type": "keyword",
            "ignore_above": 30
          },
          "p0value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p1value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p2value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p3value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p4value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p5value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p6value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p7value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p8value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "p9value": {
            "type": "keyword",
            "ignore_above": 300
          },
          "product": {
            "type": "keyword"
          },
          "scene_id": {
            "type": "keyword",
            "ignore_above": 50
          },
          "service_id": {
            "type": "keyword"
          },
          "user_key": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Here is a single index stats

Here is a single node info

Additionally,
when I force merged an index there were no remarkable heap reduce(just 5%?), as I remember..

Mappings look fine and properly optimised, so no problem there. You do however have quite large shards and the terms heap usage is high. I would recommend trying to reduce the average shard size to closer to 50GB to see if this makes a difference, e.g. by increasing the number of primary shards to between 15 and 18.

1 Like

I'll try to do that.
BTW, having more shard with smaller size can reduce the total heap usage?

It may, which is why I am asking you to test it and then look at the index stats.

Now reindexing ongoing with 15 shards.
It might takes 6 hours...

It seems that even though 15 shards reduces heap usage by 2-30% compared to 5 shards, I still cannot have replica. If 50% reduced I can have 1 replica.

Anyway I will check out the 15 shards index's stats.

Raid configuration, what I asked first, seems a little independent from the this 15 shard test. Because disk space quietly large. How do you think about set raid up regardless of heap optimizing if multiple raid1 make sense.

Multiple RAID1 maks sense, and if you manage to reduce heap usage you will be able to use a larger portion of that storage.

Of course!
Thanks for your help! : D

If a single daily index is nearly 750GB, I would consider writing hourly indices rather than daily. That will reduce them to a far more manageable ~30GB/index.

If a single daily index is nearly 750GB, I would consider writing hourly indices rather than daily. That will reduce them to a far more manageable ~30GB/index.

It seems a good idea!

@Christian_Dahlqvist
15 shards index does not shows an notable heap reduce.

15 shards

5 shards

image

I did not reindexed but applied 15 shard to a new day's index.
document size of both index are almost same.

one different is that

  • 20180803 index's(15shards) segment count is around 650
  • but 20180808 index's(5shards) segment count is around 290

It is a shame it did help, but I have to admit it was a long shot. As the mappings look good I am not sure I have any other suggestions apart from tweaking the circuit breaker thresholds a bit, but this is unlikely to give any massive improvement and could cause instability if pushed too far.

There is one thing I forgot to ask earlier: Do you allow Elasticsearch to automatically assign document IDs or do you set them yourself? If you set them yourself, what do the IDs look like and how are they generated?

@Christian_Dahlqvist
Could you tell me the technical background of 'more shards to reduce terms heap'? for next time when I meet similar situation and need heap optimizing. : )

Do you allow Elasticsearch to automatically assign document IDs or do you set them yourself?

Document _id automatically generated.

This was based on something I saw in a test of large shards (small sample size), but it seems there is no such direct correlation to shard size.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.