Approach on Big Data

Hi All,

We are looking into using ELK to log 35 TB of business events generated. The data can even go bigger and come to sizes around 70-80 TB in the future.

The use-case is to allow real-time searches through Kibana to see failing/successful interactions per event type.

Because the data is huge, I thought, there can be 2 indexes: First index for the last 1 month of data, Second for the whole data (70 TB)

From the second index, will it be possible to use Kibana to create dashboards?

Another approach is to keep ELK for the first index, and use Hadoop environment for the whole data to create batch reports.

Thanks,
CK

Some people are indexing billion of docs in their elasticsearch cluster.

I'd use aliases. May be one index per timeframe you wish. Could be month if you want.

So let say you will build one index per month:

  • 2016-06
  • 2016-07
  • 2016-08
  • 2016-09
  • 2016-10
  • 2016-11
  • 2016-12
  • 2017-01
  • 2017-02
  • 2017-03
  • 2017-04
  • 2017-05

You can add an alias named current-month which points to 2017-05 and 2017-04 and another alias current-year which points to the latest 12 months.

Then a month after, create 2017-06 and update aliases.

Kibana should use aliases then.

Hope it helps

1 Like

Depending on the nature of your data it may be worthwhile looking into using time-based indices. This is a very common and useful approach, especially if your business events are immutable.

1 Like

Thx for the info