Approach on Big Data

ckayay · June 12, 2016, 12:46pm

Hi All,

We are looking into using ELK to log 35 TB of business events generated. The data can even go bigger and come to sizes around 70-80 TB in the future.

The use-case is to allow real-time searches through Kibana to see failing/successful interactions per event type.

Because the data is huge, I thought, there can be 2 indexes: First index for the last 1 month of data, Second for the whole data (70 TB)

From the second index, will it be possible to use Kibana to create dashboards?

Another approach is to keep ELK for the first index, and use Hadoop environment for the whole data to create batch reports.

Thanks,
CK

dadoonet · June 12, 2016, 1:06pm

Some people are indexing billion of docs in their elasticsearch cluster.

I'd use aliases. May be one index per timeframe you wish. Could be month if you want.

So let say you will build one index per month:

2016-06
2016-07
2016-08
2016-09
2016-10
2016-11
2016-12
2017-01
2017-02
2017-03
2017-04
2017-05

You can add an alias named current-month which points to 2017-05 and 2017-04 and another alias current-year which points to the latest 12 months.

Then a month after, create 2017-06 and update aliases.

Kibana should use aliases then.

Hope it helps

Christian_Dahlqvist · June 12, 2016, 1:06pm

Depending on the nature of your data it may be worthwhile looking into using time-based indices. This is a very common and useful approach, especially if your business events are immutable.

ckayay · June 12, 2016, 8:46pm

Thx for the info

Topic		Replies	Views
How ELK stores data Elasticsearch	3	3501	July 6, 2017
Suggestions for aggregating time series log data Elasticsearch	1	660	July 5, 2017
Is Elasticsearch capable of storing this amount of data? Elasticsearch	10	2530	July 6, 2017
Sharding by time Elasticsearch	16	1496	July 6, 2017
Elastic Search Elasticsearch	4	471	July 6, 2017

Approach on Big Data

Related topics