Indexes and time to keep information


(Javier L) #1

Hi

I am storing a lot of logs from an cloud application (architecture logs, application logs and syslogs) on Elasticsearch.

My idea was to keep -at least- logs of last 30 days on Elasticsearch to allow analysis on application performance by using Kibana dashboards. I am also using one index for all the data.

Some days ago, an engineer told me that if I keep such amount of data and use just one index, the performance will be por because Elasticsearch, if I have 1TB of data using one index, ill try to get 1TB of RAM to load such index. It sounds a Little bit weird to me then I come here to ask to experienced people.

Could somebody tell me what is the best practice? Should I Split my Elasticsearch indexes by rolling them by date or keep just one index for the whole month? Is it true that such only index will eat all the RAM?
Also, is it OK to store data on Elasticseach for historic analysis or shall I export it to a bigdata DB?

Thanks in advance
J


(Jymit Singh Khondhu) #2

Hi,

Is the last 30days data in a single days index or do you have an index for every single day spanning a total of 30 days?
What is your shard sizing per index?
How much RAM is allotted to the (physical/virtual) server?


(Javier L) #3

Hi

Thanks for your answer.

Regarding the index, well, thats my question. We have now just one index for all the days but I am wondering/asking which is the best practice (one index for the whole month or one index per day).

Regarding the sizing, I do not have that value at the moment.

We have 8GB of RAM on the server.

Thanks in advance.
J


(Jymit Singh Khondhu) #4

Typically, what queries are you making to your current indices? Are they they type of query that looks at a single days (cloud application) log data, or a couple of days?

Are you seeing decent request and response times currently?

How much diskspace can you allot to elasticsearch data, from the overall server diskspace?


(Christian Dahlqvist) #5

Explicitly deleting documents from an index, e.g. using delete-by-query, can be quite expensive, and it is often a lot cheaper to manage retention by using time-based indices and simply delete a whole index when all data in that index has exceeded the retention period. This can be automated using Curator.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.