Size of index is increasing abnormally?

shivam_singh · April 21, 2016, 10:43am

Hi,

The size of indexes in Elasticsearch is Abnormally increasing. What can be the issue? As u can see in the uploaded snapshot it increased from 1 Gb to 25 GB.

Nuber of events came through as shown in kibana UI.
28,404,187 Today
794,774 yesterday
831,019 Day before yesterday

Mark_Harwood · April 21, 2016, 11:30am

Do you have any indication of the data source's sizes for these days?
Is there anything to suggest the elasticsearch index sizes are not commensurate with the upstream data sources for these days?

Cheers
Mark

shivam_singh · April 21, 2016, 12:03pm

I have some numbers for sizes of log files for around last 6-8 days.
Size of all the log files that i am parsing to logstash and storing into elasticsearch add up to 600 MB approx.

Mark_Harwood · April 21, 2016, 12:05pm

And individually do these log files' sizes tally with what you see in index sizes? Is there a constant ratio in the size comparisons?

shivam_singh · April 21, 2016, 12:20pm

Logs are coming from some 15 different servers, thus the log files are distributed. I don't know how to compare the individual file size with respect to the corresponding indexes.

Anyway, even if we take average 600 MB for 6 Days. Around 100 MB per day. That is nowhere near 20 GB (index size). Also, if we compare the index size from 2 days back its only 1 Gb. Since, this environment had no update/change since Monday, thr is no way logging can increase to such an extent.

Christian_Dahlqvist · April 21, 2016, 12:31pm

The number of documents in the indices has increased dramatically as well, but this increase appear roughly proportional to the increase in index size. If you are logging which host the data is coming from, run an aggregation, e.g. using Kibana, comparing the distribution of the new data to the old to see if there is anything that stands out.

shivam_singh · April 25, 2016, 8:30am

Hey,

I am attaching few snapshots please go through them for better understanding.

As you can see from the snapshots the total events that came through on 22nd were more than the total events that came through on 21st and still the size of index on 21st is greater than size of index on 22nd (35GB vs 3.13 GB). Even if we consider the events on both days to be approximately same the size of indexes is no way comparable.

Snapshot from 21st 00.00 - 22nd 00.00

Snapshot from 22st 00.00 - 23nd 00.00

Kopf index size

Thanks

Mark_Harwood · April 25, 2016, 8:58am

The barchart axes aren't labelled so it's hard to tell what the bars represent (I'm assuming Y axis is number of docs?) Also I assume the number of bars shown is only the first 10. It is quite possible there is a long tail of millions of other smaller bars that together add up to a big number. I don't know what field is represented on the X-axis - a multi-valued field would mean each doc may be accounted for by more than one bar.

Another point of confusion is how the numbers of docs on the bar chart (at least ~90m docs?) vs those reported by Kopf (4m?).

shivam_singh · April 25, 2016, 10:08am

Sorry, on y-axis it number of events and on x axis its the hosts from which the logs are coming from.

To give you a more clear picture here is the sum of all events that came through including all the hosts from which logs are coming from (26 different hosts/servers).
21St -
22nd -

To yr question "Another point of confusion is how the numbers of docs on the bar chart (at least ~90m docs?) vs those reported by Kopf (4m?)."
Its not the count of docs its total events (sum of all log events on particular date ranges).

Mark_Harwood · April 26, 2016, 7:25am

Is an event the same thing as a doc? Need to understand relationship between these two.

Christian_Dahlqvist · April 27, 2016, 6:59am

Something is strange here. In Kopf it looks like your logstash-2016-04-21 index has 20925958 documents taking up 33.54GB on disk. In your Kibana bar charts you are however reporting over 30 million hits for the same period. This does not add up. It is even more off for the logstash-2016.04.22 index, which according to Kopf only has a bit over 4 million records, yet shows more then 30 million hits in Kibana. Have you selected the time period covered in Kibana correctly? How did you create the Kibana visualisations?

shivam_singh · April 28, 2016, 8:22am

Hey Guys,

Thanks for the support. I really appreciate the help.

I guess I have found the problem. The root cause for this problem is not elasticsearch, it is some new logging that is introduced in our software (I was not informed about this change) and is creating logs abnormally, thus the size of index is increasing abnormally.

Mark_Harwood · April 28, 2016, 1:46pm

Thanks for putting an end to the mystery

Topic		Replies	Views
Elasticsearch indices suddenly much larger Elasticsearch	5	731	June 26, 2018
Index size with elasticseach 8 increasing? Elasticsearch	5	530	April 19, 2022
Fluctuating Index Sizes Elasticsearch	4	810	January 17, 2017
Elasticsearch indices is filling up the storage Elasticsearch	1	373	November 14, 2018
Kibana logstash Snapshot Storage Size continuously growing its size Logstash ilm-index-lifecycle-management	4	448	April 5, 2022

Size of index is increasing abnormally?

Related topics