Size of index is increasing abnormally?

Hi,

The size of indexes in Elasticsearch is Abnormally increasing. What can be the issue? As u can see in the uploaded snapshot it increased from 1 Gb to 25 GB.

Nuber of events came through as shown in kibana UI.
28,404,187 Today
794,774 yesterday
831,019 Day before yesterday

Do you have any indication of the data source's sizes for these days?
Is there anything to suggest the elasticsearch index sizes are not commensurate with the upstream data sources for these days?

Cheers
Mark

I have some numbers for sizes of log files for around last 6-8 days.
Size of all the log files that i am parsing to logstash and storing into elasticsearch add up to 600 MB approx.

And individually do these log files' sizes tally with what you see in index sizes? Is there a constant ratio in the size comparisons?

Logs are coming from some 15 different servers, thus the log files are distributed. I don't know how to compare the individual file size with respect to the corresponding indexes.

Anyway, even if we take average 600 MB for 6 Days. Around 100 MB per day. That is nowhere near 20 GB (index size). Also, if we compare the index size from 2 days back its only 1 Gb. Since, this environment had no update/change since Monday, thr is no way logging can increase to such an extent.

The number of documents in the indices has increased dramatically as well, but this increase appear roughly proportional to the increase in index size. If you are logging which host the data is coming from, run an aggregation, e.g. using Kibana, comparing the distribution of the new data to the old to see if there is anything that stands out.

Hey,

I am attaching few snapshots please go through them for better understanding.

As you can see from the snapshots the total events that came through on 22nd were more than the total events that came through on 21st and still the size of index on 21st is greater than size of index on 22nd (35GB vs 3.13 GB). Even if we consider the events on both days to be approximately same the size of indexes is no way comparable.

Snapshot from 21st 00.00 - 22nd 00.00

Snapshot from 22st 00.00 - 23nd 00.00

Kopf index size

Thanks

The barchart axes aren't labelled so it's hard to tell what the bars represent (I'm assuming Y axis is number of docs?) Also I assume the number of bars shown is only the first 10. It is quite possible there is a long tail of millions of other smaller bars that together add up to a big number. I don't know what field is represented on the X-axis - a multi-valued field would mean each doc may be accounted for by more than one bar.

Another point of confusion is how the numbers of docs on the bar chart (at least ~90m docs?) vs those reported by Kopf (4m?).

Sorry, on y-axis it number of events and on x axis its the hosts from which the logs are coming from.

To give you a more clear picture here is the sum of all events that came through including all the hosts from which logs are coming from (26 different hosts/servers).
21St -
22nd -

To yr question "Another point of confusion is how the numbers of docs on the bar chart (at least ~90m docs?) vs those reported by Kopf (4m?)."
Its not the count of docs its total events (sum of all log events on particular date ranges).

Is an event the same thing as a doc? Need to understand relationship between these two.

Something is strange here. In Kopf it looks like your logstash-2016-04-21 index has 20925958 documents taking up 33.54GB on disk. In your Kibana bar charts you are however reporting over 30 million hits for the same period. This does not add up. It is even more off for the logstash-2016.04.22 index, which according to Kopf only has a bit over 4 million records, yet shows more then 30 million hits in Kibana. Have you selected the time period covered in Kibana correctly? How did you create the Kibana visualisations?

Hey Guys,

Thanks for the support. I really appreciate the help.

I guess I have found the problem. The root cause for this problem is not elasticsearch, it is some new logging that is introduced in our software (I was not informed about this change) and is creating logs abnormally, thus the size of index is increasing abnormally.

Thanks for putting an end to the mystery :slight_smile: