Weird storage change

I'm trying to index a log file of size 1.4GB. The compressed version is 0.3GB.

The log lines have 77 fields, separated by whitespaces. I'm only indexing 5 of them, without storing them.

If I do not store the original log lines, the index data will be 0.3GB.

If I store the original log lines, the index data will be 2.1GB.

Why the difference (1.8GB) is so huge and even larger than the uncompressed log file (1.4GB)?

Has anyone encountered similar problems before? It looks to me that ES is not storing things efficiently and storage can be a huge issue.

Check your mapping. If there are that many spaces in that field, and it's mapped as text, it's being analyzed and tokenized for plain-text search. That could account for some of it.

Actually I have set the raw log line field to be a "keyword". I also set { "index": False }. Storage is still huge.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.