Field data types size

In documentation I found lot of fields datatypes supported by Elasticsearch but theres not information about how much storage each type consumes.

For example which datatype (keyword or ip) will be better for store IP address in index in case when we don't search it by CIDR?
Difference is matters when we would like store hundred of thousand documents.

Where can I find how is each datatype size on storage?

That's because it depends on a few things, like sparsity, size of the actual data.

Test it on your own data is the best option.

So if I choose integer as datatype and in one document I set this field as 12 and in other document set the same field as 1234 the sizes will be different? Interesting..

I know documents in indices are compressed so sparsity can affect to size on disk but I don't want to count storage requirements. I would like to know which datatype select for certain fields and don't waste storage if I really don't need some feature (like search by CIDR).

I agree this is best option if You do this in your home not for customer where this data will be stored for months or even years.

Dunno how you got that from my comment, it's not what I am saying.

An integer in Elasticsearch is fixed at 32 bits max. But a keyword can be any size. And if you have a document with a 4000 word keyword field then it's going to make an impact.

As example I ask about difference between ip and keyword datatypes and you say it depend on size of actual data so I deduce that.

Let's start again. If I choose ip as datatype and I will only keep in this field IPv4 address it's better choice than keyword datatype, am I right?

It is not quite that simple. When you index a keyword, the string is not stored once for every occurrence. The size on disk and in memory therefore depend on the cardinality of the field as well as the number of segments in the shard, as a dictionary holding keywords mapped to shorter (global ordinals) identifiers are kept per segment.

The easiest way to find out is therefore to test with as realistic data as you can find.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.