Elasticsearch 6.0 _id and size_in_bytes

I upgraded to 6.0 because of sequental ids. I expected that after switching to sequental _id it will saves memory.
In my case I have per day index with lots of events with small amount of fields (like ip_src, ip_dst, port_dst). I noticed that _id cosumes lots of memory. Is it possible to optimese _id field or somehow disable it ? If id is sequental I expect that it some sort of memoty offset could be calculated per search request and it is not requred to store it in memory

This is part of statistics:
{
"description" : "field '_id' [BlockTreeTerms(seg=_32b terms=531218720,postings=531218720,positions=-1,docs=531218720)]",
"size_in_bytes" : 78714997,
"children" : [
{
"description" : "term index [FST(input=BYTE1,output=ByteSequenceOutputs]",
"size_in_bytes" : 78714837
}
]
},
{
"description" : "field 'ip_dst' [BlockTreeTerms(seg=_32b terms=255005,postings=519813549,positions=-1,docs=519813549)]",
"size_in_bytes" : 68845,
"children" : [
{
"description" : "term index [FST(input=BYTE1,output=ByteSequenceOutputs]",
"size_in_bytes" : 68685
}
]
},

In Elasticsearch 6.0 all operations get a sequence id, which can help speed up recovery. The logic for generating document ids is not affected by this (they are not sequential).

How do you think if _id were sequental were memory consumption lower? What is a reason why _id are not numeric?

Trying to assign sequential numeric ids automatically, does generally not scale or perform in a distributed, highly concurrent system. If you want to, you can however assign your own id at the application layer.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.