Tracking input document raw size in bytes

otisg · January 21, 2013, 3:04am

Hi,

What are my options if I want to keep track of the size (in bytes) of the
original input documents? That is, the sum of sizes, in bytes, of all
their fields?

I see http://www.elasticsearch.org/guide/reference/mapping/size-field.html
, but that _size includes JSON as well.

Thanks,
Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

--

drewr · January 21, 2013, 6:21pm

Otis Gospodnetic wrote:

What are my options if I want to keep track of the size (in bytes) of
the original input documents? That is, the sum of sizes, in bytes, of
all their fields?

The best one IMO is to index a field along with your doc that has the
bytes counted in whatever way interests you.

With this solution you can add as many different sizes as you need.

-Drew

--

jprante · January 21, 2013, 6:54pm

In org.elasticsearch.action.bulk.BulkRequest, there is a method
estimatedSizeInBytes()

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java

Because you know how many bulk items are there, it is easy in the Java API
to track the average size of indexed docs when using bulk indexing.

Best regards,

Jörg

--

otisg · January 22, 2013, 6:54pm

Hi,

On Monday, January 21, 2013 1:21:34 PM UTC-5, Drew Raines wrote:

Otis Gospodnetic wrote:

What are my options if I want to keep track of the size (in bytes) of
the original input documents? That is, the sum of sizes, in bytes, of
all their fields?

The best one IMO is to index a field along with your doc that has the
bytes counted in whatever way interests you.

The client doesn't have this information.... or at least I can't rely on it
to existing and be accurate.
So I need to figure out the size on the server side somewhere.
But I would like to avoid unwrapping JSON just to sum up all document's
fields' values.

That estimateSizeInBytes in
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java that
Jörg pointed out sounds promising.

Thanks,
Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

With this solution you can add as many different sizes as you need.

-Drew

--

Topic		Replies	Views
Keeping track of document sizes Elasticsearch	2	626	July 5, 2017
Documents size vs. indexing size? Elasticsearch	2	1226	June 30, 2022
How to find the size of a field/record in elasticsearch index? Elasticsearch	6	5497	July 5, 2017
Script that will return/estimate the size a document (including indexes)? Elasticsearch	2	496	July 5, 2017
Size of document Elasticsearch	4	5509	July 5, 2017

Tracking input document raw size in bytes

Thanks, Otis

Thanks, Otis

Related topics

Thanks,
Otis

Thanks,
Otis