Tracking input document raw size in bytes

Hi,

What are my options if I want to keep track of the size (in bytes) of the
original input documents? That is, the sum of sizes, in bytes, of all
their fields?

I see http://www.elasticsearch.org/guide/reference/mapping/size-field.html
, but that _size includes JSON as well.

Thanks,
Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

--

Otis Gospodnetic wrote:

What are my options if I want to keep track of the size (in bytes) of
the original input documents? That is, the sum of sizes, in bytes, of
all their fields?

The best one IMO is to index a field along with your doc that has the
bytes counted in whatever way interests you.

With this solution you can add as many different sizes as you need.

-Drew

--

In org.elasticsearch.action.bulk.BulkRequest, there is a method
estimatedSizeInBytes()

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java

Because you know how many bulk items are there, it is easy in the Java API
to track the average size of indexed docs when using bulk indexing.

Best regards,

Jörg

--

Hi,

On Monday, January 21, 2013 1:21:34 PM UTC-5, Drew Raines wrote:

Otis Gospodnetic wrote:

What are my options if I want to keep track of the size (in bytes) of
the original input documents? That is, the sum of sizes, in bytes, of
all their fields?

The best one IMO is to index a field along with your doc that has the
bytes counted in whatever way interests you.

The client doesn't have this information.... or at least I can't rely on it
to existing and be accurate.
So I need to figure out the size on the server side somewhere.
But I would like to avoid unwrapping JSON just to sum up all document's
fields' values.

That estimateSizeInBytes in
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java that
Jörg pointed out sounds promising.

Thanks,
Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

With this solution you can add as many different sizes as you need.

-Drew

--