Attachment streaming


(Kosta) #1

I was wondering if there is any way to obtain an InputStream when
running a get query using the Java API? The reason I am asking is
because I feel that loading documents with large attachments into
memory before forwarding attachments on to a 3rd party client is
somewhat inefficient. (i.e. downloading via HTTP)

What I would like to achieve ideally is obtain a stream to a document
attachment, pass that stream to the web server so that it forwards it
to the client, and base64 decode it on the fly. Intuition tells me
that this should be much more efficient than fetching the entire
document from ES, base64 decoding it, and then passing that in-memory
byte array to the web server as a stream.

Please correct me if I am missing something obvious. Thanks in advance!


(Shay Banon) #2

Thats how it works currently..., no plans to change that as its quite a
hefty change (think about streaming across several nodes in the cluster),
and will require a different store for attachments.

On Sun, Sep 4, 2011 at 9:59 PM, Kosta kosta.krauth@gmail.com wrote:

I was wondering if there is any way to obtain an InputStream when
running a get query using the Java API? The reason I am asking is
because I feel that loading documents with large attachments into
memory before forwarding attachments on to a 3rd party client is
somewhat inefficient. (i.e. downloading via HTTP)

What I would like to achieve ideally is obtain a stream to a document
attachment, pass that stream to the web server so that it forwards it
to the client, and base64 decode it on the fly. Intuition tells me
that this should be much more efficient than fetching the entire
document from ES, base64 decoding it, and then passing that in-memory
byte array to the web server as a stream.

Please correct me if I am missing something obvious. Thanks in advance!


(Kosta) #3

Thanks Shay! Just to clarify, when you say "that's how it works
currently", which part do you mean? Automatic base64 decoding as the
attachment is being streamed to the client?

On Sep 5, 10:12 am, Shay Banon kim...@gmail.com wrote:

Thats how it works currently..., no plans to change that as its quite a
hefty change (think about streaming across several nodes in the cluster),
and will require a different store for attachments.

On Sun, Sep 4, 2011 at 9:59 PM, Kosta kosta.kra...@gmail.com wrote:

I was wondering if there is any way to obtain an InputStream when
running a get query using the Java API? The reason I am asking is
because I feel that loading documents with large attachments into
memory before forwarding attachments on to a 3rd party client is
somewhat inefficient. (i.e. downloading via HTTP)

What I would like to achieve ideally is obtain a stream to a document
attachment, pass that stream to the web server so that it forwards it
to the client, and base64 decode it on the fly. Intuition tells me
that this should be much more efficient than fetching the entire
document from ES, base64 decoding it, and then passing that in-memory
byte array to the web server as a stream.

Please correct me if I am missing something obvious. Thanks in advance!


(Shay Banon) #4

Yes, the full attachment is loaded to memory (_source), it does not get
another round of base 64 if you ask for _source, since its stored as is
(already in base64).

On Mon, Sep 5, 2011 at 11:38 PM, Kosta kosta.krauth@gmail.com wrote:

Thanks Shay! Just to clarify, when you say "that's how it works
currently", which part do you mean? Automatic base64 decoding as the
attachment is being streamed to the client?

On Sep 5, 10:12 am, Shay Banon kim...@gmail.com wrote:

Thats how it works currently..., no plans to change that as its quite a
hefty change (think about streaming across several nodes in the cluster),
and will require a different store for attachments.

On Sun, Sep 4, 2011 at 9:59 PM, Kosta kosta.kra...@gmail.com wrote:

I was wondering if there is any way to obtain an InputStream when
running a get query using the Java API? The reason I am asking is
because I feel that loading documents with large attachments into
memory before forwarding attachments on to a 3rd party client is
somewhat inefficient. (i.e. downloading via HTTP)

What I would like to achieve ideally is obtain a stream to a document
attachment, pass that stream to the web server so that it forwards it
to the client, and base64 decode it on the fly. Intuition tells me
that this should be much more efficient than fetching the entire
document from ES, base64 decoding it, and then passing that in-memory
byte array to the web server as a stream.

Please correct me if I am missing something obvious. Thanks in advance!


(system) #5