Use InputStream to write content when using attachment type

Hi,

The attachment type plugin is very handy but one question raised in my head
today. I want to store relatively large(~600MB) individual documents in ES.
I think the only way to write the content using the Java API is using a
byte array. If a byte array is used then all the content must be available
in the memory. This is a serious problem for large documents. Is there a
way to use an InputStream to write the content?

No, there isn't a way to do it.

On Mon, Mar 19, 2012 at 1:05 PM, canavar fehmican.saglam@gmail.com wrote:

Hi,

The attachment type plugin is very handy but one question raised in my
head today. I want to store relatively large(~600MB) individual documents
in ES. I think the only way to write the content using the Java API is
using a byte array. If a byte array is used then all the content must be
available in the memory. This is a serious problem for large documents. Is
there a way to use an InputStream to write the content?

May it be possible? Should I file an issue and try to implement it? Or do
you think it is needless or hard to implement?
20 Mar 2012 12:04 tarihinde "Shay Banon" kimchy@gmail.com yazdı:

No, there isn't a way to do it.

On Mon, Mar 19, 2012 at 1:05 PM, canavar fehmican.saglam@gmail.com
wrote:

Hi,
The attachment type plugin is very handy but one question raised in my
head today. I want to store relatively large(~600MB) individual documents
in ES. I think the only way to write the content using the Java API is
using a byte array. If a byte array is used then all the content must be
available in the memory. This is a serious problem for large documents. Is
there a way to use an InputStream to write the content?

Its actually quite complicated, since it spans multiple layers. The first
is being able to somehow stream the data into the shard to index the data
(and probably writing it to a temp file or something, since there is no way
to stream index data). Then, we need to index that temp file into
elasticsearch, and of course, of you use Tika, not sure if it supports
stream parsing.

On Tue, Mar 20, 2012 at 10:30 PM, canavar fehmican.saglam@gmail.com wrote:

May it be possible? Should I file an issue and try to implement it? Or do
you think it is needless or hard to implement?
20 Mar 2012 12:04 tarihinde "Shay Banon" kimchy@gmail.com yazdı:

No, there isn't a way to do it.

On Mon, Mar 19, 2012 at 1:05 PM, canavar fehmican.saglam@gmail.com
wrote:

Hi,
The attachment type plugin is very handy but one question raised in my
head today. I want to store relatively large(~600MB) individual documents
in ES. I think the only way to write the content using the Java API is
using a byte array. If a byte array is used then all the content must be
available in the memory. This is a serious problem for large documents. Is
there a way to use an InputStream to write the content?

Tika does support streams. ES attachment type should also support it. There are number of reasons but main is optimisation of memory usage. I would like to see this happens.