Use InputStream to write content when using attachment type

canavar · March 19, 2012, 11:05am

Hi,

The attachment type plugin is very handy but one question raised in my head
today. I want to store relatively large(~600MB) individual documents in ES.
I think the only way to write the content using the Java API is using a
byte array. If a byte array is used then all the content must be available
in the memory. This is a serious problem for large documents. Is there a
way to use an InputStream to write the content?

kimchy · March 20, 2012, 10:03am

No, there isn't a way to do it.

On Mon, Mar 19, 2012 at 1:05 PM, canavar fehmican.saglam@gmail.com wrote:

Hi,

The attachment type plugin is very handy but one question raised in my
head today. I want to store relatively large(~600MB) individual documents
in ES. I think the only way to write the content using the Java API is
using a byte array. If a byte array is used then all the content must be
available in the memory. This is a serious problem for large documents. Is
there a way to use an InputStream to write the content?

canavar · March 20, 2012, 8:30pm

May it be possible? Should I file an issue and try to implement it? Or do
you think it is needless or hard to implement?
20 Mar 2012 12:04 tarihinde "Shay Banon" kimchy@gmail.com yazdı:

No, there isn't a way to do it.

On Mon, Mar 19, 2012 at 1:05 PM, canavar fehmican.saglam@gmail.com
wrote:

Hi,
The attachment type plugin is very handy but one question raised in my
head today. I want to store relatively large(~600MB) individual documents
in ES. I think the only way to write the content using the Java API is
using a byte array. If a byte array is used then all the content must be
available in the memory. This is a serious problem for large documents. Is
there a way to use an InputStream to write the content?

kimchy · March 25, 2012, 10:05am

Its actually quite complicated, since it spans multiple layers. The first
is being able to somehow stream the data into the shard to index the data
(and probably writing it to a temp file or something, since there is no way
to stream index data). Then, we need to index that temp file into
elasticsearch, and of course, of you use Tika, not sure if it supports
stream parsing.

On Tue, Mar 20, 2012 at 10:30 PM, canavar fehmican.saglam@gmail.com wrote:

May it be possible? Should I file an issue and try to implement it? Or do
you think it is needless or hard to implement?
20 Mar 2012 12:04 tarihinde "Shay Banon" kimchy@gmail.com yazdı:

No, there isn't a way to do it.

On Mon, Mar 19, 2012 at 1:05 PM, canavar fehmican.saglam@gmail.com
wrote:

Hi,
The attachment type plugin is very handy but one question raised in my
head today. I want to store relatively large(~600MB) individual documents
in ES. I think the only way to write the content using the Java API is
using a byte array. If a byte array is used then all the content must be
available in the memory. This is a serious problem for large documents. Is
there a way to use an InputStream to write the content?

sasa · August 15, 2013, 9:10am

Tika does support streams. ES attachment type should also support it. There are number of reasons but main is optimisation of memory usage. I would like to see this happens.

Topic		Replies	Views
Attachment streaming Elasticsearch	4	489	July 6, 2017
Attachments Plugin: Who uses it? Elasticsearch	3	322	July 6, 2017
Attachments streaming? Elasticsearch	2	256	July 6, 2017
Is it possible to get attachments as binary stream? Elasticsearch	4	366	July 6, 2017
Having trouble storing attachments. What am I doing wrong? Elasticsearch	2	303	July 6, 2017

Use InputStream to write content when using attachment type

Related topics