While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
Its not on search side.
Mine is a multi threaded application and it is trying to index tonns of a
file at a time to ES.
So am trying to access main memory as les as possible.
While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the
content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
On Wednesday, February 15, 2012 at 10:33 AM, Vineeth Mohan wrote:
Its not on search side.
Mine is a multi threaded application and it is trying to index tonns of a file at a time to ES.
So am trying to access main memory as les as possible.
While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
Can you give clues or hints on how to implement this over the existing API
??
I was hoping to accomplish this task by overriding couple of methods.
Any pointers in this direction would be appreciated.
Thanks
Vineeth
On Wed, Feb 15, 2012 at 4:09 PM, Shay Banon kimchy@gmail.com wrote:
There is no support for streaming.
On Wednesday, February 15, 2012 at 10:33 AM, Vineeth Mohan wrote:
Its not on search side.
Mine is a multi threaded application and it is trying to index tonns of a
file at a time to ES.
So am trying to access main memory as les as possible.
While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the
content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
On Wednesday, February 15, 2012 at 10:33 AM, Vineeth Mohan wrote:
Its not on search side.
Mine is a multi threaded application and it is trying to index tonns of a file at a time to ES.
So am trying to access main memory as les as possible.
While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
Has anything changed since? Maybe some third-party plugins exist, that
allow to do large files streaming to Elasticsearch?
On Friday, February 17, 2012 8:48:46 PM UTC+3, kimchy wrote:
Not really, there is no support for streaming single doc large data.
On Friday, February 17, 2012 at 2:03 PM, Vineeth Mohan wrote:
Hello Shay ,
Can you give clues or hints on how to implement this over the existing API
??
I was hoping to accomplish this task by overriding couple of methods.
Any pointers in this direction would be appreciated.
Thanks
Vineeth
On Wed, Feb 15, 2012 at 4:09 PM, Shay Banon <kim...@gmail.com<javascript:>
wrote:
There is no support for streaming.
On Wednesday, February 15, 2012 at 10:33 AM, Vineeth Mohan wrote:
Its not on search side.
Mine is a multi threaded application and it is trying to index tonns of a
file at a time to ES.
So am trying to access main memory as les as possible.
Thanks
Vineeth
On Wed, Feb 15, 2012 at 2:25 AM, ppearcy <ppe...@gmail.com <javascript:>>wrote:
I don't believe this is possible. Would need some soft of JSON
streaming support.
How big are you talking about? If you're not storing as its own stored
field or part of the _source JSON, probably should be OK search side.
While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the
content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
There are several different aspects when it comes to streaming files to
Elasticsearch.
The first is, input format is JSON, and the indexer should create
large JSON docs, but for tiny display later (highlighting).
The second is, you want binary files store inside Lucene unindexed
(for whatever reason). The main challenge is how to handle this with
JSON (it requires something like base64 encoding in combination with
compression)
And the third is, you want Elasticsearch do smart Lucene index codec
processing to create documents from streams "on the fly"
-> 1. I think there is no advantage in streaming for this case, since
Lucene needs somewhere the whole document in memory for the inverted
index statistics computation. If your input documents are large, just
add enough heap memory to get them processed. I'm not sure but most
search engines out there including Google have a hard limit how much of
a document is analyzed for highlighting or term indexing (the first 10k
characters maybe?). The reason is better performance. So maybe there is
little sense to enforce these features on Elasticsearch with large docs.
-> 2. you should think about it twice and maybe it is better to store
these files outside Elasticsearch. Retrieving large stored docs may hit
your performance, also relocating shards will be slow.
-> 3. you can implement a custom codec for Lucene 4 that may handle
your content streams gracefully. Such a codec is unfortunately
domain-specific, since it depends on the nature of the stream elements.
For example, such a codec could do complex event stream processing (CEP)
like in Esper http://esper.codehaus.org/
My 2c.
Jörg
Am 06.05.13 21:34, schrieb Yermakovich Siarhei:
Has anything changed since? Maybe some third-party plugins exist, that
allow to do large files streaming to Elasticsearch?
I am suffering from the same issue here(I am using .net REST API, NEST)
Have you found solution of indexing large files? Thank you
Best Regards
Hao
On Tuesday, February 14, 2012 at 3:15:41 PM UTC, Vineeth Mohan wrote:
Hi ,
While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
I believe elasticsearch loads the whole indexed document into ram before
indexing. It certainly loads the whole document in ram for things like
source filtering. Lucene doesn't require this, but elasticsearch does it
because for the typical use case its fine.
On Mar 27, 2015 2:59 PM, "Hao" hao.qian.career@gmail.com wrote:
Hi Vineeth,
I am suffering from the same issue here(I am using .net REST API, NEST)
Have you found solution of indexing large files? Thank you
Best Regards
Hao
On Tuesday, February 14, 2012 at 3:15:41 PM UTC, Vineeth Mohan wrote:
Hi ,
While indexing i have a field whose value is quite large.
This value is stored in a file as text.
I prefer not to load the entire file into main memory to read the content.
Instead i would prefer this file is streamed directly to ES without
affecting much of RAM.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.