I noticed this statement too in the attachment mapper plugin, because I was
interested in handling binary content with ES.
How do you want the binary content to be exposed? At shard level? Or via
node transport?
The REST API uses JSON (XContentBuilder) which is the reason for base64.
With websockets, I can imagine direct exposition of binary streams to a
Java client, but it's still a lot to do when transporting huge data from
the shard level to the requesting node without blowing up the heap (e.g.
chunked streams).
Jörg
On Friday, November 2, 2012 1:51:00 PM UTC+1, David Pilato wrote:
I tried also to play with it some time ago but did not succeed with
mimetype autodetection.
I posted here something about it without answer:
Redirecting to Google Groups
So if someone answers to Alexander, it would be nice to have a look also
at my old post. :-/
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 2 nov. 2012 à 13:14, Alexander Reelsen <a...@spinscale.de <javascript:>>
a écrit :
Hi there,
I just played around with the attachment mapper plugin and wondered if
I can access the parsedContent (as in AttachmentMapper.java:309),
which contains the tika-parsed content of the document, in any way.
When doing a simple GET on the document I only see the base64 encoded
value which I pushed.
I'd like to do some special text extraction in my documents (like
searching for dates in them) after indexing. Alternatively I could
call the tika code a second time in my own application, seems a bit
dirty though.
Any hints appreciated.. possibly I just overlooked something when
skimming through the source and it is totally easy
--Alexander
--
--