I'm considering hacking on an extension to support binary attachments.
Essentially
put /index/type/id/attachment-id
get /index/type/id/attachment-id
delete /index/type/id/attachment-id
Each attachment would be stored as replicated files on the nodes.
(this is for big attachments, e.g. video, small attachments can just
be binhexed). Attachments would have some arbitrary size limit, e.g. 5
GB, with the expectation that the client would string pieces together
to make longer files as necessary. Attachments would be separately
hashed - they would not necessarily share the same node as the
document they are attached to.
Initially the client would be responsible for managing uploading,
downloading, and deletion, but there may be conventions discovered
that would eventually be easier to implement on the server.
for example
{
"_attachments": {
"file1": [ "attachment-id1", offset, "attachment-id2",
offset]
}
}
might eventually make the ES server do something different with these
blobs, but initially this would all be up to client convention.
I could use a completely different piece of software (e.g. S3, if I'm
already on AWS, openstack swift, etc.) to handle blobs. This just
feels unnecessarily. ES already has already does 95% of what's
necessary to manage simple blobs in a way few blob managers can match.
Many blob managers are unreasonably slow because they have
considerable management overhead tracking the blobs, where I would be
finding them using ES anyway. For example openstack object store has a
serious bottleneck of 70-100 puts/second/bucket because it has to open/
commit/close an sqlite database for each blob put.
Who thinks this is an inevitable piece of what ES should do, and who
thinks this is useless complication given it can be clearly done with
other software?