Integration with NoSQL

Hi Shay,

thanks for your response.
My thoughts below ...

Storing the actual source of course causes more overhead when indexing a
document (basically, when a lucene segment is written, more data needs to be
written to disk), and when segments are merged (again, more IO operations,
but very trivial ones). This is a very minimal cost compared to all the
other things that happens when indexing a document, so I am surprised of
your experience Sergio..., I think there might have been something else in
play here...

There might have been something else, I'm no way an expert in Lucene
internals: I'm just reporting my experiences.

On the other hand, not storing the source document means several things.
First, you need to fetch the relevant data from your "other storage",
possibly a nosql. Now, lets compare the cost of this: When executing a
search, the "fetching" phase is already executing on the relevant node, so
all that is added is accessing the shard storage (lets say fs) and fetching
it.

Agree, reading everything from ES will be certainly faster: but I was
talking about write performance degradation.

Thats why, by default, elasticsearch does store the _source, and I think
that its a good "out of the box" solution,

I'm not saying it is not a good "out of the box" solution: it actually
is indeed, so I agree with you there :slight_smile:

and this is what I would
recommend on using most (if not almost always) of the time.

Here's where I disagree, but it's just my opinion.

Cheers,

Sergio B.

--
Sergio Bossa
http://www.linkedin.com/in/sergiob