Just wanted to check whether this scenario fit properly the parent/child
mapping feature.
We currently index just meta-data of documents (dozens of fields), however
we want to index file contents too as that's sometimes useful for our
customers (our use case, the meta data is the primary mechanism). Since we
have hundreds of millions of document records, and 100Tb+ filesize, it's a
non-trivial exercise we've managed to put off for a while.
Since any reindex requires indexing both meta-data and file content, which
for us is kept separately in DB & fileserver respectively, I didn't want any
meta-data update to also require a seek of the filestore to get the text
content for indexing especially since the text-content of a file never
changes (for us). I was hoping to find a way to keep meta-data and text
content separate in the index and meta- updates update independently.
I was thinking of having a parent/child relationship between the
meta(parent) and the full text (child), allowing the parent to update
(frequent) leaving the child pretty much alone once text extracted and
indexed. Text extraction of newly uploaded files can be done async, and a
new child record added in ES independent of the registration of the document
meta data record.
Does this sound the right use case for parent/child in ES?
As I understand it, if we needed to reindex (say, new fields, or changed
values or something) then we'd also have to reindex the children, but we
could do these 2 reindex operations separately, mark the full text bit
'offline' until that's completed, allowing the meta-data to be searched much
earlier.
Shay, it would help too in the Docs if the 'parent/chi/d' bit referred to
frequently is easy to find in the docs, I'm presuming it's the 'nested'
mapping type.. ? I see reference to parent/child in the forums etc, and
took me a while to bump into it in the docs when I went looking. I could be
blind though!
thanks,
Paul