Writing parents and children docs


(phill) #1

I am converting a table of one type of records into two ES doc types.
The Parent doc has a 1:n relationship to the children.
My Sequence given a sourceRecord is
String esDocId = findESParent(sourceRecord);
if ( esDocId == null) {
esDocId = insertESParent(sourceRecord);
}
insertChild(sourceRecord, esDocId);

When inserting do I have to do a refresh before this sequence to make
sure that the parent types are findable, so I don't create 2 of the same
parent?
If I knew I recently inserted the parent, could I insert the child
without worrying if a get/search would find it?

My feeling is that many 'families' of parent and children often (but not
always) appear together in my input stream, so If I recorded the last n
parent IDs I was working with in a ring buffer, then I could save many
refreshes when I already knew of an existing parent.

         String esDocId = findESParentInCache(sourceRecord);  // try 

the cache that remembers 'recently' seen parents.
if ( esDocId == null ) {
refresh..
esDocId = getESParentByID(sourceRecord); // hit ES
to find see if it is an existing parent.
if ( esDocId == null) {
esDocId = insertESParent(sourceRecord); //
create a new parent
}
addToCache(esDocId); // remember it in case it
comes soon
}
insertChild(sourceRecord, esDocId);

Comments?
-Paul

--


(Lukáš Vlček) #2

I have not tested it myself but I think you should be able to use real time
get for parents no matter the refresh and having parent id is all you need
to index child documents. It would be interesting to know if this strategy
works.

Lukas

On Thursday, September 27, 2012, P. Hill wrote:

I am converting a table of one type of records into two ES doc types.
The Parent doc has a 1:n relationship to the children.
My Sequence given a sourceRecord is
String esDocId = findESParent(sourceRecord);
if ( esDocId == null) {
esDocId = insertESParent(sourceRecord);
}
insertChild(sourceRecord, esDocId);

When inserting do I have to do a refresh before this sequence to make sure
that the parent types are findable, so I don't create 2 of the same parent?
If I knew I recently inserted the parent, could I insert the child without
worrying if a get/search would find it?

My feeling is that many 'families' of parent and children often (but not
always) appear together in my input stream, so If I recorded the last n
parent IDs I was working with in a ring buffer, then I could save many
refreshes when I already knew of an existing parent.

        String esDocId = findESParentInCache(**sourceRecord);  // try

the cache that remembers 'recently' seen parents.
if ( esDocId == null ) {
refresh..
esDocId = getESParentByID(sourceRecord); // hit ES to
find see if it is an existing parent.
if ( esDocId == null) {
esDocId = insertESParent(sourceRecord); // create a
new parent
}
addToCache(esDocId); // remember it in case it comes
soon
}
insertChild(sourceRecord, esDocId);

Comments?
-Paul

--

--


(phill) #3

On 9/27/2012 10:56 PM, Lukáš Vlček wrote:

I have not tested it myself but I think you should be able to use real
time get for parents no matter the refresh and having parent id is all
you need to index child documents. It would be interesting to know if
this strategy works.

Oh I see that the doc says "By default, the get API is realtime, and is
not affected by the refresh rate of the index (when data will become
visible for search)." So that matches your statement.
Get is different than search, thus I need not figure out when to refresh
as long as my findESParent(sourceRecord) can use a get to check the
index. Luckily the sourceRecord does contain my choice for _id.

Given that I already know the ID, I was thinking of just asynchronously
throwing both the parent (when needed) and the child at the index
without waiting for either and then cleaning up after myself only if an
insert (future) fails.

I wonder what a river does when inserting one or two documents and a
particular insert fails. Anyone have any comments about "rollBack" of a
set of inserts?

-Paul

On Thursday, September 27, 2012, P. Hill wrote:

I am converting a table of one type of records into two ES doc types.
The Parent doc has a 1:n relationship to the children.
My Sequence given a sourceRecord is
String esDocId = findESParent(sourceRecord);
if ( esDocId == null) {
esDocId = insertESParent(sourceRecord);
}
insertChild(sourceRecord, esDocId);

--


(system) #4