Parent-Child and elastic 6


(Jamie Hodge) #1

Considering https://github.com/elastic/elasticsearch/pull/24317 and https://github.com/elastic/elasticsearch/issues/20257, is parent-child possible/encouraged going forward? Is a parent-child relation now possible across indices? Please advise.

Jamie


(Mark Walkom) #2

As long as you understand the tradeoffs, there's no change to it being used.

Nope, it never was.


(Yehosef) #3

I just posted a comment to the github issue trying to clarify this. Maybe this is a better way.

Currently questions and answers would be types in an index. If there are no types, wouldn't questions and answers be in a different index? Or if there are in the same index, aren't they types.. I'm confused.

TIA!


(Mark Walkom) #4

Check out https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch


(Yehosef) #5

Thanks @warkolm - I had read it before.. and read it a few more times and I think I'm starting to get it. What the change will look like is very subtle.. you basically have to read and understand the script at the bottom. The rest of the article is mostly just explaining why you're getting rid of types, (which while I hear.. I'm not sure I totally agree with - I can go into that a different time.)

A few issues/concerns/thoughts:

I hope the field will not really be called "type" as that's something I often use in my documents. If you want it to be something different, perhaps call it "_join" or something that is less likely to conflict with the document. Eg. I worked on something where we had a "type" of "post" but in reality there were "sub-types" like "project", "update", "blog_post" etc. Each of these were more or less the same thing, just used different for filtering. In one index we had "_type" which represented very different items (posts, comment, photos, boards) and we used a "type" field to distinguish from there.

The implementation that's being suggesting in #20257 seems a bit confusing to me as I expressed there (and more so, if I understand what you're suggesting.) Eg. They don't talk there about having a "type" field - but indexing a "question" or "answer" field in the doc (see https://github.com/elastic/elasticsearch/issues/20257#issuecomment-244024191 - there are several examples.. some using a query-string which is even more confusing to me.)

One of the problem I see in that thread is that the indexing of the parent need to include a reference that there are children - which I think is fundamentally wrong (seems to me..).

Just to note, this approach will not help in the sparsity or score problems mentioned in the blog post. It helps the problem of user expectations by making the "type==table" problem go away - but it just seems to make the use of the parent/child documents (which is a great feature when used appropriately) more confusing, IMO.

I don't really know what's involved in the change technically, but it doesn't seem small. For people properly using types and parent-child documents, I see no benefit, only drawbacks/complications.

Let's revisit the blog post. There are three reasons to get rid of types (which are currently a fundamental part of parent-child relationships):

  1. misconceptions, miseducation and bad practices when people think of types like tables
  2. sparsity - which will be less of an issue for Lucene 7, by the time this is required
  3. doc scoring - I'm curious to the extent that this causes real problems and if the switch to BM25 changes it at all. If you have any articles/bugs, etc talking about this problem, I'd be interested to hear.

So it seems like the biggest problem is the first one - people misusing it. Perhaps this can be dealt with by removing it from the normal discussions of types/mappings, etc. and put the explanations of how to use types hidden away in the discussion of parent/child. You could even change the name - we don't have "_types" anymore. Now there will be "_join_types" and if you want to use them you need to specify them at the index and query time. But otherwise they would work the same way as types do now, but would be optional.

I'll add a comment to the github issue and see if there what they think.
UPDATE: https://github.com/elastic/elasticsearch/issues/20257#issuecomment-303715022


(Mark Walkom) #6

Really appreciate the feedback there @yehosef


(Yehosef) #7

Hi - I've been having some conversation in the thread there in the github issue. I'll just summarize my feelings here since I think it's more appropriate for the forum than the github comments. Please feel free to pass it up.

I have a hard time supporting the changes that are being planned. Removing types is a very deep change as you can see from all the discussion on github (you can see from https://github.com/elastic/elasticsearch/issues/15613#issuecomment-303727982 that the repercussion of this change have to be delt with in v6, 7, 8 and 9) But, that change has been underway for some time and a lot of work has been done, so there is a lot of inertia pushing the movement, for better or worse.

Unless there is some secret plan that this is going give elasticsearch some super-powers, the entire process seems to just try to avoid some annoyances (misconceptions about types, mapping restrictions on types, etc.) It seems to me that Elastic could accomplish much/all of the same goals by just creating a facade over the api types prevent creating multiple types, but in the one place where it actually does something useful, not changing the underlying code.

My main concern is just that this will delay other more important development or releases and that it makes the parent/child more awkward to work with (ie, the join fields are a proper part of the document, which has complications which have to be dealt with - see the comments in the GH issue.) It's hard to have such a deep change and for it not to have deep repercussions. Hopefully it's all being tested very well so we won't really feel it.

I guess we'll see how it plays out and what the public reaction is when 6 comes out.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.