Thanks @warkolm - I had read it before.. and read it a few more times and I think I'm starting to get it. What the change will look like is very subtle.. you basically have to read and understand the script at the bottom. The rest of the article is mostly just explaining why you're getting rid of types, (which while I hear.. I'm not sure I totally agree with - I can go into that a different time.)
A few issues/concerns/thoughts:
I hope the field will not really be called "type" as that's something I often use in my documents. If you want it to be something different, perhaps call it "_join" or something that is less likely to conflict with the document. Eg. I worked on something where we had a "type" of "post" but in reality there were "sub-types" like "project", "update", "blog_post" etc. Each of these were more or less the same thing, just used different for filtering. In one index we had "_type" which represented very different items (posts, comment, photos, boards) and we used a "type" field to distinguish from there.
The implementation that's being suggesting in #20257 seems a bit confusing to me as I expressed there (and more so, if I understand what you're suggesting.) Eg. They don't talk there about having a "type" field - but indexing a "question" or "answer" field in the doc (see https://github.com/elastic/elasticsearch/issues/20257#issuecomment-244024191 - there are several examples.. some using a query-string which is even more confusing to me.)
One of the problem I see in that thread is that the indexing of the parent need to include a reference that there are children - which I think is fundamentally wrong (seems to me..).
Just to note, this approach will not help in the sparsity or score problems mentioned in the blog post. It helps the problem of user expectations by making the "type==table" problem go away - but it just seems to make the use of the parent/child documents (which is a great feature when used appropriately) more confusing, IMO.
I don't really know what's involved in the change technically, but it doesn't seem small. For people properly using types and parent-child documents, I see no benefit, only drawbacks/complications.
Let's revisit the blog post. There are three reasons to get rid of types (which are currently a fundamental part of parent-child relationships):
- misconceptions, miseducation and bad practices when people think of types like tables
- sparsity - which will be less of an issue for Lucene 7, by the time this is required
- doc scoring - I'm curious to the extent that this causes real problems and if the switch to BM25 changes it at all. If you have any articles/bugs, etc talking about this problem, I'd be interested to hear.
So it seems like the biggest problem is the first one - people misusing it. Perhaps this can be dealt with by removing it from the normal discussions of types/mappings, etc. and put the explanations of how to use types hidden away in the discussion of parent/child. You could even change the name - we don't have "_types" anymore. Now there will be "_join_types" and if you want to use them you need to specify them at the index and query time. But otherwise they would work the same way as types do now, but would be optional.
I'll add a comment to the github issue and see if there what they think.