Understand and using relationships without mapping types (6.0)

RE: Removal of mapping types https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

It looks like the proposed solution is to use a "common key" - my term. Such a common key can tie a user to a tweet, as in the example, but let's get serious, and talk about more complex, and I suspect, more common, and realistic use cases, shall we?

Let's say I have a FACTORY, and I make PRODUCT. I have a supply chain of numerous different VENDORS, who sell me various PARTS, INGREDIENTS, PACKAGES, and SYSTEMS that I use at various STAGES in my manufacturing PROCESSES. That doesn't even begin to get into distributors, shippers, customers, retail outlets, regulators, product managers, and so on. How do I track those relationships?

Parent-child could tie a PRODUCT to all the relevant PARTS, but not further to the relevant VENDORS. That's a 3 generation parent child, which is no longer recommended. So then what, a "common key" for all 3 fields? Make up a custom field, which is essentially the uuid of the PRODUCT, and then make that the common field for every other document that is related to it? Because if I understand this correctly, all those other documents would be in a different index and therefore, by definition, of a different type?

It seems nesting does not solve this problem and can't, because it follows that you can't nest documents of a different type, right?

This sounds like a nightmare for those that need relationships. How do you facet without relations? How do you do a recommendation engine? How do you do any analytics if you can't require relations?

I looked at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html, but that seems to assume even more denormalization, so that, for example, a full text search would pick up common terms (like the name of the same VENDOR) in a query result.

Hopefully, I am too new at this to know there is a simple solution used by those with similar data. I hope to learn that in your response.

How do I track those relationships?

Do you need to?

I mean that the main questions to ask yourself are about the What and then think about the How.

  • So what are my users going to search for?
  • What kind of information do they need to search for those entities?

That's IMHO the main major questions to ask before trying to think about the implementation. I'm used to say at conferences that forgetting about all the stuff you learned at school about Relational Databases is one of the hardest thing to do when you want to play with Document Oriented systems. It's really not that easy to change your mindset, specifically when you did that for years. At least, that was my experience when I introduced Elasticsearch and CouchDB at $job-1.

I tried to explain my thoughts there:

I hope this can help.

1 Like

Just to keep from being confused by terminology, I will use the word
'Paper' instead of 'Document'.

Sure, they can easily find any paper. That's the easy part. But in my
schema, there are People associated with every Paper: 1) The people that
wrote it, and 2) the people affected by it. In my current schema, those are
two different models. Since a Person is not a Paper, these are different
types. And my users are very much going to want to search for the People
connected to any Paper that affects them. But it gets more complex than
that. Every Paper has a Topic. In fact, it might have more than one Topic.
Paper 19 might refer to Paper 12. But it might not say anything about Paper
8, even though both 8 and 19 are on the same Topic! So the best, maybe
only, way to find 8 from 19 is by Topic, unless they both come up in the
same term query - which will depend entirely on the effectiveness of the
synonym associations. This simple example has already generated 4 types,
which can't be in the same Document, because they are different types, and
can't be 3 generation because that's no longer recommended and not really
an accurate model of their relationship anyway.

This might be a deal breaker for me, and that is terribly disappointing. I
was really excited about the speed and potential of Elasticsearch, but now
I don't see how I can make it work - especially since my most central - if
not most important - table, is a recursive many to many!

“None of you has faith until he loves for his brother or his neighbor what
he loves for himself.”

If I try to sum up what you said in what a user can search for. Let's start easy and step by step to make sure I understand the use case.

So you have a paper which is your top level object, and you first want to be able to search for a paper that has been written by someone.
Let say we store papers as something like:

PUT papers/doc/1
{
  "title": "foo",
  "topic": "a very good topic",
  "authors": [ {
    "name": "me"
  }, {
    "name": "you"
  }]
}

Is that the right first step that will answer some (most?) of your users queries?

Then I think you are may be looking for something close to recommandations. If so, I believe that the Graph feature we have in x-pack might be helpful to help you to connect the documents. May be @Mark_Harwood can say more.

Ok, thanks. I will take a look at the Graph feature and come back to you and or Mark with more questions.

@Mark_Harwood I looked at the video on graph that you did just before it came out. My disappointment has been salved and my excitement has returned. Now I have a specific use case to ask about. From what I gather, the graph depends first and foremost on the significant terms already being indexed, like in a field, or by parsing them out of user queries. My question, then, is what about significant term discovery? For example, going back to my Paper example, let's say I have a term, 'green shoe laces' in the the text of some of my Papers. It's an old term, been around a long time, but not high frequency, and not separated out as any kind of type or category. Suddenly, a new use for green shoe laces is discovered, and it explodes in frequency in my Papers. So, the question:

How do I discover what are significant terms in the first place? Particularly if it is in raw text in a large set of text documents (Papers)? I might know what most of them are in general, but there are so many of them, maybe thousands, that I don't want to create categories for each of them by hand, or I don't have the manpower to do so. Is there a way to discover these terms, so that I can then make them significant, and follow the relationships that would thereafter be generated, among users, or types of users, or time, (timelion?) or frequency, relation to other significant terms, well known or newly discovered in the data?

I know that you talked about the Enron use case. Have there been improvements or new workarounds in finding these things in distributed systems? Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.