Thanks Jörg, I do understand your point. As for the clarity of my question,
let me explain with an example.
Let us say we have an index called "document" and two types "contrib" and
"access".
Here are the inserts into these types:
XPUT document/contrib/1
{
"url":"Semantic Web - Wikipedia",
"contributors": [
{
"user":"Cutting"
},
{
"user":"Lee"
}
]
}
XPUT document/contrib/2
{
"url":"Information retrieval - Wikipedia",
"contributors": [
{
"user":"Cutting"
},
{
"user":"Raghavan"
}
]
}
XPUT document/access/1
{
"url":"Semantic Web - Wikipedia",
"accessors": [
{
"user":"Mahesh"
},
{
"user":"Suresh"
}
]
}
XPUT document/access/2
{
"url":"Information retrieval - Wikipedia",
"accessors": [
{
"user":"Banon"
},
{
"user":"Raghavan"
}
]
}
These two types are stored separately because:
Contributions are rare, and updates to the index would not be frequent.
Accesses are very regular and updates need to be real time.
Now, if we want to get all documents where "Cutting" is an actor (either
contributor or accessor), then the results would be:
"hits": [
{
"_index": "document",
"_type": "contrib",
"_id": "2",
"_score": 0.375,
"_source": {
"url": "http://en.wikipedia.org/wiki/Information_retrieval",
"contributors": [
{
"user": "Cutting"
},
{
"user": "Raghavan"
}
]
}
},
{
"_index": "document",
"_type": "contrib",
"_id": "1",
"_score": 0.375,
"_source": {
"url": "http://en.wikipedia.org/wiki/Semantic_web",
"contributors": [
{
"user": "Cutting"
},
{
"user": "Lee"
}
]
}
}
]
If I do this for "Raghavan", it would be:
"hits": [
{
"_index": "document",
"_type": "contrib",
"_id": "2",
"_score": 0.22295055,
"_source": {
"url": "http://en.wikipedia.org/wiki/Information_retrieval",
"contributors": [
{
"user": "Cutting"
},
{
"user": "Raghavan"
}
]
}
},
{
"_index": "document",
"_type": "access",
"_id": "2",
"_score": 0.22295055,
"_source": {
"url": "http://en.wikipedia.org/wiki/Information_retrieval",
"accessors": [
{
"user": "Banon"
},
{
"user": "Raghavan"
}
]
}
}
]
Ultimately, I would like to show the results in terms of the documents,
i.e., the URLs. If you observe, in case of Cutting, it was indeed two
different documents he was acting on. And for Raghavan, he is playing two
roles with respect to the same document. I would like to group these
results into one, and make sure the ranking is updated based on the fact
that he has played both the roles too. So, if Raghavan had only contributed
or accessed another document, that would have to rank lower than the one
here, as he is appearing twice.
Does this make the scenario a little clear? Hope I have also clarified why
it is difficult to make these two types roll into one, due to the huge
differences in update frequency behavior. Also, not saying parent/child
would be the best solution in this case, that was the best I could think
of, although I agree that is not the ideal solution.
Regards,
VP.
On Tuesday, October 15, 2013 10:22:22 PM UTC+5:30, Jörg Prante wrote:
I do not fully understand your challenge and the role you want ES to play.
If you operate with IDs, you can iterate through the multiget response and
visit contributors, accessors etc. and select a unique list of members in
your app simply by looking at the doc ID.
It is not very elegant to assign more than one ID to the same entity,
because then an ID is no longer unique. Then you'd have to address the
problem of entity identification or entity matching to obtain unique IDs,
which is outside the scope of ES, it is in the domain of the app.
Parent/child is for special queries on parent/child relationships
(has_parent/has_children) and updating parents and children docs on their
own steps, where children docs can link to a unique parent ID. So I'm not
sure why parent/child can solve your challenge.
For more inspiration about relationships, I recommend this overview of
Zachary Tong
Elasticsearch Platform — Find real-time answers at scale | Elastic
Jörg
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.