How to model this relationship in ES?

Hi all

I am trying to model a one-to-many relationship in Elasticsearch:

  • One user has many articles.
  • The user and article data will always be returned together (e.g. a
    list of articles with corresponding user information for each article)
  • Possible to filter by user fields, article fields or a combination of
    the two.
  • The user data should only exist in one place to allow easy updating,
    not be denormalised into the article.
  • Elasticsearch will be used as the primary datastore.

What is the best way to achieve this in Elasticsearch? Possible options I
am considering are parent/child and nested documents, or perhaps there is a
better way?

Any advice much appreciated.

Thanks
Greg

--

Hi Greg,

i am new to elasticsearch but i already mapped a "complex" domain model,
with attachments, into elasticsearch.
Used Grails 2.1.1 + ElasticSearch-Plugin + elasticsearch for the task.

You didn't mentioned a platform/language. So i will describe it for grails
in a few words:

  • One user has many articles. -> very easy to model in Grails. Look at
    "gorm" in the grails documentation
  • The user and article data will always be returned together (e.g. a
    list of articles with corresponding user information for each article) ->
    very easy, use the "root false/true" and "component:true/false" of the
    grails-elasticsearch-plugin
  • Possible to filter by user fields, article fields or a combination of
    the two. -> depends on the model. Could fields be used as facets? Therefore
    i would look into the elasticSearchService in Grails-elasticsearch-plugin.
    Should be easy.
  • The user data should only exist in one place to allow easy updating,
    not be denormalised into the article -> Define both as "root true", but use
    "component:true". Limit "types". Define article as "type to return"
  • Elasticsearch will be used as the primary datastore. -> Gateway, for
    long persistence ? Therefore i would overwrite CRUD methods of domain
    classes, so they are only indexed/deleted from index.

I only described one way, which could be realised in about 2 hours. If it
is the best way? I don't think so, but it works good enough.

Greetings

Christian Th.

exensio: http://www.exensio.de

--

Hi Christian

Thanks for your reply.

I should have explained, I am looking to achieve this using the
Elasticsearch REST API.

I have checked how Grails is doing this and it appears to perform two
separate queries: one to retrieve the list of articles, and a second to
query the matching users. I appreciate that Elasticsearch doesn't support
joining queries, but is there a way that I could structure the data so that
articles and users could be retrieved at the same time?

Thanks
Greg

On Friday, 28 September 2012 11:48:32 UTC+1, Christian Th. wrote:

Hi Greg,

i am new to elasticsearch but i already mapped a "complex" domain model,
with attachments, into elasticsearch.
Used Grails 2.1.1 + Elasticsearch-Plugin + elasticsearch for the task.

You didn't mentioned a platform/language. So i will describe it for grails
in a few words:

  • One user has many articles. -> very easy to model in Grails. Look at
    "gorm" in the grails documentation
  • The user and article data will always be returned together (e.g. a
    list of articles with corresponding user information for each article) ->
    very easy, use the "root false/true" and "component:true/false" of the
    grails-elasticsearch-plugin
  • Possible to filter by user fields, article fields or a combination
    of the two. -> depends on the model. Could fields be used as facets?
    Therefore i would look into the elasticSearchService in
    Grails-elasticsearch-plugin. Should be easy.
  • The user data should only exist in one place to allow easy updating,
    not be denormalised into the article -> Define both as "root true", but use
    "component:true". Limit "types". Define article as "type to return"
  • Elasticsearch will be used as the primary datastore. -> Gateway,
    for long persistence ? Therefore i would overwrite CRUD methods of domain
    classes, so they are only indexed/deleted from index.

I only described one way, which could be realised in about 2 hours. If
it is the best way? I don't think so, but it works good enough.

Greetings

Christian Th.

exensio: http://www.exensio.de

--

I have a similar schema and to the best of my knowledge, array(or list in my Python/pyes case) is the way to model the 1-to-many relationship.

mappings: {
user: {
properties: {
comments: {
properties: {
id: {
include_in_all: false
store: yes
type: long
}
body: {
include_in_all: true
omit_norms: true
store: yes
term_vector: with_positions_offsets
type: string
}
}
}
etc

In my client code, I populate the 'comments' list for each user where each comment is a python dict(keys: id, body)

e.g. query syntax: curl localhost:9200/users/user/_search?q=comments.body:facebook

Interesting, thanks. I hadn't thought of using an array.

Is it possible to add a new comment to the document without the need to
reindex the existing user and comments?

On Friday, 28 September 2012 17:52:49 UTC+1, es_learner wrote:

I have a similar schema and to the best of my knowledge, array(or list in
my
Python/pyes case) is the way to model the 1-to-many relationship.

mappings: {
user: {
properties: {
comments: {
properties: {
id: {
include_in_all: false
store: yes
type: long
}
body: {
include_in_all: true
omit_norms: true
store: yes
term_vector: with_positions_offsets
type: string
}
}
}
etc

In my client code, I populate the 'comments' list for each user where each
comment is a python dict(keys: id, body)

e.g. query syntax: curl
localhost:9200/users/user/_search?q=comments.body:facebook

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-model-this-relationship-in-ES-tp4023306p4023332.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--

Not that I know of. And I'm using 0.19.2.

In fact, I have to do this:
doc = connection.get(index_name, doc_type, id) # to retrieve the old content
.. update its comments list.
connection.delete(index_name, doc_type, id) # delete before reindex
connection.index(doc, index_name, doc_type, id) # actually I'm using bulk data write, not single doc write

The above get/delete/index is done at client-side - very inefficient IMO.

I'm still waiting for a ES server-side update feature(similar to Mongo's) where we sent only the delta's. I don't believe this feature is there yet - would love to be corrected here :slight_smile:

EDIT: I read up more on 0.19 features and found the Update API - http://www.elasticsearch.org/guide/reference/api/update.html I posted another post for pyes bindings. So, I take back what I said above about server-side update. But the entire doc will be reindexed(which is expected).

Careful when using a repeated array of objects
"One of the problems when indexing inner objects that occur several
times in a doc is that “cross object” search match will occur, for example:"
see

To use the following user with comments example, in the above warning
"several times", might be an array of comments.
If you want to find any user who has any comment with some terms in the
body, no problem, but if you want to find any user who has a comment of
type = 2 and that comment has a body that contains some terms, you might
get false positives when there is a comment of type 2 and there is a
comment with the terms, but not necessarily the same comment.

If you want to filter on comments.type and search on comments.body,
you'd want to use an inner object of type nested, or get more complex
and use child docs.
If the comments come along after the user and have a different
life-cycle, child comments might be just the thing, but you can't obtain
child values when you obtain user values, so you'd have to go back for
another search. Searching documents is not like doing SQL in a database.

Good luck,
-Paul

On 9/28/2012 9:52 AM, es_learner wrote:

I have a similar schema and to the best of my knowledge, array(or list in my
Python/pyes case) is the way to model the 1-to-many relationship.

mappings: {
user: {
properties: {
comments: {
properties: {
id: {
include_in_all: false
store: yes
type: long
}
body: {
include_in_all: true
omit_norms: true
store: yes
term_vector: with_positions_offsets
type: string
}
}
}
etc

In my client code, I populate the 'comments' list for each user where each
comment is a python dict(keys: id, body)

e.g. query syntax: curl
localhost:9200/users/user/_search?q=comments.body:facebook

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-model-this-relationship-in-ES-tp4023306p4023332.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--