Nested object with arrays , best practice


(samuel merlet) #1

Hi,

I have a quite complex model to put as a document .
Here is the mapping

{
"id" : {"type" : "integer" },
"public_name": {"type" : "string" ,"store" : "no", "index":"no" },
"description": {"type" : "string" ,"store" : "no", "index":"analyzed" },
"website_url": {"type" : "string" ,"store" : "no", "index":"no" },
"overall_rating": {"type" : "float" },
"plan_id": {"type" : "integer" },
"city": {"type" : "string" ,"store" : "yes", "index":"analyzed" },
"zipcode": {"type" : "string" ,"store" : "yes", "index":"analyzed" },
"state": {"type" : "string" ,"store" : "yes", "index":"analyzed" },
"country": {"type" : "string" ,"store" : "yes", "index":"analyzed" },
"location":{"type" : "geo_point" , "lat_lon" : true},
"created": {"type" : "date" ,"format" : "YYYY-MM-dd HH:mm:ss" },

"profile_type_3" : {
       "type" : "object",
        "properties" : {

          "profile_roles"  : {
              "properties" : {
                "role_id" : {"type" : "integer","store" : "yes", 

"index":"analyzed"},
"industry_id" : {"type" : "integer"},
"description" : {"type" : "string","store" : "yes",
"index":"analyzed"}
"skills" : {
"properties" : {
"skill_id" : {"type" : "integer", "store" : "yes"},

                    "skill_name" : {"type" : "string","store" : "yes", 

"index":"analyzed"},
"experience_level_id" : {"type" : "integer","store"
: "yes"}
}
},
"terms" : {
"properties" : {
"term_id" : {"type" : "integer","store" : "yes" },
"term_name" : {"type" : "string","store" : "yes" }
}
}
}
}
}
},

    "profile_type_2" : {
      "type" : "object",
      "properties" : {

        "profile_roles"  : {
          "properties" : {
            "role_id" : {"type" : "integer","store" : "yes" },
            "role_name" : {"type" : "string","store" : "yes" },
            "industry_id" : {"type" : "integer"},          
            "description" : {"type" : "string","store" : "yes", 

"index":"analyzed"},
"skills" : {
"properties" : {
"skill_id" : {"type" : "integer","store" : "yes",
"index":"analyzed"},
"skill_name" : {"type" : "string","store" : "yes",
"index":"analyzed"},
"experience_level_id" : {"type" : "integer","store" :
"yes", "index":"analyzed"},
"experience_level_name" : {"type" : "string"}
}
}
}
}
}
}

}

Each profile_type_* contains many profile_roles (as array) and each
profiles_roles contains many skills (as array) and terms (as array)

So data could look like this

.....
"profile_type_3" : {

"profile_roles" : {

 {
    "role_id" : 1,                    
    "industry_id" : 1,
    "description" : "some text"

    "skills" : {                      
     {
       "skill_id" : 10,  
       "skill_name" : "PHP",
       "experience_level_id" : 1 
     },     
     {
       "skill_id" : 12,  
       "skill_name" : "PYTHON",
       "experience_level_id" : 3 
     }
    }
                

    "terms" : {
      {
        "term_id" : 1,
        "term_name" : "some text"
      },
      {
        "term_id" : 2,
        "term_name" : "some text"
      },
    }
  },

 {
    "role_id" : 2,                    
    "industry_id" : 13,
    "description" : "some text"

    "skills" : {                      
     {
       "skill_id" : 10,  
       "skill_name" : "PHP",
       "experience_level_id" : 1 
     },     
     {
       "skill_id" : 12,  
       "skill_name" : "PYTHON",
       "experience_level_id" : 3 
     }
    }
                

    "terms" : {
      {
        "term_id" : 1,
        "term_name" : "some text"
      },
      {
        "term_id" : 2,
        "term_name" : "some text"
      },
    }
  },


}

}

So a profile_type_* could have many profile_roles , a rofile_role could
have many skills and terms

so now if need to get document that have a profile_roles with role_id = 1
and a skill_id = 12 ( within the role_id = 1) and experience level >= 2
how can i achieve this ? i read about nested object but here i have nested
nested object profile -> profiles_roles -> skills

For skill I think about creating some kind of tuple field like for example
skills = [ "10:1", "12:3" ] where the first sohuld be the id and the
second the experience_id but how to do it for profile_roles aswell ? by
creating some kind of hash like this : roles = [ "1:10:1", "1:12:13" ] ?

If someone could help to understand the best way to achieve this it will be
really great, thanks.

--


(es_learner) #2

I have a similar application.

I would make profile_roles an array of profile_role objects and in each profile_role object, skills and terms are arrays of skill and term object respectively.

Using python,
profile_roles = [] and each profile_role is a dict that looks like a {"role_id": 1, "industry_id" : 1, "description" : "some text", "skills": [], "terms": []}

skill = {"skill_id" : 10, "skill_name" : "PHP", "experience_level_id" : 1}
term = {"term_id" : 1, "term_name" : "some text"}

Field dereferencing can be done like so: profile_roles.role_id, profile_roles.skills.experience_level_id

Hope this helps.


(samuel merlet) #3

Hi,

Thanks , that help a lot !
But i have false postive by using this way .

if i have 2 roles with different skills and search for a combination ok
role and skill i get a result even if the skill belong to another role

"roles" : {
{
"role_id" : 2,
"skills" : {
"skill_id" : 5
}
},
{
"role_id" : 3,
"skills" : {
"skill_id" : 4
}
},
}

Now if i filter the query with role id = 2 with skill = 4 ,
( profile_roles.role_id = 2 AND profile_roles.skills.skill_id = 4 ) i have
a result but it's a false one because role 2 doesn't have a skill 4

Maybe i i need to use nested documents ? Any help bout using this will be
much appreciated.

Thanks

On Tuesday, October 16, 2012 2:31:06 PM UTC+2, samuel merlet wrote:

Hi,

I have a quite complex model to put as a document .
Here is the mapping

{
"id" : {"type" : "integer" },
"public_name": {"type" : "string" ,"store" : "no", "index":"no" },
"description": {"type" : "string" ,"store" : "no", "index":"analyzed"
},
"website_url": {"type" : "string" ,"store" : "no", "index":"no" },
"overall_rating": {"type" : "float" },
"plan_id": {"type" : "integer" },
"city": {"type" : "string" ,"store" : "yes", "index":"analyzed" },
"zipcode": {"type" : "string" ,"store" : "yes", "index":"analyzed" },
"state": {"type" : "string" ,"store" : "yes", "index":"analyzed" },
"country": {"type" : "string" ,"store" : "yes", "index":"analyzed" },
"location":{"type" : "geo_point" , "lat_lon" : true},
"created": {"type" : "date" ,"format" : "YYYY-MM-dd HH:mm:ss" },

"profile_type_3" : {
       "type" : "object",
        "properties" : {

          "profile_roles"  : {
              "properties" : {
                "role_id" : {"type" : "integer","store" : "yes", 

"index":"analyzed"},
"industry_id" : {"type" : "integer"},
"description" : {"type" : "string","store" : "yes",
"index":"analyzed"}
"skills" : {
"properties" : {
"skill_id" : {"type" : "integer", "store" :
"yes"},
"skill_name" : {"type" : "string","store" : "yes",
"index":"analyzed"},
"experience_level_id" : {"type" :
"integer","store" : "yes"}
}
},
"terms" : {
"properties" : {
"term_id" : {"type" : "integer","store" : "yes" },
"term_name" : {"type" : "string","store" : "yes" }
}
}
}
}
}
},

    "profile_type_2" : {
      "type" : "object",
      "properties" : {

        "profile_roles"  : {
          "properties" : {
            "role_id" : {"type" : "integer","store" : "yes" },
            "role_name" : {"type" : "string","store" : "yes" },
            "industry_id" : {"type" : "integer"},          
            "description" : {"type" : "string","store" : "yes", 

"index":"analyzed"},
"skills" : {
"properties" : {
"skill_id" : {"type" : "integer","store" : "yes",
"index":"analyzed"},
"skill_name" : {"type" : "string","store" : "yes",
"index":"analyzed"},
"experience_level_id" : {"type" : "integer","store" :
"yes", "index":"analyzed"},
"experience_level_name" : {"type" : "string"}
}
}
}
}
}
}

}

Each profile_type_* contains many profile_roles (as array) and each
profiles_roles contains many skills (as array) and terms (as array)

So data could look like this

.....
"profile_type_3" : {

"profile_roles" : {

 {
    "role_id" : 1,                    
    "industry_id" : 1,
    "description" : "some text"

    "skills" : {                      
     {
       "skill_id" : 10,  
       "skill_name" : "PHP",
       "experience_level_id" : 1 
     },     
     {
       "skill_id" : 12,  
       "skill_name" : "PYTHON",
       "experience_level_id" : 3 
     }
    }
                

    "terms" : {
      {
        "term_id" : 1,
        "term_name" : "some text"
      },
      {
        "term_id" : 2,
        "term_name" : "some text"
      },
    }
  },

 {
    "role_id" : 2,                    
    "industry_id" : 13,
    "description" : "some text"

    "skills" : {                      
     {
       "skill_id" : 10,  
       "skill_name" : "PHP",
       "experience_level_id" : 1 
     },     
     {
       "skill_id" : 12,  
       "skill_name" : "PYTHON",
       "experience_level_id" : 3 
     }
    }
                

    "terms" : {
      {
        "term_id" : 1,
        "term_name" : "some text"
      },
      {
        "term_id" : 2,
        "term_name" : "some text"
      },
    }
  },


}

}

So a profile_type_* could have many profile_roles , a rofile_role could
have many skills and terms

so now if need to get document that have a profile_roles with role_id = 1
and a skill_id = 12 ( within the role_id = 1) and experience level >= 2
how can i achieve this ? i read about nested object but here i have nested
nested object profile -> profiles_roles -> skills

For skill I think about creating some kind of tuple field like for example
skills = [ "10:1", "12:3" ] where the first sohuld be the id and the
second the experience_id but how to do it for profile_roles aswell ? by
creating some kind of hash like this : roles = [ "1:10:1", "1:12:13" ] ?

If someone could help to understand the best way to achieve this it will
be really great, thanks.

--


(samuel merlet) #4

Hi,

Thanks , that help a lot !
But i have false postive by using this way .

if i have 2 roles with different skills and search for a combination ok
role and skill i get a result even if the skill belong to another role

"roles" : {
{
"role_id" : 2,
"skills" : {
"skill_id" : 5
}
},
{
"role_id" : 3,
"skills" : {
"skill_id" : 4
}
},
}

Now if i filter the query with role id = 2 with skill = 4 ,
( profile_roles.role_id = 2 AND profile_roles.skills.skill_id = 4 ) i have
a result but it's a false one because role 2 doesn't have a skill 4

Maybe i i need to use nested documents ? Any help bout using this will be
much appreciated.

Thanks

--


(Clinton Gormley) #5

Hiya

Thanks , that help a lot !
But i have false postive by using this way .

if i have 2 roles with different skills and search for a combination
ok role and skill i get a result even if the skill belong to another
role

Now if i filter the query with role id = 2 with skill = 4 ,
( profile_roles.role_id = 2 AND profile_roles.skills.skill_id = 4 ) i
have a result but it's a false one because role 2 doesn't have a skill
4

Maybe i i need to use nested documents ? Any help bout using this will
be much appreciated.

Yes, you need to use nested documents.

To explain why, take this document as an example:
{
stuff: [
{ foo: 1, bar: 1},
{ foo: 2, bar: 2}
]
}

That actually gets flattened to:

{
"stuff.foo": [ 1, 2 ],
"stuff.bar": [ 1, 2 ]
}

Nested documents keep the nested objects as separate documents
internally. They are not visible as separate documents at the user
level, but that is how they are stored.

Have a look at:
http://www.elasticsearch.org/guide/reference/mapping/nested-type.html

For querying, look at:
http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html
http://www.elasticsearch.org/guide/reference/query-dsl/nested-filter.html

The process is: run a query on the nested docs, then run the rest of the
query on their parent or root docs.

If you need data from the nested docs to also be visible in the parent
or root (ie topmost parent) doc, then you need to include that data in
the levels above, eg using include_in_root or include_in_parent (see the
nested type docs above), but be aware that the data in the root/parent
will be flattened as per my example above

hth

clint

--


(samuel merlet) #6

Thanks , it works great.

Can we have nested "nested objects" ?
Also i don't really understand what do you mean by
" If you need data from the nested docs to also be visible in the parent or
root " ?

When i get my results i have already access to the nested docs while
reading the source.

Thanks again

On Wed, Oct 17, 2012 at 11:21 AM, Clinton Gormley clint@traveljury.comwrote:

Hiya

Thanks , that help a lot !
But i have false postive by using this way .

if i have 2 roles with different skills and search for a combination
ok role and skill i get a result even if the skill belong to another
role

Now if i filter the query with role id = 2 with skill = 4 ,
( profile_roles.role_id = 2 AND profile_roles.skills.skill_id = 4 ) i
have a result but it's a false one because role 2 doesn't have a skill
4

Maybe i i need to use nested documents ? Any help bout using this will
be much appreciated.

Yes, you need to use nested documents.

To explain why, take this document as an example:
{
stuff: [
{ foo: 1, bar: 1},
{ foo: 2, bar: 2}
]
}

That actually gets flattened to:

{
"stuff.foo": [ 1, 2 ],
"stuff.bar": [ 1, 2 ]
}

Nested documents keep the nested objects as separate documents
internally. They are not visible as separate documents at the user
level, but that is how they are stored.

Have a look at:
http://www.elasticsearch.org/guide/reference/mapping/nested-type.html

For querying, look at:
http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html
http://www.elasticsearch.org/guide/reference/query-dsl/nested-filter.html

The process is: run a query on the nested docs, then run the rest of the
query on their parent or root docs.

If you need data from the nested docs to also be visible in the parent
or root (ie topmost parent) doc, then you need to include that data in
the levels above, eg using include_in_root or include_in_parent (see the
nested type docs above), but be aware that the data in the root/parent
will be flattened as per my example above

hth

clint

--

--


(Clinton Gormley) #7

Hiya

Can we have nested "nested objects" ?

Yes

Also i don't really understand what do you mean by
" If you need data from the nested docs to also be visible in the
parent or root " ?
When i get my results i have already access to the nested docs while
reading the source.

That's fine.

What I mean is that: sometimes you want to run queries on the root doc
which involves data stored in the nested docs. Right now I can't think
of a good example :slight_smile:

So, don't worry about it for the moment. Just bear in mind that such an
option exists in case you find yourself unable to do what you need
because the data isn't visible to you at the right level.

clint

Thanks again

On Wed, Oct 17, 2012 at 11:21 AM, Clinton Gormley
clint@traveljury.com wrote:
Hiya
>
> Thanks , that help a lot !
> But i have false postive by using this way .

    > if i have 2 roles with different skills and search for a
    combination
    > ok role and skill i get a result even if the skill belong to
    another
    > role
    
    > Now if i filter the query with role id = 2 with skill = 4 ,
    > ( profile_roles.role_id = 2 AND
    profile_roles.skills.skill_id = 4 )  i
    > have a result but it's a false one because role 2 doesn't
    have a skill
    > 4
    
    > Maybe i i need to use nested documents ? Any help bout using
    this will
    > be much appreciated.
    
    Yes, you need to use nested documents.
    
    To explain why, take this document as an example:
    {
      stuff: [
        {  foo: 1, bar: 1},
        {  foo: 2, bar: 2}
      ]
    }
    
    That actually gets flattened to:
    
    {
      "stuff.foo": [ 1, 2 ],
      "stuff.bar": [ 1, 2 ]
    }
    
    
    Nested documents keep the nested objects as separate documents
    internally.  They are not visible as separate documents at the
    user
    level, but that is how they are stored.
    
    Have a look at:
    http://www.elasticsearch.org/guide/reference/mapping/nested-type.html
    
    For querying, look at:
    http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html
    http://www.elasticsearch.org/guide/reference/query-dsl/nested-filter.html
    
    The process is: run a query on the nested docs, then run the
    rest of the
    query on their parent or root docs.
    
    If you need data from the nested docs to also be visible in
    the parent
    or root (ie topmost parent) doc, then you need to include that
    data in
    the levels above, eg using include_in_root or
    include_in_parent (see the
    nested type docs above), but be aware that the data in the
    root/parent
    will be flattened as per my example above
    
    hth
    
    clint
    
    
    
    
    
    
    --

--

--


(samuel merlet) #8

OK , thanks .

From your experiences is it a good practice to do it this way ? Or is there
better solutions like denormalize datas , hash possible combinations or
whatever ..

On Wednesday, October 17, 2012 12:13:34 PM UTC+2, Clinton Gormley wrote:

Hiya

Can we have nested "nested objects" ?

Yes

Also i don't really understand what do you mean by
" If you need data from the nested docs to also be visible in the
parent or root " ?
When i get my results i have already access to the nested docs while
reading the source.

That's fine.

What I mean is that: sometimes you want to run queries on the root doc
which involves data stored in the nested docs. Right now I can't think
of a good example :slight_smile:

So, don't worry about it for the moment. Just bear in mind that such an
option exists in case you find yourself unable to do what you need
because the data isn't visible to you at the right level.

clint

Thanks again

On Wed, Oct 17, 2012 at 11:21 AM, Clinton Gormley
<cl...@traveljury.com <javascript:>> wrote:
Hiya
>
> Thanks , that help a lot !
> But i have false postive by using this way .

    > if i have 2 roles with different skills and search for a 
    combination 
    > ok role and skill i get a result even if the skill belong to 
    another 
    > role 
    
    > Now if i filter the query with role id = 2 with skill = 4 , 
    > ( profile_roles.role_id = 2 AND 
    profile_roles.skills.skill_id = 4 )  i 
    > have a result but it's a false one because role 2 doesn't 
    have a skill 
    > 4 
    
    > Maybe i i need to use nested documents ? Any help bout using 
    this will 
    > be much appreciated. 
    
    Yes, you need to use nested documents. 
    
    To explain why, take this document as an example: 
    { 
      stuff: [ 
        {  foo: 1, bar: 1}, 
        {  foo: 2, bar: 2} 
      ] 
    } 
    
    That actually gets flattened to: 
    
    { 
      "stuff.foo": [ 1, 2 ], 
      "stuff.bar": [ 1, 2 ] 
    } 
    
    
    Nested documents keep the nested objects as separate documents 
    internally.  They are not visible as separate documents at the 
    user 
    level, but that is how they are stored. 
    
    Have a look at: 

http://www.elasticsearch.org/guide/reference/mapping/nested-type.html

    For querying, look at: 

http://www.elasticsearch.org/guide/reference/query-dsl/nested-query.html

http://www.elasticsearch.org/guide/reference/query-dsl/nested-filter.html

    The process is: run a query on the nested docs, then run the 
    rest of the 
    query on their parent or root docs. 
    
    If you need data from the nested docs to also be visible in 
    the parent 
    or root (ie topmost parent) doc, then you need to include that 
    data in 
    the levels above, eg using include_in_root or 
    include_in_parent (see the 
    nested type docs above), but be aware that the data in the 
    root/parent 
    will be flattened as per my example above 
    
    hth 
    
    clint 
    
    
    
    
    
    
    -- 

--

--


(system) #9