Document Structure Design

In replacing the backed of an relational DB with ES, we're faced with the
choice of structuring documents a couple of ways:

(1)

attribute : [{
adapterInstance : "ying",
attributeName : "attribute1",
attributeType : "java.lang.Integer",
attributeValue : 24
}
]

or (2)

attributes : {
adapterInstance : "ying",
attribute : [{
attributeName : "attribute1",
attributeType : "java.lang.Integer",
attributeValue : 24,
}
]

(1) is how the data is typically received, one at a time, (2) is some
aggregate business logic that associates the adapterInstance with an array
of attributes. While this can be coded to support either document
structure, for search would (2) give faster results knowing that the
attribute (array) in either structure can be indexed (as a nested type)?

--

Hello,

I think it's best to structure your data in a way that suits best what you
search for.

So how does your typical query look like? What do you most often search for?

The rule of thumb is that flat documents are easier to handle than nested
documents. Nested documents are actually additional documents internally.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Dec 12, 2012 at 9:48 PM, TeeTim tim.sheridan@me.com wrote:

In replacing the backed of an relational DB with ES, we're faced with the
choice of structuring documents a couple of ways:

(1)

attribute : [{
adapterInstance : "ying",
attributeName : "attribute1",
attributeType : "java.lang.Integer",
attributeValue : 24
}
]

or (2)

attributes : {
adapterInstance : "ying",
attribute : [{
attributeName : "attribute1",
attributeType : "java.lang.Integer",
attributeValue : 24,
}
]

(1) is how the data is typically received, one at a time, (2) is some
aggregate business logic that associates the adapterInstance with an array
of attributes. While this can be coded to support either document
structure, for search would (2) give faster results knowing that the
attribute (array) in either structure can be indexed (as a nested type)?

--

--

Picking up this thread, I think I can pass on a comment or two.
Tim did not say that he was going to use nested documents
as opposed to
(1) hierarchical structure which is not technically "nested" in ES, but
called "inner objects" in the documentation.
(2) nested documents which as Radu says would be separate documents
(3) parent child documents, in Tim's case, would maybe be adapter as
parent and attribute as child.

But, if you ever need to search on two or more values in an attribute
matching, e.g. attribute.name:"length" AND attribute.value:10
then you would have to use either a nested or parent/child arrangement
in order to avoid "cross object matches"
for example given two doc's
"adapter": { "instance":"ying", "attribute": [ { "name":"height",
"value":"10"}, { "name":"width", "value":"5" } ] }
"adapter": { "instance":"yang", "attribute": [ { "name":"height",
"value":"5"}, { "name":"width", "value":"10" } ] }

And the query like
+attribute.name:"height"+"attribute.value":10

Using inner objects you would match ying and yang
Using nested objects you would match only ying
Using child objects and the appropriate has_child or has_parent as
needed, you would match only ying.

As Radu said, you want to build the structure which supports your queries.

I would also design types and field names which don't repeat "attribute"
everywhere (as I have done above).

-Paul

On 12/15/2012 6:17 AM, Radu Gheorghe wrote:

Hello,

I think it's best to structure your data in a way that suits best what
you search for.

So how does your typical query look like? What do you most often
search for?

The rule of thumb is that flat documents are easier to handle than
nested documents. Nested documents are actually additional documents
internally.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Dec 12, 2012 at 9:48 PM, TeeTim <tim.sheridan@me.com
mailto:tim.sheridan@me.com> wrote:

In replacing the backed of an relational DB with ES, we're faced
with the choice of structuring documents a couple of ways:

(1)

|attribute : [|{
|||adapterInstance : ||"ying",|
|||attributeName : ||"attribute1"||,|
|||attributeType : ||"java.lang.Integer"||,|
|||attributeValue : ||24|
|||}|
]

or (2)

|attributes : {|
|adapterInstance : "ying",|
|attribute : |[{
|||attributeName : ||"attribute1"||,|
|||attributeType : ||"java.lang.Integer"||,|
|||attributeValue : ||24||,|
|||}|
]

(1) is how the data is typically received, one at a time, (2) is
some aggregate business logic that associates the adapterInstance
with an array of attributes. While this can be coded to support
either document structure, for search would (2) give faster
results knowing that the attribute (array) in either structure can
be indexed (as a nested type)?
-- 

--

--

Thank you both so much. Your comments helped clarify my understanding.

Tim

On Wednesday, January 2, 2013 8:13:30 PM UTC-5, P Hill wrote:

Picking up this thread, I think I can pass on a comment or two.
Tim did not say that he was going to use nested documents
as opposed to
(1) hierarchical structure which is not technically "nested" in ES, but
called "inner objects" in the documentation.
(2) nested documents which as Radu says would be separate documents
(3) parent child documents, in Tim's case, would maybe be adapter as
parent and attribute as child.

But, if you ever need to search on two or more values in an attribute
matching, e.g. attribute.name:"length" AND attribute.value:10
then you would have to use either a nested or parent/child arrangement
in order to avoid "cross object matches"
for example given two doc's
"adapter": { "instance":"ying", "attribute": [ { "name":"height",
"value":"10"}, { "name":"width", "value":"5" } ] }
"adapter": { "instance":"yang", "attribute": [ { "name":"height",
"value":"5"}, { "name":"width", "value":"10" } ] }

And the query like
+attribute.name:"height"+"attribute.value":10

Using inner objects you would match ying and yang
Using nested objects you would match only ying
Using child objects and the appropriate has_child or has_parent as
needed, you would match only ying.

As Radu said, you want to build the structure which supports your queries.

I would also design types and field names which don't repeat "attribute"
everywhere (as I have done above).

-Paul

On 12/15/2012 6:17 AM, Radu Gheorghe wrote:

Hello,

I think it's best to structure your data in a way that suits best what
you search for.

So how does your typical query look like? What do you most often
search for?

The rule of thumb is that flat documents are easier to handle than
nested documents. Nested documents are actually additional documents
internally.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Dec 12, 2012 at 9:48 PM, TeeTim <tim.sh...@me.com <javascript:>
<mailto:tim.sh...@me.com <javascript:>>> wrote:

In replacing the backed of an relational DB with ES, we're faced 
with the choice of structuring documents a couple of ways: 

(1) 

|attribute : [|{ 
|||adapterInstance : ||"ying",| 
|||attributeName : ||"attribute1"||,| 
|||attributeType : ||"java.lang.Integer"||,| 
|||attributeValue : ||24| 
|||}| 
] 

or (2) 

|attributes : {| 
|adapterInstance : "ying",| 
|attribute : |[{ 
|||attributeName : ||"attribute1"||,| 
|||attributeType : ||"java.lang.Integer"||,| 
|||attributeValue : ||24||,| 
|||}| 
] 

(1) is how the data is typically received, one at a time, (2) is 
some aggregate business logic that associates the adapterInstance 
with an array of attributes. While this can be coded to support 
either document structure, for search would (2) give faster 
results knowing that the attribute (array) in either structure can 
be indexed (as a nested type)? 
-- 

--

--