Does document database means denormalize


(eunever32) #1

Hey,

What is the best way to design indexes in elasticsearch?

I mean in terms of normalization vs denormalization

So am I right in thinking that because elasticsearch is a document database
we don't worry about having a denormalized model?

So let's say I'm in the "fruit" domain and I have a set of apple varieties
say:
Bailey
Baldwin
Ballyfatten
Beacon
Beauty of Bath
Belle de Boskoop agm
Ben Davis
Beverly Hills
Bismarck
Blenheim Orange agm
Bloody Ploughman
Bottle Greening
Braeburn
Bramley (Bramley's Seedling) agm
Bravo de Esmolfe
Breedon Pippin
Brina
Byfleet Seedling

I know the name will always be less than say 70 characters so if I have an
index say

stock {
apple_variety,
count,
shop,
expiry_date
}

with for example :
stock = {
"Bailey", 50, "store1", "01/01/2015",
"Baldwin", 150, "store1", "01/01/2015",
"Ballyfatten", 250, "store1", "01/01/2015",
"Beacon", 50, "store1", "01/01/2015",
"Beauty of Bath", 250, "store2", "01/01/2015"
}

is that better than
variety {
variety_id,
variety_name
}

with {
1, "Bailey",
2, "Baldwin",
3, "Ballyfatten",
4, "Beacon",
5, "Beauty of Bath",
6, "Belle de Boskoop agm",
7, "Ben Davis",
8, "Beverly Hills",
9, "Bismarck",
10, "Blenheim Orange agm",
11, "Bloody Ploughman",
12, "Bottle Greening",
13, "Braeburn",
}",
And
stock = {
1, 50, "store1", "01/01/2015",
2, 150, "store1", "01/01/2015",
3, 250, "store1", "01/01/2015",
4, 50, "store1", "01/01/2015",
5, 250, "store1", "01/01/2015"
}

I am thinking that in production there will be advantages to having
denormalised data
particularly say event of schema change then the data can remain in place
with addition of fields

If on other hand a new apple variety is insert or removed then the meaning
of apple_id : 1 might accidentally change and cause grief ?

Thoughts ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/95dfa1c9-8e26-4c1b-8009-506870ca4a24%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(eunever32) #2

What I am asking is

Do different design decisions apply in elasticsearch compared to relational

Is denormalized better for elasticsearch

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f87a55a-c9e8-4198-a5ce-72054ce52958%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jilles van Gurp) #3

Yes, definitely think in terms of denormalizing. Joins are hard/expensive
in elasticsearch so you need to avoid needing to joing by prejoining. But
you have other options as well, see
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/

So, say you had a person table and a address table in a database, where you
have a 1:1 relation, that's a no brainer: shove the address in the person
index along with the rest of the person data.

If you had another table called company with a 1:n relation to person, it
gets more tricky. Now you have options.

Option 1: put the company data in the person index. Sure you are copying
data all over the place but storage is cheap and it is not like you are
going to have a trillion companies or persons. Your main worry is not space
but consistency. What happens if you need to change the company details?
Option 2: put the person objects in an array in the company objects. Fine
as long as you don't need to query for the persons separately.
Option 3: store just the company id in the person index or the person id in
the company index (array). Now you will end up in situations where you may
need to join and you'll have to fire many queries and manipulate search
results to do it, which is slow, tedious to program, and somewhat error
prone. But for simple use cases you might get away with it.
Option 4: use nested documents to put persons in companies. Now you can use
nested queries and aggregations, which give you join like benefits. Don't
use this for massive amounts of nested documents on a single parent.
Option 5: use parent child documents to give persons a company parent. More
flexibe than nested and gives you some performance benefits since parent
and child reside on the same shard. So same as option 3 but faster.
Option 6: compromise: denormalize some but not all of the fields and keep
things in a separate index as well.

With n:m style relations it gets a bit harder. Probably you don't want to
index the cartesian product, so you'll need to compromise. Any of the
options above could work. All depends on how many relations you are really
managing.

We've actually gotten rid of our database entirely. Once you get used to
it, thinking in terms of documents is much more natural than thinking in
terms of rows, tables, and relations. You have much less of an impedance
mismatch that you need to pretend does not exist with some object
relational library. It's more like here's an object, serialize it, store
it, query for it.

Jilles

On Friday, June 13, 2014 9:48:37 AM UTC+2, eune...@gmail.com wrote:

What I am asking is

Do different design decisions apply in elasticsearch compared to
relational

Is denormalized better for elasticsearch

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69337cde-4962-4c9f-a59a-3c01d26440a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(eunever32) #4

Great answer! Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ddbe277a-1996-455c-b07d-82a75cb9b56e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5