Any suggestion for duplicate data?

Here is the mapping of my index: https://gist.github.com/4394060

I have used parent & child mapping to normalize data but as far as I
understand there is no way to get any fields from _parent document. Now,
I'm trying to find the best way of storing artist_name and release_name in
song type. I won't query these fields but I should be able to get them when
I query other fields like name. There will be millions of song and I don't
have much memory so I suspect that these duplicate values may cause out of
memory. For now, artist_name and release_name fields are has "index": "no"
property and I turned on compression for _source field. Do you have any
efficient suggestion for avoiding duplicate values like querying multiple
queries or hacky way to get fields from _parent document or denormalized
data is the only way to handle this kindle of problem?

--

Are you talking about memory or disk space? Storing additional fields
without indexing shouldn't affect memory in a any way, unless you
are retrieving all these songs in a single request. If you are really
concerned about denormalization you can always use mget or search to
retrieve fields from parents.

On Thursday, December 27, 2012 9:56:56 PM UTC-5, Burak Emre Kabakcı wrote:

Here is the mapping of my index: elasticsearch mapping · GitHub

I have used parent & child mapping to normalize data but as far as I
understand there is no way to get any fields from _parent document. Now,
I'm trying to find the best way of storing artist_name and release_name in
song type. I won't query these fields but I should be able to get them when
I query other fields like name. There will be millions of song and I
don't have much memory so I suspect that these duplicate values may cause
out of memory. For now, artist_name and release_name fields are has
"index": "no" property and I turned on compression for _source field. Do
you have any efficient suggestion for avoiding duplicate values like
querying multiple queries or hacky way to get fields from _parent document
or denormalized data is the only way to handle this kindle of problem?

--