Any suggestion for duplicate data?

Burak_Emre_Kabakci · December 28, 2012, 2:56am

Here is the mapping of my index: https://gist.github.com/4394060

I have used parent & child mapping to normalize data but as far as I
understand there is no way to get any fields from _parent document. Now,
I'm trying to find the best way of storing artist_name and release_name in
song type. I won't query these fields but I should be able to get them when
I query other fields like name. There will be millions of song and I don't
have much memory so I suspect that these duplicate values may cause out of
memory. For now, artist_name and release_name fields are has "index": "no"
property and I turned on compression for _source field. Do you have any
efficient suggestion for avoiding duplicate values like querying multiple
queries or hacky way to get fields from _parent document or denormalized
data is the only way to handle this kindle of problem?

--

Igor_Motov · December 31, 2012, 1:20pm

Are you talking about memory or disk space? Storing additional fields
without indexing shouldn't affect memory in a any way, unless you
are retrieving all these songs in a single request. If you are really
concerned about denormalization you can always use mget or search to
retrieve fields from parents.

On Thursday, December 27, 2012 9:56:56 PM UTC-5, Burak Emre Kabakcı wrote:

Here is the mapping of my index: elasticsearch mapping · GitHub

I have used parent & child mapping to normalize data but as far as I
understand there is no way to get any fields from _parent document. Now,
I'm trying to find the best way of storing artist_name and release_name in
song type. I won't query these fields but I should be able to get them when
I query other fields like name. There will be millions of song and I
don't have much memory so I suspect that these duplicate values may cause
out of memory. For now, artist_name and release_name fields are has
"index": "no" property and I turned on compression for _source field. Do
you have any efficient suggestion for avoiding duplicate values like
querying multiple queries or hacky way to get fields from _parent document
or denormalized data is the only way to handle this kindle of problem?

--

Topic		Replies	Views
How can I handle duplicate data in Elasticsearch? Elasticsearch	3	2931	July 6, 2017
Duplicate results in resultset Elasticsearch	4	3002	July 6, 2017
Index only and do not store in source Elasticsearch	3	3097	October 19, 2020
Deduplication - Nested, Parent/Child OR None Elasticsearch	2	1501	July 6, 2017
The effect of multi-fields and copy_to on storage size Elasticsearch	5	2248	July 6, 2017

Any suggestion for duplicate data?

Related topics