Hi, I am storing duplicate fields in my document (solution to an interesting use case). Using, _dynamic=false
, I am only indexing the one set of fields, not the duplicate ones. This helps keep my index small, but our _source is storing the duplicate data. My question is, what is the storage cost of this approach? If I'm concerned about disk usage, am I saving a ton by not indexing the duplicate fields? Is storing the duplicate fields a problem in any way? (Aside from the questionable use itself)?
What I originally wanted to do was, set the index field name to be the duplicate field name, but keep the original field name in the _source
. For example, in the example below, Text_1, Num_1, and TextLong_1 in the index store _source.name, age, and description, respectively. But the field names are different between index and _source. The only way to do this, it seems, is with field alias, which we've investigated, and it does not git with what we're trying to do.
Example
Index:
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"dynamic": false,
"properties" : {
"Text_1" : { "type" : "text" },
"TextLong_1" : { "type" : "text" },
"Num_1" : { "type" : "integer" }
}
}
}
Index Document in Elastic Search
{
"DocType": "Doc1",
"name" : "my name goes here",
"Text_1" : "my name goes here",
"age": 21,
"Num_1": 21,
"description": "Doc 1 description, age is 21"
"TextLong_1": "Doc 1 description, age is 21"
}
Retrieve Document from Index
{
"_index":"fieldtestdynamic",
"_type":"_doc",
"_id":"1",
"_version":1,
"_seq_no":0,
"_primary_term":2,
"found":true,
"_source":{
"Text_1":"my name goes here", **# Indexed**
"TextLong_1":"Doc 1 description, age is 21", **# Indexed**
"Num_1":21, **# Indexed**
"DocType":"Doc1", **# Indexed**
"name":"my name goes here",
"description":"Doc 1 description, age is 21",
"age":21
}
}