Cost of storing duplicate fields, but not indexing the duplicated field via _dynamic=false

harsimranb · November 14, 2019, 12:05pm

Hi, I am storing duplicate fields in my document (solution to an interesting use case). Using, _dynamic=false, I am only indexing the one set of fields, not the duplicate ones. This helps keep my index small, but our _source is storing the duplicate data. My question is, what is the storage cost of this approach? If I'm concerned about disk usage, am I saving a ton by not indexing the duplicate fields? Is storing the duplicate fields a problem in any way? (Aside from the questionable use itself)?

What I originally wanted to do was, set the index field name to be the duplicate field name, but keep the original field name in the _source. For example, in the example below, Text_1, Num_1, and TextLong_1 in the index store _source.name, age, and description, respectively. But the field names are different between index and _source. The only way to do this, it seems, is with field alias, which we've investigated, and it does not git with what we're trying to do.

Example

Index:

{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "dynamic": false,
        "properties" : {
            "Text_1" : { "type" : "text" },
            "TextLong_1" : { "type" : "text" },
            "Num_1" : { "type" : "integer" }
        }
    }
}

Index Document in Elastic Search

{
  "DocType": "Doc1",
  "name" : "my name goes here",
  "Text_1" : "my name goes here",
  "age": 21,
  "Num_1": 21,
  "description": "Doc 1 description, age is 21"
  "TextLong_1": "Doc 1 description, age is 21"
}

Retrieve Document from Index

{ 
   "_index":"fieldtestdynamic",
   "_type":"_doc",
   "_id":"1",
   "_version":1,
   "_seq_no":0,
   "_primary_term":2,
   "found":true,
   "_source":{ 
      "Text_1":"my name goes here", **# Indexed**
      "TextLong_1":"Doc 1 description, age is 21", **# Indexed**
      "Num_1":21, **# Indexed**
      "DocType":"Doc1", **# Indexed**
      "name":"my name goes here",
      "description":"Doc 1 description, age is 21",
      "age":21
   }
}

system · December 12, 2019, 12:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index pattern fields - disk space cost? Elasticsearch	5	455	June 16, 2020
The effect of multi-fields and copy_to on storage size Elasticsearch	5	2312	July 6, 2017
Keyword analyzer Elasticsearch	9	837	June 17, 2017
Document compression - duplicate fields Elasticsearch	2	393	November 18, 2020
How to avoid duplicate fields? Elasticsearch	8	198	January 24, 2025

Cost of storing duplicate fields, but not indexing the duplicated field via _dynamic=false

Example

Related topics