Index size on disk

Hi All,

I am small query related to raw data vs size of disk.I am using es 5.7 and below is the observation.
Can someone please help me to understand this size factor.

Here raw data is having 1.5 KB data but when i am calling es indices api its showing as 10KB.
So the question data in disk will increase with 1:10 ratio ?.In mapping i used keyword and date
type data type only

Raw data:

     "_source": {
    "service_key": "eaa05d04-6171-11e7-8292-005056944c2b",
    "api_req_cmplt_time": "2018-03-15T06:16:56.441Z",
    "handler_req_time": "2018-03-15T06:16:59.609Z",
    "channel_req_cmplt_time": "2018-03-15T06:17:00.276Z",
    "secured_key": "eaa05d04-6171-11e7-8292-005056944c2b",
    "trans_id": "ba867688-3e02-4e68-bbcb-2b7ca88ac643",
    "type": "IMICONNECT_TRANS_TIME_TAKEN",
    "source_tid": "ba867688-3e02-4e68-bbcb-2b7ca88ac643",
    "request_source": "1",
    "api_req_time": "2018-03-15T06:16:56.440Z",
    "datetime": "2018-03-15T06:17:06.000Z",
    "@timestamp": "2018-03-15T06:17:13.154Z",
    "filename": "/logdata/mlogserverj/encrypted_files/__302007___24025990174359981.log_1.done",
    "handler_req_cmplt_time": "2018-03-15T06:16:59.773Z",
    "channel_req_time": "2018-03-15T06:16:59.849Z",
    "rule_action_tid": "null",
    "total_elapsed_time": 3836,
    "channel_id": "1"
  }

Regards,
Chhavi

You meant 5.6.7 I suppose.

Nevermind, could you share your mapping as well?

FYI in 5.x series, we are still generating the _all field which might not be useful for you. You may want to disable it.
Also we are storing the _source json field.

So technically we are generating more data at index time than the raw JSON document.

Hi Dadoonet,

Please find mapping here.Also i want to know what is _all field
{

    "template" : "*",
    "settings" : {
      "index" : {
        "number_of_shards":1
      }
    },
    "order" : 0,
    "mappings" : {
      "_default_" : {
        "dynamic_templates" : [ {
          "message_field" : {
            "mapping" : {
              "index" : "not_analyzed",
              "norms" : true,
              "fielddata" : {
                "format" : "disabled"
              },
              "type" : "keyword"
            },
            "match_mapping_type" : "string",
            "match" : "message"
          }
        }, {
          "string_fields" : {
          "match" : "*",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "keyword", "index" : "not_analyzed", "norms" : true,
            "fielddata" : { "format" : "disabled" }
          }
        }
        }, {
          "long_fields" : {
            "mapping" : {
              "doc_values" : true,
              "type" : "long"
            },
            "match_mapping_type" : "long",
            "match" : "*"
          }
        }, {
          "date_fields" : {
            "mapping" : {
              "doc_values" : true,
              "type" : "date"
            },
            "match_mapping_type" : "date",
            "match" : "*"
          }
        }],
        "properties" : {
        "request":    { "type": "text"  },
		 "response":    { "type": "text"  },
		 "description":    { "type": "text"  },
		 "message":    { "type": "text"  },
		  "filename":    { "type": "text"  },
		   "extraparams":    { "type": "text"  },
          "@timestamp" : {
            "doc_values" : true,
            "type" : "date"
          }
        }
      }
    }
}

Regards,
Chhavi

You shared the template, not the mapping.
Could you share the mapping please?

The _all field doc: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/mapping-all-field.html

Hi dadoonet,

Sorry for that.Please find mapping.

  {
  "test_imiconnect_trans_time_taken2018-03-15" : {
    "mappings" : {
      "_default_" : {
        "_all" : {
          "enabled" : false
        },
        "dynamic_templates" : [
          {
            "message_field" : {
              "match" : "message",
              "match_mapping_type" : "string",
              "mapping" : {
                "fielddata" : {
                  "format" : "disabled"
                },
                "index" : "not_analyzed",
                "norms" : false,
                "type" : "keyword"
              }
            }
          },
          {
            "string_fields" : {
              "match" : "*",
              "match_mapping_type" : "string",
              "mapping" : {
                "fielddata" : {
                  "format" : "disabled"
                },
                "index" : "not_analyzed",
                "norms" : false,
                "type" : "keyword"
              }
            }
          },
          {
            "long_fields" : {
              "match" : "*",
              "match_mapping_type" : "long",
              "mapping" : {
                "doc_values" : true,
                "type" : "long"
              }
            }
          },
          {
            "date_fields" : {
              "match" : "*",
              "match_mapping_type" : "date",
              "mapping" : {
                "doc_values" : true,
                "type" : "date"
              }
            }
          }
        ],
        "properties" : {
          "@timestamp" : {
            "type" : "date"
          },
          "description" : {
            "type" : "text"
          },
          "extraparams" : {
            "type" : "text"
          },
          "filename" : {
            "type" : "text"
          },
          "message" : {
            "type" : "text"
          },
          "request" : {
            "type" : "text"
          },
          "response" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

Regards,
Chhavi

I can see a lot of text fields in your mapping.
They are using the default analyzer. That might emit a lot of tokens then.

That could explain.

Hi Dadoonet,

The field which defined as text are not available in raw data,only related to keyword and datetime field i am using.
so the my question is any way to find out for keyword,datetime elastic search will take this much space.

Regards,
Chhavi

May be this: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/cluster-nodes-stats.html#node-indices-stats

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.