Storage gains in removing redundant message fields


(Sandeepkanabar) #1

Consider the following document (no logstash is involved here):

{
    "_index": "day-wise-2018.11.25",
    "_type": "type1",
    "_id": "11111111111111111111111111111111111111111111111111111111",
    "_score": 1,
    "_source": {
      "id": "012345678910111213",
      "LOAD_AVG_MIN": 1.000001,
      "ts": "2018-11-25T01:41:04.045Z",
      "u": "Load",
      "foo_field": "abcdef-ghi-jk-lmnopq-rstu-v-release_181010_270",
      "@id": "11111111111111111111111111111111111111111111111111111111",
      "@timestamp": "2018-11-25T01:41:04.221Z",
      "@message": "{\"id\":\"012345678910111213\",\"LOAD_AVG_MIN\":1.000001,\"ts\":\"2018-11-	25T01:41:04.045Z\",\"u\":\"Load\",\"foo_field\":\"abcdef-ghi-jk-lmnopq-rstu-v-release_181010_270\"}",
      "@owner": "foo_owner",
      "@log_group": "type1",
    }
  }

As you can see, the @message field looks to be redundant here since the contents of it are indexed as individual K,V pairs. for e.g the id, LOAD_AVG_MIN, ts etc are all indexed are separate fields (and their mappings too are defined).

  1. Will it result in considerable reduction in size if I do away with the @message field? In the example above the contents of the field are pretty small but it can be large as well
  2. I'm also thinking to do away with the redundant @id and @log_group fields which have the same values as _id and _type respectively. Does it yield any benefit?

(Sandeepkanabar) #2

@warkolm - any suggestions here?