When can keyword field with doc values enabled consume fielddata cache?

ES version - 6.2.3

Sometimes after running an aggregation query on a keyword field enabled with doc_values, my fielddata cache is still being consumed which was unexpected for me. When can this happen?

eg. here is a mapping of my test index

curl localhost:9200/data?pretty
{
  "data" : {
    "aliases" : { },
    "mappings" : {
      "type1" : {
        "properties" : {
          "age" : {
            "type" : "integer"
          },
          "city" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "city_dv" : {
            "type" : "keyword"
          },
          "city_nondv" : {
            "type" : "keyword",
            "doc_values" : false
          },
          "city_text" : {
            "type" : "text"
          },
          "comments" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "month" : {
            "type" : "keyword"
          },
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "salary" : {
            "type" : "float"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1523053143368",
        "number_of_shards" : "1",
        "number_of_replicas" : "0",
        "uuid" : "qkzge3m8T0-r2pEcAm3reg",
        "version" : {
          "created" : "6020399"
        },
        "provided_name" : "data"
      }
    }
  }
}

Here is the query:

curl -H "Content-Type: application/json" http://localhost:9200/data/_search?pretty -d '{
    "aggs" : {
        "city_breakup" : { "terms" : { "field" : "city_dv" } }
    }
  }'

Here is a look at cache consumption after the query (it does reset back to 0 after a while but still is seeing the field 'city_dv' take up fielddata cache expected?

curl "localhost:9200/_stats/fielddata?fields=*&pretty"
{
  "_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
  },
  "_all" : {
"primaries" : {
  "fielddata" : {
    "memory_size_in_bytes" : 45128624,
    "evictions" : 0,
    "fields" : {
      "city_dv" : {
        "memory_size_in_bytes" : 45128624
      }
    }
  }
},
"total" : {
  "fielddata" : {
    "memory_size_in_bytes" : 45128624,
    "evictions" : 0,
    "fields" : {
      "city_dv" : {
        "memory_size_in_bytes" : 45128624
      }
    }
  }
}
  },
  "indices" : {
"data" : {
  "primaries" : {
    "fielddata" : {
      "memory_size_in_bytes" : 45128624,
      "evictions" : 0,
      "fields" : {
        "city_dv" : {
          "memory_size_in_bytes" : 45128624
        }
      }
    }
  },
  "total" : {
    "fielddata" : {
      "memory_size_in_bytes" : 45128624,
      "evictions" : 0,
      "fields" : {
        "city_dv" : {
          "memory_size_in_bytes" : 45128624
        }
      }
    }
  }
}
  }
}
1 Like

I encounter the same problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

The fielddata settings you might pick for a field are only one of several potential contributors to the overall internal fielddata accounting that is reported.

Even if you don't select to use fielddata in your mapping certain actions may cause other field-related structures to be loaded into memory e.g. a terms aggregation may trigger the use of a global ordinals map under the covers to service the request. These costs fall under the umbrella of fielddata-related stats

1 Like