Why I see fielddata when doc_value is enabled in Aggregations?


#1

Based on Elastic Documents, every type except text(an analyzed string) supports doc_values which I suppose when available, should completely omit fielddata in Aggregation.

However this not the case for me, whenever I do term aggregation based on a keyword or ip type I see they are loaded as fieldata, although this is not happening for other types (e.g session_id as long type in this case)

Is this the correct behavior? if true, how can I prevent fielddata creation?

I'm using elasticsearch 6.5 and This is my mapping

{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 0,
      "codec": "best_compression"
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "time": {
          "type": "date",
          "format": "epoch_millis"
        },
        "session_token": {
          "type": "keyword"
        },
        "session_ref": {
          "type": "keyword"
        },
        "session_id": {
          "type": "long"
        },
        "src": {
          "type": "ip"
        },
        "version": {
          "type": "byte"
        }
      }
    }
  }
}

this is a sample aggregation which causes fielddata to get loaded

GET test_ind/_search?size=0
  {
  "aggs" : {
    "by_token":{
      "terms":{ 
        "field": "token",
        "size": 100
      }
    }
  }
  }

and here is fielddata status after aggregation

"test_ind" : {
  "uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
  "primaries" : {
    "fielddata" : {
      "memory_size_in_bytes" : 1564696,
      "evictions" : 0,
      "fields" : {
        "session_ref" : {
          "memory_size_in_bytes" : 0
        },
        "session_token" : {
          "memory_size_in_bytes" : 1564696
        }
      }
    }
  },
  "total" : {
    "fielddata" : {
      "memory_size_in_bytes" : 1564696,
      "evictions" : 0,
      "fields" : {
        "session_ref" : {
          "memory_size_in_bytes" : 0
        },
        "session_token" : {
          "memory_size_in_bytes" : 1564696
        }
      }
    }
  }
}

and here is segments stat

"test_ind" : {
  "uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
  "primaries" : {
    "segments" : {
      "count" : 8,
      "memory_in_bytes" : 472939,
      "terms_memory_in_bytes" : 423365,
      "stored_fields_memory_in_bytes" : 3504,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 0,
      "points_memory_in_bytes" : 41598,
      "doc_values_memory_in_bytes" : 4472,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    }
  },
  "total" : {
    "segments" : {
      "count" : 8,
      "memory_in_bytes" : 472939,
      "terms_memory_in_bytes" : 423365,
      "stored_fields_memory_in_bytes" : 3504,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 0,
      "points_memory_in_bytes" : 41598,
      "doc_values_memory_in_bytes" : 4472,
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    }
  }
}

(Mark Harwood) #2

See Global Ordinals discussion: Global ordinals performance and size on-heap
From there:

"They are monitored in the fielddata stats, we could have a dedicated section but if you don't have any text field that loads fielddata then the memory reported can be attributed to global ordinals entirely."


#3

thanks for the clarification.