Data size on disk increase 15 times when moved from hive to elasticsearch

I cannot log onto the node, Why is request not working on Postman/Insomnia. It doesn't give timeout there. Is there anything else that can be done to check the status.

I have not used the analyze disk index usage API, but would expect it to need to read a lot of data from disk. What kind of storage does your node have? Is it local SSD? What does disk I/O and CPU usage look like when you run the API? Are there any notifications around long or frequent GC in the logs?

Yes there are multiple GC logs, we use mount points for data storage. I have added a new node to this cluster. But still it doesn't resolve for multiplying data by 15 times at least.

Seems like there is a bit of an issue, where in order to know what is causing the disk space usage, the _disk_usage API would be the best solution, but given that it takes longer for the request to run than the proxy's timeout, this method won't work.

I think there are 2 possible solutions for finding out what is causing the disk usage.

  1. See if the proxy timeout can be increased
    • This is the recommended solution, as _disk_usage will not be the only long running API call in Elasticsearch and fixing this could solve other headaches in the future. Elasticsearch has its own timeout system, so I find generally letting Elasticsearch handle the timeouts rather than a proxy, a much better solution.
  2. If changing the proxy timeout is not an option, the other option I see is starting from the ground up with the data and seeing what causes the disk usage.
    • What I mean by this is, take your Hive data and first put it into an Elasticsearch index with no mappings (this won't be very useful from a data perspective, but will at least give you a base understanding of what the data size would be without any mappings).

    • Once you have a base understanding of the data size in Elasticsearch without any mappings, slowly start creating new indices with more and more mappings and adding your data to it. As you add more mappings, you'll obviously see a disk usage increase, but you will at least better understand that; If you add mappings A, B, and C disk usage increases by X. This will allow you to do at least some level of tuning of disk usage without the _disk_usage API.

    • Also, as @Christian_Dahlqvist mention:

      I would also recommend you go though this guide if you have not already.

      The guide should help provide an understanding of some of the fundamentals of Elasticsearch disk usage tuning.

Also, another question, could you provide the output of _stats for the index in question? (This API should be faster, and provide a bit more context around the index that is being talked about)

1 Like

Apologies for delayed response. Was working on same trying to change some mappings. Please find stats below:

{"_shards":{"total":1,"successful":1,"failed":0},"_all":{"primaries":{"docs":{"count":204344332,"deleted":17},"shard_stats":{"total_count":1},"store":{"size_in_bytes":1729960022402,"total_data_set_size_in_bytes":1729960022402,"reserved_in_bytes":0},"indexing":{"index_total":0,"index_time_in_millis":0,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":21924,"query_time_in_millis":7051525,"query_current":0,"fetch_total":21924,"fetch_time_in_millis":471,"fetch_current":0,"scroll_total":0,"scroll_time_in_millis":0,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":20971520},"refresh":{"total":2,"total_time_in_millis":0,"external_total":2,"external_total_time_in_millis":2,"listeners":0},"flush":{"total":1,"periodic":1,"total_time_in_millis":0},"warmer":{"current":0,"total":1,"total_time_in_millis":1},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":31232,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":421,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":0,"size_in_bytes":55,"uncommitted_operations":0,"uncommitted_size_in_bytes":55,"earliest_last_modified_age":522483726},"request_cache":{"memory_size_in_bytes":1107928,"evictions":0,"hit_count":19696,"miss_count":1805},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0},"bulk":{"total_operations":0,"total_time_in_millis":0,"total_size_in_bytes":0,"avg_time_in_millis":0,"avg_size_in_bytes":0}},"total":{"docs":{"count":204344332,"deleted":17},"shard_stats":{"total_count":1},"store":{"size_in_bytes":1729960022402,"total_data_set_size_in_bytes":1729960022402,"reserved_in_bytes":0},"indexing":{"index_total":0,"index_time_in_millis":0,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":21924,"query_time_in_millis":7051525,"query_current":0,"fetch_total":21924,"fetch_time_in_millis":471,"fetch_current":0,"scroll_total":0,"scroll_time_in_millis":0,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":20971520},"refresh":{"total":2,"total_time_in_millis":0,"external_total":2,"external_total_time_in_millis":2,"listeners":0},"flush":{"total":1,"periodic":1,"total_time_in_millis":0},"warmer":{"current":0,"total":1,"total_time_in_millis":1},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":31232,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":421,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":0,"size_in_bytes":55,"uncommitted_operations":0,"uncommitted_size_in_bytes":55,"earliest_last_modified_age":522483726},"request_cache":{"memory_size_in_bytes":1107928,"evictions":0,"hit_count":19696,"miss_count":1805},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0},"bulk":{"total_operations":0,"total_time_in_millis":0,"total_size_in_bytes":0,"avg_time_in_millis":0,"avg_size_in_bytes":0}}},"indices":{"dev_sample":{"uuid":"WacLhoypSYWAsJH8_Qtsag","primaries":{"docs":{"count":204344332,"deleted":17},"shard_stats":{"total_count":1},"store":{"size_in_bytes":1729960022402,"total_data_set_size_in_bytes":1729960022402,"reserved_in_bytes":0},"indexing":{"index_total":0,"index_time_in_millis":0,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":21924,"query_time_in_millis":7051525,"query_current":0,"fetch_total":21924,"fetch_time_in_millis":471,"fetch_current":0,"scroll_total":0,"scroll_time_in_millis":0,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":20971520},"refresh":{"total":2,"total_time_in_millis":0,"external_total":2,"external_total_time_in_millis":2,"listeners":0},"flush":{"total":1,"periodic":1,"total_time_in_millis":0},"warmer":{"current":0,"total":1,"total_time_in_millis":1},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":31232,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":421,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":0,"size_in_bytes":55,"uncommitted_operations":0,"uncommitted_size_in_bytes":55,"earliest_last_modified_age":522483726},"request_cache":{"memory_size_in_bytes":1107928,"evictions":0,"hit_count":19696,"miss_count":1805},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0},"bulk":{"total_operations":0,"total_time_in_millis":0,"total_size_in_bytes":0,"avg_time_in_millis":0,"avg_size_in_bytes":0}},"total":{"docs":{"count":204344332,"deleted":17},"shard_stats":{"total_count":1},"store":{"size_in_bytes":1729960022402,"total_data_set_size_in_bytes":1729960022402,"reserved_in_bytes":0},"indexing":{"index_total":0,"index_time_in_millis":0,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":21924,"query_time_in_millis":7051525,"query_current":0,"fetch_total":21924,"fetch_time_in_millis":471,"fetch_current":0,"scroll_total":0,"scroll_time_in_millis":0,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":20971520},"refresh":{"total":2,"total_time_in_millis":0,"external_total":2,"external_total_time_in_millis":2,"listeners":0},"flush":{"total":1,"periodic":1,"total_time_in_millis":0},"warmer":{"current":0,"total":1,"total_time_in_millis":1},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":31232,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":421,"memory_in_bytes":0,"terms_memory_in_bytes":0,"stored_fields_memory_in_bytes":0,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":0,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":0,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":0,"size_in_bytes":55,"uncommitted_operations":0,"uncommitted_size_in_bytes":55,"earliest_last_modified_age":522483726},"request_cache":{"memory_size_in_bytes":1107928,"evictions":0,"hit_count":19696,"miss_count":1805},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0},"bulk":{"total_operations":0,"total_time_in_millis":0,"total_size_in_bytes":0,"avg_time_in_millis":0,"avg_size_in_bytes":0}}}}}

Also, the issue that I am facing due to this amount of disk usage is this parent circuit breaker exception data size too large.

Do you think you could reproduce this with similar mappings and data that you can share publicly?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.