Extremely Large Documents: Querying and Dealing with

We are in a situation where extremely large documents were indexed (to text fields) and our ElasticSearch instance has been going down/crazy recently with out of disk and out of memory. We've also run into HTTP response errors just viewing the documents.

As I understand it there is no way to limit a text field on the mapping? Just keyword fields.

How can we limit text fields to say 10'000 bytes?

How can we query for large documents above a certain size?

What does look like the query you are sending to elasticsearch ?

Any query which returns these large documents causes problems.

I am now trying a _reindex with a chopping function but nothing is working in painless today.

{
  "source": {
    "index": "logs-{{system}}-000071"
  },
  "dest": {
    "index": "logs-{{system}}-000071-bak"
  },
  "script": {
    "source": "if (ctx._source.raw != null && ctx._source.raw.length > 10000) ctx._source.raw = ctx._source.raw.substring(0, 10000);",
    "lang": "painless"
  }
}

Gives:

if (ctx._source.raw != null && ctx._source.raw.length > 10000) 
                                              ^---- HERE

Trying:

if (ctx._source.raw != null) ctx._source.raw = ctx._source.raw.substring(0, 10000);

Gives

java.base/java.lang.String.checkBoundsBeginEnd(String.java:3756)
java.base/java.lang.String.substring(String.java:1902)
ctx._source.raw = ctx._source.raw.substring(0, 10000);
                                 ^---- HERE"

Ah .length() - this is not JS!

if (ctx._source.raw != null && ctx._source.raw.length() > 10000) ctx._source.raw = ctx._source.raw.substring(0, 10000);

Re-index is taking forever - task is reporting:

{
  "error": {
    "bytes_limit": 8160437862,
    "bytes_wanted": 8183228176,
    "durability": "TRANSIENT",
    "reason": "[parent] Data too large, data for [] would be [8183228176/7.6gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8183228176/7.6gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=54818544/52.2mb, in_flight_requests=245930012/234.5mb, accounting=48907372/46.6mb]",
    "root_cause": [
      {
        "bytes_limit": 8160437862,
        "bytes_wanted": 8183228176,
        "durability": "TRANSIENT",
        "reason": "[parent] Data too large, data for [] would be [8183228176/7.6gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8183228176/7.6gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=54818544/52.2mb, in_flight_requests=245930012/234.5mb, accounting=48907372/46.6mb]",
        "type": "circuit_breaking_exception"
      }
    ],
    "type": "circuit_breaking_exception"
  },
  "status": 429
}

Can you get the output from the _cluster/stats?pretty&human API?
What does hot threads and slow log show on the node?

1 Like

jvm:

 "mem": {
              "heap_used": "3.3gb",
              "heap_used_in_bytes": 3645778976,
              "heap_max": "8gb",
              "heap_max_in_bytes": 8589934592
          },
          "threads": 209

Slow log is empty.

We have a real issue now as one index is inaccessible - I can't even re-index it out trimming the fields as I go because it's unallocated.

POST /_reindex
{
    "error": {
        "root_cause": [],
        "type": "search_phase_execution_exception",
        "reason": "",
        "phase": "query",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
            "type": "search_phase_execution_exception",
            "reason": "Search rejected due to missing shards [[logs-prod-000072][0]]. Consider using `allow_partial_search_results` setting to bypass this error.",
            "phase": "query",
            "grouped": true,
            "failed_shards": []
        }
    },
    "status": 503
}

The full output is useful please.

{
    "_nodes": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "cluster_name": "elasticsearch",
    "cluster_uuid": "D0Wrr_RiTAerzQ_jqVeKfw",
    "timestamp": 1632290871722,
    "status": "red",
    "indices": {
        "count": 443,
        "shards": {
            "total": 443,
            "primaries": 443,
            "replication": 0.0,
            "index": {
                "shards": {
                    "min": 1,
                    "max": 1,
                    "avg": 1.0
                },
                "primaries": {
                    "min": 1,
                    "max": 1,
                    "avg": 1.0
                },
                "replication": {
                    "min": 0.0,
                    "max": 0.0,
                    "avg": 0.0
                }
            }
        },
        "docs": {
            "count": 1903528844,
            "deleted": 12094356
        },
        "store": {
            "size": "417.3gb",
            "size_in_bytes": 448150240168
        },
        "fielddata": {
            "memory_size": "72.8mb",
            "memory_size_in_bytes": 76429840,
            "evictions": 0
        },
        "query_cache": {
            "memory_size": "88.7mb",
            "memory_size_in_bytes": 93066813,
            "total_count": 416337,
            "hit_count": 77843,
            "miss_count": 338494,
            "cache_size": 6832,
            "cache_count": 8383,
            "evictions": 1551
        },
        "completion": {
            "size": "0b",
            "size_in_bytes": 0
        },
        "segments": {
            "count": 6751,
            "memory": "46.9mb",
            "memory_in_bytes": 49226116,
            "terms_memory": "27.7mb",
            "terms_memory_in_bytes": 29056864,
            "stored_fields_memory": "11.3mb",
            "stored_fields_memory_in_bytes": 11926544,
            "term_vectors_memory": "0b",
            "term_vectors_memory_in_bytes": 0,
            "norms_memory": "988.8kb",
            "norms_memory_in_bytes": 1012544,
            "points_memory": "0b",
            "points_memory_in_bytes": 0,
            "doc_values_memory": "6.8mb",
            "doc_values_memory_in_bytes": 7230164,
            "index_writer_memory": "163.7mb",
            "index_writer_memory_in_bytes": 171663768,
            "version_map_memory": "6.2mb",
            "version_map_memory_in_bytes": 6536624,
            "fixed_bit_set": "3.2mb",
            "fixed_bit_set_memory_in_bytes": 3402232,
            "max_unsafe_auto_id_timestamp": 1632210708253,
            "file_sizes": {}
        },
        "mappings": {
            "field_types": [
                {
                    "name": "binary",
                    "count": 7,
                    "index_count": 3
                },
                {
                    "name": "boolean",
                    "count": 123,
                    "index_count": 27
                },
                {
                    "name": "byte",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "date",
                    "count": 590,
                    "index_count": 440
                },
                {
                    "name": "double",
                    "count": 3,
                    "index_count": 1
                },
                {
                    "name": "flattened",
                    "count": 2,
                    "index_count": 2
                },
                {
                    "name": "float",
                    "count": 82,
                    "index_count": 12
                },
                {
                    "name": "geo_point",
                    "count": 124,
                    "index_count": 124
                },
                {
                    "name": "geo_shape",
                    "count": 4,
                    "index_count": 4
                },
                {
                    "name": "half_float",
                    "count": 56,
                    "index_count": 14
                },
                {
                    "name": "integer",
                    "count": 1027,
                    "index_count": 335
                },
                {
                    "name": "ip",
                    "count": 328,
                    "index_count": 279
                },
                {
                    "name": "keyword",
                    "count": 8323,
                    "index_count": 442
                },
                {
                    "name": "long",
                    "count": 2340,
                    "index_count": 438
                },
                {
                    "name": "nested",
                    "count": 38,
                    "index_count": 11
                },
                {
                    "name": "object",
                    "count": 1426,
                    "index_count": 162
                },
                {
                    "name": "scaled_float",
                    "count": 1,
                    "index_count": 1
                },
                {
                    "name": "short",
                    "count": 127,
                    "index_count": 126
                },
                {
                    "name": "text",
                    "count": 1647,
                    "index_count": 426
                }
            ]
        },
        "analysis": {
            "char_filter_types": [],
            "tokenizer_types": [],
            "filter_types": [],
            "analyzer_types": [],
            "built_in_char_filters": [],
            "built_in_tokenizers": [],
            "built_in_filters": [],
            "built_in_analyzers": []
        }
    },
    "nodes": {
        "count": {
            "total": 1,
            "coordinating_only": 0,
            "data": 1,
            "ingest": 1,
            "master": 1,
            "ml": 1,
            "remote_cluster_client": 1,
            "transform": 1,
            "voting_only": 0
        },
        "versions": [
            "7.7.0"
        ],
        "os": {
            "available_processors": 8,
            "allocated_processors": 8,
            "names": [
                {
                    "name": "Windows Server 2012 R2",
                    "count": 1
                }
            ],
            "pretty_names": [
                {
                    "pretty_name": "Windows Server 2012 R2",
                    "count": 1
                }
            ],
            "mem": {
                "total": "31.9gb",
                "total_in_bytes": 34359136256,
                "free": "4.2gb",
                "free_in_bytes": 4591697920,
                "used": "27.7gb",
                "used_in_bytes": 29767438336,
                "free_percent": 13,
                "used_percent": 87
            }
        },
        "process": {
            "cpu": {
                "percent": 2
            },
            "open_file_descriptors": {
                "min": -1,
                "max": -1,
                "avg": 0
            }
        },
        "jvm": {
            "max_uptime": "6.6d",
            "max_uptime_in_millis": 572732911,
            "versions": [
                {
                    "version": "14",
                    "vm_name": "OpenJDK 64-Bit Server VM",
                    "vm_version": "14+36",
                    "vm_vendor": "AdoptOpenJDK",
                    "bundled_jdk": true,
                    "using_bundled_jdk": true,
                    "count": 1
                }
            ],
            "mem": {
                "heap_used": "6.5gb",
                "heap_used_in_bytes": 7009845720,
                "heap_max": "8gb",
                "heap_max_in_bytes": 8589934592
            },
            "threads": 211
        },
        "fs": {
            "total": "549.9gb",
            "total_in_bytes": 590554853376,
            "free": "75.3gb",
            "free_in_bytes": 80886726656,
            "available": "75.3gb",
            "available_in_bytes": 80886726656
        },
        "plugins": [],
        "network_types": {
            "transport_types": {
                "security4": 1
            },
            "http_types": {
                "security4": 1
            }
        },
        "discovery_types": {
            "zen": 1
        },
        "packaging_types": [
            {
                "flavor": "default",
                "type": "zip",
                "count": 1
            }
        ],
        "ingest": {
            "number_of_pipelines": 18,
            "processor_stats": {
                "conditional": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "convert": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "date": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "dot_expander": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "geoip": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "grok": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "gsub": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "json": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "lowercase": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "pipeline": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "remove": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "rename": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "script": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "set": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "split": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                },
                "user_agent": {
                    "count": 0,
                    "failed": 0,
                    "current": 0,
                    "time": "0s",
                    "time_in_millis": 0
                }
            }
        }
    }
}

That's a lot of shards for 8gb of HEAP. In general we recommend no more than 20 shards per gb of HEAP.
So no more than 160 shards in your case. You have 443 shards here.

I'd suggest first to increase the HEAP size to 16gb as you have 32gb of RAM on your machine.

That could help.

Can I unallocate some indices temporarily to get this one large index back online for a cleanup? :thinking:

use the _close API.

OK, will it also help to delete unwanted documents from indices? My feeling is ES doesn't really delete (docs still appear as docs.deleted in _cat/indices).

Also, how do I know when I've deleted/_closed enough? Will my index automatically attach?

Not at all.

If you are removing documents (and not indices), then you need to forcemerge the index if you want the disk space to be reclaimed.

POST /INDEXNAME/_forcemerge?max_num_segments=1

Or

POST /INDEXNAME/_forcemerge?only_expunge_deletes=true

See Force merge API | Elasticsearch Guide [7.14] | Elastic

1 Like

OK _forcemerge?only_expunge_deletes=true does nothing to decrease the documents.deleted count at all but _forcemerge?max_num_segments=1 certainly does.
Thanks