Using Scroll Api causes Out of Memory errors

I am trying to use scroll for getting over 41 million documents (156 GB) but it fails with out of memory error after getting 410000 documents

Using batch size of 5000 documents and 30s - scroll time (also tried changing batch size)

It'd help if you shared a bit more info please.

What does the scroll request look like?
What is the actual error, from the Elasticsearch log?
What is the output from the _cluster/stats?pretty&human API?

I am using the below query

   client.ConnectionSettings.QueryStringParameters["scroll"] = "30s";
            var searchResults = client.Search<Call>(s => s
                .Size(5000)
                .Index("***")
                .Aggregations( a => a
                    .Terms("xxx",t => t
                        .Field(f => f.xxx)
                        .Include(".*xxx.*")
                        .MinimumDocumentCount(2)
                        )
                    )
            );

and looping in code using the next scroll ID
Getting below error

Elasticsearch.Net.UnexpectedElasticsearchClientException
  HResult=0x80131500
  Message=Exception of type 'System.OutOfMemoryException' was thrown.
  Source=Elasticsearch.Net
  StackTrace:
   at Elasticsearch.Net.Transport`1.Request[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters)
   at Elasticsearch.Net.ElasticLowLevelClient.DoRequest[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters)
   at Nest.ElasticClient.DoRequest[TRequest,TResponse](TRequest p, IRequestParameters 


cluster statistics

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : xxx
  "cluster_uuid" : xxx
  "timestamp" : 1630897173450,
  "status" : "green",
  "indices" : {
    "count" : 39,
    "shards" : {
      "total" : 78,
      "primaries" : 39,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 2,
          "avg" : 2.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 101874138,
      "deleted" : 1149082
    },
    "store" : {
      "size" : "735.6gb",
      "size_in_bytes" : 789849023402,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "3.9mb",
      "memory_size_in_bytes" : 4179768,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "1.6mb",
      "memory_size_in_bytes" : 1712328,
      "total_count" : 19544173,
      "hit_count" : 689508,
      "miss_count" : 18854665,
      "cache_size" : 1967,
      "cache_count" : 13638,
      "evictions" : 11671
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 853,
      "memory" : "29mb",
      "memory_in_bytes" : 30510924,
      "terms_memory" : "21.4mb",
      "terms_memory_in_bytes" : 22518880,
      "stored_fields_memory" : "551.8kb",
      "stored_fields_memory_in_bytes" : 565048,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "3mb",
      "norms_memory_in_bytes" : 3171264,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "4mb",
      "doc_values_memory_in_bytes" : 4255732,
      "index_writer_memory" : "0b",
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory" : "0b",
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set" : "25.3kb",
      "fixed_bit_set_memory_in_bytes" : 25968,
      "max_unsafe_auto_id_timestamp" : 1629308868402,
      "file_sizes" : { }
    },
   
    "versions" : [
      {
        "version" : "7.6.1",
        "index_count" : 5,
        "primary_shard_count" : 5,
        "total_primary_size" : "209.3kb",
        "total_primary_bytes" : 214376
      },
      {
        "version" : "7.11.2",
        "index_count" : 34,
        "primary_shard_count" : 34,
        "total_primary_size" : "367.7gb",
        "total_primary_bytes" : 394840071562
      }
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 3,
      "coordinating_only" : 0,
      "data" : 3,
      "data_cold" : 3,
      "data_content" : 3,
      "data_hot" : 3,
      "data_warm" : 3,
      "ingest" : 3,
      "master" : 3,
      "ml" : 3,
      "remote_cluster_client" : 3,
      "transform" : 3,
      "voting_only" : 0
    },
    "versions" : [
      "7.11.2"
    ],
    "os" : {
      "available_processors" : 24,
      "allocated_processors" : 24,
      "names" : [
        {
          "name" : "Linux",
          "count" : 3
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.9 (Maipo)",
          "count" : 3
        }
      ],
      "mem" : {
        "total" : "188.2gb",
        "total_in_bytes" : 202105995264,
        "free" : "1.2gb",
        "free_in_bytes" : 1388679168,
        "used" : "186.9gb",
        "used_in_bytes" : 200717316096,
        "free_percent" : 1,
        "used_percent" : 99
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 0
      },
      "open_file_descriptors" : {
        "min" : 500,
        "max" : 591,
        "avg" : 538
      }
    },
    "jvm" : {
      "max_uptime" : "18.4d",
      "max_uptime_in_millis" : 1590359798,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 3
        }
      ],
      "mem" : {
        "heap_used" : "13.9gb",
        "heap_used_in_bytes" : 15015866480,
        "heap_max" : "48gb",
        "heap_max_in_bytes" : 51539607552
      },
      "threads" : 306
    },
    "fs" : {
      "total" : "2.6tb",
      "total_in_bytes" : 2897687347200,
      "free" : "1.9tb",
      "free_in_bytes" : 2107608080384,
      "available" : "1.9tb",
      "available_in_bytes" : 2107608080384
    },
    "plugins" : [
      {
        "name" : "analysis-phonetic",
        "version" : "7.11.2",
        "elasticsearch_version" : "7.11.2",
        "java_version" : "1.8",
        "description" : "The Phonetic Analysis plugin integrates phonetic token filter analysis with elasticsearch.",
        "classname" : "org.elasticsearch.plugin.analysis.AnalysisPhoneticPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false,
        "licensed" : false,
        "type" : "isolated"
      },
      {
        "name" : "analysis-icu",
        "version" : "7.11.2",
        "elasticsearch_version" : "7.11.2",
        "java_version" : "1.8",
        "description" : "The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.",
        "classname" : "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false,
        "licensed" : false,
        "type" : "isolated"
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 3
      },
      "http_types" : {
        "security4" : 3
      }
    },
    "discovery_types" : {
      "zen" : 3
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 3
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 2,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

Please don't post pictures of text or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them :slight_smile:

I have removed the error snap and added the error details.

Thanks, is there an OOM error in the Elasticsearch logs as well?

Will update about the logs soon. Need to set it up.

I am trying to read all the data using Nest API for data processing.

Hi,

I can see this is a client-side memory exception. You are dealing with a huge amount of data here. The NEST library itself will cause allocations for each search, but those should be short-lived in most cases and collected during the next gen-0 GC. At scale though, with a tight search loop, it's possible those are not being collected quickly enough to release the memory. What .NET runtime are you using, how much memory does your system have available, what OS are you using and how are you running the app (i.e. is it in a constrained environment such as Docker)?

How complex is your Call type? Each result will allocate an instance of that. Also, after you receive the 5,000 results, are you holding onto those beyond each search (in a List for example) or processing them before performing the next scrolled search?

Which version of NEST are you using? We made some changes to memory pooling behaviour in recent releases.

To narrow down the cause, I'd recommend you perform some memory profiling to determine which objects are consuming the most memory and living the longest. If you are able to collect a memory dump from the crashed application, that would be most useful. This will help us identify what is contributing to the OOM exception.

Cheers,
Steve

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.