Using Scroll Api causes Out of Memory errors

lakshmiteja_salapu · September 6, 2021, 1:16am

I am trying to use scroll for getting over 41 million documents (156 GB) but it fails with out of memory error after getting 410000 documents

Using batch size of 5000 documents and 30s - scroll time (also tried changing batch size)

warkolm · September 6, 2021, 1:34am

It'd help if you shared a bit more info please.

What does the scroll request look like?
What is the actual error, from the Elasticsearch log?
What is the output from the _cluster/stats?pretty&human API?

lakshmiteja_salapu · September 6, 2021, 3:05am

I am using the below query

   client.ConnectionSettings.QueryStringParameters["scroll"] = "30s";
            var searchResults = client.Search<Call>(s => s
                .Size(5000)
                .Index("***")
                .Aggregations( a => a
                    .Terms("xxx",t => t
                        .Field(f => f.xxx)
                        .Include(".*xxx.*")
                        .MinimumDocumentCount(2)
                        )
                    )
            );

and looping in code using the next scroll ID
Getting below error

Elasticsearch.Net.UnexpectedElasticsearchClientException
  HResult=0x80131500
  Message=Exception of type 'System.OutOfMemoryException' was thrown.
  Source=Elasticsearch.Net
  StackTrace:
   at Elasticsearch.Net.Transport`1.Request[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters)
   at Elasticsearch.Net.ElasticLowLevelClient.DoRequest[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters)
   at Nest.ElasticClient.DoRequest[TRequest,TResponse](TRequest p, IRequestParameters

lakshmiteja_salapu · September 6, 2021, 3:12am

cluster statistics

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : xxx
  "cluster_uuid" : xxx
  "timestamp" : 1630897173450,
  "status" : "green",
  "indices" : {
    "count" : 39,
    "shards" : {
      "total" : 78,
      "primaries" : 39,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 2,
          "avg" : 2.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 101874138,
      "deleted" : 1149082
    },
    "store" : {
      "size" : "735.6gb",
      "size_in_bytes" : 789849023402,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "3.9mb",
      "memory_size_in_bytes" : 4179768,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "1.6mb",
      "memory_size_in_bytes" : 1712328,
      "total_count" : 19544173,
      "hit_count" : 689508,
      "miss_count" : 18854665,
      "cache_size" : 1967,
      "cache_count" : 13638,
      "evictions" : 11671
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 853,
      "memory" : "29mb",
      "memory_in_bytes" : 30510924,
      "terms_memory" : "21.4mb",
      "terms_memory_in_bytes" : 22518880,
      "stored_fields_memory" : "551.8kb",
      "stored_fields_memory_in_bytes" : 565048,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "3mb",
      "norms_memory_in_bytes" : 3171264,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "4mb",
      "doc_values_memory_in_bytes" : 4255732,
      "index_writer_memory" : "0b",
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory" : "0b",
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set" : "25.3kb",
      "fixed_bit_set_memory_in_bytes" : 25968,
      "max_unsafe_auto_id_timestamp" : 1629308868402,
      "file_sizes" : { }
    },
   
    "versions" : [
      {
        "version" : "7.6.1",
        "index_count" : 5,
        "primary_shard_count" : 5,
        "total_primary_size" : "209.3kb",
        "total_primary_bytes" : 214376
      },
      {
        "version" : "7.11.2",
        "index_count" : 34,
        "primary_shard_count" : 34,
        "total_primary_size" : "367.7gb",
        "total_primary_bytes" : 394840071562
      }
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 3,
      "coordinating_only" : 0,
      "data" : 3,
      "data_cold" : 3,
      "data_content" : 3,
      "data_hot" : 3,
      "data_warm" : 3,
      "ingest" : 3,
      "master" : 3,
      "ml" : 3,
      "remote_cluster_client" : 3,
      "transform" : 3,
      "voting_only" : 0
    },
    "versions" : [
      "7.11.2"
    ],
    "os" : {
      "available_processors" : 24,
      "allocated_processors" : 24,
      "names" : [
        {
          "name" : "Linux",
          "count" : 3
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.9 (Maipo)",
          "count" : 3
        }
      ],
      "mem" : {
        "total" : "188.2gb",
        "total_in_bytes" : 202105995264,
        "free" : "1.2gb",
        "free_in_bytes" : 1388679168,
        "used" : "186.9gb",
        "used_in_bytes" : 200717316096,
        "free_percent" : 1,
        "used_percent" : 99
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 0
      },
      "open_file_descriptors" : {
        "min" : 500,
        "max" : 591,
        "avg" : 538
      }
    },
    "jvm" : {
      "max_uptime" : "18.4d",
      "max_uptime_in_millis" : 1590359798,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 3
        }
      ],
      "mem" : {
        "heap_used" : "13.9gb",
        "heap_used_in_bytes" : 15015866480,
        "heap_max" : "48gb",
        "heap_max_in_bytes" : 51539607552
      },
      "threads" : 306
    },
    "fs" : {
      "total" : "2.6tb",
      "total_in_bytes" : 2897687347200,
      "free" : "1.9tb",
      "free_in_bytes" : 2107608080384,
      "available" : "1.9tb",
      "available_in_bytes" : 2107608080384
    },
    "plugins" : [
      {
        "name" : "analysis-phonetic",
        "version" : "7.11.2",
        "elasticsearch_version" : "7.11.2",
        "java_version" : "1.8",
        "description" : "The Phonetic Analysis plugin integrates phonetic token filter analysis with elasticsearch.",
        "classname" : "org.elasticsearch.plugin.analysis.AnalysisPhoneticPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false,
        "licensed" : false,
        "type" : "isolated"
      },
      {
        "name" : "analysis-icu",
        "version" : "7.11.2",
        "elasticsearch_version" : "7.11.2",
        "java_version" : "1.8",
        "description" : "The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.",
        "classname" : "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false,
        "licensed" : false,
        "type" : "isolated"
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 3
      },
      "http_types" : {
        "security4" : 3
      }
    },
    "discovery_types" : {
      "zen" : 3
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 3
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 2,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

warkolm · September 6, 2021, 4:47am

Please don't post pictures of text or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them

lakshmiteja_salapu · September 6, 2021, 5:25am

I have removed the error snap and added the error details.

warkolm · September 6, 2021, 5:26am

Thanks, is there an OOM error in the Elasticsearch logs as well?

lakshmiteja_salapu · September 6, 2021, 5:52am

Will update about the logs soon. Need to set it up.

I am trying to read all the data using Nest API for data processing.

stevejgordon · September 6, 2021, 6:54am

Hi,

I can see this is a client-side memory exception. You are dealing with a huge amount of data here. The NEST library itself will cause allocations for each search, but those should be short-lived in most cases and collected during the next gen-0 GC. At scale though, with a tight search loop, it's possible those are not being collected quickly enough to release the memory. What .NET runtime are you using, how much memory does your system have available, what OS are you using and how are you running the app (i.e. is it in a constrained environment such as Docker)?

How complex is your Call type? Each result will allocate an instance of that. Also, after you receive the 5,000 results, are you holding onto those beyond each search (in a List for example) or processing them before performing the next scrolled search?

Which version of NEST are you using? We made some changes to memory pooling behaviour in recent releases.

To narrow down the cause, I'd recommend you perform some memory profiling to determine which objects are consuming the most memory and living the longest. If you are able to collect a memory dump from the crashed application, that would be most useful. This will help us identify what is contributing to the OOM exception.

Cheers,
Steve

system · October 4, 2021, 6:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scroll API(C# NEST) fails Elasticsearch	8	6450	June 2, 2018
[es/search] failed: [search_phase_execution_exception] all shards failed Elasticsearch language-clients	7	15069	July 10, 2023
Issues with Elasticsearch Scroll API results Elasticsearch	1	710	March 17, 2021
ElasticSearch Cluster JVM Heap Increasing level while doing Queries and Writting Elasticsearch	4	833	July 1, 2019
Bug: memory leak while scrolling over index Elasticsearch	4	1922	July 6, 2017

Using Scroll Api causes Out of Memory errors

Related topics