High OS memory usage and less heap memory usage during indexing

Hi,

I am trying to index 500 million records. I have heap_size of 16 GB, total memory of 32 GB.

I am trying to index 50,000 records in bulk in a loop till 500 million records with a single transport client connection in java.

However, After some time, I have noticed that overall CPU usage on elasticsearch has increased to nearly 100%, and indexing failed. However heap usage is not much.

Here are the node_stats

"os": {
            "timestamp": 1440515157491,
            "uptime_in_millis": 18871,
            "cpu": {
               "sys": 4,
               "user": 50,
               "idle": 45,
               "usage": 54,
               "stolen": 0
            },
            "mem": {
               "free_in_bytes": 133742592,
               "used_in_bytes": 34225528832,
               "free_percent": 0,
               "used_percent": 99,
               "actual_free_in_bytes": 221503488,
               "actual_used_in_bytes": 34137767936
            },
            "swap": {
               "used_in_bytes": 18765332480,
               "free_in_bytes": 49951268864
            }
         },
         "process": {
            "timestamp": 1440515158006,
            "open_file_descriptors": 1358,
            "cpu": {
               "percent": 10,
               "sys_in_millis": 333343,
               "user_in_millis": 279399,
               "total_in_millis": 612742
            },
            "mem": {
               "resident_in_bytes": 3216711680,
               "share_in_bytes": -1,
               "total_virtual_in_bytes": 3339325440
            }
         },
         "jvm": {
            "timestamp": 1440515158272,
            "uptime_in_millis": 17181206,
            "mem": {
               "heap_used_in_bytes": 2777733000,
               "heap_used_percent": 16,
               "heap_committed_in_bytes": 17145004032,
               "heap_max_in_bytes": 17145004032,
               "non_heap_used_in_bytes": 69697944,
               "non_heap_committed_in_bytes": 71225344,
               "pools": {
                  "young": {
                     "used_in_bytes": 130136048,
                     "max_in_bytes": 279183360,
                     "peak_used_in_bytes": 279183360,
                     "peak_max_in_bytes": 279183360
                  },
                  "survivor": {
                     "used_in_bytes": 34865152,
                     "max_in_bytes": 34865152,
                     "peak_used_in_bytes": 34865152,
                     "peak_max_in_bytes": 34865152
                  },
                  "old": {
                     "used_in_bytes": 2612731800,
                     "max_in_bytes": 16830955520,
                     "peak_used_in_bytes": 12738557344,
                     "peak_max_in_bytes": 16830955520
                  }
               }
            },
            "threads": {
               "count": 75,
               "peak_count": 82
            },
            "gc": {
               "collectors": {
                  "young": {
                     "collection_count": 21391,
                     "collection_time_in_millis": 1190698
                  },
                  "old": {
                     "collection_count": 11,
                     "collection_time_in_millis": 2329
                  }
               }
            },
            "buffer_pools": {
               "direct": {
                  "count": 111,
                  "used_in_bytes": 11379522,
                  "total_capacity_in_bytes": 11379522
               },
               "mapped": {
                  "count": 287,
                  "used_in_bytes": 35753712656,
                  "total_capacity_in_bytes": 35753712656
               }
            }
         },

Please let me know what is going on wrong here. Thanks

Try reducing the number of records you are sending at one time. You're probably getting rejections because you're over flowing the bulk queue because on your setup Elasticsearch can't keep up. Start out with like 500 records and increase it until you start seeing rejections. Once you've found an acceptable number for your environment then you need to retry any rejections you get:

https://www.elastic.co/guide/en/elasticsearch/guide/current/_monitoring_individual_nodes.html#_threadpool_section

If you look at the mapped section in buffer_pools, you have 33G of data mapped in the RAM.
It has completely used up your RAM and the OS could be trying to swap it to disk. How many shards do you have?

Your system is swapping. That's not good. Use smaller bulk size, back off strategy and retry in your code.

Turn off swapping. Turn on memlock.

Thanks Everyone
I have turned off swapping and turned on memlock. Also, now, I am indexing 750 records with wait of 4 seconds between every bulk insert call. With this overall memory usage on elasticsearch server seems to be stable now. But to me it seems that indexing rate is very low.
More points:
I am using HDD for indexing and OS is Windows 2008 r2

My question is:

  1. If I change to SDD from HDD will that make any impact on indexing and how much, also, are there any other ways by which I can improve upon this indexing rate?
  2. Which is most common/recommended OS for elasticsearch.

Puneet, there are some basic posts on the website. Also, check https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html?q=indexing%20performance