ElasticSearch Windows Service 100% CPU

Also got similar SO question

Using ElasticSearch 5.4.0 as Windows Service. Windows Server 2016 with 16 GB RAM.

Installed via command: elasticsearch-service.bat manager
Java options:

-Xms8g
-Xmx8g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+DisableExplicitGC
-XX:+AlwaysPreTouch
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-Delasticsearch
-Des.path.home=C:\elasticsearch
-Des.default.path.logs=C:\elasticsearch\logs
-Des.default.path.data=C:\elasticsearch\data
-Des.default.path.conf=C:\elasticsearch\config

Initial memory pool: 8192 MB
Maximum memory pool: 8192 MB
Thread stack size: 1024 KB.

Filled with data once after index creation with norconex crawler. Total over 12k documents stored, index size is 1,5GB. After it finished working the process is almost 100% in Task Manager all the time. It lowers sometime to 0% or 25%, but revert back to 95% after a minute.

ElasticSearch.log (over 150MB after few hours) file is filled with records of this type:

[2017-09-25T16:23:13,166][INFO ][o.e.m.j.JvmGcMonitorService] [Zugx4zq] [gc][9415] overhead, spent [364ms] collecting in the last [1s]
[2017-09-25T16:23:14,534][INFO ][o.e.m.j.JvmGcMonitorService] [Zugx4zq] [gc][9416] overhead, spent [647ms] collecting in the last [1.3s]
[2017-09-25T16:23:15,550][INFO ][o.e.m.j.JvmGcMonitorService] [Zugx4zq] [gc][9417] overhead, spent [397ms] collecting in the last [1s]
[2017-09-25T16:23:16,708][WARN ][o.e.m.j.JvmGcMonitorService] [Zugx4zq] [gc][9418] overhead, spent [657ms] collecting in the last [1.1s]
[2017-09-25T16:23:17,745][INFO ][o.e.m.j.JvmGcMonitorService] [Zugx4zq] [gc][9419] overhead, spent [494ms] collecting in the last [1s]

Requests executing too long or failed with exeption:

org.elasticsearch.transport.RemoteTransportException: [Zugx4zq][127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@25194f on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@48fa409f[Running, pool size = 7, active threads = 7, queued tasks = 1000, completed tasks = 93134]]

org.elasticsearch.transport.RemoteTransportException: [Zugx4zq][127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@34fbd561 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@48fa409f[Running, pool size = 7, active threads = 7, queued tasks = 1000, completed tasks = 118501]]

I dont't understand who generating this kind of loading on this service. Where are all this tasks from (queued tasks = 1000)? How to monitor service loading? What kind of service fine tuning i missed? I'm not a DevOps or professional System Administrator, i'm a .net programmer, so all this java apps is a dark wood for me.

1 Like

That looks like the search queue backing up. What have you got hitting the cluster with searches?
Do you see any entries in the elasticsearch_index_search_slowlog.log log file that give a clue as to long-running searches?

There is no direct access to server, which hosting elastic service, from "Internet". Only from backend throu local network. In Task manager ther are very slow traffic, 15-30 kb\s. Where i can find "elasticsearch_index_search_slowlog.log" file? In log subfolder? Should i enable something in settings, if there is no such file?

Yes, the elasticsearch_index_search_slowlog.log log file will be in the logs directory configured at C:\elasticsearch\logs

elasticsearch_index_indexing_slowlog and elasticsearch_index_search_slowlog both are empty.

Given no single search is reported as slow it is possibly an overwhelming volume of concurrent search requests that is to blame.
Can you say more about the volume and type of searches being executed?

Querying large number of shards can also help fill up the queue quickly. How many indices and shards do you have in the cluster? You can use the cat indices API to get this.

Increased heapsize to 16GB (-Xms16g -Xmx16g) everithing is same.

What info about volume is needed?
Index settings setted with this:

{
	"mappings":{
		"pages": {
			"properties": {
				"content": {
					"type": "text",
					"analyzer": "russian",
					"term_vector" : "with_positions_offsets",
					"fields": {
	                	"keyword": {
	                		"type": "keyword",
		                	"ignore_above": 256
	                	},
	                	"raw": {
				            "type": "string",
				            "index": "analyzed"
				          }
            		}
				},
				"title": {
	            	"type": "text",
	            	"analyzer": "russian",
	            	"fields": {
	            		"keyword": {
		                	"type": "keyword",
		                	"ignore_above": 256
		            	},
		            	"raw": {
				            "type": "string",
				            "index": "analyzed"
				          }
		            }
	        	},
	        	"DateShort": {
	            	"type": "date",
	            	"format": "yyyy-MM-dd"
	        	}
			}
		}
	}
}

Search query is like this (all the same, changed only "query")

{
  "from" : 0,
  "size" : 100,
  "query" : {
    "bool" : {
      "must" : [
        {
          "match" : {
            "content.raw" : {
              "query" : "fees",
              "type" : "phrase",
              "operator" : "OR",
              "prefix_length" : 0,
              "max_expansions" : 50,
              "fuzzy_transpositions" : true,
              "lenient" : false,
              "zero_terms_query" : "NONE",
              "boost" : 1.0
            }
          }
        }
      ],
      "filter" : [
        {
          "term" : {
            "isDocument" : {
              "value" : "1",
              "boost" : 1.0
            }
          }
        }
      ],
      "disable_coord" : false,
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : {
    "includes" : [
      "DateShort",
      "document_reference",
      "title",
      "materialType"
    ],
    "excludes" : [ ]
  },
  "sort" : [
    {
      "DateShort" : {
        "order" : "desc"
      }
    },
    {
      "_score" : {
        "order" : "desc"
      }
    }
  ],
  "highlight" : {
    "fields" : {
      "content.raw" : {
        "pre_tags" : [
          "<b>"
        ],
        "post_tags" : [
          "</b>"
        ],
        "fragment_size" : 150,
        "number_of_fragments" : 3
      }
    }
  }
}}

localhost:9200//_cat/indices
returns:
green open one_shard hOj2QdAvSj6I6ob5bQyVYA 1 0 17880 0 1.1gb 1.1gb

Some observations:

  • If your docs are typical news or web pages, content.keyword is not likely to be useful (and perhaps also title.keyword) if these are long items of free-text.
  • I don't think Your fuzzy settings will take effect unless you set "fuzziness" to something other than the default of zero and remove type:phrase
  • A prefix_length of zero can be expensive - see [1]
  • Highlighting many long docs can be expensive. Experiment with different highlighters or reduce number of results

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html#_parameters_6

Using NEST to communicate with Elastic service from backend. So maybe this extra params is added automatically.
It generate request body like this:

{
  "from": 0,
  "size": 100,
  "sort": [
    {
      "DateShort": {
        "order": "desc"
      }
    },
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "highlight": {
    "fields": {
      "content.raw": {
        "pre_tags": [
          "<b>"
        ],
        "post_tags": [
          "</b>"
        ],
        "fragment_size": 150,
        "number_of_fragments": 3
      }
    }
  },
  "_source": {
    "includes": [
      "DateShort",
      "document_reference",
      "title",
      "materialType"
    ]
  },
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content.raw": {
              "query": "биржа",
              "type": "phrase"
            }
          }
        }
      ],
      "filter": [
        {
          "term": {
            "isDocument": {
              "value": "0"
            }
          }
        }
      ]
    }
  }
}

Previous request body taken from elasticsearch-service-x64-stdout log file.

Query rates e.g. "an average of 10 queries per second being issued by each of 5 webservers".
Some indication of types of queries e.g. free text searches vs aggregations etc

Common query example, generated by NEST, i posted above.
There are 2 node of web server, which send requests to ElastiSearch service. But i have no access to them, and can't get stats. Is there a way to get loading statistics from the ElastiSearch service itself?

Monitoring queue sizes over time is generally a good way to gauge if you're pushing the servers too hard.

The free basic x-pack module includes the monitoring feature that can plot these stats over time and much more: Monitor and Manage Elasticsearch | Elastic

Installed x-pack. Max Heap size is setted to 24 GB. No CPU overloading. Here are screens from x-pack.



This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.