Estoy utilizando una implementación de la herramienta Wazuh 2.0 https://documentation.wazuh.com/2.0/getting-started/index.html, en una instalación agente-Servidor tengo instalado 31 agentes y mi servidor cuenta con una instalación de ELK para el manejo de los registros que envíen los agentes.
Desde hace unas semanas estoy teniendo el siguiente error en el kibana:
Error: Request Timeout after 30000ms
Y también este error :
Error: in cell #1: [illegal_argument_exception] Trying to query 1401 shards, which is over the limit of 1000. This limit exists because querying many shards at the same time can make the job of the coordinating node very CPU and/or memory intensive. It is usually a better idea to have a smaller number of larger shards. Update [action.search.shard_count.limit] to a greater value if you really want to query that many shards at the same time.
at throwWithCell (/usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:30:11)
at /usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:160:13
at arrayEach (/usr/share/kibana/node_modules/lodash/index.js:1289:13)
at Function. (/usr/share/kibana/node_modules/lodash/index.js:3345:13)
at /usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:152:9
at bound (domain.js:280:14)
at runBound (domain.js:293:12)
at tryCatcher (/usr/share/kibana/node_modules/bluebird/js/main/util.js:26:23)
at Promise._settlePromiseFromHandler (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:503:31)
at Promise._settlePromiseAt (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:577:18)
at Promise._settlePromises (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:693:14)
at Async._drainQueue (/usr/share/kibana/node_modules/bluebird/js/main/async.js:123:16)
at Async._drainQueues (/usr/share/kibana/node_modules/bluebird/js/main/async.js:133:10)
at Immediate.Async.drainQueues (/usr/share/kibana/node_modules/bluebird/js/main/async.js:15:14)
at runCallback (timers.js:666:20)
at tryOnImmediate (timers.js:639:5)
*********Algunos datos del servidor:
[root@xxxxxxx ~]# df -h
S.ficheros Tamaño Usados Disp Uso% Montado en
/dev/mapper/cl-root 97G 40G 58G 42% /
devtmpfs 3,9G 0 3,9G 0% /dev
tmpfs 3,9G 0 3,9G 0% /dev/shm
tmpfs 3,9G 17M 3,8G 1% /run
tmpfs 3,9G 0 3,9G 0% /sys/fs/cgroup
/dev/sda1 1014M 230M 785M 23% /boot
tmpfs 782M 0 782M 0% /run/user/0
[root@xxxxxxxx ~]# free
total used free shared buff/cache available
Mem: 8002828 2283280 132632 16928 5586916 5312056
Swap: 2097148 0 2097148
************Versiones utilizadas
CentOS 7
Elasticsearch 5.5.0
Logstash 5.5.0
Kibana 5.5.0
GET Health Status ---------------------------
[root@xxxxxxx ~]# curl 'localhost:9200/_cat/health?v'
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1517923628 07:27:08 wazuh yellow 1 1 1411 1411 0 0 10 0 - 99.3%
GET Nodes -----------------------------------
[root@xxxxxxx ~]# curl 'localhost:9200/_cat/nodes?v'
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 94 98 23 0.55 0.85 0.92 mdi * node-1
GET cluster health --------------------------
[root@xxxxxxx~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
"cluster_name" : "wazuh",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 1411,
"active_shards" : 1411,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 99.29627023223082
GET cluster health indices --------------------------
[root@xxxxxx ~]# curl -XGET 'http://localhost:9200/_cluster/health?level=indices?pretty=true'
GET cluster health shards --------------------------
[root@xxxxxxxx ~]# curl -XGET 'http://localhost:9200/_cluster/health?level=shards?pretty=true'
Según esos resultados veo que tengo una gran cantidad de fragmentos para un solo nodo
¿ Cómo podría reducir esos fragmentos? para ver si se corrige el problema.
O en tal caso crear otro nodo en el mismo servidor para mejorar el rendimiento.
En lo que me puedan ayudar se los agradecería.