Hola
Estoy utilizando una implementación de la herramienta Wazuh 2.0 https://documentation.wazuh.com/2.0/getting-started/index.html, en una instalación agente-Servidor tengo instalado 31 agentes y mi servidor cuenta con una instalación de ELK para el manejo de los registros que envíen los agentes.
Desde hace unas semanas estoy teniendo el siguiente error en el kibana:
Error: Request Timeout after 30000ms
ErrorAbstract@http://xxxxxxxx:5601/bundles/kibana.bundle.js?v=14849:12:24939
StatusCodeError@http://xxxxxxxx5601/bundles/kibana.bundle.js?v=14849:12:28395
Transport.prototype.request/requestTimeoutId<@http://xxxxxxxxx:5601/bundles/kibana.bundle.js?v=14849:13:4431
Transport.prototype._timeout/id<@http://xxxxxxxxx:5601/bundles/kibana.bundle.js?v=14849:13:4852
Y también este error :
Error:  in cell #1: [illegal_argument_exception] Trying to query 1401 shards, which is over the limit of 1000. This limit exists because querying many shards at the same time can make the job of the coordinating node very CPU and/or memory intensive. It is usually a better idea to have a smaller number of larger shards. Update [action.search.shard_count.limit] to a greater value if you really want to query that many shards at the same time.
at throwWithCell (/usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:30:11)
at /usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:160:13
at arrayEach (/usr/share/kibana/node_modules/lodash/index.js:1289:13)
at Function. (/usr/share/kibana/node_modules/lodash/index.js:3345:13)
at /usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:152:9
at bound (domain.js:280:14)
at runBound (domain.js:293:12)
at tryCatcher (/usr/share/kibana/node_modules/bluebird/js/main/util.js:26:23)
at Promise._settlePromiseFromHandler (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:503:31)
at Promise._settlePromiseAt (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:577:18)
at Promise._settlePromises (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:693:14)
at Async._drainQueue (/usr/share/kibana/node_modules/bluebird/js/main/async.js:123:16)
at Async._drainQueues (/usr/share/kibana/node_modules/bluebird/js/main/async.js:133:10)
at Immediate.Async.drainQueues (/usr/share/kibana/node_modules/bluebird/js/main/async.js:15:14)
at runCallback (timers.js:666:20)
at tryOnImmediate (timers.js:639:5)
*********Algunos datos del servidor:
[root@xxxxxxx ~]# df -h
S.ficheros          Tamaño Usados  Disp Uso% Montado en
/dev/mapper/cl-root    97G    40G   58G  42% /
devtmpfs              3,9G      0  3,9G   0% /dev
tmpfs                 3,9G      0  3,9G   0% /dev/shm
tmpfs                 3,9G    17M  3,8G   1% /run
tmpfs                 3,9G      0  3,9G   0% /sys/fs/cgroup
/dev/sda1            1014M   230M  785M  23% /boot
tmpfs                 782M      0  782M   0% /run/user/0
[root@xxxxxxxx ~]# free
total        used        free      shared  buff/cache   available
Mem:        8002828     2283280      132632       16928     5586916     5312056
Swap:       2097148           0     2097148
************Versiones utilizadas
CentOS 7
Elasticsearch 5.5.0
Logstash 5.5.0
Kibana 5.5.0
******Elasticsearch:
GET Health Status ---------------------------
[root@xxxxxxx ~]# curl 'localhost:9200/_cat/health?v'
epoch      timestamp cluster status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1517923628 07:27:08  wazuh   yellow          1         1   1411 1411    0    0       10             0                  -                 99.3%
GET Nodes -----------------------------------
[root@xxxxxxx ~]# curl 'localhost:9200/_cat/nodes?v'
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1           94          98  23    0.55    0.85     0.92 mdi       *      node-1
GET cluster health --------------------------
[root@xxxxxxx~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "wazuh",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 1411,
"active_shards" : 1411,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 99.29627023223082
GET cluster health indices --------------------------
[root@xxxxxx ~]# curl -XGET 'http://localhost:9200/_cluster/health?level=indices?pretty=true'
{
"cluster_name":"wazuh",
"status":"yellow",
"timed_out":false,
"number_of_nodes":1,
"number_of_data_nodes":1,
"active_primary_shards":1411,
"active_shards":1411,
"relocating_shards":0,
"initializing_shards":0,
"unassigned_shards":10,
"delayed_unassigned_shards":0,
"number_of_pending_tasks":0,
"number_of_in_flight_fetch":0,
"task_max_waiting_in_queue_millis":0,
"active_shards_percent_as_number":9
GET cluster health shards --------------------------
[root@xxxxxxxx ~]# curl -XGET 'http://localhost:9200/_cluster/health?level=shards?pretty=true'
{
"cluster_name":"wazuh",
"status":"yellow",
"timed_out":false,
"number_of_nodes":1,
"number_of_data_nodes":1,
"active_primary_shards":1411,
"active_shards":1411,
"relocating_shards":0,
"initializing_shards":0,
"unassigned_shards":10,
"delayed_unassigned_shards":0,
"number_of_pending_tasks":0,
"number_of_in_flight_fetch":0,
"task_max_waiting_in_queue_millis":0,
"active_shards_percent_as_number":99.2962702322
Según esos resultados veo que tengo una gran cantidad de fragmentos para un solo nodo
¿ Cómo podría reducir esos fragmentos? para ver si se corrige el problema.
O en tal caso crear otro nodo en el mismo servidor para mejorar el rendimiento.
En lo que me puedan ayudar se los agradecería.
Saludos