I am facing a problem with Logstash on a single node. I need to ingest a ~30MB xml file with the following format
<ROOT>
<field1>...</field1>
<row>
<fieldA>...</fieldA>
<fieldB>...</fieldB>
</row>
<row>
<fieldA>...</fieldA>
<fieldB>...</fieldB>
</row>
:
</ROOT>
There are no problems if I shrink the file a bit, I tried with 8MB and logstash works correctly. However if I try to ingest the whole file I start getting these errors from elasticsearch:
[2018-07-03T17:53:51,870][INFO ][o.e.m.j.JvmGcMonitorService] [RurP6jQ] [gc][125] overhead, spent [262ms] collecting in the last [1s]
[2018-07-03T17:53:54,576][INFO ][o.e.c.m.MetaDataIndexTemplateService] [RurP6jQ] adding template [logstash] for index patterns [track*, planned*]
[2018-07-03T17:54:32,252][INFO ][o.e.m.j.JvmGcMonitorService] [RurP6jQ] [gc][165] overhead, spent [266ms] collecting in the last [1s]
[2018-07-03T17:56:08,744][INFO ][o.e.c.r.a.AllocationService] [RurP6jQ] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[track][0]] ...]).
[2018-07-03T17:58:06,301][INFO ][o.e.m.j.JvmGcMonitorService] [RurP6jQ] [gc][378] overhead, spent [294ms] collecting in the last [1s]
[2018-07-03T17:59:22,795][INFO ][o.e.m.j.JvmGcMonitorService] [RurP6jQ] [gc][454] overhead, spent [279ms] collecting in the last [1s]
While logstash outputs:
[WARN ] 2018-07-03 17:56:01.570 [Ruby-0-Thread-7@[main]>worker2: :1] elasticsearch - Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://localhost:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://localhost:9200/, :error_message=>"Elasticsearch Unreachable: [http://localhost:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[ERROR] 2018-07-03 17:56:01.606 [Ruby-0-Thread-7@[main]>worker2: :1] elasticsearch - Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://localhost:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[WARN ] 2018-07-03 17:56:04.169 [Ruby-0-Thread-7@[main]>worker2: :1] elasticsearch - UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[ERROR] 2018-07-03 17:56:04.177 [Ruby-0-Thread-7@[main]>worker2: :1] elasticsearch - Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[INFO ] 2018-07-03 17:56:05.030 [Ruby-0-Thread-4: :1] elasticsearch - Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[WARN ] 2018-07-03 17:56:05.064 [Ruby-0-Thread-4: :1] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"}
and so on.
If I don't kill logstash it keeps ingesting data, but the strange thing is that I only have about 30k rows in my file but the ingestion doesn't stop ( I even found 400k+ indexed rows when I tried to let them run).
Has anyone ever experienced this?