Hello all,
Some background:
My Elasticsearch cluster is in Azure. I used the Azure template ( https://www.elastic.co/blog/deploying-elasticsearch-on-microsoft-azure ) to initially create the cluster with a Kibana VM, Logstash VM, 3 Master VMs, and 3 Data Node VMs. After configuration everything worked great and I was taking in 1000s of logs/ day.
I recently re-ran the template to create another 6 Data Node VMs. The idea was to use the new 6 as part of a hot-warm-cold architectore. I read somewhere that doing so would not replace what already exists. This was mostly true.
The script created 6 more nodes, but also replaced the elasticsearch.yml of each of my previously existing nodes. The script did first back-up my original elasticsearch.yml (I also did have my own backups) so I was able to quickly get back to where I once was.
However, starting logstash backup I now get a bunch of error codes 400.
[2019-10-01T21:05:34,671][ERROR][logstash.outputs.elasticsearch] Encountered a retryable error. Will Retry with exponential backoff {:code=>400, :url=>"http://10.15.1.108:9200/_bulk"}
Scrolling through I also see this error:
Read timed out {:url=>http://elastic:xxxxxx@10.15.1.108:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@10.15.1.108:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
10.15.1.108 is my loadbalancer and did previously work before all of these changes. I have since removed all the new VMs from behind the loadbalancer, but I still get the above error. I've even removed all my logstash config files save one and still get this error.
Hell, I've even tried this:
sudo /usr/share/logstash/bin/logstash -e 'input { stdin { } } output { elasticsearch { hosts=>["10.15.1.108:9200"] user=> "elastic" password=>"**" } }'
And still get the same error. I have also tried directly outputting to one of the elasticsearch data nodes, but still the same error.
Some interesting finds, through doing a curl on each of the nodes I see they all have different UUIDs. Should they be the same?
I also noticed one of the 3 nodes has a UUID of "na".
Could this be the culprit?
Further information: From my Kibana, if I set the date to display data from when everythign was working I do get expected results.
Any help whatsover would be greatly appreciated!