Logstash 400

jtdepalm · October 1, 2019, 9:16pm

Hello all,

Some background:

My Elasticsearch cluster is in Azure. I used the Azure template ( https://www.elastic.co/blog/deploying-elasticsearch-on-microsoft-azure ) to initially create the cluster with a Kibana VM, Logstash VM, 3 Master VMs, and 3 Data Node VMs. After configuration everything worked great and I was taking in 1000s of logs/ day.

I recently re-ran the template to create another 6 Data Node VMs. The idea was to use the new 6 as part of a hot-warm-cold architectore. I read somewhere that doing so would not replace what already exists. This was mostly true.

The script created 6 more nodes, but also replaced the elasticsearch.yml of each of my previously existing nodes. The script did first back-up my original elasticsearch.yml (I also did have my own backups) so I was able to quickly get back to where I once was.

However, starting logstash backup I now get a bunch of error codes 400.

[2019-10-01T21:05:34,671][ERROR][logstash.outputs.elasticsearch] Encountered a retryable error. Will Retry with exponential backoff {:code=>400, :url=>"http://10.15.1.108:9200/_bulk"}

Scrolling through I also see this error:

Read timed out {:url=>http://elastic:xxxxxx@10.15.1.108:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@10.15.1.108:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

10.15.1.108 is my loadbalancer and did previously work before all of these changes. I have since removed all the new VMs from behind the loadbalancer, but I still get the above error. I've even removed all my logstash config files save one and still get this error.

Hell, I've even tried this:
sudo /usr/share/logstash/bin/logstash -e 'input { stdin { } } output { elasticsearch { hosts=>["10.15.1.108:9200"] user=> "elastic" password=>"**" } }'

And still get the same error. I have also tried directly outputting to one of the elasticsearch data nodes, but still the same error.

Some interesting finds, through doing a curl on each of the nodes I see they all have different UUIDs. Should they be the same?

I also noticed one of the 3 nodes has a UUID of "na".

Could this be the culprit?
Further information: From my Kibana, if I set the date to display data from when everythign was working I do get expected results.

Any help whatsover would be greatly appreciated!

jtdepalm · October 1, 2019, 9:18pm

I also have no idea why logastash is doing a _bulk request. I assume this is because I have many nodes?

Badger · October 1, 2019, 9:33pm

logstash always uses the bulk interface because it is more effecient (it can load 125 events with a single call).

I can't help with the 400 errors.

jtdepalm · October 2, 2019, 7:53pm

Some progress! After much checking of things (finding a data node is no longer part of the cluster -still not fixed) I determined that functionality resumes once I tell logstash to output to a new/different index.

Previously:
index => "logstash-%{+YYYY.MM.dd}"

Now:
index => "wincollect-%{+YYYY.MM.dd}"

Just that small change in a logstash config file and I'm back up and running (have bad data node turned off now)

Can someone explain to me what's happened here?

Badger · October 2, 2019, 8:11pm

If moving to a different index stopped the 400 errors that suggests to me that the data was incompatible with the index template. Were you getting mapping errors in the elasticsearch logs?

jtdepalm · October 2, 2019, 8:53pm

I checked elasticsearch.log on the running nodes and did not find any errors.

Could be that one of the main nodes was down and that caused the issue?

Thanks for the quick replies!

system · October 30, 2019, 9:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.