Filebeat stops sending to es eventually, bulk index 400

JSkier · June 12, 2017, 9:29pm

Hello,

I'm running ES, kibana, and beat version 5.4.1, ArchLinux (64bit), nginx-mainline 1.13.1 on all nodes, and I'm having issues with them consistently sending data to ES.

I'm relatively new to this whole project, so if I left something out or you need a configuration or log file, please let me know.

I have elasticsearch on a server, and pi with filebeats on it. The ES server is setup to receive over nginx proxy over https with basic auth. It starts out okay, but after about 10 - 40 minutes the filebeat log on the pi says this:

2017-06-12T16:10:10-05:00 ERR Failed to perform any bulk index operations: 400 Bad Request
2017-06-12T16:10:14-05:00 ERR Connecting error publishing events (retrying): Failed to parse JSON response: invalid character '<' looking for beginning of value
2017-06-12T16:10:20-05:00 ERR Connecting ....

The last error just keeps repeating, until I restart filebeat on the pi (and then usually it's good for a little bit). The file is valid json (one line).

Here is the beat configuration:

filebeat.prospectors:
   - input_type: log
     tags: ["json"]
     json.keys_under_root: true
     json.add_error_key: false
     #json.message_key: log
     close_inactive: 12m
     paths:
             - /tmp/somefile.json
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["https://eshosthere.com:443"]

  # Optional protocol and basic auth credentials.
  protocol: "https"
  username: "blah"
  password: "blah"
  index: "blah-%{+yyyy.MM.dd}"

nginx-mainline location settings (using ssl, port 443), based on a guide I found. I've tried to be very simple before, but this seems to work the best:

location / {
    rewrite ^/(.*) /$1 break;
    proxy_ignore_client_abort on;
    proxy_pass http://localhost:9200;
    proxy_redirect http://localhost:9200 http://search.somehost.com/;
    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header  Host $http_host;
    access_log off;
}

andrewkroh · June 12, 2017, 10:24pm

Have you checked the Elasticsearch logs for errors or warnings? How about the nginx proxy?

JSkier · June 12, 2017, 11:29pm

Yes, nothing in elasticsearch logs, but nginx has some issues. The timing doesn't really fit though. A few of these hours ago, pertaining to the bulk index uri (but nothing recently or around the time of when this goes out):
upstream server temporarily disabled while connecting to upstream

I set close_inactive to 0, that didn't do anything.

Basically, this is just a one line of json that changes every 5 minutes. stdin and filebeat is too buggy, otherwise I'd use that.

andrewkroh · June 13, 2017, 2:08pm

There error message is related to the response coming back from Elasticsearch. That setting influences how files are read so it will have no impact.

Filebeat expects a specific JSON response to bulk requests. So any sort of error from nginx could cause this issue. I would probably debug this by disabling HTTPS and then doing a packet capture of the HTTP traffic with tcpdump until the error occurs. Then take a look at the packet capture in Wireshark to see the exact HTTP request and response that caused the issue. And go from there.

You could also enable debug logging on the Filebeat side, but I don't think the full bulk requests or responses are logged.

JSkier · June 13, 2017, 2:46pm

Thanks, makes sense to try http and see what shows up.

I also am having the same issues with metricbeat too. So, perhaps a connection issue.

EDIT:
Tried changing a few things; settled with TLS on port 9200 and scraped the sub domains (I think there may be a DNS issue, so I'm working around this for now). This worked for a bit, but now seeing a new error in the filebeat log after some time:

2017-06-13T13:41:43-05:00 ERR Failed to perform any bulk index operations: Post https://somehost.com:9200/_bulk: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Any insight on this (perhaps increasing the timeout somewhere?) is greatly appreciated.

I am trying port 80 by IP address now, I'll let that sit for a bit. Not sure I can get a SAN for my IP with LetsEncrypt, but if this works, I'll try that next.

steffens · June 14, 2017, 9:19am

This error message indicates the HTTP request publishing events timed out (no response by either ES or some other device). You can adjust the timeout by increasing output.elasticsearch.timeout (default is 90 seconds). Makes we wonder if batch request contains too many too large events, ES not properly sized or some other component is introducing a timeout. When increasing the timeout, also have a look at your proxies not having a too low timeout on HTTP requests.

JSkier · June 14, 2017, 7:48pm

Thanks for the tip, I'll give that a shot.

Anyway, I think I've narrowed this down. It appears filebeat really wants to use port 9200 and no subdomain, which I can work with, but that's strange. With https, using a subdomain and port 443 and it thinks the data is invalid (I'm guessing the < is the start of an html error page, so nginx is doing it's job).

I give it port 9200 on the subdomain and it times out occasionally without the timeout rule, but catches up eventually. No subdomain and port 9200 and it works pretty well but still occasional timeouts (it also catches up). I'll try the setting now.

Side note I'm beefing up the virtual server to see if that helps with timeouts. Also, the client node is a pi (version 1), so perhaps there are some shortcomings there. It's basically using python script to take dht22 temperature data and putting it into a json file filebeats monitors.

steffens · June 15, 2017, 10:14am

you can configure whichever port number you want. If you don't configure a port number, the default Elasticsearch port number is used (9200). When using port 443 also ensure you set the scheme to HTTPS. Given the certificate authority for a valid certificate is available it should work (you can disable certificate verification in the output as well).

JSkier · June 15, 2017, 5:16pm

Seems to be all good now, going to 9200 with or without a subdomain.

I can test the subdomain and port 443 more, do you have the syntax for that?

I previously set the output hostname to subdomain.somedomain.com:443 and protocol to https (using basic auth). I think maybe because of how the package is, filebeat is it's own user. Perhaps some strange priv port issue, although I don't see why sending out over 443 would be an issue for a *nix user.

steffens · June 16, 2017, 10:26am

Any port is ok, but the response must come from Elasticsearch itself. Just configure your host as https://<domain>:<port> and you should be fine. Sounds more like a problem with the proxy.

system · July 14, 2017, 10:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat proxy permissions? Beats filebeat	10	1810	December 28, 2016
Filebeat not connecting directly to Elasticsearch from particular machine Beats	24	10177	June 27, 2016
Filebeat can not publish events to Elastic Search Beats filebeat	17	8584	February 13, 2017
Filebeat fails to send logs to Elasticsearch Beats filebeat	2	1144	June 20, 2019
Filebeats is failing to perform the bulk index Beats filebeat	7	7065	February 6, 2017

Filebeat stops sending to es eventually, bulk index 400

Related topics