Http output to logstash


#1

Hi,

Is there any plan to implement output to logstash over http(s)? I am working on a project that will require all traffic to be over https, so lumberjack is not an option.

At this point I believe my best option will be to run logstash on the clients with a file input and https output, rather than using filebeat, but I am wondering if there is a better way.

Any ideas are appreciated!

Thanks,
Eric


Beats output to Logstash over HTTP?
(ruflin) #2

@teampants What about setting the port of the beats input and logstash output to 80 or 443?


(Steffen Siering) #3

Well, it doesn't change the fact it is not HTTP.


(Clay Gorman) #4

http input?

If you want to use beats though you would need to have multiple logstash's or build your own custom middleware


#5

Thanks for the help so far, all.

It does need to be https protocol, the port doesn't matter. I was considering using logstash http output on the clients and logstash http input on the central logstash cluster, with the central logstash services writing to elasticsearch.

Now it occurs to me that if I need to run logstash on the clients anyway I can install my filters there and just have those logstash instances talk to the central elasticsearch cluster directly over https. Does anyone have experience with this kind of set-up?

I would still prefer to use something like filebeat on the clients, on the assumption that filebeat is lighter-weight in terms of process resource requirements than logstash. I don't actually have any data to support that assumption at this point though.

Thanks again,
Eric


(ruflin) #6

@teampants Filebeat is definitively more lightweight then Logstash, but Logstash is much more powerful. The setup you suggest makes sense from my point of view.


(Al) #7

@ruflin I know this is a slightly older thread, although I do have a use case were I'd really like to see HTTP output for libbeat. With our current multi-tentant setup, all servers ship logs with logstash-forwarder to a central Logstash receiving servers (multiple, behind a load blancer), which then outputs to a Redis list for processing at a later time.

The issue we're having is that logstash-forwarder opens up persistent TCP connections, which ends up being a pain when you're connecting behind a load balancer, even if you tweak the parameters accordingly. I understand that the lubmerjack protocol is potentially more efficient and well built for shipping events, but it doesn't scale as easily and as well as simple HTTP would. If libbeat had the option to send batches of events over HTTP, these requests would be shorter lived and easier to balance more evenly across multiple logstash servers using HTTP input or some other custom lightweight HTTP server which could send of the data off to logstash. The other thing to note is that HTTP is quite standard so it becomes easy at that point for any application to send off an event to a Logstash endpoint, without even requiring the filebeat shipper in some cases. Now you might be wondering why we don't just install Logstash on each machine? That's because we have many servers that have very limited resources and we'd also rather have a small lightweight process with a small memory footprint, compared to a large Java process with unnecessary complexity.

I am aware that there would be a cost to using HTTP, such as continuously opening new connections, http protocol overhead (larger message size), etc, but in my situation (and i'm sure others would agree), the benefits of an http output still outweigh the cons. Now considering all of this, would the beats team consider adding an HTTP output module?


(Al) #8

Hi @ruflin, I was wondering if there's any updates on this topic? We're about to start receiving logs from thousands more of servers simultaneously, so persistent lumberjack connections just aren't realistic at this point. We definitely need an HTTP output for our client shippers so that we can efficiently balance connections across all of our Logstash instances. I'd rather stick with Filbeat if possible, instead of switching to FluentD or Heka.

Thanks again.


(ruflin) #9

Best person to give an update here is @steffens as he was working on lots of improvements.


(Al) #10

@steffens could you provide some feedback on this discussion? Would be great to know if the beats team will implement an HTTP output.


(Sergej Popov) #11

+1 I would really like to see generic HTTP output implemented. At the moment having to use Logstash just for forwarding. Any update on this?


(Steffen Siering) #12

An HTTP output either being generic or for logstash is not on the roadmap, but definitely discussed internally. Thing is, the input side in logstash must be BC in order to guarantee some upgrade path. Plus, we encountered some performance problems in logstash input we want to fix first (experiments with HTTP based transfer have been much worse). Related tickets: 45 and 92 Plus we're thinking about additional features for beats potentially having an impact on outputs.

The elasticsearch output uses HTTP. Unfortunately there is no plugin in logstash correctly handling the elasticsearch bulk api (es_bulk codec plugin only handles decoding the request, but can not generate a correct response).


(Halfcrazy) #13

In my case, i found that the filebeat built-in output protocol elasticsearch is over http/https , so i set the elasticsearch host here to be my logstash-forward host and port to my logstash-input-http port. That works and logstash collected the logs from filebeat.

Additionally my logstashs is behind the load-balance nginx.

A brief configuration is shown below

filebeat.yml

output:

  ### Logstash as output
  elasticsearch:
    hosts: ["nginx_host:443"]

    protocol: "https"

    # Optional HTTP Path
    path: "/nginx_log"

    tls:
      # List of root certificates for HTTPS server verifications
      certificate_authorities: ["/path/to/nginx.crt"]

      # Certificate for TLS client authentication
      certificate: "/path/to/nginx.crt"

      # Client Certificate Key
      certificate_key: "/path/to/nginx.key"

logstash.yml

input {
    http {
        host => "0.0.0.0"
        port => 9876
    }
}
filter {
    split {}
    ruby {
        init => "@counter = 0"
        code => "event.cancel if @counter % 2 == 0
                 @counter += 1
        "
    }
    json {
        source => "message"
    }
    grok {
        patterns_dir => "./patterns"
        match => {
            "message" => [
                '%{DATESTAMP:time} %{LOGLEVEL:loglevel} %{ANY:tb} client: %{IP:client}, server:%{ANY:server}, request: \"%{ANY:request}\", host: \"%{ANY:hostname}\"',
                '%{DATESTAMP:time} %{LOGLEVEL:loglevel} %{ANY:tb}'
            ]}
    }
}
output {
    elasticsearch {
        hosts => "elasticsearch:9200"
        manage_template => false
        workers => 4
        index => "nginx_log-%{+YYYY.MM.dd}"
    }
    #stdout {
    #    codec => rubydebug
    #}
}

nginx.conf

worker_processes 1;

pid logs/nginx.pid;
error_log logs/error.log notice;

events {
    worker_connections 4096;
}

http {
    # Optimization for ssl
    # http://nginx.org/en/docs/http/configuring_https_servers.html#optimization
    ssl_session_cache   shared:SSL:10m;
    ssl_session_timeout 10m;

    lua_code_cache off;
    lua_package_path "$prefix/lua/?.lua;;";
    access_log logs/access.log combined;

    # Avaliable for over 7000 ips,
    # while each ip costs 128(value) + 4 ~ 16(key) bytes
    limit_req_zone $binary_remote_addr zone=perip:1m rate=10r/s;

    upstream logstash  {
        server logstash-1:9876;
        server logstash-2:9876;
        server logstash-3:9876;
    }

    server {
        listen 443 ssl;
        keepalive_timeout   70;

        ssl_certificate nginx_test.crt;
        ssl_certificate_key nginx_test.key;

        # Use /nginx_log as filebeat's http path
        # HEAD /
        # Return 200 OK to tell filebeat that I'm listening
        location = /nginx_log {
            return 200;
        }

        # POST /_bulk
        # See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
        location = /nginx_log/_bulk {
            if ($request_method != POST ) {
                return 405;
            }
            limit_req zone=perip burst=40;

            proxy_pass http://logstash/;
            proxy_http_version 1.1;
            proxy_redirect off;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $http_host;
            proxy_set_header Connection "Keep-Alive";
            proxy_set_header Proxy-Connection "Keep-Alive";
        }
    }
}

Note:

  • The filebeat will send two kind to output, the first is the action, and secound is the actual data. So in logstash i drop the first log.

  • From debug the filebeat i found when using a logstash to receive as the elasticsearch from filebeat, filebeat will expect to receive some data from es, but we are using logstash so we will not actually return the data filebeat excepted data, but currently i don't found it hurt something... Maybe it will course some problem in some days. So be aware of this potential problem, if you found the solution, plz let me informed.


(Halfcrazy) #14

@teampants Hope my solution can help.


(Al) #15

@halfcrazy Thanks for the suggestion. I was aware that you could ship with filbeat via elasticsearch output, which is http(s), although I was hoping they'd implement just a basic HTTP output. I decided to do something similar to you, with some slight modifications.

Out filbeat instances now ship to a load-balanced pool of Nginx nodes with SSL termination being handled by HAProxy. The Nginx nodes have Lua code to extract the individual events from the bulk payload and decode the JSON data to allow us to add additional fields. From there we directly insert the events into the a redis list, which then gets picked up by a Logstash node.

I have yet to do some load tests in our production environment as we're only running in our pre-prod environment for now, but i'm quite confident we'll have some good numbers in terms of throughput and CPU. This should now simplify our goal of accepting events from many thousands of hosts and efficiently balancing the connections across these Nginx servers.

@Sergej-Popov, @teampants hopefully you find this solution useful also.


(Al) #16

There's one more thing I've failed to mention in my first comment on this topic regrading HAProxy in this scenario. One of the complications is that if we have any new config modifications, you can't simply reload the active process if there are any active persistent connections. You end up having an old HAProxy process running indefinitely until the current persistent connections either fail or timeout for some reason.

We no longer have this issue when reloading Nginx as the new incoming requests are gradually sent to the new master process until the old running process is eventually shut down once all the current requests have been completed.


(system) #17