How to collect data periodically from multiple HTTP endpoints, and indexing the result in Elasticsearch

m.sereda · July 25, 2018, 7:00am

Hi,
could anyone explain me how to collect data periodically from multiple HTTP endpoints, and indexing the result in Elasticsearch

guyboertje · July 25, 2018, 9:27am

Without know any more details, all I can say is read up on the http poller input.

m.sereda · July 25, 2018, 9:29am

my metricbeat.yml

###################### Metricbeat Configuration Example #######################

# This file is an example configuration file highlighting only the most common
# options. The metricbeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/metricbeat/index.html

#==========================  Modules configuration ============================

metricbeat.modules:
- module: http
  metricsets: ["json"]
  enabled: true
  period: 10s
  hosts: ["localhost:8080"]
  namespace: "metrics"
  path: "/api/session/login"
 # body: ""
  method: POST
  username: "admin"
  password: "pass"
  request.enabled: true
  response.enabled: true
  #json.is_array: false
  #dedot.enabled: false
  #
- module: http
  metricsets: ["json"]
  period: 10s
  hosts: ["localhost:8080"]
  namespace: "metrics"
  path: "/api/dashboard/backup-destination-stats"
 # body: ""
  method: "GET"
  username: "admin"
  password: "vPr0tect"

#metricbeat.modules:
#- module: system
  #metricsets:
   # - cpu
   # - filesystem
   # - memory
   # - network
   # - process
 # enabled: true
 # period: 30s
 # processes: ['.*']
 # cpu_ticks: false
#- module: apache
 # metricsets: ["status"]
 # enabled: true
 # period: 30s
 # hosts: ["http://127.0.0.1:8080"]
 # Glob pattern for configuration loading
  #path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  #reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 1
  index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "localhost:5601"

#============================= Elastic Cloud ==================================

# These settings simplify using metricbeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
 # hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
 # username: "elastic"
 # password: "root"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== Xpack Monitoring ===============================
# metricbeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:

m.sereda · July 25, 2018, 9:30am

my /etc/logstash/conf.d/logstash.conf

input {
 beats {
        port => "5044"
  }
}

filter {
if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGLINE}" }
    }

    date {
match => [ "timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
  }

}

output {
 elasticsearch {
  hosts => "localhost:9200"
  index => "logstash-%{+YYYY.MM.dd}"
 user => "admin"
  password => "vPr0tect"
  }
  stdout {
    codec => rubydebug
       }
}

m.sereda · July 25, 2018, 9:32am

/etc/logstash/logstash.ym

# Settings file in YAML
#
# Settings can be specified either in hierarchical form, e.g.:
#
#   pipeline:
#     batch:
#       size: 125
#       delay: 5
#
# Or as flat keys:
#
#   pipeline.batch.size: 125
#   pipeline.batch.delay: 5
#
# ------------  Node identity ------------
#
# Use a descriptive name for the node:
#
# node.name: test
#
# If omitted the node name will default to the machine's host name
#
# ------------ Data path ------------------
#
# Which directory should be used by logstash and its plugins
# for any persistent needs. Defaults to LOGSTASH_HOME/data
#
path.data: /var/lib/logstash
#
# ------------ Pipeline Settings --------------
#
# The ID of the pipeline.
#
# pipeline.id: main
#
# Set the number of workers that will, in parallel, execute the filters+outputs
# stage of the pipeline.
#
# This defaults to the number of the host's CPU cores.
#
# pipeline.workers: 2
#
# How many events to retrieve from inputs before sending to filters+workers
#
# pipeline.batch.size: 125
#
# How long to wait in milliseconds while polling for the next event
# before dispatching an undersized batch to filters+outputs
#
# pipeline.batch.delay: 50
#
# Force Logstash to exit during shutdown even if there are still inflight
# events in memory. By default, logstash will refuse to quit until all
# received events have been pushed to the outputs.
#
# WARNING: enabling this can lead to data loss during shutdown
#
# pipeline.unsafe_shutdown: false
#
# ------------ Pipeline Configuration Settings --------------
#
# Where to fetch the pipeline configuration for the main pipeline
#
# path.config:
#
# Pipeline configuration string for the main pipeline
#
# config.string:
#
# At startup, test if the configuration is valid and exit (dry run)
#
# config.test_and_exit: false
#
# Periodically check if the configuration has changed and reload the pipeline
# This can also be triggered manually through the SIGHUP signal
#
# config.reload.automatic: false
#
# How often to check if the pipeline configuration has changed (in seconds)
#
# config.reload.interval: 3s
#
# Show fully compiled configuration as debug log message
# NOTE: --log.level must be 'debug'
#
# config.debug: false
#
# When enabled, process escaped characters such as \n and \" in strings in the
# pipeline configuration files.
#
# config.support_escapes: false
#
# ------------ Module Settings ---------------
# Define modules here.  Modules definitions must be defined as an array.
# The simple way to see this is to prepend each `name` with a `-`, and keep
# all associated variables under the `name` they are associated with, and 
# above the next, like this:
#
# modules:
#   - name: MODULE_NAME
#     var.PLUGINTYPE1.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE1.PLUGINNAME1.KEY2: VALUE
#     var.PLUGINTYPE2.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE3.PLUGINNAME3.KEY1: VALUE
#
# Module variable names must be in the format of 
#
# var.PLUGIN_TYPE.PLUGIN_NAME.KEY
#
# modules:
#
# ------------ Cloud Settings ---------------
# Define Elastic Cloud settings here.
# Format of cloud.id is a base64 value e.g. dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRub3RhcmVhbCRpZGVudGlmaWVy
# and it may have an label prefix e.g. staging:dXMtZ...
# This will overwrite 'var.elasticsearch.hosts' and 'var.kibana.host'
# cloud.id: <identifier>
#
# Format of cloud.auth is: <user>:<pass>
# This is optional
# If supplied this will overwrite 'var.elasticsearch.username' and 'var.elasticsearch.password'
# If supplied this will overwrite 'var.kibana.username' and 'var.kibana.password'
# cloud.auth: elastic:<password>
#
# ------------ Queuing Settings --------------
#
# Internal queuing model, "memory" for legacy in-memory based queuing and
# "persisted" for disk-based acked queueing. Defaults is memory
#
# queue.type: memory
#
# If using queue.type: persisted, the directory path where the data files will be stored.
# Default is path.data/queue
#
# path.queue:
#
# If using queue.type: persisted, the page data files size. The queue data consists of
# append-only data files separated into pages. Default is 64mb
#
# queue.page_capacity: 64mb
#
# If using queue.type: persisted, the maximum number of unread events in the queue.
# Default is 0 (unlimited)
#
# queue.max_events: 0
#
# If using queue.type: persisted, the total capacity of the queue in number of bytes.
# If you would like more unacked events to be buffered in Logstash, you can increase the
# capacity using this setting. Please make sure your disk drive has capacity greater than
# the size specified here. If both max_bytes and max_events are specified, Logstash will pick
# whichever criteria is reached first
# Default is 1024mb or 1gb
#
# queue.max_bytes: 1024mb
#
# If using queue.type: persisted, the maximum number of acked events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#
# queue.checkpoint.acks: 1024
#
# If using queue.type: persisted, the maximum number of written events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#
# queue.checkpoint.writes: 1024
#
# If using queue.type: persisted, the interval in milliseconds when a checkpoint is forced on the head page
# Default is 1000, 0 for no periodic checkpoint.
#
# queue.checkpoint.interval: 1000

# ------------ Metrics Settings --------------
#
# Bind address for the metrics REST endpoint
#
# http.host: "127.0.0.1"
#
# Bind port for the metrics REST endpoint, this option also accept a range
# (9600-9700) and logstash will pick up the first available ports.
#
# http.port: 9600-9700
#
# ------------ Debugging Settings --------------
#
# Options for log.level:
#   * fatal
#   * error
#   * warn
#   * info (default)
#   * debug
#   * trace
#
# log.level: info
path.logs: /var/log/logstash

m.sereda · July 25, 2018, 9:36am

I am trying to get data from an endpoint http://localhost:8080/api/dashboard/backup-destination-stats but to do that i need to login firstly (as i guess) to http://localhost:8080/api/session/login with user: "admin", password: "pass"

If need more details pleasse let me know

guyboertje · July 25, 2018, 9:45am

This is more of a question for metricbeat.

Unless you want to switch to using http_poller.

But generally, this kind of automated query does not work well with needing to create an authorised session before hand. Usually one needs an API key that is provided in each call - Amazon and Github are examples of using API keys.

m.sereda · July 25, 2018, 9:48am

do you know how to make it easier?
I was trying with http_poller but also was a bad result. I wrote this in /etc/logstash/conf.d/logstash.conf

input {
  http_poller {
    urls => {
      test1 => "http://localhost:8080"
      test2 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => get
        user => "admin"
        password => "pass"
        url => "http://localhost:8080/api/session/login"
        headers => {
          Accept => "application/json"
        }
     }
    }
    request_timeout => 60
    # Supports "cron", "every", "at" and "in" schedules by rufus scheduler
    schedule => { cron => "* * * * * UTC"}
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}

output {
  stdout {
    codec =>input {
  http_poller {
    urls => {
      test1 => "http://localhost:8080"
      test2 => rubydebug
  }
}

guyboertje · July 25, 2018, 9:54am

I have no idea of your level of knowledge or expertise so, without trying to be offensive, do you know what this means?

Usually one needs an API key that is provided in each call - Amazon and Github are examples of using API keys.

m.sereda · July 25, 2018, 9:55am

To tell the truth no

m.sereda · July 25, 2018, 9:58am

I change my logstash.conf file and got the following

input {
  http_poller {
    urls => {
      test1 => "http://localhost:8080"
      test2 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => get
        user => "admin"
        password => "vPr0tect"
        url => "http://localhost:8080/api/dashboard/backup-destination-stats"
        headers => {
          Accept => "application/json"
        }
     }
    }
    request_timeout => 60
    # Supports "cron", "every", "at" and "in" schedules by rufus scheduler
    schedule => { cron => "* * * * * UTC"}
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

guyboertje · July 25, 2018, 10:01am

Fair enough.

Take Github.
I have an account. I log in and go to my profile, to the Manage API keys page. I click on the new key link/button, the Github site generates a key and stores it in my profile. When I want an application like Logstash, Beats or Travis CI to access my git repositories, I make sure that the query parameters supply that key and then Github gives that app access without a login session.

Your app on localhost:8080 will have to provide the same facility.

m.sereda · July 25, 2018, 10:11am

OK thank you for explaining.

And this is the only one way to do that? I mean could I write a login and a password in a some configuration file?

guyboertje · July 25, 2018, 10:23am

Using dynamic sessions will only work if metricbeat and http_poller allow another setting, say, auth_url that establishes logged in session before hand and remembers the cookie to supply it when querying the main urls. I looked at the docs and they don't.

In metricbeat, this setting may help bearer_token_file but I have no experience of using it.

I will get someone from the Beats team to look at this discussion.

m.sereda · July 25, 2018, 10:27am

I thought that that two things help to login

user
Value type is string
There is no default value for this setting.
Username to use with HTTP authentication for ALL requests. Note that you can also set this per-URL. If you set this you must also set the password option.
password
Value type is password
There is no default value for this setting.
Password to be used in conjunction with the username for HTTP authentication.

From here https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html#plugins-inputs-http_poller-password

Thank you anyway

guyboertje · July 25, 2018, 10:29am

What is the application running on port 8080?

m.sereda · July 25, 2018, 10:31am

vprotect-server it is a project on what I am working and should to test

guyboertje · July 25, 2018, 10:41am

You will have to check with them to see how to use BASIC AUTH or an API key/token to get stats.

m.sereda · July 25, 2018, 10:45am

As my mentor said: "Firstly you need to login with user/password with help of http://localhost:8080/api/session/login and after that could look on endpoints"

And when I do this in postman it is working

Now i have the following output

After some changies

input {
  http_poller {
    urls => {
      test1 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => "post"
        auth => {
          user => "admin"
          password => "vPr0tect"
        } 
        url => "http://localhost:8080/api/session/login"
        headers => {
          Accept => "application/json"
        }
        } 

      test2 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => "get"
        auth => {
          user => "admin"
          password => "vPr0tect"
        } 
        url => "http://localhost:8080/api/dashboard/backup-destination-stats"
        headers => {
          Accept => "application/json"
        }
     }
    }

m.sereda · July 25, 2018, 2:33pm

One more question. Do you know how to set up an agent that makes a request to APIs to use cookies

Topic		Replies	Views
How to get data from localhost:8080 to Metricbeat? Beats metricbeat	19	2377	August 22, 2018
Metricbeat next to filebeat to same elasticsearch Beats	2	404	June 10, 2018
Index Lifecycle Management Metricbeat Elasticsearch	21	1159	January 16, 2023
Help with elesticsearch and metricbeat Beats elastic-stack-monitoring , docker , beats-module , metricbeat	3	402	July 21, 2021
Multiple metricbeat index for each host Beats metricbeat	5	3246	February 28, 2018

How to collect data periodically from multiple HTTP endpoints, and indexing the result in Elasticsearch

Related topics