Two logstash http_poller (API query) configs are outputting documents to each other's indices

I have a unique issue in Logstash where my data is not being routed to the correct index name appropriatly. I have two conf.d config files that each use http_poller as input source and are gathering data from seperate APIs.

Here are my config files

User API - 02-users-api.conf

input {
  http_poller {
    urls => {
	test => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => get
        url => "https://jsonplaceholder.typicode.com/users"
        headers => {
          Accept => "application/json"
        }
     }
    }
    request_timeout => 60
    # Supports "cron", "every", "at" and "in" schedules by rufus scheduler
    # schedule => { cron => "* * * * * UTC"}
    schedule => { every => "1m" }
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}
filter {
  mutate {
    remove_field => ["http_poller_metadata"]
  }
}
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]  # Replace with your Elasticsearch host and port
    index => "users-01"  # Replace with your desired index name
  }
 #stdout { codec => rubydebug}
}

Here is my second conf.d config file - 01-weather-api.conf

input {
  http_poller {
    urls => {
	test3 => {
        # Supports all options supported by ruby's Manticore HTTP client
        method => get
        url => "https://api.openweathermap.org/data/2.5/weather"
        headers => {
          Accept => "application/json"
        }
	query => {
          lat => "49"
          lon => "81"
          appid => "<API KEY HERE>"
        }
     }
    }
    request_timeout => 60
    # Supports "cron", "every", "at" and "in" schedules by rufus scheduler
    # schedule => { cron => "* * * * * UTC"}
    schedule => { every => "1m" }
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    metadata_target => "http_poller_metadata"
  }
}
filter {
  mutate {
    remove_field => ["http_poller_metadata"]
  }
}
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]  # Replace with your Elasticsearch host and port
    index => "weather-01"  # Replace with your desired index name
  }
  #stdout { codec => rubyrebug }
}

I am getting data from each, but the data from each source is going to each index... like this:

I dont understand how this is happening because my outputs for each conf.d config file specify the unique index name that should be associated with the API.

When I test each conf.d file at a time, it works as expected, but when they are both active, I get this weird duplication and mixing of data in both indices.

This is the default behaviour, files inside conf.d will be merged in one pipeline configuration and each filter and output will receive data from all inputs present.

If you want each file to work as an independent pipeline you need to configure logstash to run multiple pipelines with the pipelines.yml.

Per default the pipelines.yml will run one pipeline named main which will merge all the files inside /etc/logstash/conf.d, so you need to change it.

Basically you need something like this:

- pipeline.id: pipeline-01
  path.config: "/etc/logstash/conf.d/pipeline-01.conf"
- pipeline.id: pipeline-02
  path.config: "/etc/logstash/conf.d/pipeline-02.conf"

Are you running lostash as service, right?

2 Likes

Thanks so much for the detailed explanation! That really helps.

Yes, I am running logstash as a service in RHEL.

You just need to configure the pipelines.yml and restar the service then.