Help/Advice needed setting up geo-ip filters in an on-prem Logstash to SIEM in Elastic Cloud instance

ronmer · May 20, 2021, 11:51pm

I need help confirming the exact steps required to get geo-ip information to map to the SIEM network map in Kabana

My environment:
Windows and Linux hosts running either Auditbeat, Packetbeat or Winlogbeat, Packetbeat
All traffic from hosts is piped to an ON-prem Logstash server
current pipeline.conf file is a single conf file, meaning single pipeline. (Baby steps)......

What am I trying to accomplish:
I want to add the correct filter bits within my single pipeline that will convert any IP information received to geo-ip information so that when the detail is output to my Elastic Cloud, that information will be mapped the the SIEM network map without affecting any other log data that is received by that pipeline.

So, Is there an IP address in the log data received into my on-prem Logstash server?

Yes! then via the filter section, create the geo-ip detail and and output that data along with any other included log data to Elastic to be mapped to it's respective Kibana dashboard and SIEM network map.

No! then bypass this filter and just send the log data received out to Elastic to the respective Kibana dashboard.

There is ample documentation, however my issue is that a lot of the documentation does not have relevant outside context. You either already understand the context and if you don't, oh well....

I have yet to find within either the Elastic documentation or even a good real world example of a "HOW TO " document on how to set this up from scratch. Specifically if you are running a hybrid installation like mine where you are on-prem logstash piping data out to an Elastic cloud instance.

Any help/advice would be most appreciated.

cknz · May 21, 2021, 4:15am

Hi ronmer, welcome aboard the Logstash train!

Indeed, there are many ways in which to proceed with Logstash and ELK; indeed Logstash is not even necessary for Beats, Elasticsearch and Kibana these days, but its' one of my favourite parts.

To get started its best to start with the grok plugin. I would also recommend you get started using a Docker container, which can simplify things a little.

Pro tip: nice place to get started with Grok is grokdebug.herokuapp.com (there is also a similar tool within Kibana developer app).

Although most of the time your grok expressions will look much more detailed and be targetted at particular log formats, but you said you just wanted to find any IP....

Now you should be able to following with Parsing Logs with Logstash | Logstash Reference [7.12] | Elastic (eg. the section 'Enhancing Your Data with the Geoip Filter Plugin') but you'll need a different elasticsearch output configuration to talk to Elasticsearch cloud instance.

When you're ready to take a bigger next step: Chances are you will have a lot of internal IP addresses you may want to do some enrichment on; you'll perhaps want some geoip-like functionality for that; perhaps know which part of your internal network it came from. In which case, you might like to investigate a custom geoip-like solution that I've made: GitHub - cameronkerrnz/logstash-filter-mmdb: Logstash filter similar to logstash-filter-geoip, but more generalised and using your own bespoke MMDB database

ronmer · May 21, 2021, 9:52pm

Thanks Cameron. This is not what I am looking for. It's interesting and as I learn more about ELK I may very well come back to this.

So let me rephrase the question.

I am looking for a sample logstash configuration file that allows me to take in any beats traffic and be able to filter for the IP address both source and destination so that when I am in my Elastic Cloud instance and I go to the SIEM dashboard I will find on the Network map the GEO locations of where my traffic is coming from and going to.

I only have one pipeline right now first-pipeline.conf. Ultimately I would like to be able to use multiple pipelines so that I can filter out just the packetbeat traffic and map just packetbeat to the Network map in the SIEM dashboard. But for now I will take whatever advice/samples I can get.

The Elastic documentation draws alot of assumptions about the knowledge level of the user and I'm not a DEV, all I want is to get my network traffic that shows up in my Beats agents to map to the Network map in the SIEM application. Nothing more, nothing less.

Thanks in advance.

stephenb · May 21, 2021, 11:47pm

Hi @ronmer Welcome to the community.

Before I answer anything I'm going to ask a few more questions.

Let's focus on example Packetbeat

A) First are you aware that if you just use Packetbeat correctly configured and set up and directly ingest into Elasticsearch in Elastic Cloud that it will automatically perform GEO IP processing on all public IPs. The GEO IP processing is based on an open source Geo IP database that is built into elasticsearch. These beats use modules or ingest pipelines to call that Geo IP processing.

The Packetbeat data will automatically show up in the SIEM application as well as on the network map by just using the minimal default configurations.

Logstash is not required to do this. This architecture looks like.

Packetbeat -> Elasticsearch in Elastic Cloud

B) That is not to say you can not use Logstash many people do for an architecture that leverages a central collect and forward point (Logstash) from an internal Network to Elastic Cloud.

As in architecture A) if configured correctly which is pretty simple all the Packbeat data will show up in the SIEM and on the network map with minimal configuration. This configuration would still leverage the internal GEO IP database and processing in Elasticsearch

None of the GEO IP processing would happen in Logstash.

Packetbeat -> Logstash -> Elasticsearch in Elastic Cloud

So first which architecture are you after A or B.

Second the open source GEO IP databases (as other publicly available) only process public IPs, if you want to do internal IPs that is a more complicated process, perhaps we can talk about that in a next phase.

Once you pick architecture A or B perhaps I can help you a bit with getting Packetbeat set up and then you can repeat with the other beats.

One thing I often see people that are new to the elastic stack is to try to customize everything first instead of using the defaults and getting things up and running and then learning how to customize this can be a source of frustration I highly recommend using the defaults which is what I will show you first.

Let me know if you want a little help.

What I would highly recommend is to use the quick start guide on packet beat first and get that running with architecture A then I'll show you how to do B if you are interested.

This quick start guide is pretty straightforward.

ronmer · May 25, 2021, 7:47pm

Hey Stephen,

Thanks for adding to this thread. Let me answer some of your questions to get you up to speed on where I am at with the ELK thing.

We are Logstash on Prem and all beats traffic for the foreseeable future needs to go through our on-prem logstash server. At time I have the more or less working with exception to this GEO-ip thing.
I did do the trial of Elastic with beats traffic going directly to the Cloud. It was great and I only wish Logstash on prem had or would provide the same experience.
With regards to GEO IP. Private IP's. I am aware that this will not be mapped. What I am expecting to be mapped is any IP that is a public IP.
Regarding GEO IP. I created a test pipeline for GEOip using stdin on my Logstash server and when done I entered 8.8.8.8 at the command line and received a valid response back. here is the filter section from the test.conf pipeline file

filter
{
grok { match => ['message', '%{IP:ip}' ] }
geoip
{
source => "ip"
}
}

when I enter 8.8.8.8 at the command line I received this response back

{

 "geoip" => {

   "latitude" => 37.751,

 "country_name" => "United States",

 "country_code3" => "US",

   "location" => {

  "lon" => -97.822,

  "lat" => 37.751

},

   "timezone" => "America/Chicago",

"continent_code" => "NA",

      "ip" => "8.8.8.8",

   "longitude" => -97.822,

 "country_code2" => "US"

},

"message" => "8.8.8.8",

 "host" => "edpr-inaplog01",

"@version" => "1",

"@timestamp" => 2021-05-21T23:09:01.217Z,

  "ip" => "8.8.8.8"

}

When I enter a private IP I get this expected response

snip <<
[0] "_geoip_lookup_failure"
]
}

So from a logstash perspective I know I can do the GEO-ip lookup. Question now is how do I get this filter to work correctly from a PacketBeat packet traffic perspective. I am guessing that the "Grok" that I used in my test filter is wholly inadquate for deriving the correct IP information from a packetbeat packet.

grok { match => ['message', '%{IP:ip}' ] }

Also,

When I do confirm that my filter will work with digesting a packet beat packet correctly, what do I need to do on the Elastic Cloud side to ensure Elastic knows how to map this detail to my Kabana SIEM Network map dashboard.

You mentioned that Elastic has the GEO-ip ingest plugin in Elastic Cloud enabled automatically? or did I mis-interpret that statement? If I need to enable this plugin, I'm not even sure where I would do this as large swath of the documentation that I have been referring too assumes you are running an on-prem Elastic instance. As such, any Elasticsearch command line references that I want to refer to, I do not even know where I would go in the Elastic cloud instance to make those CMD line entries ... So I am certainly stuck when it comes to understanding where to make changes where or when Elasticsearch is referenced/mentioned.

Any questions and I'm sure you will have a few, just ask.

Thanks.

-Ron

ronmer · May 25, 2021, 7:48pm

@stephenb

Apologies for not adding your name as an @ Please comments on this thread relevant to your questions.

stephenb · May 26, 2021, 12:03am

Hi @ronmer

Thanks for the detailed explanation and questions.

Quick Recap:

So you want architecture B cool that is very common.
Packetbeat -> Logstash -> Elasticsearch in Elastic Cloud
With this architecture you do not need to do the GEO IP in Logstash if you are using Packetbeat. (I will show you below)
With Logstash on prem you can have the exact same experience as you had on cloud following simple instructions below, very very little work.
Good you understand Private IP.
You do not need to enable the GEO IP Processor on Elastic Cloud as it is loaded / enabled by default.

Solution:
So actually there is very little you need to do to make this all work, we will use Logstash as a pasthrough and let all the Packetbeat module do its work, ECS formating, Templates, GEO IP, Index Lifecycle Management, it will all be taken care of.

Here is what I recommend, try to resist the urge to make this more complex that it needs to be.

One a single host Perform Steps 1 - 5 on the Packetbeat Quckstart page for Elastic Search Service.

This will setup Packetbeat and all the associated assets in Elasticsearch and Kibana.
Note Setup only needs to run Once whether you are setting up on 1 host or 1000 hosts, it just loads all the needs artifacts. and If you already did all this.. .and you still have the the cluster you don't even need to do it again.
Now in the packetbeat.yml comment out cloud.id and cloud.auth: and configure the output section of packetbeat to point to logstash. Comment out the output.elasticsearch: section. Now Packetbeat is pointed to your on prem Logstash

EDIT : CORRECTED WITH CORRECT PIPELINE SEE BELOW.

    output.logstash:
      # The Logstash hosts
      hosts: ["localhost:5044"]
      pipeline: geoip-info
      ...

Setup Logstash. Below is the logstash-beats-es.conf that will support all the beats functionality. Logstash simply acts as a passthough, Packetbeat functionality will magically get passed through.
Start Logstash then start Packetbeat... take a look...data should start to flow exactly as it did when it was Packetbeat to Elastic Cloud direct.
Deploy Packetbeat on other hosts. Configure to point at this Logstash.

Logstash Config for Beats Pass through.

################################################
# beats->logstash->es default config.
################################################
input {
  beats {
    port => 5044
  }
}

output {
  if [@metadata][pipeline] {
    elasticsearch {
      cloud_auth => "elastic:password"
      cloud_id => "mycloud:dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRj......"

      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}"
      pipeline => "%{[@metadata][pipeline]}" 
    }
  } else {
    elasticsearch {
      cloud_auth => "elastic:password"
      cloud_id => "mycloud:dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRj......"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}"
    }
  }
}

ronmer · May 26, 2021, 12:30am

@stephenb

Thanks for this Stephen. Before I get going on this I do need to clarify something with regards to the .conf file you have included. I am assuming at this time that I can use this to replace my first-pipeline.conf. It would seem that way by looking at the configuration. I only say this because again my logstash configuration is very basic. I have used the logstash.yml to define where my pipeline is located. I do want to eventually use multiple pipelines, but for I start with just the one. This is why I need to know that your .conf file included in this thread can essentially replace the first-pipline.conf file. I believe the answer is 'Yes' but I would like confirmation on that.

------------ Pipeline Settings --------------

The ID of the pipeline.

pipeline.id: main

Set the number of workers that will, in parallel, execute the filters+outputs

stage of the pipeline.

This defaults to the number of the host's CPU cores.

pipeline.workers: 2

How many events to retrieve from inputs before sending to filters+workers

pipeline.batch.size: 125

How long to wait in milliseconds while polling for the next event

before dispatching an undersized batch to filters+outputs

pipeline.batch.delay: 50

Force Logstash to exit during shutdown even if there are still inflight

events in memory. By default, logstash will refuse to quit until all

received events have been pushed to the outputs.

WARNING: enabling this can lead to data loss during shutdown

pipeline.unsafe_shutdown: false

Set the pipeline event ordering. Options are "auto" (the default), "true" or "false".

"auto" will automatically enable ordering if the 'pipeline.workers' setting

is also set to '1'.

"true" will enforce ordering on the pipeline and prevent logstash from starting

if there are multiple workers.

"false" will disable any extra processing necessary for preserving ordering.

pipeline.ordered: auto

------------ Pipeline Configuration Settings --------------

Where to fetch the pipeline configuration for the main pipeline

path.config: "/etc/logstash/conf.d/*.conf"

stephenb · May 26, 2021, 12:35am

Yes replace your current conf with what I provided.

Careful when you start to use multiple pipelines come back and ask some questions they may not work the way you expect at first.

ronmer · May 26, 2021, 9:36pm

@stephenb

Made said changes and confirmed that I am receiving traffic via Discover.

I'm still not seeing anything in my maps. But before I say "it's not working", I do need to put packetbeat onto one of my external proxy servers.

I did one other thing and that was to pipe my Palo Alto traffic to a syslog server and using Filebeat and the panw module to move traffic to Logstash. I wanted to make sure I was getting external IP traffic.

The configuration you supplied is certainly omni directional, if I can use that term. So I would expect my Filebeat traffic with IP detail to produce results.

Not seeing anything in maps and I know that I am passing in traffic from the FW.

When in Discover I am selecting the tags field in both Filebeat and Packetbeat. When a private IP is passed I see this tag

beats_input_codec_plain_applied, _grokparsefailure, _geoip_lookup_failure

When I have a public IP passed the tag looks like this

beats_input_codec_plain_applied

I also added this filter to my first-pipeline.conf file along with your detail

filter
{
grok { match => ['message', '%{IP:ip}' ] }
geoip
{
source => "ip"
}
}

This does populate the GEO-ip fields in the raw log that you see in Discovery for beats traffic sourced from filebeat. I have not seen anything in packetbeat yet. But I'm still testing stuff with regards to packetbeat.

stephenb · May 27, 2021, 5:04am

Yup your right... darned packetbeat.

I left a step out, You are right you need to load the the geoip-info pipeline yourself (other beats have this built in not sure why packetbeat does not, I would alot with ngnix , apache etc and the geoip pipeline is built into the module.) The geoip-infopipeline calls the GEOIP processors.

So Follow the instructions here

So PUT the ingest pipeline as shown in the Kibana -> Dev Tools, that will install it in elasticsearch

PUT _ingest/pipeline/geoip-info
{
  "description": "Add geoip info",
  "processors": [
    {
      "geoip": {
        "field": "client.ip",
        "target_field": "client.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "source.ip",
        "target_field": "source.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "destination.ip",
        "target_field": "destination.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "server.ip",
        "target_field": "server.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "host.ip",
        "target_field": "host.geo",
        "ignore_missing": true
      }
    }
  ]
}

Then add the pipeline to the packetbeat.yml output.logstash section

output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]
  pipeline : geoip-info

Then the logstash conf I gave you should pass through the ingest pipeline geoip-info to elasticsearch and you should be in geoip business

With respect to the PA Firewall you will need to do something similar.

stephenb · May 27, 2021, 5:26am

I ran for a while with

packetbeat -> logstash -> elasticsearch in elastic cloud

and the data is showing up in the SIEM App

I did notice the map on the Packetbeat dashboard is looking at the map in detail it is looking for client.geo.location

Data source Clusters and grids

Index pattern packetbeat-*

Geospatial field client.geo.location

client.ip which my packetbeat on my mac does not fill but I just when to the map and added destination.geo.location the showed up on the map.

system · June 24, 2021, 5:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Configuring GeoIP information in Logstash for Kibana SIEM Network Map Logstash	1	289	May 13, 2020
How to setup logstash with geoip Logstash	14	3670	February 21, 2017
Plotting geoIP Data in Kibana Maps via Beats Output to Logstash Logstash maps	4	557	February 11, 2020
How to configure ELK for geoip Elasticsearch	22	1204	December 23, 2021
Complete Example - Adding Private IPs for Internal Networks Logstash	2	463	November 8, 2018

Help/Advice needed setting up geo-ip filters in an on-prem Logstash to SIEM in Elastic Cloud instance

------------ Pipeline Settings --------------

The ID of the pipeline.

Set the number of workers that will, in parallel, execute the filters+outputs

stage of the pipeline.

This defaults to the number of the host's CPU cores.

pipeline.workers: 2

How many events to retrieve from inputs before sending to filters+workers

How long to wait in milliseconds while polling for the next event

before dispatching an undersized batch to filters+outputs

Force Logstash to exit during shutdown even if there are still inflight

events in memory. By default, logstash will refuse to quit until all

received events have been pushed to the outputs.

WARNING: enabling this can lead to data loss during shutdown

pipeline.unsafe_shutdown: false

Set the pipeline event ordering. Options are "auto" (the default), "true" or "false".

"auto" will automatically enable ordering if the 'pipeline.workers' setting

is also set to '1'.

"true" will enforce ordering on the pipeline and prevent logstash from starting

if there are multiple workers.

"false" will disable any extra processing necessary for preserving ordering.

------------ Pipeline Configuration Settings --------------

Where to fetch the pipeline configuration for the main pipeline

Related topics