How to Parse Log to Get IP Address

Hello
I've been stuck in a problem for 3 days now..I am trying to alert on new IP address but the format of log file is not helping me


so I've used a pipeline to parse the log with grok

%{NUMBER:bytes} %{IP:client} %{URIPATH:itstheip} %{GREEDYDATA:le-reste}

and then I've configured filebeat.yml :

output.elasticsearch:
  hosts: ["localhost:9200"]
  pipeline: my_pipeline_id`

but now I don't know what to do or where can I find the new parsed log file.

Hi @Dhia_Saibi

Please provide several lines text samples of your raw logs, we can not debug without a text sample not a screen shot

Also please provide your entire ingest pipeline using

GET _ingest/pipeline/my_pipeline_id

IMPORTANT: Also please provide your entire filebeat.yml configuration

What version of the stack?

PUT _ingest/pipeline/disscus-ip
{
  "description": "",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{NUMBER:response_code} %{IP:client} %{URIPATH:url_path} %{GREEDYDATA:le-reste}"
        ]
      }
    }
  ]
}
  
POST _ingest/pipeline/disscus-ip/_simulate
{
  "docs": [
    {
      "_source": {
        "message": "200 10.15.49.9 /decor 29 1233455 - "
      }
    }
  ]
}

Results

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_id" : "_id",
        "_source" : {
          "response_code" : "200",
          "client" : "10.15.49.9",
          "le-reste" : "29 1233455 - ",
          "message" : "200 10.15.49.9 /decor 29 1233455 - ",
          "url_path" : "/decor"
        },
        "_ingest" : {
          "timestamp" : "2022-05-13T20:19:31.214861Z"
        }
      }
    }
  ]
}

I do not see a problem with the parsing

You will need to create a mapping if you want the IP field to be of type ip

Hi @stephenb

Thanks for your response

Here is several lines from the log file

#Webserver Logfile mx10-15-44-177, created 2018-05-10 20:31:38
#HTTP_STATUS CLIENT_IP URL COUNT LAST_ACCESS USER
200 10.15.44.9 /record/current.jpg 1258 1525977098 -
200 10.15.44.9 /admin 3 1525977098 admin
200 10.15.44.9 /decor 29 1525977033 -
200 10.15.44.9 / 6 1525977026 -
200 10.15.44.9 /control 1 1525977024 -
200 10.0.2.9 / 4 1525976905 -
200 10.0.2.9 /decor 18 1525976905 -
200 10.0.2.9 /record/current.jpg 16 1525976905 -
200 10.0.2.9 /control 1 1525976903 -
200 10.15.44.200 /record/current.jpg 128 1525014714 -
200 10.15.44.200 /record/current.jpg 15737 1524933854 -
200 10.15.44.200 / 9 1524932795 -
200 10.15.44.200 /decor 44 1524932795 -
200 10.15.44.200 /control 2 1524932793 -
200 10.15.44.200 /control/control 1 1524932791 -
200 10.15.44.200 /record/current.jpg 878 1524928478 -
200 10.15.44.200 /decor 62 1524928470 -
200 10.15.44.200 / 12 1524928469 -
200 10.15.44.200 /control 3 1524928468 -
200 10.15.44.200 /record/current.jpg 35856 1524766036 -
200 10.15.44.100 /record/current.jpg 50514 1524763478 -
200 10.15.44.100 /decor 223 1524763204 -
200 10.15.44.100 / 48 1524762082 -
200 10.15.44.100 /control 3 1524762081 -
200 10.15.44.100 /control/control 3 1524762078 -
200 10.15.44.200 /decor 19 1524762048 -
200 10.15.44.200 / 3 1524762047 -
200 10.15.44.200 /control 1 1524762046 -
200 10.15.44.200 /control/control 1 1524762043 -
200 10.15.44.100 /admin 8 1524760871 admin
200 10.15.44.100 /control 4 1524757739 -
200 10.15.44.100 /control 1 1524757421 admin
200 10.15.44.100 /cgi-bin 1 1524757400 -
200 10.15.44.100 /control/control 4 1524757399 -
200 10.15.44.100 /control/click.cgi 4 1524757301 -
200 10.15.44.103 /record/current.jpg 532 1524421048 -
200 10.15.44.103 / 4 1524420901 -
200 10.15.44.103 /decor 18 1524420901 -
200 10.15.44.103 /control 1 1524420900 -
200 10.15.44.100 /record/current.jpg 44991 1524420792 -
200 10.15.44.100 / 78 1524420747 -
200 10.15.44.100 /decor 339 1524420747 -
200 10.15.44.100 /control 93 1524420746 -
200 10.15.44.100 /control/control 24 1524420743 -
200 10.15.44.100 /server 75 1524420663 -
200 10.15.44.100 /control/event.jpg 13 1524420660 -
200 10.15.44.100 /control/click.cgi 50 1524420643 -
200 10.15.44.100 /control 1 1524420580 admin
200 10.15.44.100 /control/rcontrol 2 1524420497 -
200 10.15.44.100 /cgi-bin 5 1524420487 -
200 127.0.0.1 /cgi-bin/image.jpg 19 1524420471 -
200 127.0.0.1 /control/event.jpg 9 1524420469 -
200 127.0.0.1 /cgi-bin/rinfo 1 1524420399 -
200 10.15.44.100 /help 15 1524420322 -
200 10.15.44.100 /admin 6 1524419988 admin
200 10.15.44.10 /record/current.jpg 18833 1523305382 -
200 10.15.44.10 /control 31 1523303283 -
200 127.0.0.1 /cgi-bin/image.jpg 11 1523303283 -
200 127.0.0.1 /control/event.jpg 7 1523303274 -
200 10.15.44.10 /decor 174 1523303233 -
200 127.0.0.1 /cgi-bin/rinfo 1 1523303233 -
200 10.15.44.10 / 40 1523303232 -
200 10.15.44.10 /cgi-bin 3 1523303232 -
200 10.15.44.10 /control/event.jpg 2 1523303228 -
200 10.15.44.10 /control/control 7 1523303173 -
200 10.15.44.10 /control/click.cgi 5 1523303072 -
200 10.15.44.10 /admin 3 1523301103 admin
200 192.168.1.39 /admin/remoteconfig 1 1441446977 admin
200 192.168.1.39 /control/click.cgi 8 1441446976 -
200 192.168.1.39 /control/control 6 1441446975 -
200 192.168.1.39 /admin/control 2 1441446975 admin
200 192.168.1.39 /control/event.jpg 43 1441446974 -
200 192.168.1.39 /cgi-bin/rinfo 11 1441446974 -
200 192.168.1.39 /control/faststream.jpg 6 1441446972 -
200 192.168.1.39 /server 155 1441446762 -
200 192.168.1.39 /control 5 1441446761 -
200 192.168.1.39 /control/rotorcgi 1 1441446112 -
200 192.168.1.39 /admin/rcontrol 4 1441446099 admin
200 192.168.1.39 /admin 1 1441446099 admin
200 192.168.1.39 /control/rotorcgi 14 1441383750 -
200 192.168.1.39 /admin/rcontrol 22 1441383745 admin
200 192.168.1.39 /control/click.cgi 61 1441383735 -
200 192.168.1.39 /control/control 50 1441383715 -
200 192.168.1.39 /control/event.jpg 128 1441380830 -
200 192.168.1.39 /cgi-bin/rinfo 31 1441380827 -
200 192.168.1.39 /control/faststream.jpg 19 1441380779 -
200 192.168.1.39 /server 162 1441380705 -
200 192.168.1.39 /control 9 1441380700 -
200 192.168.1.39 /admin/remoteconfig 9 1441380453 admin
200 192.168.1.39 /admin 3 1441377633 admin
200 192.168.1.39 /admin/control 4 1441377564 admin
200 10.15.44.176 /record/current.jpg 19796 1441374288 -
200 10.15.44.176 /admin/rcontrol 1 1441374200 admin
200 10.15.44.176 /control/faststream.jpg 1 1441374198 -
200 10.15.44.176 /cgi-bin/rinfo 1 1441374198 -
200 10.15.44.176 /record/current.jpg 80062 1441362628 -
200 10.15.44.176 / 34 1441360650 -
200 10.15.44.176 /decor 294 1441360650 -
200 10.15.44.176 /control 107 1441360649 -
200 10.15.44.176 /control/control 45 1441360647 -
200 10.15.44.176 /control/click.cgi 46 1441360599 -
200 10.15.44.176 /server 135 1441360540 -
200 10.15.44.176 /control/event.jpg 99 1441360505 -
200 10.15.44.176 /admin/saveconfig 1 1441359589 admin
200 10.15.44.176 /control 4 1441359563 admin

Here you can find the entire ingest pipeline :

{
  "lol" : {
    "processors" : [
      {
        "grok" : {
          "field" : "message",
          "patterns" : [
            "%{NUMBER:bytes} %{IP:client} %{URIPATH:itstheip} %{GREEDYDATA:le-reste}"
          ],
          "ignore_missing" : true
        }
      }
    ]
  }
}

and the entire filebeat.yml configuration :

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/webserver.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: true

  # Period on which files under path should be checked for changes
  reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
tags: ["elk"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the setup command.
setup.dashboards.enabled: true

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the output.elasticsearch.hosts and
# setup.kibana.host options.
# You can find the cloud.id in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the output.elasticsearch.username and
# output.elasticsearch.password settings. The format is <user>:<pass>.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]

  # Protocol - either http (default) or https.
  protocol: "http"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"
  password: "oRBFVPXn0vSLMzWD9hJB"

# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
logging.selectors: ["*"]

logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch monitoring cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

http.enabled: true
http.port: 5067

Exactly what version of the elastic Stack?

This is with 8.1.2, Will work with 8.x

Proper Ingest Pipeline
Note if you use client.ip for the IP Address it will automatically be an ip data fields

PUT _ingest/pipeline/disscus-ip
{
  "description": "",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{NUMBER:http_status} %{IP:client.ip} %{URIPATH:url_path} %{NUMBER:count} %{NUMBER:last_access} %{USER:user.name}"
        ]
      }
    }
  ]
}

Filebeat.yml
Note where I put the pipeline in the input

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  ##
  ## Set Pipeline on the input  
  ##
  pipeline: disscus-ip

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /Users/sbrown/workspace/sample-data/discuss/simple-ip/test.log
    # - /Users/sbrown/workspace/sample-data/discuss/filebeat-multiline/java-except.log
    # - /Users/sbrown/workspace/sample-data/spring-boot-log/spring-short.log
    # - /Users/sbrown/workspace/sample-data/nginx/nginx2020.log
    #- /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*

  # parsers:
  #   - multiline:
  #       type: pattern
  #       pattern: '^\['
  #       negate: true
  #       match: after
  
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "http://localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"
  password: "asldkfjasd;lfkjasd"
  ssl.verification_mode : "none"

# output.console:
#   pretty: true

# ------------------------------ Logstash Output -------------------------------
# output.logstash:
#   # The Logstash hosts
#   hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

Discover

client.ip is of type ip

The version of elk stack is 7.17.3

Unfortunately this didn't work for me with 7.17.3
I don't know why I couldn't find the new fields in the discover window even though I followed all the instructions.

What do the filebeat logs show?

Did you delete the filebeat data stream and start over?

Kibana - Stack Management -Index Management - Data Streams

If you already loaded the file, filebeat will not load it again because it keeps track.

To reload the file remove the data directory in the filebeat directory.

rm -fr ./data

Then try again

How are you starting filebeat?

I ran all mine in 8.2 I will double check all this runs exactly same in 7.17.3

I've changed the place where you've wrote from

pipeline: disscus-ip

from this:

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  ##
  ## Set Pipeline on the input  
  ##
  pipeline: disscus-ip

to this:

output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  pipeline: disscus-ip

  # Protocol - either `http` (default) or `https`.
  protocol: "https"

and it finally worked

1 Like

Glad you got it working

Not sure why this did not work for you I just this in 7.17.3 and it worked as well

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  pipeline: disscus-ip

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /Users/sbrown/workspace/sample-data/discuss/simple-ip/test.log

The only difference is... if you put the pipeline in the inputs section each input can have its own pipeline

If you put the pipeline in the output that will be the default pipeline for any input where the pipeline is not defined in the pipeline

either way you are good.