Ingest data to DataStream Through Filebeat

Debasis_Mallick · August 7, 2024, 5:41am

Hi Team,

Could you please advise how can we ingest data to data streams by using filebeat.

Thanks,
Debasis

stephenb · August 7, 2024, 5:54am

Hi @Debasis_Mallick

What version filbeat and elastic?

There is an example here

BTW by default new versions of filebeat automatically use data streams.

So what version are you on

Debasis_Mallick · August 8, 2024, 6:34am

@stephenb In my case both filebeat and elastic is of version 8.9.2. I am using below filebeat.yml but not able to ingest data.

[root@cb-4 filebeat]# cat filebeat.yml

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.

  # Unique ID among all inputs, an ID is required.

  # Change to true to enable this input configuration.

  # Paths that should be crawled and fetched. Glob based paths.

- type: filestream
  id: sfwidxt-06-08-2024
  enabled: true
  paths:
   - /cbdata/elastic/cb4lv1/democsv/*.csv # path to your CSV file
  exclude_lines: ['^\"\"']          # header line

filebeat.registry.flush: 60s

 

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


setup.template.enabled: false


setup.ilm.enabled: false
setup.ilm.policy_name: sfwilm


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "http://xx.xx.xx:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:


# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["https://xx.xx.xx.xx:9200","https://xx.xx.xx.xx:9200"]
  worker: 8
  bulk_max_size: 3000
  # Protocol - either `http` (default) or `https`.
  protocol: "https"


  # Authentication credentials - either API key or username/password.
  # api_key: "id:api_key"
  username: "elastic"
  password: "elastic"
  ssl:
    enabled: true
    certificate_authorities: ["/etc/filebeat/certs/cert.pem"]
# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================

# ================================== Logging ===================================

# Sets log level. The default log leveltamp:

# Available log levels are: error, warning, info, debug
logging:
  level: debug
  to_files: true
  files:
    path: /var/log/filebeat
    name: filebeat
    keepfiles: 7

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

#path.logs: /var/log/filebeat

Please let us know if anything wrong in the yml files.

Thanks,
Debasis

Debasis_Mallick · August 8, 2024, 9:31am

@stephenb The main issue is we are not able to see the error even if the filebeat mode is debug. Could you please help me if anything missing.

Thanks,
Debasis

stephenb · August 8, 2024, 2:29pm

Hi @Debasis_Mallick

Can you tell me

a) What you are trying to accomplish what data stream you are trying to write to?

When I see this

setup.template.enabled: false
setup.ilm.enabled: false
setup.ilm.policy_name: sfwilm

With no output index name I am not sure what you expect... that says send to
filebeat-8.9.2 but you instructed filebeat not to load its own template so I am not sure what you are trying to do.

we are not able to see the error even

How did you install (.dem, rpm etc) and how are you starting filebeat

Have you already loaded the files csv files onces already ... if so they will no be loaded again unless you clean out the the registry database...

So not really sure what you are trying to do / accomplish and where you are at.

My normal suggestion is if you are new... don't change a bunch of settings use the default templates / mappings / index names etc..etc... get it working and then adjust...

Debasis_Mallick · August 9, 2024, 5:51am

With the assistance of Data stream, I aim to generate daily indices based on the Index template by parsing the CSV file using Filebeat. Every day, post 12:00 AM midnight, it should transition to a new index for data should ingest to that with help of Filebeat. Each index's lifespan should not exceed 90 days; hence, on the 91st day, the initial index should be automatically removed.

Each load I creating some new csv files which are not previously loaded to Elastic.

I had installed filebeat using rpm based installation and it started as systemctl service .

Thanks,
Debasis

stephenb · August 9, 2024, 3:27pm

Hi @Debasis_Mallick

This is how I would do it...

This is a complete working example

Setup the filebeat.yml below and adjust for your input and SSL etc.

you can substitute sfwilm for everyplace I have csv

run setup

filebeat setup -e

It will create a Template, Data Stream and an ILM Policy
(You can edit the template if you want )
You probably need an Ingest Pipeline to parse the CSV... you will need to create it
Here is the csv processor

filebeat.inputs:

- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: my-filestream-id

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*

output.elasticsearch:
  hosts: ["http://localhost:9200"]
  index: "csv-data-%{[agent.version]}" 
  pipeline: myingestpipeline <<< To Parse the CSV

setup.ilm.enabled: true
setup.ilm.check_exists: true
setup.ilm.rollover_alias: csv-data
setup.ilm.pattern: '{now/d}-000001'
setup.ilm.overwrite: false

setup.template.enabled: true  
setup.template.name: "csv-data-%{[agent.version]}" 
setup.template.pattern: "csv-data-%{[agent.version]}" 
setup.template.overwrite: false 
setup.ilm.policy_name: csv-data

Then edit the ILM Policy to Rollover Evervyday ... It will do it About 12:00 AM UTC

You can just PUT this or edit in the UI

PUT _ilm/policy/csv-data
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "1d",
            "max_primary_shard_size": "50gb"
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}

This will create data stream backing indices that look like this which will rollover every day

green open .ds-csv-data-8.9.2-2024.08.09-000001

Debasis_Mallick · August 12, 2024, 2:16am

Thanks @stephenb . Let me try and in case error I will post you the result.

Thanks,
Debasis

Debasis_Mallick · August 12, 2024, 7:31am

@stephenb I am trying to update the csv-data-8.9.2 so that it will create Index according to the csv data we received from client.

PUT _index_template/csv-data-8.9.2
{
        "index_patterns": ["csv-data-8.9.2"],
          "settings": {
              "lifecycle": {"name": "csv-data"},
              "refresh_interval": "5s",
              "number_of_shards": "4",
              "number_of_replicas": "1"
          },
          "mappings": {
      "sequence": {
        "type": "keyword"
      },
	  "component":{
	  "type":"integer"
	  },
	  "tenant":{
	  "type":"keyword"
	  },
     "service_id":{
	  "type":"integer"
	  },
	  "session_id":{
	  "type":"keyword"
	  },
	  "timestamp":{
	  "type":"date"
	  },
	  "edr_version":{
	  "type":"keyword"
	  },
	  "action_id":{
	  "type":"integer"
	  }
    },      
        "composed_of": [],
        "priority": 150,
        "data_stream": {
          "hidden": false,
          "allow_custom_routing": false
        }
      }

But I am getting below error. I just doublechecked the syntax according to elastic docs but still no help. Any advise where it is getting wrong.

{
  "error": {
    "root_cause": [
      {
        "type": "x_content_parse_exception",
        "reason": "[3:11] [index_template] unknown field [settings]"
      }
    ],
    "type": "x_content_parse_exception",
    "reason": "[3:11] [index_template] unknown field [settings]"
  },
  "status": 400
}

Below is the sample data from csv which I am trying to upload.

> [root@cb-4 democsv]# cat 2024-08-03.csv
> sequence,component,tenant,service_id,session_id,timestamp,edr_version
> 45771,85,INDAT,33,11711802425874023106770,1720264154000,V2
> 45772,86,INDAT,33,11711802425874023106770,1720264154000,V2

Thanks,
Debasis

leandrojmp · August 12, 2024, 11:54am

Which doc did you check it? The syntas is not correct.

Check the example here.

The objects settings and mappings need to be nested under template.

Something like this:

    "template": {
        "settings": { your settings },
        "mappings": { your mappings }
    }

Debasis_Mallick · August 13, 2024, 6:17am

@leandrojmp After adding properties field inside mappings the syntax error is resolved.

PUT _index_template/csv-data-8.9.2
{
        "index_patterns": ["csv-data-8.9.2"],
        "data_stream": {},
        "template": {
          "settings": {
           "lifecycle": {"name": "csv-data"},
           "refresh_interval": "5s",
            "number_of_shards": "4",
            "number_of_replicas": "1"
          },
    "mappings": 
    {"properties": {
    "sequence":{"type":"keyword"},
	  "component":{"type":"integer"},
	  "tenant":{"type":"keyword"},
    "service_id":{"type":"integer"},
	  "session_id":{"type":"keyword"},
	  "timestamp":{"type":"date"},
	  "edr_version":{"type":"keyword"},
	  "action_id":{"type":"integer"}
    }
    }
  }
}

@stephenb Still data is not ingested to the required index. I can see the rollover still not happening to newer day index below is the ILM policy definition.

PUT _ilm/policy/csv-data
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "1d",
            "max_primary_shard_size": "50gb"
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}

Thanks,
Debasis

leandrojmp · August 13, 2024, 12:44pm

Please share some evidence of it, just the policy is not enough.

Is your data being written in which index?

Also, I think there is a little confusion here, with data streams you will not get daily indices when the day changes, only when it completes 1 day after the creation of the index as it is based on the age.

Even if you create the index exactly at midnight, it is not guarantee that it will rollover at the same time as it can take some time for the rollover to trigger.

So, if the index is created in the middle of the day for example, it will only rollover in the middle of the next day.

Also, please share the entire filebeat.yml you are using now.

stephenb · August 13, 2024, 2:14pm

Hi @Debasis_Mallick

In Kibana Dev Tools run

GET _cat/indices/*csv*?v

Show the command and results.

What I provided absolutely works All you needed to do is add your extra mappings to the template that was created by my code and everything would work.

And yes, please share your entire filebeat.yml

Debasis_Mallick · August 13, 2024, 2:24pm

health status index                                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .ds-csv-data-8.9.2-2024.08.12-000001 LvWjWLhcTSKT7fWjOc1vCQ   1   1          0            0       494b           247b

I had already shared the template in above thread. I am sharing it one more time for your information.

PUT _index_template/csv-data-8.9.2
{
        "index_patterns": ["csv-data-8.9.2"],
        "data_stream": {},
        "template": {
          "settings": {
           "lifecycle": {"name": "csv-data"},
           "refresh_interval": "5s",
            "number_of_shards": "4",
            "number_of_replicas": "1"
          },
    "mappings": 
    {"properties": {
    "sequence":{"type":"keyword"},
	  "component":{"type":"integer"},
	  "tenant":{"type":"keyword"},
    "service_id":{"type":"integer"},
	  "session_id":{"type":"keyword"},
	  "timestamp":{"type":"date"},
	  "edr_version":{"type":"keyword"},
	  "action_id":{"type":"integer"}
    }
    }
  }
}

Please find the filebeat.yml for your reference.

[root@cb-4 ~]# cat /etc/filebeat/filebeat.yml

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.

  # Unique ID among all inputs, an ID is required.

  # Change to true to enable this input configuration.

  # Paths that should be crawled and fetched. Glob based paths.

- type: filestream
  id: my-filestream-id
  enabled: true
  paths:
   - /cbdata/elastic/cb4lv1/democsv1/*.csv # path to your CSV file
  exclude_lines: ['^\"\"']          # header line

filebeat.registry.flush: 60s

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # Line filtering happens after the parsers pipeline. If you would like to filter lines
  # before parsers, use include_message parser.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  # Line filtering happens after the parsers pipeline. If you would like to filter lines
  # before parsers, use include_message parser.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

setup.ilm.enabled: true
setup.ilm.check_exists: true
setup.ilm.rollover_alias: csv-data
setup.ilm.pattern: '{now/d}-000001'
setup.ilm.overwrite: false

setup.template.enabled: true
setup.template.name: "csv-data-%{[agent.version]}"
setup.template.pattern: "csv-data-%{[agent.version]}"
setup.template.overwrite: false
setup.ilm.policy_name: csv-data




# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "http://10.10.17.54:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["https://10.10.18.174:9200","https://10.10.18.215:9200"]
  worker: 8
  bulk_max_size: 3000
  # Protocol - either `http` (default) or `https`.
  protocol: "https"
  index: "csv-data-%{[agent.version]}"
  pipeline: parse_elastic_data_v2

  # Authentication credentials - either API key or username/password.
  # api_key: "id:api_key"
  username: "elastic"
  password: "elastic"
  ssl:
    enabled: true
    certificate_authorities: ["/etc/filebeat/certs/cert.pem"]
# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================

# ================================== Logging ===================================

# Sets log level. The default log leveltamp:

# Available log levels are: error, warning, info, debug
logging:
  level: debug
  to_files: true
  files:
    path: /var/log/filebeat
    name: filebeat
    keepfiles: 7

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

#path.logs: /var/log/filebeat

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

monitoring:
  enabled: true
  elasticsearch:
    username: beats_system
    password: beats_system

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

Thanks,
Debasis

stephenb · August 13, 2024, 3:05pm

It looks like you have not actually written any data..

So that first index is create when you ran setup

.ds-csv-data-8.9.2-2024.08.12-000001

You can see there is no documents in it

If you were to write documents today they would go into

.ds-csv-data-8.9.2-2024.08.13-000002

So to me looks like you are not actually reading any files and shipping them or you are dropping every line...

or your ingest pipeline is failing..

or you have mapping errors...

What I would suggest is use the tar.gz distribution of filebeat

Find a directory you can work in

Download it
untar it
put your config in
run filebeat in the foreground
$ ./filebeat -e <<< If you run this and wait about a minute one of the line will show you exactly how many event you shipped lacked etc.

or even turn on all debug
$ ./filebeat -e -d "*"

See what you see... I think you are not reading any files or you already read them which in that case you need to cleanout the filebeat data registry..

Debasis_Mallick · August 13, 2024, 4:04pm

@stephenb Is there anyway to upload filebeat log since it is almost 3 MB. Below is the snippet from where it clearly says filebeat able to read record but not able to ingest and error also not printing.

{"log.level":"debug","@timestamp":"2024-08-13T10:51:37.139+0530","log.logger":"processors","log.origin":{"file.name":"processing/processors.go","file.line":213},"message":"Publish event: {\n  \"@timestamp\": \"2024-08-13T05:21:37.138Z\",\n  \"@metadata\": {\n    \"beat\": \"filebeat\",\n    \"type\": \"_doc\",\n    \"version\": \"8.9.2\"\n  },\n  \"log\": {\n    \"offset\": 129,\n    \"file\": {\n      \"path\": \"/cbdata/elastic/cb4lv1/democsv1/2024-08-12.csv\"\n    }\n  },\n  \"message\": \"45784,86,INDAT,33,11711802425874023106770,1720264154000,V2\",\n  \"input\": {\n    \"type\": \"filestream\"\n  },\n  \"ecs\": {\n    \"version\": \"8.0.0\"\n  },\n  \"host\": {\n    \"name\": \"cb-4\"\n  },\n  \"agent\": {\n    \"version\": \"8.9.2\",\n    \"ephemeral_id\": \"2c2281ad-a804-4c8b-bf6a-bc627d102e1b\",\n    \"id\": \"9afb3b58-4756-4969-98bf-5d2487aef70f\",\n    \"name\": \"cb-4\",\n    \"type\": \"filebeat\"\n  }\n}","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-08-13T10:51:37.139+0530","log.logger":"input.filestream","log.origin":{"file.name":"filestream/filestream.go","file.line":131},"message":"End of file reached: /cbdata/elastic/cb4lv1/democsv1/2024-08-12.csv; Backoff now.","service.name":"filebeat","id":"my-filestream-id","source_file":"filestream::my-filestream-id::native::9437223-64513","path":"/cbdata/elastic/cb4lv1/democsv1/2024-08-12.csv","state-id":"native::9437223-64513","ecs.version":"1.6.0"}

Thanks,
Debasis

stephenb · August 13, 2024, 7:24pm

If you ingested it once... It will not be reingested filebeat keeps track of that.

Take out the pipeline in the output section. You did not share your ingest pipeline BTW

Clean up / DELETE the datasteam

Re run setup.

Clean up the registry in the data directory

Remove everything in

/var/lib/filebeat

Then run again....

Then try again....

You can always use pastbin to share a file

Debasis_Mallick · August 14, 2024, 9:16am

thanks @stephenb after doing minor correction in pipeline able to ingest record to Index. I will monitor whether rollover is happening or not.
Could you please help me to understand what is the below parameter you suggested to mention in filebeat.yml.
setup.ilm.pattern: '{now/d}-000001'

Thanks,
Debasis

Debasis_Mallick · August 16, 2024, 12:12am

@leandrojmp Can not we tweak the yml file associated with filebeat to create each day index at midnight. As we check with client they will post those csv files in output directory after 12 AM only.

Thanks,
Debasis

Debasis_Mallick · August 19, 2024, 1:21pm

@stephenb While observing the Indices, we found that Indexes are not rollover if the csv files are not avliable in the paths folder mention in filebeat.yml. Is it expected behavior.

type: filestream
  id: my-filestream-id
  enabled: true
  paths:
   - /cbdata/elastic/cb4lv1/democsv1/*.csv

Thanks,
Debasis

Topic		Replies	Views
Filebeat: Ingest logs to Datastream Elasticsearch	2	679	December 28, 2020
Beats and Elastic Agent data streams Beats filebeat , metricbeat	3	330	May 17, 2024
How to configure file beat to send to data stream so that life policy settings can be made use of? Beats filebeat	3	487	December 9, 2021
Filebeat module ingest pipeline Beats beats-module , filebeat	6	684	February 21, 2023
How to configure ingest pipeline with filebeat to send filtered data to elastic search Elasticsearch	1	286	March 29, 2021

Ingest data to DataStream Through Filebeat

Related topics