PUT _ingest/pipeline/csv_pipeline
{
"description": "A pipeline to parse CSV data",
"processors": [
{
"csv": {
"field": "message",
"target_fields": ["cluster", "index", "ilm_policy", "time_since_index_creation"],
"ignore_missing": false
}
}
]
}
###################### Filebeat Configuration Example #########################
# ============================== Filebeat modules ==============================
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: filestream
id: my-csv-filestream-id
enabled: true
paths:
- /Users/taran/stocks/ansible/indices_ilm_policies_prod1.es.us-central1.gcp.cloud.es.io.csv
- /Users/taran/stocks/ansible/indices_ilm_policies_prod2.es.asia-south1.gcp.elastic-cloud.com.csv
- type: filestream # Input for macOS logs
enabled: true
paths:
- /var/log/system.log
# ======================= Elasticsearch template setting =======================
setup.template.settings:
index.number_of_shards: 1
setup.template.name: "ilmprod2" # Replace with your template name
setup.template.pattern: "ilm-*" # Replace with your template pattern
setup.ilm.overwrite: true
# ================================== General ===================================
# The name of the shipper that publishes the network data.
#name:
# The tags of the shipper are included in their field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
# =================================== Kibana ===================================
setup.kibana:
host: "prod1.kb.us-central1.gcp.cloud.es.io:9243"
ssl.verification_mode: "none"
# =============================== Elastic Cloud ================================
cloud.id: "7455d8019d7e455ebf45b0704a20d83e:d"
cloud.auth: ""
# ================================== Outputs ===================================
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
hosts: ["prod1.es.us-central1.gcp.cloud.es.io"]
protocol: "https"
username: "elastic"
password: ""
indices:
- index: "combined-csv-index" # Single index for both CSV files
pipeline: csv_pipeline
when.contains:
log.file.path: "/Users/taran/stocks/ansible/indices_ilm_policies"
- index: "macos-log"
when.contains:
log.file.path: "/var/log/system.log"
# ================================= Processors =================================
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
# ================================== Logging ===================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors, use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]
# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch.
# The reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#monitoring.enabled: false
# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI.
#monitoring.cluster_uuid:
# Uncomment to send the metrics to Elasticsearch.
#monitoring.elasticsearch:
# ============================== Instrumentation ===============================
# Instrumentation support for the filebeat.
#instrumentation:
# Set to true to enable instrumentation of filebeat.
#enabled: false
# Environment in which filebeat is running on (eg: staging, production, etc.)
#environment: ""
# APM Server hosts to report instrumentation results to.
#hosts:
# - http://localhost:8200
# API Key for the APM Server(s).
#api_key:
# Secret token for the APM Server(s).
#secret_token:
# ================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true
Cluster1,.ds-logs-enterprise_search.api-default-2023.11.27-000002,logs-enterprise_search.api-default,1.66d
Cluster1,.internal.alerts-observability.apm.alerts-default-000001,.alerts-ilm-policy,8.67d
Cluster1,.elastic-connectors-v1,None,Not available
Cluster1,.ds-.monitoring-beats-8-mb-2023.11.26-000003,.monitoring-8-ilm-policy,2.61d
Cluster1,.ds-.monitoring-ent-search-8-mb-2023.11.26-000003,.monitoring-8-ilm-policy,2.61d
Cluster1,.ds-logs-enterprise_search.api-default-2023.11.20-000001,logs-enterprise_search.api-default,8.66d
Firstly, I have multiple CSV files in a folder, but when I use the asterisk (*) to select them, they are not being picked up. I have also defined the type as "log", but it still doesn't work.
Secondly, the pipeline is not functioning properly. It only works if I reindex the index by copying it after injection. However, it fails to work before injection.
To address these issues, I have created an index component and made necessary adjustments to the ILM and settings. I have even tried changing the index name to "policy" or any other name, but the problem persists.
The main challenges I am facing are twofold. Firstly, I need to find a way to add multiple files to the pipeline. Secondly, the pipeline itself is not functioning as expected when using filebeat. I have attempted alternative methods, but the pipeline continues to fail.
It is crucial to resolve these issues promptly to ensure smooth and efficient data processing.
Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.
Or use markdown style like:
```
CODE
```
This is the icon to use if you are not using markdown format:
There's a live preview panel for exactly this reasons.
Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.