Ingest processor: remove a line from a CSV?

Hi all, I'm trying to create a processor for an ingest pipeline. I'm checking all available existing processor but I can't understand if it's possible to remove a line from a CSV.

The data we need to import has 2 headers, line 1 contains the human readable column name and the second one is the code of the column (example: "Quantity" and "QTY").

We need to keep line 1 only, do you think it's possible?

You can do something like this in the filter (before csv filter)

if [message] =~ "QTY" { drop {} }

Hi @Fosco thanks for your reply, I tried adding:

message =~ "ARTICOLO/CODICE"

in the "Condition" field of the processor but when I try to save the pipeline it throws an error:

drop processor: compile error

I checked the Drop documentation but it doesn't explain the syntax.

Also, I'm not sure if Drop processor is what I need, documentation says:

Drops the document without raising any errors. This is useful to prevent the document from getting indexed based on some condition.

But I don't want to drop the entire document, I want to drop just a line of my document (csv).

Hi @QuartoStato

How are you sending the CSV to the ingest pipeline?

Perhaps there's a bit of confusion The CSV processor processes a single line not multiple.

This is not the same as the CSV filter in logstash

The way I would do this is use filebeat or something to send each line as a message And then use the CSV processor.

You already know what the structure is and you can throw out the line that has the codes in it.

Hi @stephenb you're right I was confused about sending a CSV file or a single row of my CSV.

I can make my own script that parse the CSV but then I don't understand what the processors are useful for. If I have to parse the CSV I can simply create and forge the document to store the way I need, without any processor.

What I'd like to have is less code in my script and more configuration on Elasticsearch (with processors).

If I sent the whole CSV as a single document and use the CSV processor.. that brings to my first questions: how can I instruct the pipeline to skip line 2 of the CSV? Maybe I should use the SCRIPT processor before the CSV processor?

The way I would do it since it does not make sense to send a whole multi-line file to the CSV processor.

I would simply use filebeat to harvest the file.

Filebeat will end up sending The file line by line to Elasticsearch.

In the filebeat settings You'll define the ingest pipeline that will be used

In that ingest pipeline use the CSV or Dissect processor to break it up into the fields.

Then add a drop processor that looks at the first field and if it's equal to either of those header fields drop it..

You will end up with all the lines parsed into individual documents and have the first two lines dropped.

2 Likes

Thank you @stephenb your explanation cleared all my doubts!

I have another question related to Filebeat, hope you can help with this too. I have many CSV in different folders, each folder is for a different index. So my question is, how can I pass this information to the ingest pipeline?

I don't need to dynamically read the name of the folders, I can manually create an input for each folder.

Hi @QuartoStato I am not sure I am following... perhaps open a new thread with all the details and examples ...

But in short add a fields or tag in each input then set the indices on output see here.

See Example here

You can just use your own condition and index

output.elasticsearch:
  hosts: ["http://localhost:9200"]
  indices:
    - index: "warning-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        message: "WARN"
    - index: "error-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        message: "ERR"

Thank you, I think you put me on the right track. Clearly I need to RTFM anyway :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.