Issue:
I have two kinds CSV file, one kind (default) has two columns, and the other one (custom) has about seventeen columns.
Default CSV:
rosbagTimestamp
data
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
1.56E+18
53829632
Custom CSV:
rosbagTimestamp
header
seq
stamp
secs
nsecs
frame_id
status
goal_id
stamp
secs
nsecs
id
status
text
feedback
message
percent_compressed
percent_uploaded
duration_time
1.56E+18
1
1.56E+09
9.93E+08
''
1.56E+09
8.51E+08
"/uploader-1-1557443728.850739955"
1
"This goal has been accepted by the simple action server"
"LabelAggregationStrategy complete"
0
4
0
1.56E+18
2
1.56E+09
13680934
''
1.56E+09
8.51E+08
"/uploader-1-1557443728.850739955"
1
"This goal has been accepted by the simple action server"
"LabelConversionStrategy complete"
0
8
0
1.56E+18
3
1.56E+09
23626089
''
1.56E+09
8.51E+08
"/uploader-1-1557443728.850739955"
1
"This goal has been accepted by the simple action server"
"AuxAggregationStrategy complete"
0
12
0
1.56E+18
4
1.56E+09
4.95E+08
''
1.56E+09
8.51E+08
"/uploader-1-1557443728.850739955"
1
"This goal has been accepted by the simple action server"
"WaveAggregationStrategy complete"
0
16
0
1.56E+18
5
1.56E+09
9.54E+08
''
1.56E+09
8.51E+08
"/uploader-1-1557443728.850739955"
1
"This goal has been accepted by the simple action server"
"LabelSplitStrategy complete"
0
20
0
1.56E+18
6
1.56E+09
3.91E+08
''
1.56E+09
8.51E+08
"/uploader-1-1557443728.850739955"
1
"This goal has been accepted by the simple action server"
"AuxConversionStrategy complete"
0
25
0
1.56E+18
7
1.56E+09
5.23E+08
''
1.56E+09
8.51E+08
"/uploader-1-1557443728.850739955"
1
"This goal has been accepted by the simple action server"
"LogAggregationStrategy complete"
0
29
4
Objective:
I am trying to filter the data, but my logstash configuration works for only one kind of csv at a time. The two kinds of files are mixed in a directory, so I cannot seprate them manually.
Questions:
Is there any way I can count the number of columns of these csv files using Logstash and apply a different configuration for each?
What is the best and easiest way of handling multiple coniguration formats in a single configuration file?
one more question, if you had two kinds of CSV files that lets say, they have +15 columns, what is the best way to apply different filtering patterns to these two CSV kinds? Is counting the number of columns still the best option or can we do better?
You could use the add_field option on the csv filters to add a document_type field. Then make the filtering conditional on that, either
if [docment_type] == "oneThing" {
#Filters for oneThing
}
Or possibly using pipeline-to-pipeline communication with a distributor pattern. If the processing is only slightly different I would lean towards conditionals. If it is significantly different I would lean towards pipelines.
Yes. I just realized I might have misunderstood the second question. Do you have two types of CSVs that both have 15 columns, or do you have two types of CSVs that have different number of columns.
If it is the former you are going to have to find a regular expression that allows you recognize one of them (and anything that does not match is the other).
One CSV file has around 17 columns, the other has around 100 columns which some of the column values in the csv are of the form csv key, value pairs. Here is an example of it:
OK, so count the number of columns using the method I linked to, then save that in a field on the event (possibly a field within the [@metadata] object), then do things conditionally based on that field.
@Badger, question about your code. Could you please explain what is the purpose of having "[@metadata][fields]" inside event.set(...) in your code down below? The objective is to count the number of columns, so I don't understand why we need to take @metadata into account, I thought jsut having event.get("message").count(",") + 1 is enough. I appreciate if you can explain your logic.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.