Indexing csv data in a nested field using logstash

With below settings im unable to index the data .
Logstash pipeline starts and stands still without showing any error or producing output.
Please correct me where it is going wrong

Input csv:

cust_name,state1,city1,state2,city2
ab,CA,LA,IL,Chicago

Mapping used :

"mappings": {
"info": {
"properties": {
"cust_name": {
"type": "string"
},
"address": {
"properties": {
"address1": {
"properties": {
"city": {
"type": "string"
},
"state": {
"type": "string"
}
}
},
"address2": {
"properties": {
"city": {
"type": "string"
},
"state": {
"type": "string"
}
}
}
},
"type": "nested"
}
}
}
}

conf file :

input {
file {
path => "/home/cloudera/Desktop/nested_csv.csv"
type => "core2"
start_position => "beginning"
}
}
filter {
csv {
columns => ["cust_name", "state1","city1","state2","city2"]
separator => ","
}
mutate{
rename => {
"state1" => "[address][address1][state]"
"city1" => "[address][address1][city]"
"state2" => "[address][address2][state]"
"city2" => "[address][address2][city]"
}
}
}
output {
elasticsearch {
action => "index"
hosts => ["localhost:9200"]
index => "nested_sample"
document_type => "info"
workers => 1
}
}

Desired output :
cust_name: "ab",
address : {
address1 : {
state:"CA",
city:"LA"
},
address2:{
state:"IL",
city:"Chicago"
}
}

Thanks in Advance.

I just tried your filter config, and it works great !

I think your problem is that you have a "sincedb" file pointing to the end of your csv file, so csv content is not processed.

So I invite you to stop logstash, delete all files $HOME/.sincedb* and restart logstash.

Yes it works.. Thank you Fab for your response.
Do i have to do the above said .sincedb file deletion every time ?
Is there a workaround to avoid this scenario?

Normally, you have just to do it one time, given that you have a program which writes into input csv
file or you have input file configured with path containing * and each time a new csv file is detected, logstash will process it from beginning.

Other than that, if you need to make sure to index the CSV, no matter what, you can always set the sincedb_path in the input to /dev/null and LS will read in the file again whenever needed.

Thorsten

1 Like