Importing file to an existing index

I wanted to update my index by importing a file via Logstash. I used the same logstash file (the input file was updated with new records) cause I thought it's gonna overwrite the existing index. I tested this on a different index and everything worked as far as I could've seen. But now, somehow I got more documents than the count of records in the input file. But at the same time, it wasn't duplicated cause the difference in count is not that big.
Do you have any idea what could be the issue? Thanks in advance.

That might happen, but it's not a guarantee.

You'd need to share your config for us to comment more.

that happens because there is _id for each record. and that _id is uniq.
if you have not define it then ELK will create automatically

i.e next time if you import same record again it will be dulicated with different _id.
if you try third time your record count will increase again.

to avoid this on logstash output section you will have to define uniq document_id

you will have to create it , the way you can do is by combine multiple field and create uniq id. it will depend on your input.

I think I already am defining an unique document_id in the config file. Here's how my config looks:

input {
	file {
		path => "input_sample.csv"
		start_position => "beginning"
		sincedb_path => "NUL"
	}
}

filter {
	csv {
		separator => ","
		autodetect_column_names => true
	}
	ruby {
	    code => "wanted_columns = ['License Plate','Brand','Expiry Date','Catalogue Price']
	    event.to_hash.keys.each { |k| event.remove(k) unless wanted_columns.include? k }"
	}
	mutate {
		rename => {
			"License Plate" => "licensePlate"
			"Brand" => "brand"
			"Expiry Date" => "expiryDate"
			"Catalogue Price" => "cataloguePrice"
		}
	}
	date {
		match => ["expiryDate", "yyyyMMdd"]
		target => "expiryDate"
		timezone => "UTC"
	}
}

output {
	elasticsearch {
		hosts => ["http://localhost:9200"]
		index => "car"
		document_id => "%{licensePlate}"
	}
}

by this config it will not create new entry but it will not update either.

for example licensePlate=ABC123.

in first run you had already created record.
in second pass when it try to create this record with updated value it sees that _id already exist and it won't do anything. if you want to update you have to use action => "update" in output section.

now to the original problem. you will have to find out which record are duplicated/new/unwanted and find out why they are there.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.