Logstash & Elasticsearch - Inserting / Updating data

(Adrian Moreno) #1

I'm testing out Logstash and ElasticSearch on my local dev (Win 7) as a
replacement for our current SQL Server based search pages.

I'm using the current Logstash config to import a folder full of CSV files
(pipe delimited) into ElasticSearch:

input {
stdin {
type => "stdin-type"

file {
path => ["C:/Users/.../export.csv"]

filter {
csv {
columns =>
separator => "|"

output {
elasticsearch {
embedded => true
index => "assets"
index_type => "asset"

  1. Sometimes it imports, sometime it doesn't. I've deleted the .sincedb
    files over and over and have changed the index name to make sure it's going
    in correctly (when it actually runs the import). Any idea why it's sporadic?

  2. I have a data set of over a million records.the "_id" value of each
    record in ES is, of course, a unique string. If I add a new CSV file with
    updates for 100 records, how does Logstash or ES know how to match an
    update to an existing record? In the original data set, the "property_id"
    value is the primary key.

I looked
at http://logstash.net/docs/1.3.3/outputs/elasticsearch#document_id , which
seems to be the correct setting for the import, but what value? I tried
"property_id", the first column name, but that doesn't work. The import
doesn't even run with that setting.

Any help would be appreciated. Thanks.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ca71517f-f950-4d63-9340-57acf35e45f6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

(system) #2