Can we change sincedb different location for different input?

I have changed path.data on logstash.yml

path.data:D:/www/logstash-6.2.3/pr/

Input file:

input {
file {
    path => "D:/ELK-Sample-CSV/txt-5/test-import.csv"    <-- this file have 100k lines
    start_position => "beginning"
	#sincedb_path => "D:/www/logstash-6.2.3/pr"
	sincedb_write_interval=> 1
}
}
filter {
  csv {
		separator => ","
	}
}
output {
elasticsearch {
 hosts => "http://localhost:9200"
 index => "test-3"
}
stdout { codec => rubydebug }
}

When I started running
1.Sincedb not created in mentioned location and also in default location
2. unexpectedly shut down, (imported 10k lines)
3. Restart again and I got this message

input {
file {
    path => "D:/ELK-Sample-CSV/txt-5/test-import.csv"
    start_position => "end"    <--- changed to end
	#sincedb_path => "D:/www/logstash-6.2.3/pr"
	sincedb_write_interval=> 1
}
}
filter {
  csv {
		separator => ","
	}
}
output {
elasticsearch {
 hosts => "http://localhost:9200"
 index => "test-3"
}
stdout { codec => rubydebug }
}

Now, I expected to read the line from where it left off?

The plugin keeps track of the current position in each file by recording it in a separate file
named sincedb. This makes it possible to stop and restart Logstash and have it pick up
where it left off without missing the lines that were added to the file while Logstash was
stopped.
Ex:  file rotation is detect from 10001 line or else from unwatched line

Debug message:

 _open_file: D:/ELK-Sample-CSV/txt-5/test-import.csv: opening
D:/ELK-Sample-CSV/txt-5/test-import.csv: initial create, no sincedb, seeking to end 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x5432d49b sleep>"}

You can configure the exact path to the sincedb file or indirectly set the directory (letting Logstash pick the filename) by changing the path.data setting. What you're trying to do doesn't work.

If you explain what you're trying to accomplish maybe we can suggest something that solves that problem.

Please read again, I have updated my question.

Perhaps no sincedb file was ever written? I wonder if the file perhaps is created only after the Logstash hits EOF on the input file, which it never did in this case.

Hi Magnus,

Totally deleted logstash folder and reinstall, Run again same file,

When I started running, above mentioned same procedures

  1. Sincedb created in default location D:\www\logstash-6.2.3\data\plugins\inputs\file

  2. unexpectedly shut down, (imported 10k lines)

  3. Restart again and Mentioned Last created sincedb file

    input {
    file {
     path => "D:/ELK-Sample-CSV/txt-5/test-import.csv"
     start_position => "end"
     sincedb_path => "D:/www/logstash-6.2.3/data/plugins/inputs/file/.sincedb_3618408ba030b67f0d325141aa988e75"
     sincedb_write_interval=> 1
    }
    }
    

Got the message:

[2018-04-06T17:33:07,836][DEBUG][logstash.inputs.file     ] _open_file: D:/ELK-Sample-CSV/txt-5/test-import.csv: opening
[2018-04-06T17:33:07,867][DEBUG][logstash.inputs.file     ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb last value 2007847, cur size 2007847
[2018-04-06T17:33:07,867][DEBUG][logstash.inputs.file     ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb: seeking to 2007847

I expected to import from where it left off ,
Can you understand my question?

Hi Magnus,
Can update any sample code for unexpectedly shutdown cases,

example: pipeline or system or server machine shutdown

when run the same input, How to resume from where it left? , Because I using a big amount of data importing to elasticsearch on the daily basis.

Logstash how to track or identify watched and unwatched files?

I have read Tracking of the current position in watched files and tried all the way, but no luck.
How to achieve this?

The log entries

[2018-04-06T17:33:07,867][DEBUG][logstash.inputs.file ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb last value 2007847, cur size 2007847
[2018-04-06T17:33:07,867][DEBUG][logstash.inputs.file ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb: seeking to 2007847

indicate that the file's size is 2007847 bytes and that Logstash has reached that position, indicating that it actually is continuing where it left off.

I understood from the log statement, but there is no new append to my existing index, Still, I have remaining data in my input file.
Here I attached the log:

[2018-04-06T18:52:57,490][DEBUG][logstash.inputs.file     ] _open_file: D:/ELK-Sample-CSV/txt-5/test-import.csv: opening
[2018-04-06T18:52:57,522][DEBUG][logstash.inputs.file     ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb last value 2007847, cur size 2007847
[2018-04-06T18:52:57,526][DEBUG][logstash.inputs.file     ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb: seeking to 2007847
[2018-04-06T18:53:02,107][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:02,474][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2018-04-06T18:53:02,536][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2018-04-06T18:53:07,116][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:07,532][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2018-04-06T18:53:07,532][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2018-04-06T18:53:11,687][DEBUG][logstash.inputs.file     ] _globbed_files: D:/ELK-Sample-CSV/txt-5/test-import.csv: glob is: ["D:/ELK-Sample-CSV/txt-5/test-import.csv"]
[2018-04-06T18:53:12,116][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:12,544][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2018-04-06T18:53:12,544][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2018-04-06T18:53:17,124][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:17,554][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2018-04-06T18:53:17,578][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2018-04-06T18:53:22,125][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:22,614][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}

I don't see any error in the log, at the same time there are no new records in the existing index.

Is the size of the file actually 2007847 bytes?

CSV file size is 2007847 bytes
Sincedb value:

   385240477-968648-196608 0 0 2007847

Is it possible to track unwatched line or inserting records from where it left at last run?

Okay, but now I'm confused. Logstash clearly seeks to the end of the file so the problem doesn't appear to be that it doesn't pick up where it left off. What are you having problems with?

Actually, I'm based on Big Data, So, I have more than 50 million records in multiple CSV files,

path => "D:/ELK-Sample-CSV/pr/pr_*.csv"

Logstash running from three different ec2 machines to import records.

Now, the problem is,

  1. Imported 20 million records
  2. Unfortunately shut down anyone server machine,
  3. Again I have to run from the start_position beginning or end?
  4. How to rerun and import records from where it left?

how to handle in this case?

It might be the case that Logstash doesn't update the sincedb file until it has hit EOF, i.e. if you start reading a large file from scratch you won't be able to interrupt it, but if you continuously read a growing file it works fine.

I believe Filebeat is much better at dealing with this.

Thanks for your support @magnusbaeck, going to learn #beats

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.