Can we change sincedb different location for different input?

Karthik_Saravanan · April 6, 2018, 4:53am

I have changed path.data on logstash.yml

path.data:D:/www/logstash-6.2.3/pr/

Input file:

input {
file {
    path => "D:/ELK-Sample-CSV/txt-5/test-import.csv"    <-- this file have 100k lines
    start_position => "beginning"
	#sincedb_path => "D:/www/logstash-6.2.3/pr"
	sincedb_write_interval=> 1
}
}
filter {
  csv {
		separator => ","
	}
}
output {
elasticsearch {
 hosts => "http://localhost:9200"
 index => "test-3"
}
stdout { codec => rubydebug }
}

When I started running
1.Sincedb not created in mentioned location and also in default location
2. unexpectedly shut down, (imported 10k lines)
3. Restart again and I got this message

input {
file {
    path => "D:/ELK-Sample-CSV/txt-5/test-import.csv"
    start_position => "end"    <--- changed to end
	#sincedb_path => "D:/www/logstash-6.2.3/pr"
	sincedb_write_interval=> 1
}
}
filter {
  csv {
		separator => ","
	}
}
output {
elasticsearch {
 hosts => "http://localhost:9200"
 index => "test-3"
}
stdout { codec => rubydebug }
}

Now, I expected to read the line from where it left off?

The plugin keeps track of the current position in each file by recording it in a separate file
named sincedb. This makes it possible to stop and restart Logstash and have it pick up
where it left off without missing the lines that were added to the file while Logstash was
stopped.
Ex:  file rotation is detect from 10001 line or else from unwatched line

Debug message:

 _open_file: D:/ELK-Sample-CSV/txt-5/test-import.csv: opening
D:/ELK-Sample-CSV/txt-5/test-import.csv: initial create, no sincedb, seeking to end 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
each: file grew: D:/ELK-Sample-CSV/txt-5/test-import.csv: old size 0, new size 2007847
Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x5432d49b sleep>"}

magnusbaeck · April 6, 2018, 6:33am

You can configure the exact path to the sincedb file or indirectly set the directory (letting Logstash pick the filename) by changing the path.data setting. What you're trying to do doesn't work.

If you explain what you're trying to accomplish maybe we can suggest something that solves that problem.

Karthik_Saravanan · April 6, 2018, 10:19am

Please read again, I have updated my question.

magnusbaeck · April 6, 2018, 11:29am

Perhaps no sincedb file was ever written? I wonder if the file perhaps is created only after the Logstash hits EOF on the input file, which it never did in this case.

Karthik_Saravanan · April 6, 2018, 12:05pm

Hi Magnus,

Totally deleted logstash folder and reinstall, Run again same file,

When I started running, above mentioned same procedures

Sincedb created in default location D:\www\logstash-6.2.3\data\plugins\inputs\file
unexpectedly shut down, (imported 10k lines)

Restart again and Mentioned Last created sincedb file

input {
file {
 path => "D:/ELK-Sample-CSV/txt-5/test-import.csv"
 start_position => "end"
 sincedb_path => "D:/www/logstash-6.2.3/data/plugins/inputs/file/.sincedb_3618408ba030b67f0d325141aa988e75"
 sincedb_write_interval=> 1
}
}

Got the message:

[2018-04-06T17:33:07,836][DEBUG][logstash.inputs.file     ] _open_file: D:/ELK-Sample-CSV/txt-5/test-import.csv: opening
[2018-04-06T17:33:07,867][DEBUG][logstash.inputs.file     ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb last value 2007847, cur size 2007847
[2018-04-06T17:33:07,867][DEBUG][logstash.inputs.file     ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb: seeking to 2007847

I expected to import from where it left off ,
Can you understand my question?

Karthik_Saravanan · April 6, 2018, 12:38pm

Hi Magnus,
Can update any sample code for unexpectedly shutdown cases,

example: pipeline or system or server machine shutdown

when run the same input, How to resume from where it left? , Because I using a big amount of data importing to elasticsearch on the daily basis.

Logstash how to track or identify watched and unwatched files?

I have read Tracking of the current position in watched files and tried all the way, but no luck.
How to achieve this?

magnusbaeck · April 6, 2018, 1:12pm

The log entries

[2018-04-06T17:33:07,867][DEBUG][logstash.inputs.file ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb last value 2007847, cur size 2007847
[2018-04-06T17:33:07,867][DEBUG][logstash.inputs.file ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb: seeking to 2007847

indicate that the file's size is 2007847 bytes and that Logstash has reached that position, indicating that it actually is continuing where it left off.

Karthik_Saravanan · April 6, 2018, 1:24pm

I understood from the log statement, but there is no new append to my existing index, Still, I have remaining data in my input file.
Here I attached the log:

[2018-04-06T18:52:57,490][DEBUG][logstash.inputs.file     ] _open_file: D:/ELK-Sample-CSV/txt-5/test-import.csv: opening
[2018-04-06T18:52:57,522][DEBUG][logstash.inputs.file     ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb last value 2007847, cur size 2007847
[2018-04-06T18:52:57,526][DEBUG][logstash.inputs.file     ] D:/ELK-Sample-CSV/txt-5/test-import.csv: sincedb: seeking to 2007847
[2018-04-06T18:53:02,107][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:02,474][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2018-04-06T18:53:02,536][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2018-04-06T18:53:07,116][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:07,532][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2018-04-06T18:53:07,532][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2018-04-06T18:53:11,687][DEBUG][logstash.inputs.file     ] _globbed_files: D:/ELK-Sample-CSV/txt-5/test-import.csv: glob is: ["D:/ELK-Sample-CSV/txt-5/test-import.csv"]
[2018-04-06T18:53:12,116][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:12,544][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2018-04-06T18:53:12,544][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2018-04-06T18:53:17,124][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:17,554][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2018-04-06T18:53:17,578][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2018-04-06T18:53:22,125][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0x32b6f385 sleep>"}
[2018-04-06T18:53:22,614][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}

I don't see any error in the log, at the same time there are no new records in the existing index.

magnusbaeck · April 6, 2018, 7:25pm

Is the size of the file actually 2007847 bytes?

Karthik_Saravanan · April 7, 2018, 3:59am

CSV file size is 2007847 bytes
Sincedb value:

   385240477-968648-196608 0 0 2007847

Is it possible to track unwatched line or inserting records from where it left at last run?

magnusbaeck · April 10, 2018, 6:14pm

Okay, but now I'm confused. Logstash clearly seeks to the end of the file so the problem doesn't appear to be that it doesn't pick up where it left off. What are you having problems with?

Karthik_Saravanan · April 12, 2018, 5:10am

Actually, I'm based on Big Data, So, I have more than 50 million records in multiple CSV files,

path => "D:/ELK-Sample-CSV/pr/pr_*.csv"

Logstash running from three different ec2 machines to import records.

Now, the problem is,

Imported 20 million records
Unfortunately shut down anyone server machine,
Again I have to run from the start_position beginning or end?
How to rerun and import records from where it left?

how to handle in this case?

magnusbaeck · April 12, 2018, 6:19am

It might be the case that Logstash doesn't update the sincedb file until it has hit EOF, i.e. if you start reading a large file from scratch you won't be able to interrupt it, but if you continuously read a growing file it works fine.

I believe Filebeat is much better at dealing with this.

Karthik_Saravanan · April 12, 2018, 6:57am

Thanks for your support @magnusbaeck, going to learn #beats

system · May 10, 2018, 6:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sincedb doesn't updated until tail reaches the end of the file Logstash	4	2048	July 6, 2017
How does sincedb work for a Logstash reading multiple files from a single directory? Logstash	6	12243	October 7, 2017
Importing the same file Logstash	19	3154	March 23, 2018
Trouble with Logstash sincedb file Logstash	2	1016	January 4, 2019
Logstash startup completed nothing proceeds after this command in logstash Logstash	6	872	December 21, 2017

Can we change sincedb different location for different input?

Related topics