Data loading in real time using logstash csv input

Alternatively, in your python script, you could give each file a different name, using a uuid for example, and delete all files that do not have that uuid as its name. Make sure you do the delete after the new file write to minimize the chances of the same inode being reused.

The reason why you get data some or all data is that, while LS is running it keeps track of the amount of bytes processed by inode in memory and periodically to disk. If it detects a file size change when you overwrite the file contents, it does one of three things; 1) if the size is less than previous - it sees this as wholly new content and rereads the file 2) if the size is the same - it does nothing; 3) if the size is more than previous - it reads the bytes from the previous size point up to the new size (this is the "tailing" behaviour).

2 Likes