sorry what do you mean by 'second file of the configuration'?
and what is your suggestion on path forward, if we want to run 4 parallel pipelines (3 with TCP input and 1 with file input), we cant do that?
appreciate your insights.
sorry what do you mean by 'second file of the configuration'?
and what is your suggestion on path forward, if we want to run 4 parallel pipelines (3 with TCP input and 1 with file input), we cant do that?
appreciate your insights.
I mean that if you have
"pipeline.sources"=>[
"/opt/elastic-release/logstash/pipeline/security1100/100-ingestion.conf",
"/opt/elastic-release/logstash/pipeline/security1100/102-date.conf",
"/opt/elastic-release/logstash/pipeline/security1100/200-tcapLookups.conf",
"/opt/elastic-release/logstash/pipeline/security1100/201-sccpLookups.conf",
"/opt/elastic-release/logstash/pipeline/security1100/202-gtaLookups.conf",
"/opt/elastic-release/logstash/pipeline/security1100/300-secMessagesLookup.conf",
"/opt/elastic-release/logstash/pipeline/security1100/301-secMessages.conf",
"/opt/elastic-release/logstash/pipeline/security1100/302-JanKOrdering.yml",
"/opt/elastic-release/logstash/pipeline/security1100/400-fieldRemoval.conf",
"/opt/elastic-release/logstash/pipeline/security1100/900-output.conf"],
then the error will occur for security1100/102-date.conf
ok. So i understand that this known issue is preventing me from running 4 parallel pipelines (3 with TCP input and 1 with file input)? is what you are suggesting is that the two are linked?
No, I do not think they are linked. You said you only get that "Could not determine ID for filter/date" when the java execution engine is disabled, but the pipeline with the file input stops even when it is enabled.
do you have any insights on the pipeline issue i am encountering?
No, sorry.
Hi @hsalim we could try to reproduce it, but some information is needed to work on this.
Are you able to provide a minimal pipeline config that reproduces it? Maybe a pipeline with the minimal input filter and output that manifest the problem.
Which version of Logstash are you using?
Hi @Andrea_Selva issue we are having is files that are waiting to be ingested are not being marked as new_discovery, see below, any ideas?
[2021-05-20T18:20:02,794][TRACE][filewatch.discoverer ] discover_files {"count"=>2254}
[2021-05-20T18:20:02,794][TRACE][filewatch.discoverer ] discover_files handling: {"new discovery"=>false, "watched_file details"=>"<FileWatch::WatchedFile: @filename='202010260420-332388-subset.txt.gz', @state='ignored', @recent_states='[:watched, :watched]', @bytes_read='354', @bytes_unread='0', current_size='351', last_stat_size='351', file_open?='false', @initial=false, @sincedb_key='67116587 0 2058'>"}
So it has read 354 bytes from a 351 byte file. That suggests inode reuse.
@Badger i am not using sincedb config is below, so not sure how inode reuse will impact me? and if for some weird reason it does what is the solution? and by the way this only happens on 3 node cluster not on a single node.
input {
file {
path => "/home/ftpuser/tdr/input/diameter/SYSTEM/local/*.gz"
max_open_files => 8000
mode => "read"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
The sincedb_path option determines whether the in-memory sincedb is persisted to disk across restarts. The in-memory db is always used.
Reliably determining whether a file has been read really requires maintaining a hash of the part of the file that has been read. That means re-hashing the file every time additional data is read, which would be extremely expensive. The file input uses a much cheaper technique which sometimes gets things wrong.
When in read mode and using file_completion_action delete, the problem could be solved by deleting sincedb entries as part of the file deletion process. If someone contributed a PR that implemented that I expect elastic would be happy to take a look at it.
Yes, i read up and now understand better role of sincedb.
so my files do not change once written to directory for ingestion. file_completion_action delete is active (as default). i am now testing with sincedb_clean_after 45 seconds. lets see...
any other ideas and suggestions are welcome.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.