Logstash input file configuration

Hi!
I use logstash to import a json file into Elasticsearch.

I use the following configuration but the index is not created in Elasticsearch (or the consol).

input{
	file{
		codec=>multiline{
			pattern =>"^{header" 
			negate=>true
			what=> previous
			auto_flush_interval=>5
			multiline_tag=>""
		}

		path=> "C:/xxx/xxx/xxx/xxxx/result.json"
		start_position=>"beginning"
		sincedb_path => "NUL"
		sincedb_write_interval =>5
	}
}
filter{

}


output {
    elasticsearch {
        hosts => "localhost:9200"
		index => "pythonfile"
    }
	stdout {codec=>rubydebug}
}

I tried with codec => "json" or "json_lines" but get the same problem: Logstash is running but don't import the file.

[DEBUG] 2022-05-18 16:02:34.800 [pool-9-thread-1] cgroup - One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
[DEBUG] 2022-05-18 16:02:35.369 [pool-9-thread-1] jvm - collector name {:name=>"G1 Young Generation"}
[DEBUG] 2022-05-18 16:02:35.369 [pool-9-thread-1] jvm - collector name {:name=>"G1 Old Generation"}
[DEBUG] 2022-05-18 16:02:38.455 [logstash-pipeline-flush] PeriodicFlush - Pushing flush onto pipeline.

Can you help me to fix the problem? Thank you!

If you enable log.level TRACE then the filewatch module in the file input will log messages about whether it found the file etc.

Thank you for response.
When I enable log.level TRACE, I have this :

[TRACE] 2022-05-19 00:12:45.036 [[main]<file] processor - process_active no change {:path=>"result.txt"}
[DEBUG] 2022-05-19 00:12:46.013 [pool-3-thread-1] cgroup - One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
[TRACE] 2022-05-19 00:12:46.039 [[main]<file] processor - process_closed
[TRACE] 2022-05-19 00:12:46.040 [[main]<file] processor - process_ignored
[TRACE] 2022-05-19 00:12:46.040 [[main]<file] processor - process_delayed_delete
[TRACE] 2022-05-19 00:12:46.041 [[main]<file] processor - process_restat_for_watched_and_active
[TRACE] 2022-05-19 00:12:46.041 [[main]<file] processor - process_rotation_in_progress
[TRACE] 2022-05-19 00:12:46.041 [[main]<file] processor - process_watched
[TRACE] 2022-05-19 00:12:46.042 [[main]<file] processor - process_active

NB: I have changed the initial file (result.json) to result.txt

There should be a lot more than that. Enough that you may need to use a file sharing site (pastebin.com, gist.github.com or anywhere similar).

Ok, You are right
Here is the link logleveltrace

OK, so it found the file

handling: {:new_discovery=>true, :watched_file=>"<FileWatch::WatchedFile: @filename='result.txt', @state=:watched, @recent_states=[:watched], @bytes_read=0, @bytes_unread=0, current_size=2199147, last_stat_size=2199147, file_open?=false, @initial=true, sincedb_key='1243747818-162593-11796480 0 0'>"}

and knows it has to read 2.2 MB from it. So it reads the entire 2.2 MB into memory and is still waiting to see the end of the first line.

buffer_extract: a delimiter can't be found in current chunk, maybe there are no more delimiters or the delimiter is incorrect or the text before the delimiter, a 'line', is very large, if this message is logged often try increasing the file_chunk_size setting. {"delimiter"=>"\n", "read_position"=>2195456, "bytes_read_count"=>3691, "last_known_file_size"=>2199147, "file_path"=>"C:/data/result.txt"}

1 Like

Good news!
When I add delimiter=>"\n", the index is created. But I think I missed something: only 3 results are in the index.

I am surprised you get 3. If it was a single line then I would have expected a single 2.2 MB event.

The index store size is 2.58 MB and it has 3 docs count

The codec logs some messages at log level DEBUG. Taking a look at what those messages say might tell you something.

1 Like

Okay, thank you very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.