Logstash problem with images

Betorov · June 15, 2021, 1:59pm

Hi evrey one, I'am new with elasticsearch and logstash.
The problem:
I want to use logstash to obtain all image of a file system structure, (I don't need the immage i only need the informetion (Like: path, name)) but when logstash take an immage he create and send to elasticsearch multiple information (for the same image). I want to send only one time.
I tryed

drop {}

but it skip evreything and no information go in elasticsearch

Logstash rubydebug:

             "path" => [
        [0] "C:",
        [1] "Users",
        [2] "U1",
        [3] "Desktop",
        [4] "porva",
        [5] "lkml",
        [6] "1.jpg"
    ],
    "shortHostname" => "C:",
             "host" => "PC3130",
    "Link_for_file" => "C:\\Users\U1\Desktop\\porva\\lkml\\1.jpg",
         "NameFile" => "1.jpg",
         "@version" => "1",
       "@timestamp" => 2021-06-15T13:46:51.142Z
}
{
             "path" => [
        [0] "C:",
        [1] "Users",
        [2] "U1",
        [3] "Desktop",
        [4] "porva",
        [5] "lkml",
        [6] "1.jpg"
    ],
    "shortHostname" => "C:",
             "host" => "PC3130",
    "Link_for_file" => "C:\\Users\\U1\\Desktop\\porva\\lkml\\1.jpg",
         "NameFile" => "1.jpg",
         "@version" => "1",
       "@timestamp" => 2021-06-15T13:46:51.158Z
}
{
             "path" => [
        [0] "C:",
        [1] "Users",
        [2] "U1",
        [3] "Desktop",
        [4] "porva",
        [5] "lkml",
        [6] "1.jpg"
    ],
    "shortHostname" => "C:",
             "host" => "PC3130",
    "Link_for_file" => "C:\\Users\\U1\\Desktop\\porva\\lkml\\1.jpg",
         "NameFile" => "1.jpg",
         "@version" => "1",
       "@timestamp" => 2021-06-15T13:46:51.166Z
}

(he sends to many time the same information).
There is some way i can skip all file (read only the name of the files?). And send information to elasticsearch only one time?

Sorry for my english

warkolm · June 17, 2021, 2:08am

Welcome to our community!

It'd be helpful if you posted your config

Betorov · June 17, 2021, 6:52am

Thanks @warkolm for your reply.
My config is this:

input {


file {
	path => "C:/Users/U1/Desktop/porva/*/*"
	start_position => "beginning"
    #end position don't work :(
	#mode => "read"
 	 
	} 
}

filter {


	mutate	
		{
		split=> {"path"=>"/"}		
	    add_field => { "shortHostname" => "%{[path][0]}" }
	    add_field => { "NameFile"=> "%{[path][6]}" }
		add_field => { "Link_for_file" => "%{[path][0]}\%{[path][1]}\%{[path][2]}\%{[path][3]}\%{[path][4]}\%{[path][5]}\%{[path][6]}" }
		remove_field => [ "message"]
		} 
ruby {
  init => "@c = 0"
  code => "
  @c += 1
  event.set('count', @c)
  "
}
}
output {
    if [count]==2
        {
	    stdout { codec => rubydebug }
	    elasticsearch {hosts => [ "localhost:9200" ]	}
        }
}

Whit the counter c I wanted to send only the first value . The problem is that logstash send multiple time the data with the count==1, somtime it send only one or send nothing.
How ? It shouldn't do that. The counter is slower then the logstash reading?

Betorov · June 17, 2021, 12:11pm

I notice that in the example there is [count]==1 but is the same thing. The behavior don t change if i put [count]==2 (is the same)

Betorov · June 17, 2021, 12:12pm

input {


file {
	path => "C:/Users/U1/Desktop/porva/*/*"
	start_position => "beginning"
    #end position don't work :(
	#mode => "read"
 	 
	} 
}

filter {


	mutate	
		{
		split=> {"path"=>"/"}		
	    add_field => { "shortHostname" => "%{[path][0]}" }
	    add_field => { "NameFile"=> "%{[path][6]}" }
		add_field => { "Link_for_file" => "%{[path][0]}\%{[path][1]}\%{[path][2]}\%{[path][3]}\%{[path][4]}\%{[path][5]}\%{[path][6]}" }
		remove_field => [ "message"]
		} 
ruby {
  init => "@c = 0"
  code => "
  @c += 1
  event.set('count', @c)
  "
}
}
output {
    if [count]==1
        {
	    stdout { codec => rubydebug }
	    elasticsearch {hosts => [ "localhost:9200" ]	}
        }
}

Now it's correct

Badger · June 17, 2021, 5:28pm

Note that if you have multiple CPUs, by default logstash will run a worker thread on each CPU, and that means there will be as many instances of that ruby filter as there are CPUs. Each maintaining @c independently. That code will only work reliably if you set pipeline.workers to 1.

Betorov · June 18, 2021, 6:13am

Thanks @Badger for your reply.

The pipeline.workers ==1 is set by default. I manually set to 1 but nothing changed

Betorov · June 18, 2021, 6:18am

Maybe there's a way to read only few bytes of data (in a file), the effect would be the same

Betorov · June 18, 2021, 6:56am

I was wrong, I set pipline.workers to 1 and now it works but the counter doesn't reset when logstash starts reading a new file.

I tried this but the ruby filter is faster than the mutate filter (the mutate filter create the field Link_for_file)

ruby {
  init => "@c = 0
	   @help=[Link_for_file]"
  code => "
	
	if  @help !=[Link_for_file] 	
	@c=0
	@help=[Link_for_file]
	end

  @c += 1
  event.set('count', @c)
}

so it gives this error

[logstash.javapipeline    ][main] Pipeline error {:pipeline_id=>"main", :exception=>#<NameError: uninitialized constant

Betorov · June 18, 2021, 9:37am

I found a solution :

	ruby {
  		init => '@c=0
		@name_file=""
		'
  		
		code => '
		if @c==0		
		@name_file = event.get("Link_for_file")
		end
		
		if @name_file != event.get("Link_for_file")
		@c=1
		@name_file = event.get("Link_for_file")	
		end
		
		@c += 1
  		event.set("debug", @name_file)
		event.set("count",@c)
		'
	     }

It's not the best solution (I don't need the counter) but it works.
It has 2 defect :
-it's slower (it can work only if I use 1 cpu)
-read all the file but take only 1 entry (i waste time reading all the file, it's better if I read only the file name or only the first line (but I don't know how ))

Thanks for your help @warkolm @Badger.

Betorov · June 24, 2021, 7:06am

Hi @Badger .
I have a question: when I was reading the file with the file input and set the piplines worker to 1 everythings work fine. But when I'am reading the file with filebeat and then give to logstash the pipline workers are ignored and logstash start counting duble (multiple CPU). Why? The information that arrive from filebeat are maneged in different way?
You said before that each cpu use an instance of @c there is a way to use one globale varable of each cpu. In this article seems not possible Global variable? - #2 by Chemse. Can I use the field (like count) to save the object globaly (accessible by all cpu)?

Betorov · June 24, 2021, 7:29am

I have noticed that file beat send multiple file at the same time (it start another one when I didn't finish the first one). Is that right? If yes there is a way to avoid it?
harverst_limit work fine with close_eof but is very slow there is a way to incrise his speed ?

system · July 22, 2021, 7:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch - Logstash multiple messages Logstash	2	757	July 6, 2017
Logstash reading the file line multiple times Logstash	2	582	July 23, 2019
Files from logstash to elasticsearch multiplied Logstash	2	269	August 27, 2019
FIlebeat-Redis-Logstash : Filebeat fast and Logstah slow, logstash threading? Logstash	19	3820	February 10, 2017
Logstash pipeline: events get triplicated Logstash	1	697	July 6, 2017

Logstash problem with images

Related topics