Logstash problem with images

Hi evrey one, I'am new with elasticsearch and logstash.
The problem:
I want to use logstash to obtain all image of a file system structure, (I don't need the immage i only need the informetion (Like: path, name)) but when logstash take an immage he create and send to elasticsearch multiple information (for the same image). I want to send only one time.
I tryed

drop {}

but it skip evreything and no information go in elasticsearch

Logstash rubydebug:

             "path" => [
        [0] "C:",
        [1] "Users",
        [2] "U1",
        [3] "Desktop",
        [4] "porva",
        [5] "lkml",
        [6] "1.jpg"
    ],
    "shortHostname" => "C:",
             "host" => "PC3130",
    "Link_for_file" => "C:\\Users\U1\Desktop\\porva\\lkml\\1.jpg",
         "NameFile" => "1.jpg",
         "@version" => "1",
       "@timestamp" => 2021-06-15T13:46:51.142Z
}
{
             "path" => [
        [0] "C:",
        [1] "Users",
        [2] "U1",
        [3] "Desktop",
        [4] "porva",
        [5] "lkml",
        [6] "1.jpg"
    ],
    "shortHostname" => "C:",
             "host" => "PC3130",
    "Link_for_file" => "C:\\Users\\U1\\Desktop\\porva\\lkml\\1.jpg",
         "NameFile" => "1.jpg",
         "@version" => "1",
       "@timestamp" => 2021-06-15T13:46:51.158Z
}
{
             "path" => [
        [0] "C:",
        [1] "Users",
        [2] "U1",
        [3] "Desktop",
        [4] "porva",
        [5] "lkml",
        [6] "1.jpg"
    ],
    "shortHostname" => "C:",
             "host" => "PC3130",
    "Link_for_file" => "C:\\Users\\U1\\Desktop\\porva\\lkml\\1.jpg",
         "NameFile" => "1.jpg",
         "@version" => "1",
       "@timestamp" => 2021-06-15T13:46:51.166Z
}

(he sends to many time the same information).
There is some way i can skip all file (read only the name of the files?). And send information to elasticsearch only one time?

Sorry for my english

Welcome to our community! :smiley:

It'd be helpful if you posted your config :slight_smile:

1 Like

Thanks @warkolm for your reply.
My config is this:

input {


file {
	path => "C:/Users/U1/Desktop/porva/*/*"
	start_position => "beginning"
    #end position don't work :(
	#mode => "read"
 	 
	} 
}

filter {


	mutate	
		{
		split=> {"path"=>"/"}		
	    add_field => { "shortHostname" => "%{[path][0]}" }
	    add_field => { "NameFile"=> "%{[path][6]}" }
		add_field => { "Link_for_file" => "%{[path][0]}\%{[path][1]}\%{[path][2]}\%{[path][3]}\%{[path][4]}\%{[path][5]}\%{[path][6]}" }
		remove_field => [ "message"]
		} 
ruby {
  init => "@c = 0"
  code => "
  @c += 1
  event.set('count', @c)
  "
}
}
output {
    if [count]==2
        {
	    stdout { codec => rubydebug }
	    elasticsearch {hosts => [ "localhost:9200" ]	}
        }
}

Whit the counter c I wanted to send only the first value . The problem is that logstash send multiple time the data with the count==1, somtime it send only one or send nothing.
How ? It shouldn't do that. The counter is slower then the logstash reading?

I notice that in the example there is [count]==1 but is the same thing. The behavior don t change if i put [count]==2 (is the same)

input {


file {
	path => "C:/Users/U1/Desktop/porva/*/*"
	start_position => "beginning"
    #end position don't work :(
	#mode => "read"
 	 
	} 
}

filter {


	mutate	
		{
		split=> {"path"=>"/"}		
	    add_field => { "shortHostname" => "%{[path][0]}" }
	    add_field => { "NameFile"=> "%{[path][6]}" }
		add_field => { "Link_for_file" => "%{[path][0]}\%{[path][1]}\%{[path][2]}\%{[path][3]}\%{[path][4]}\%{[path][5]}\%{[path][6]}" }
		remove_field => [ "message"]
		} 
ruby {
  init => "@c = 0"
  code => "
  @c += 1
  event.set('count', @c)
  "
}
}
output {
    if [count]==1
        {
	    stdout { codec => rubydebug }
	    elasticsearch {hosts => [ "localhost:9200" ]	}
        }
}

Now it's correct

Note that if you have multiple CPUs, by default logstash will run a worker thread on each CPU, and that means there will be as many instances of that ruby filter as there are CPUs. Each maintaining @c independently. That code will only work reliably if you set pipeline.workers to 1.

1 Like

Thanks @Badger for your reply.

The pipeline.workers ==1 is set by default. I manually set to 1 but nothing changed

Maybe there's a way to read only few bytes of data (in a file), the effect would be the same

I was wrong, I set pipline.workers to 1 and now it works but the counter doesn't reset when logstash starts reading a new file.

I tried this but the ruby filter is faster than the mutate filter (the mutate filter create the field Link_for_file)

ruby {
  init => "@c = 0
	   @help=[Link_for_file]"
  code => "
	
	if  @help !=[Link_for_file] 	
	@c=0
	@help=[Link_for_file]
	end

  @c += 1
  event.set('count', @c)
}

so it gives this error

[logstash.javapipeline    ][main] Pipeline error {:pipeline_id=>"main", :exception=>#<NameError: uninitialized constant

I found a solution :

	ruby {
  		init => '@c=0
		@name_file=""
		'
  		
		code => '
		if @c==0		
		@name_file = event.get("Link_for_file")
		end
		
		if @name_file != event.get("Link_for_file")
		@c=1
		@name_file = event.get("Link_for_file")	
		end
		
		@c += 1
  		event.set("debug", @name_file)
		event.set("count",@c)
		'
	     }

It's not the best solution (I don't need the counter) but it works.
It has 2 defect :
-it's slower (it can work only if I use 1 cpu)
-read all the file but take only 1 entry (i waste time reading all the file, it's better if I read only the file name or only the first line (but I don't know how :expressionless:))

Thanks for your help @warkolm @Badger.

Hi @Badger .
I have a question: when I was reading the file with the file input and set the piplines worker to 1 everythings work fine. But when I'am reading the file with filebeat and then give to logstash the pipline workers are ignored and logstash start counting duble (multiple CPU). Why? The information that arrive from filebeat are maneged in different way?
You said before that each cpu use an instance of @c there is a way to use one globale varable of each cpu. In this article seems not possible Global variable? - #2 by Chemse. Can I use the field (like count) to save the object globaly (accessible by all cpu)?

I have noticed that file beat send multiple file at the same time (it start another one when I didn't finish the first one). Is that right? If yes there is a way to avoid it?
harverst_limit work fine with close_eof but is very slow there is a way to incrise his speed ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.