Use the same connection for multiple Logstash configurations


(André Silva) #1

I'm using Logstash 2.4.1 to load data to Elasticsearch 2.4.6.
I have the following Logstash config:

input {
	jdbc {
		jdbc_connection_string => "jdbc:oracle:thin:@database:1521:db1"
	    jdbc_user => "user"
    	jdbc_password => "password"
		jdbc_driver_library => "ojdbc6-11.2.0.jar"
	    jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
    	parameters => { "id" => 1 }
		statement => "SELECT modify_date, userName from user where id = :id AND modify_date >= :sql_last_value"

	    schedule => "*/1 * * * *"
    	tracking_column => modify_date
	}
}
output {
	elasticsearch { 
	    hosts => ["localhost:9200"]
    	index => "index1"
		document_type  => "USER"
	}
    stdout { codec => rubydebug }
}

So, for each minute, it goes to the database to check if there is new data for Elastic.
It works perfectly, but there is one problem:
We have around 100 clients, and they are all in the same database instance.

That means I have 100 scripts and will have 100 instances of Logstash running, meaning 100 open connections:

nohup ./logstash -f client-1.conf Logstash startup
nohup ./logstash -f client-2.conf Logstash startup
nohup ./logstash -f client-3.conf Logstash startup
nohup ./logstash -f client-4.conf Logstash startup
nohup ./logstash -f client-5.conf Logstash startup
and so on...

This is just bad.

Is there any way I can use the same connection for all my scripts ?
The only difference between all those scripts is the parameter id and the index name, each client will have a diferent id and a different index:

parameters => { "id" => 1 }
index => "index1"

Any ideas ?


(Elastic-for-me) #2

I am not expert in this but can you just pull all user,modify_date from table user and that will pull all 100 record in to elasticsearch. then you can check modify date via kibana?


(Magnus Bäck) #3

Just select all rows (i.e. drop the id = :id condition in the query), include the id column in the SELECT clause, and reference the customer id in the output configuration:

index => "index%{id}"

(It's most likely not a great idea to have a separate index for each customer. Make sure you know what you're doing.)


(André Silva) #4

Hi Magnus.

Yeah, I can do that, I thought about many indexes because I could separate them, making each index smaller and making searches faster.
Why do you think it is not a great idea to use separate index ?


(Magnus Bäck) #5

Indexes have a fixed memory overhead so you'll waste resources if you have too many of them. What gives the best performance depends on a lot of factors and you shouldn't assume that greater separation is necessarily advantageous.


(André Silva) #6

Makes sense... I'll try that and let you know, thanks!


(André Silva) #7

Worked like a charm, no performance issues.
Thanks!


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.