Running logstash properly

retxedue · March 15, 2018, 3:30am

I have installed my logstash using rpm and enabled it using systemctl. I have to stash a csv file and create table in grafana daily. Do i have to run bin/logstash -f /etc/logstash/conf.d/logstash.config via cron if i want to stash sa csv file hourly?. I am not seeing any content my grafana when i am about to add a table panel.

2018-03-15_12-12-29

index are getting created.

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana _G9Bg9S8SruqNG12CBrU2g 1 0 2 1 16kb 16kb
yellow open implementationcrq-2018.03.15 H7B50zQZQsuwHsx5XHK2pA 5 1 43 0 88.4kb 88.4kb
yellow open pendingcrq-2018.03.15 Nnb8_XnVQQmxPMzpBJO9tA 5 1 2 0 19.1kb 19.1kb
yellow open completedcrq-2018.03.15 5FIT8AxsSfKry22xKEleHQ 5 1 16 0 85.8kb 85.8kb
yellow open closedcrq-2018.03.15 ecf0hIpZSUCObNR7hDZQKg 5 1 53 0 129.7kb 129.7kb

magnusbaeck · March 17, 2018, 8:13pm

You can keep Logstash running all the time and configure a file input to read *.csv or whatever the files are named. When you want Logstash to process a file just copy it to the directory that Logstash is monitoring.

retxedue · March 19, 2018, 2:10am

i have created a cron to run my logstash config every 3mins. Is that allowed? in my logs i am seeing this.

[2018-03-18T22:03:11,256][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-03-18T22:03:11,262][FATAL][logstash.runner ] Logstash could not be started because there is already another instance using the configured data directory. If you wish to run multiple instances, you must change the "path.data" setting.
[2018-03-18T22:03:11,265][ERROR][org.logstash.Logstash ] java.lang.IllegalStateException: org.jruby.exceptions.RaiseException: (SystemExit) exit
[2018-03-18T22:06:11,320][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}

retxedue · March 19, 2018, 2:43am

Should enable systemctl for logstash service rather than running the logstash using cron?

magnusbaeck · March 19, 2018, 7:11am

Yes, run Logstash as a background service instead and follow the advice I gave earlier. Logstash doesn't shut down when it has processed all data via the file input so starting it every three minutes doesn't make sense.

retxedue · March 19, 2018, 10:01am

Thanks. Follow up question. I am receiving this error in logs "the type event field won't be used to determine the document _type"

what's wrong in my logstash config?

input {
file {
type => "samscrq"
path => "/samba/CRQ_reports/samscrq.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_samscrq"
}
file {
type => "pending"
path => "/samba/CRQ_reports/pending.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_pending"
}
file {
type => "implementation"
path => "/samba/CRQ_reports/implementation.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_implementation"
}
file {
type => "completed"
path => "/samba/CRQ_reports/completed.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_completed"
}
file {
type => "closed"
path => "/samba/CRQ_reports/closed.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_closed"
}
}
filter {
if [type] == "samscrq" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
if [type] == "completed" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
if [type] == "closed" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
if [type] == "implementation" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
if [type] == "pending" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
}
output {
if [type] == "samscrq" {
elasticsearch {
hosts => ["146.40.233.10:9200"]
index => "samscrq"
}
}
if [type] == "completed" {
elasticsearch {
hosts => ["146.40.233.10:9200"]
index => "completed"
}
}
if [type] == "closed" {
elasticsearch {
hosts => ["146.40.233.10:9200"]
index => "closed"
}
}
if [type] == "implementation" {
elasticsearch {
hosts => ["146.40.233.10:9200"]
index => "implementation"
}
}
if [type] == "pending" {
elasticsearch {
hosts => ["146.40.233.10:9200"]
index => "pending"
}
}
}

magnusbaeck · March 19, 2018, 10:42am

It means exactly what it says. The value of the type field will no longer by default be used to determine the type of the documents. If this really is what you want you can use

document_type => "%{type}"

in your elasticsearch output configuration. Here's the relevant code:

github.com

logstash-plugins/logstash-output-elasticsearch/blob/v9.0.3/lib/logstash/outputs/elasticsearch/common.rb#L208-L225


def get_event_type(event)
  # Set the 'type' value for the index.
  type = if @document_type
           event.sprintf(@document_type)
         else
           if client.maximum_seen_major_version < 6
             event.get("type") || DEFAULT_EVENT_TYPE
           else
             DEFAULT_EVENT_TYPE
           end
         end


  if !(type.is_a?(String) || type.is_a?(Numeric))
    @logger.warn("Bad event type! Non-string/integer type value set!", :type_class => type.class, :type_value => type.to_s, :event => event)
  end


  type.to_s
end

retxedue · March 19, 2018, 11:09am

what should be my correct approach? this is the scenario. every 2 hours a CSV file is dumped in my samba directory. I have a python scrupt parses the CSV file and create a new CSV file (pending.csv, completed.csv and implementation.csv). From this new input files i would like to create 3 different indeces that i will be using for my table panel in grafana. I want to display this in my dashboard with table panels.

retxedue · March 19, 2018, 11:23am

i have update i used type instead of type. but still same error "the type event field won't be used to determine the document _type"

input {
file {
type => "samscrq"
path => "/samba/CRQ_reports/samscrq.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_samscrq"
}
file {
type => "pending"
path => "/samba/CRQ_reports/pending.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_pending"
}
file {
type => "implementation"
path => "/samba/CRQ_reports/implementation.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_implementation"
}
file {
type => "completed"
path => "/samba/CRQ_reports/completed.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_completed"
}
file {
type => "closed"
path => "/samba/CRQ_reports/closed.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_closed"
}
}
filter {
if [type] == "samscrq" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
if [type] == "completed" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
if [type] == "closed" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
if [type] == "implementation" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
if [type] == "pending" {
csv {
separator => ","
columns => ["change_id","summary","Status","start_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
}
}
output {
if [type] == "samscrq" {
elasticsearch {
hosts => ["XXX.XX.XXX.10:9200"]
index => "samscrq"
}
}
if [type] == "completed" {
elasticsearch {
hosts => ["XXX.XX.XXX.10:9200"]
index => "completed"
}
}
if [type] == "closed" {
elasticsearch {
hosts => ["XXX.XX.XXX.10:9200"]
index => "closed"
}
}
if [type] == "implementation" {
elasticsearch {
hosts => ["XXX.XX.XXX.10:9200"]
index => "implementation"
}
}
if [type] == "pending" {
elasticsearch {
hosts => ["XXX.XX.XXX.10:9200"]
index => "pending"
}
}
}

magnusbaeck · March 19, 2018, 12:03pm

i have update i used type instead of type.

What?

but still same error "the type event field won't be used to determine the document _type"

It's not an error, it's a warning. If you don't want to see the warning you can set document_type to something.

retxedue · March 19, 2018, 12:13pm

I see. I thought its already an error and config was not good at all.

retxedue · March 19, 2018, 12:14pm

The config earlier was using tag it was supposedly type sorry for that

guyboertje · March 19, 2018, 9:26pm

What is your python script doing to the incoming CSV file? Is it using the Status field to decide which file to write the line to?
If so, then Logstash can do all of what you want to do in a single pass of the incoming file.
I am sure you can use Logstash functions to send each event to the correct index on-the-fly using string interpolation.
Perhaps:

input {
  file {
    type => "samscrq"
    path => "/samba/CRQ_reports/incoming.csv"
    start_position => "beginning"
    sincedb_path => "/opt/sincedb/.sincedb_crq"
  }
}
filter {
  csv {
    separator => ","
    columns => ["change_id","summary","status","start_date","coordinator"]
    skip_empty_columns => true
    skip_empty_rows => true
  }
  mutate {
    lowercase => ["status"]
  }
}
output {
  elasticsearch {
    hosts => ["XXX.XX.XXX.10:9200"]
    index => "%{status}"
  }
}

retxedue · March 20, 2018, 1:40am

yes it is using the status field to decide which to write the line to. I'll try to use this logstash config. I'll let you know of the outcome. Thanks

retxedue · March 20, 2018, 1:53am

I tried running below config. see below pic i am seeing this type of indeces.

input {
file {
type => "samscrq"
path => "/samba/CRQ_reports/*.csv"
start_position => "beginning"
sincedb_path => "/opt/sincedb/.sincedb_crq"
}
}
filter {
csv {
separator => ","
columns => ["change_id","summary","status","start_date","end_date","coordinator"]
skip_empty_columns => true
skip_empty_rows => true
}
mutate {
lowercase => ["status"]
}
}
output {
elasticsearch {
hosts => ["XXX.XX.XXX.XX:9200"]
index => "%{status}"
}
}

guyboertje · March 20, 2018, 3:32am

If you replace the elastics watch output with
stout { codec => { rubydebug }}
What does one event look like, copy paste it here.

system · April 17, 2018, 3:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.