Logstash : ingestion via Grok pattern does not work

Hello everybody,

I wish to ingest some CSV and texts in Elasticsearch via Logstash. Problem is than Logstash generates a unique field named « message » with the text line inside. I would like to split this text line into few columns. Someone gave me the tip to use Grok pattern insted of split and mutate system. So I tried to use a Grok pattern but it does not work.

I’ve seen that syntax is : %{SYNTAX :SEMANTIC}

Is there a list of keyword about SYNTAX somewhere ?

Here you are two examples of test files I want to ingest :

1234;unmot

1234;un mot;;UN AUTRE MOT;PA ;;14/02567/AB/167;

This is the current Logstash configuration

input {
  file {
		path => "C:/logs/*.txt"
		start_position => "beginning"
		sincedb_path => "NULL"
	}
}
filter {
	grok {
		match => {
			"message" => "%{NUMBER:col0};%{WORD:col1}" 
		}
	}
}
output {
 elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

This is the log content :

[2020-02-13T14:35:46,738][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-02-13T14:35:46,845][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.5.1"}
[2020-02-13T14:35:48,795][INFO ][org.reflections.Reflections] Reflections took 38 ms to scan 1 urls, producing 20 keys and 40 values 
[2020-02-13T14:35:50,239][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2020-02-13T14:35:50,423][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2020-02-13T14:35:50,471][INFO ][logstash.outputs.elasticsearch][main] ES Output version determined {:es_version=>7}
[2020-02-13T14:35:50,479][WARN ][logstash.outputs.elasticsearch][main] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2020-02-13T14:35:50,547][INFO ][logstash.outputs.elasticsearch][main] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2020-02-13T14:35:50,607][INFO ][logstash.outputs.elasticsearch][main] Using default mapping template
[2020-02-13T14:35:50,703][INFO ][logstash.outputs.elasticsearch][main] Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1, "index.lifecycle.name"=>"logstash-policy", "index.lifecycle.rollover_alias"=>"logstash"}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}
[2020-02-13T14:35:50,835][WARN ][org.logstash.instrument.metrics.gauge.LazyDelegatingGauge][main] A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been create for key: cluster_uuids. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[2020-02-13T14:35:50,843][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["C:/logstash-file-read.conf"], :thread=>"#<Thread:0x154d1065 run>"}
[2020-02-13T14:35:51,507][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2020-02-13T14:35:51,559][INFO ][filewatch.observingtail  ][main] START, creating Discoverer, Watch with file and sincedb collections
[2020-02-13T14:35:51,567][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2020-02-13T14:35:52,227][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Thankyou for your help

Hi @romainfoulono,

the GROK pattern seems fine. I have had problems getting Logstash to "pick up" files, especially if they exist already when Logstash starts...

Lately I have used Filebeat anytime I want to ingest from a file. There is even a CSV module for Filebeat :slight_smile:

There is also a CSV filter for Logstash

Hope that helps,
AB

1 Like

There is patterns directory somewhere under the logstash install directory that contains multiple text files that define patterns. Also...

sincedb_path => "NULL"

If you do not want the file input to persist the sincedb to disk when it stops you should use the value "NUL", not "NULL".

1 Like

Hi both of you
@Badger

I guess this is the path you were talking about :

C:\logstash-7.5.1\vendor\bundle\jruby\2.5.0\gems\logstash-patterns-core-4.1.2\patterns

In this folder, there is a file named grok-patterns.

@A_B
Okay and I think I'm going to use conditions to treat different kind of files.
I have a question, could we ingest an entire file and not only line by line into ES by Filebeat ? I mean, if I need to ingest an entire log file and I want ES to recognize all of its lines as one distinct file, so to have a single record containing the entire content from the file. Is it possible ?

Thank you

This depends a bit on the structure of the file... We do handle many "multiline" logs, like JAVA stack traces with Filebeat. Not sure if these is any limits to single document size in ES...

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.