Ingesting .doc .txt .csv .sql and some others

Hi everyone,

I have a question. What would be the fastest way to ingest data? I have a folder that contains a variety of file formats including .doc, .txt, .csv and .sql. I have roughly 30 gigs of files like this that I would like to be able to search on Kibana.

Any help would be great thank you.

At the moment there is no single way to get all that data into Elasticsearch;

  • Workplace Search can handle the .doc files
  • CSV = Filebeat or Logstash
  • .sql is text, but the custom format means you'd need to figure out exactly how you want to search these
  • .txt is similar, it depends on what is in them and how you want to search them

Awesome so I am on Ubuntu 20 LTS Server and I wrote this as my config file.

    input {
			file {
				path => "/media/alex/XXXX/converted/*/*/*.csv"
				start_position => "beginning"
				sincedb_path => "/dev/null"
	}
}


filter {
csv {
separator => ","
columns => [ ""]
}

output {
	elasticsearch {
	hosts => "localhost"
	index => "docs"
}

stdout {}

}

Then I ran
$ sudo bin/logstash -f /usr/share/logstash/csv.conf

and I get the error of

WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
ERROR: Unknown command 'sudo'

See: 'bin/logstash --help'
[ERROR] 2020-08-31 03:49:03.102 [main] Logstash - java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit

So then I try to run
sudo -u logstash /usr/share/logstash/bin/logstash --path.settings=/etc/logstash -f /usr/share/logstash/csv.conf

and I get error
Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties
[2020-08-31T03:51:15,351][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.9.0", "jruby.version"=>"jruby 9.2.12.0 (2.5.7) 2020-07-01 db01a49ba6 OpenJDK 64-Bit Server VM 25.265-b01 on 1.8.0_265-8u265-b01-0ubuntu2~20.04-b01 +indy +jit [linux-x86_64]"}
[2020-08-31T03:51:15,898][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-08-31T03:51:17,164][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of [ \t\r\n], "#", "{" at line 9, column 3 (byte 120) after filter\n\t\t", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:32:in compile_imperative'", "org/logstash/execution/AbstractPipelineExt.java:183:in initialize'", "org/logstash/execution/JavaBasePipelineExt.java:69:in initialize'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:44:in initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:52:in execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:357:in block in converge_state'"]}
[2020-08-31T03:51:17,466][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2020-08-31T03:51:22,558][INFO ][logstash.runner ] Logstash shut down.
[2020-08-31T03:51:22,575][ERROR][org.logstash.Logstash ] java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit

any ideas what I am doing wrong?

You seem to be missing a final closing bracket on your filter section.

Yea it didnt copy over for some reason. Any ideas what why it wont execute?

Did you add the bracket? Is there a different error?

I added the bracket. Seems to be an issue with Java

If there is a different outcome, posting the log would be helpful.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.