How to import data to elasticsearch?

Hi, All
I want to know how to import data to elasticsearch? Now I have the follow sence:
1, the data size is very small, eg: the import date size is 10MB/s;
2, the data size is very big, eg: the import data size is 100TB or 1PB per day;

For my opinion, I think the data source is very import.
When the data is now store into database such as the mysql/oracle or mongo, we should use the sync method,
such as: logstash-input-jdbc, mongo-connector sync data.

When the data is not store into database, it just crawle from website through web Crawler.
If the data format is JSON, I think we can use logstash import data into elasticsearch.
But when the data is not JSON, how to import data into elasticsearch? I don't konw.

The above is my opinion, maybe it's not correct.
If you have the detail demo, please tell me the website link address.
I want to know the correct method, any help could be sincerely appreciate.
Thank you very much!

1 Like

IMO it depends on the data you have.
If you have JSon files or XML files on your disk (structured data), you can use FSCrawler project.
If you have non structured files on disk (PDF, oOo...) you can use FSCrawler as well.

You can also write a shell script (find) which cat the content of any file to Logstash, then define in Logstash the pipeline you want to use.

What kind of files do you have?

Hi, @dadoonet:
Thanks for your reply.
My file type is like *.log recently, and I refer to the follow document configure
https://www.elastic.co/guide/en/logstash/current/advanced-pipeline.html

But when I to running, The follow error apper:
[root@laoyang bin]# ./logstash -f ./logstash_conf/first-pipeline.conf
Settings: Default pipeline workers: 16
Connection refused {:class=>"Manticore::SocketException", :level=>:error}
Pipeline main started

So, I want to know the correct config of input and output?

now, my configure is:
[root@laoyang bin]# cat ./logstash_conf/
first-pipeline.conf logstash-tutorial-dataset.log shakespare.json
[root@laoyang bin]# cat ./logstash_conf/first-pipeline.conf
input {
file {
path => "/opt/logstash/bin/logstash_conf/logstash-tutorial-dataset.log"
start_position => beginning
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
geoip {
source => "clientip"
}
}
output {
elasticsearch {}
stdout {}
}

I moved your question to #logstash

Connection refused {:class=>"Manticore::SocketException", :level=>:error}

Is Elasticsearch running on localhost?

Hi, @magnusbaeck:
Thanks for your reply, My Elasticsearch is now running on localhost, and my hostname is "laoyang"

[root@laoyang lib]# hostname
laoyang

In elasticsearch.yml:
cluster.name: my-application

ok, tks.

My Elasticsearch is now running on localhost, and my hostname is "laoyang"

And you're still getting "connection refused" in Logstash?