Hi, All
I want to know how to import data to elasticsearch? Now I have the follow sence:
1, the data size is very small, eg: the import date size is 10MB/s;
2, the data size is very big, eg: the import data size is 100TB or 1PB per day;
For my opinion, I think the data source is very import.
When the data is now store into database such as the mysql/oracle or mongo, we should use the sync method,
such as: logstash-input-jdbc, mongo-connector sync data.
When the data is not store into database, it just crawle from website through web Crawler.
If the data format is JSON, I think we can use logstash import data into elasticsearch.
But when the data is not JSON, how to import data into elasticsearch? I don't konw.
The above is my opinion, maybe it's not correct.
If you have the detail demo, please tell me the website link address.
I want to know the correct method, any help could be sincerely appreciate.
Thank you very much!
IMO it depends on the data you have.
If you have JSon files or XML files on your disk (structured data), you can use FSCrawler project.
If you have non structured files on disk (PDF, oOo...) you can use FSCrawler as well.
You can also write a shell script (find) which cat the content of any file to Logstash, then define in Logstash the pipeline you want to use.
But when I to running, The follow error apper:
[root@laoyang bin]# ./logstash -f ./logstash_conf/first-pipeline.conf
Settings: Default pipeline workers: 16
Connection refused {:class=>"Manticore::SocketException", :level=>:error}
Pipeline main started
So, I want to know the correct config of input and output?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.