Load Elasticsearch from mongoDB backup


(Martin) #1

I have been provided a mongoDB export .json file by a client, and they want o load this into elasticsearch (ES) to perform testing. They have a process currently in place to extract data from MySQL and load it into mongo, so I expected there to be an easy way to take a mongo extract and load it into ES

I tried installing river with the mapper attachment, but I have now learned that "river" is deprecated. I have ES2.2.0 installed and running (with Kibana), and I have loaded data from MySQL into the node. I also have mongoDB2.6 installed and functional, and the 2 backups have been imported. CentOS7 instance.

I have created the mappings and analyzers I need for ES, and I just need to load the mongoDB data into it. it is 2 indexes that are very large. Surely there is some easy, straight-forward way to load the 2 mongoDB .json files into ES; or, do I need to create custom code to do this?

I would like for logstash to be the solution here, but I cannot get access to the production server where mongo resides to see and understand the setup I need for FileBeat.

If someone would just point me in the right direction it would be much appreciated.

Thanks


(David Pilato) #2

If you have a JSON file as an input, I'd probably give logstash a try. If the file has one single JSON document per line, that'd be super easy.

If you have multiple JSON files (one per document), you can also have a look at fscrawler.

FileBeat can help to read also the source file and send it to logstash but here the following command might be equivalent:

cat mybackup.json | bin/logstash -f myconfig.conf

Writing your own script might be also a good option.


(Martin) #3

Thank You Sir.

It is 1 document per line, and I do already have Logstash installed. I thought the backups were just to big to begin with, so I culled it down to about 10 documents. However, my mappings don't match the mongo format (and I don't want them to), so a curl will not work. I hope Logstash will. I was about to start on a logstash config file, and I just wanted to be sure there wasn't some other way I missed in my google searching. With all of the information available, the proper google search has serious meaning. :smile:

Best Regads


(Martin) #4

Still having trouble with this. I had logstash running as a service, so I stopped it. Should I have done this?

I created the following logstash config file and placed it in the /opt/logstash directory. Should it be in the /etc/logstash/conf.d directory? (filter section has nothing, so not included below).

input 
{
    file 
    {
        path => "/data/mongoexport/log.json"
        start_position => beginning        
    }
}
output
{ 
  elasticsearch {    
    codec => json
	hosts => ["localhost:9200"]
	index => "myesindex"
	document_type => "myestype"    
  }
  stdout { }
}

I did the --configtest. OK
I tried your exact command (with my json file and config. It starts logstash and gives the following message:

Settings: Default filter workers: 2
Logstash startup completed

but it just sits and no data is being loaded into elasticsearch. I, at least, expected to see something in the stdout file. I eventually CTRL-C out of it to try something else. I know from the simple test on the site, it is expecting some kind of input as it sits (hello world); so, is there something else I need to do to get it started, or do you see something wrong with my config?

I can't find anything in any /var/log files, so I am a dumbfounded newbie.

Thanks


(Mark Walkom) #5

It's because of since DB.

You are better off using stdin, then catting the file directly into a LS call.


(Martin) #6

Thanks Warkolm!

That definitely landed it to stdout, but nothing in elasticsearch.

It eventually went into
retrying failed action with response code: 503 {:level=>:warn}
_cat/health is
yellow 1 1 6 6 0 0 11 0 - 35.3%

I will work on my elasticsearch "output" config and try to find a log of this somewhere.
It looks like I also have some junk data. Imagine that!

Regards


(Martin) #7

removing

index => "myesindex"
document_type => "myestype"

did load it into a logstash index. Though it is not what I want, it is progress.

I have to assume that my existing mapping for the index/type I want it in is not allowing it. I guess I need to do some filter converting.

I think the 2 config output parms are good for use?


(system) #8