MongoDB River Plugin 1.1.0


MongoDB River Plugin 1.1.0 has just been released [0].
MongoDB River will now work with the new Elasticsearch 0.19.0.

To install it use:
plugin.bat -install elasticsearch/elasticsearch-mapper-attachments/1.2.0
plugin.bat -install richardwilly98/elasticsearch-river-mongodb/1.1.0

[0] -


I am not sure what I am doing wrong. I can't get the river to replicate
from Mongo to ElasticSearch.

Here is what I am doing.

In Mongo:

Database: cm
Collection: screens

  1. Installed Mongo 2.0.3 and ElasticSearch 0.19.0.
  2. Install ElasticSearch plugins

plugin.bat -install elasticsearch/elasticsearch-mapper-attachments/1.2.0

plugin.bat -install richardwilly98/elasticsearch-river-mongodb/1.1.0

  1. Changed mongo and elastic search logging to debug at root level
  2. Started ElasticSearch (./elasticsearch)
  3. Started mongo (./mongod -replSet funWithOplogs)
  4. In mongo console I execute 'rs.initiate()'
  5. Using curl I add the river

curl -PUT http://localhost:9200/_river/mongo/_meta -d


type: "mongodb",

mongodb : {db: "cm", collection: "screens"},

index: {name: "mongoindex", type: "screens"}


  1. I insert data into the mongo database 'cm' and collection 'screens'
  2. Verify that the data is in mongo
  3. Query ElasticSearch

curl -XGET localhost:9200/cm/screens/_search?pretty

  1. I get nothing from elasticsearch other than 0 hits

"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]

I am not sure if I am querying the wrong index in elasticsearch or if I am
configuring the river correctly. Also is Mongo running with oplog on all
the time or do I need to start it with -replSet?

A little additional information from elastic search:

From log file

[2012-03-08 16:00:34,007][DEBUG][cluster.action.shard ] [Songbird]
sending shard started for [amazon][2], node[XO-XIlcrSrCagCvYR4QUOw], [R],
s[INITIALIZING], reason [after recovery (replica) from node [[Ghost
[2012-03-08 16:00:34,010][DEBUG][index.shard.service ] [Songbird]
[_river][0] state: [RECOVERING]->[STARTED], reason [post recovery]
[2012-03-08 16:00:34,010][DEBUG][index.shard.service ] [Songbird]
[_river][0] scheduling refresher every 1s
[2012-03-08 16:00:34,010][DEBUG][index.shard.service ] [Songbird]
[_river][0] scheduling optimizer / merger every 1s
[2012-03-08 16:00:34,013][DEBUG][indices.recovery ] [Songbird]
[_river][0] recovery completed from [Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/]], took[214ms]
phase1: recovered_files [1] with total_size of [1.5kb], took [10ms],
throttling_wait [0s]
: reusing_files [16] with total_size of [1kb]
phase2: start took [92ms]
: recovered [2] transaction log operations, took [93ms]
phase3: recovered [0] transaction log operations, took [16ms]
[2012-03-08 16:00:34,014][DEBUG][cluster.action.shard ] [Songbird]
sending shard started for [_river][0], node[XO-XIlcrSrCagCvYR4QUOw], [R],
s[INITIALIZING], reason [after recovery (replica) from node [[Ghost
[2012-03-08 16:00:34,014][DEBUG][cluster.service ] [Songbird]
processing [zen-disco-receive(from master [[Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/]]])]: done applying
updated cluster_state
[2012-03-08 16:00:34,014][DEBUG][cluster.service ] [Songbird]
processing [zen-disco-receive(from master [[Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/]]])]: execute
[2012-03-08 16:00:34,014][DEBUG][cluster.service ] [Songbird]
cluster state updated, version [276], source [zen-disco-receive(from master
[[Ghost Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/]]])]
[2012-03-08 16:00:34,015][DEBUG][cluster.action.shard ] [Songbird]
sending shard started for [_river][0], node[XO-XIlcrSrCagCvYR4QUOw], [R],
s[INITIALIZING], reason [master [Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/]] marked shard as
initializing, but shard already started, mark shard as started]

Inside look at river configuration

curl -GET localhost:9200/_river/mongo/_search?pretty

"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "_river",
"_type" : "mongo",
"_id" : "_meta",
"_score" : 1.0, "_source" : {type: "mongodb", mongodb : {db: "cm",
collection: "screens"}, index: {name: "mongoindex", type: "screens"}}
}, {
"_index" : "_river",
"_type" : "mongo",
"_id" : "_status",
"_score" : 1.0, "_source" :
} ]

I am running exactly as you did
curl -XPUT 'http://localhost:9200/_river/test/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "test",
"collection": "es_test"
"index": {
"name": "mongoindex",
"type": "es_test"
However I am getting the following exception again and again:

{"error":"MapperParsingException[Failed to parse]; nested:
expected character ('m' (code 109)): expected a valid value (number,
String, array, object, 'true', 'false' or 'null')\n at [Source:
[B@61f133ea; line: 1, column:8]]; ","status":400}

can you please point out what I am doing wrong? I am completely new to
ElasticSearch and will appreciate your assistance.

What does your mongodb docs looks like ?

Le 22 mai 2012 à 08:27, Serikozz a écrit :

I am running exactly as you did
curl -XPUT 'http://localhost:9200/_river/test/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "test",
"collection": "es_test"
"index": {
"name": "mongoindex",
"type": "es_test"
However I am getting the following exception again and again:

{"error":"MapperParsingException[Failed to parse]; nested:
expected character ('m' (code 109)): expected a valid value (number,
String, array, object, 'true', 'false' or 'null')\n at [Source:
[B@61f133ea; line: 1, column:8]]; ","status":400}

can you please point out what I am doing wrong? I am completely new to
Elasticsearch and will appreciate your assistance.

View this message in context:
Sent from the Elasticsearch Users mailing list archive at

it is a tweets that I've collected

{ "id" : ObjectId("4fbb380cfed8f515a0000005"), "created_at" : "Tue May 22 06:54
:05 +0000 2012", "in_reply_to_user_id" : null, "place" : null, "in_reply_to_scre
en_name" : null, "favorited" : false, "source" : "<a href="http://www.echofon.c
om/" rel="nofollow">Echofon", "truncated" : false, "in_reply_to_status_id
" : null, "entities" : { "urls" : [ ], "hashtags" : [ ], "user_mentions" : [ ] }
, "id_str" : "204827455935086592", "retweet_count" : 0, "contributors" : null, "
geo" : null, "in_reply_to_user_id_str" : null, "user" : { "profile_background_im
age_url" : "", "profile_text_colo
r" : "333333", "show_all_inline_media" : false, "notifications" : null, "contrib
utors_enabled" : false, "profile_background_tile" : false, "created_at" : "Mon A
ug 08 19:19:02 +0000 2011", "time_zone" : "Central Time (US & Canada)", "listed

count" : 4, "profile_image_url" : "
/madamGABalot_normal.jpg", "follow_request_sent" : null, "is_translator" : false
, "url" : null, "friends_count" : 530, "utc_offset" : -21600, "verified" : false
, "screen_name" : "madamGABalot", "profile_background_color" : "C0DEED", "lang"
: "en", "id_str" : "351080465", "statuses_count" : 26720, "default_profile_image
" : false, "description" : "Life is the only thing one should take advantage of
FOLLOW my instagram GABsnapsALOT\n", "favourites_count" : 58, "profile_use_backg
round_image" : true, "default_profile" : true, "profile_sidebar_border_color" :
"C0DEED", "following" : null, "profile_sidebar_fill_color" : "DDEEF6", "geo_enab
led" : false, "id" : 351080465, "profile_link_color" : "0084B4", "followers_coun
t" : 770, "profile_background_image_url_https" : "
hemes/theme1/bg.png", "location" : "On Top of Myself", "name" : "Ssshhh", "prote
cted" : false, "profile_image_url_https" : "
/2239532534/madamGABalot_normal.jpg" }, "retweeted" : false, "id" : NumberLong("
204827455935086592"), "in_reply_to_status_id_str" : null, "coordinates" : null,
"text" : "In a happy place right now but the best part is its only gonna get bet
ter" }

But I only need to retrieve from MongoDB tweets's text:

{ "_id" : ObjectId("4fbb380cfed8f515a0000004"), "text" : "Lil Wayne Singlee Oh Y
opp #SiyahTweetin笶、笶、笶、" }
{ "_id" : ObjectId("4fbb380cfed8f515a0000005"), "text" : "In a happy place right
now but the best part is its only gonna get better" }

I think the problem is here : "_id" : ObjectId("4fbb380cfed8f515a0000005")

IMHO, ObjectId("4fbb380cfed8f515a0000005") could not be an ID for an ES document.

I don't know how the mongodb river works (does it extract id from ObjectId ?), but the error log seems to indicate that the error comes from this.


Le 23 mai 2012 à 07:58, Serikozz a écrit :

"_id" : ObjectId("4fbb380cfed8f515a0000005")

I'm running it with command promt
I've tried with single line and even tried to execute an example from with double quote coz Windows seems don't like single quote

curl -XPUT "http://localhost:9200/_river/mongodb/_meta" -d "{"type":"mongodb", "mongodb":{"db":"testmongo", "collection":"person"}, "index":{"name":"mongoindex", "type":"person"} }"

however still getting this error:

{"error":"MapperParsingException[Failed to parse]; nested: JsonParseException[Un
expected character ('m' (code 109)): expected a valid value (number, String, arr
ay, object, 'true', 'false' or 'null')\n at [Source: [B@5fa13de; line: 1, colum n: 8]]; ","status":400}

What this error actually means?


A pinyin analysis plugin integrates Pinyin4j( has just rocks up,it can used to convert chinese characters to pinyin.

To install it use:
plugin -install medcl/elasticsearch-analysis-pinyin/1.1.0

And here is the project url:

Have fun~
