MongoDB River Plugin 1.1.0


(Richard Louapre) #1

Hi,

MongoDB River Plugin 1.1.0 has just been released [0].
MongoDB River will now work with the new Elasticsearch 0.19.0.

To install it use:
plugin.bat -install elasticsearch/elasticsearch-mapper-attachments/1.2.0
plugin.bat -install richardwilly98/elasticsearch-river-mongodb/1.1.0

[0] - https://github.com/richardwilly98/elasticsearch-river-mongodb

Ciao,
Richard.


(d95sld95-2) #2

I am not sure what I am doing wrong. I can't get the river to replicate
from Mongo to ElasticSearch.

Here is what I am doing.

In Mongo:

Database: cm
Collection: screens

  1. Installed Mongo 2.0.3 and ElasticSearch 0.19.0.
  2. Install ElasticSearch plugins

plugin.bat -install elasticsearch/elasticsearch-mapper-attachments/1.2.0

plugin.bat -install richardwilly98/elasticsearch-river-mongodb/1.1.0

  1. Changed mongo and elastic search logging to debug at root level
  2. Started ElasticSearch (./elasticsearch)
  3. Started mongo (./mongod -replSet funWithOplogs)
  4. In mongo console I execute 'rs.initiate()'
  5. Using curl I add the river

curl -PUT http://localhost:9200/_river/mongo/_meta -d

'{

type: "mongodb",

mongodb : {db: "cm", collection: "screens"},

index: {name: "mongoindex", type: "screens"}

}'

  1. I insert data into the mongo database 'cm' and collection 'screens'
  2. Verify that the data is in mongo
  3. Query ElasticSearch

curl -XGET localhost:9200/cm/screens/_search?pretty

  1. I get nothing from elasticsearch other than 0 hits

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}

I am not sure if I am querying the wrong index in elasticsearch or if I am
configuring the river correctly. Also is Mongo running with oplog on all
the time or do I need to start it with -replSet?


(d95sld95-2) #3

A little additional information from elastic search:

From log file

[2012-03-08 16:00:34,007][DEBUG][cluster.action.shard ] [Songbird]
sending shard started for [amazon][2], node[XO-XIlcrSrCagCvYR4QUOw], [R],
s[INITIALIZING], reason [after recovery (replica) from node [[Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/192.168.10.84:9300]]]]
[2012-03-08 16:00:34,010][DEBUG][index.shard.service ] [Songbird]
[_river][0] state: [RECOVERING]->[STARTED], reason [post recovery]
[2012-03-08 16:00:34,010][DEBUG][index.shard.service ] [Songbird]
[_river][0] scheduling refresher every 1s
[2012-03-08 16:00:34,010][DEBUG][index.shard.service ] [Songbird]
[_river][0] scheduling optimizer / merger every 1s
[2012-03-08 16:00:34,013][DEBUG][indices.recovery ] [Songbird]
[_river][0] recovery completed from [Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/192.168.10.84:9300]], took[214ms]
phase1: recovered_files [1] with total_size of [1.5kb], took [10ms],
throttling_wait [0s]
: reusing_files [16] with total_size of [1kb]
phase2: start took [92ms]
: recovered [2] transaction log operations, took [93ms]
phase3: recovered [0] transaction log operations, took [16ms]
[2012-03-08 16:00:34,014][DEBUG][cluster.action.shard ] [Songbird]
sending shard started for [_river][0], node[XO-XIlcrSrCagCvYR4QUOw], [R],
s[INITIALIZING], reason [after recovery (replica) from node [[Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/192.168.10.84:9300]]]]
[2012-03-08 16:00:34,014][DEBUG][cluster.service ] [Songbird]
processing [zen-disco-receive(from master [[Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/192.168.10.84:9300]]])]: done applying
updated cluster_state
[2012-03-08 16:00:34,014][DEBUG][cluster.service ] [Songbird]
processing [zen-disco-receive(from master [[Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/192.168.10.84:9300]]])]: execute
[2012-03-08 16:00:34,014][DEBUG][cluster.service ] [Songbird]
cluster state updated, version [276], source [zen-disco-receive(from master
[[Ghost Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/192.168.10.84:9300]]])]
[2012-03-08 16:00:34,015][DEBUG][cluster.action.shard ] [Songbird]
sending shard started for [_river][0], node[XO-XIlcrSrCagCvYR4QUOw], [R],
s[INITIALIZING], reason [master [Ghost
Girl][4lwZnx8QR_euxyP8UtWIkQ][inet[/192.168.10.84:9300]] marked shard as
initializing, but shard already started, mark shard as started]

Inside look at river configuration

curl -GET localhost:9200/_river/mongo/_search?pretty

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "_river",
"_type" : "mongo",
"_id" : "_meta",
"_score" : 1.0, "_source" : {type: "mongodb", mongodb : {db: "cm",
collection: "screens"}, index: {name: "mongoindex", type: "screens"}}
}, {
"_index" : "_river",
"_type" : "mongo",
"_id" : "_status",
"_score" : 1.0, "_source" :
{"ok":true,"node":{"id":"4lwZnx8QR_euxyP8UtWIkQ","name":"Ghost
Girl","transport_address":"inet[/192.168.10.84:9300]"}}
} ]
}


(Serikozz) #4

I am running exactly as you did
curl -XPUT 'http://localhost:9200/_river/test/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "test",
"collection": "es_test"
},
"index": {
"name": "mongoindex",
"type": "es_test"
}
}'
However I am getting the following exception again and again:

{"error":"MapperParsingException[Failed to parse]; nested:
JsonParseException[Un
expected character ('m' (code 109)): expected a valid value (number,
String, array, object, 'true', 'false' or 'null')\n at [Source:
[B@61f133ea; line: 1, column:8]]; ","status":400}

can you please point out what I am doing wrong? I am completely new to
ElasticSearch and will appreciate your assistance.


(David Pilato) #5

What does your mongodb docs looks like ?

Le 22 mai 2012 à 08:27, Serikozz serikozz@mail.ru a écrit :

I am running exactly as you did
curl -XPUT 'http://localhost:9200/_river/test/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "test",
"collection": "es_test"
},
"index": {
"name": "mongoindex",
"type": "es_test"
}
}'
However I am getting the following exception again and again:

{"error":"MapperParsingException[Failed to parse]; nested:
JsonParseException[Un
expected character ('m' (code 109)): expected a valid value (number,
String, array, object, 'true', 'false' or 'null')\n at [Source:
[B@61f133ea; line: 1, column:8]]; ","status":400}

can you please point out what I am doing wrong? I am completely new to
ElasticSearch and will appreciate your assistance.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/MongoDB-River-Plugin-1-1-0-tp3797903p4006086.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Serikozz) #6

it is a tweets that I've collected

{ "id" : ObjectId("4fbb380cfed8f515a0000005"), "created_at" : "Tue May 22 06:54
:05 +0000 2012", "in_reply_to_user_id" : null, "place" : null, "in_reply_to_scre
en_name" : null, "favorited" : false, "source" : "<a href="http://www.echofon.c
om/" rel="nofollow">Echofon", "truncated" : false, "in_reply_to_status_id
" : null, "entities" : { "urls" : [ ], "hashtags" : [ ], "user_mentions" : [ ] }
, "id_str" : "204827455935086592", "retweet_count" : 0, "contributors" : null, "
geo" : null, "in_reply_to_user_id_str" : null, "user" : { "profile_background_im
age_url" : "http://a0.twimg.com/images/themes/theme1/bg.png", "profile_text_colo
r" : "333333", "show_all_inline_media" : false, "notifications" : null, "contrib
utors_enabled" : false, "profile_background_tile" : false, "created_at" : "Mon A
ug 08 19:19:02 +0000 2011", "time_zone" : "Central Time (US & Canada)", "listed

count" : 4, "profile_image_url" : "http://a0.twimg.com/profile_images/2239532534
/madamGABalot_normal.jpg", "follow_request_sent" : null, "is_translator" : false
, "url" : null, "friends_count" : 530, "utc_offset" : -21600, "verified" : false
, "screen_name" : "madamGABalot", "profile_background_color" : "C0DEED", "lang"
: "en", "id_str" : "351080465", "statuses_count" : 26720, "default_profile_image
" : false, "description" : "Life is the only thing one should take advantage of
FOLLOW my instagram GABsnapsALOT\n", "favourites_count" : 58, "profile_use_backg
round_image" : true, "default_profile" : true, "profile_sidebar_border_color" :
"C0DEED", "following" : null, "profile_sidebar_fill_color" : "DDEEF6", "geo_enab
led" : false, "id" : 351080465, "profile_link_color" : "0084B4", "followers_coun
t" : 770, "profile_background_image_url_https" : "https://si0.twimg.com/images/t
hemes/theme1/bg.png", "location" : "On Top of Myself", "name" : "Ssshhh", "prote
cted" : false, "profile_image_url_https" : "https://si0.twimg.com/profile_images
/2239532534/madamGABalot_normal.jpg" }, "retweeted" : false, "id" : NumberLong("
204827455935086592"), "in_reply_to_status_id_str" : null, "coordinates" : null,
"text" : "In a happy place right now but the best part is its only gonna get bet
ter" }

But I only need to retrieve from MongoDB tweets's text:

{ "_id" : ObjectId("4fbb380cfed8f515a0000004"), "text" : "Lil Wayne Singlee Oh Y
opp #SiyahTweetin笶、笶、笶、" }
{ "_id" : ObjectId("4fbb380cfed8f515a0000005"), "text" : "In a happy place right
now but the best part is its only gonna get better" }


(David Pilato) #7

I think the problem is here : "_id" : ObjectId("4fbb380cfed8f515a0000005")

IMHO, ObjectId("4fbb380cfed8f515a0000005") could not be an ID for an ES document.

I don't know how the mongodb river works (does it extract id from ObjectId ?), but the error log seems to indicate that the error comes from this.

HTH
David

Le 23 mai 2012 à 07:58, Serikozz serikozz@mail.ru a écrit :

"_id" : ObjectId("4fbb380cfed8f515a0000005")


(Serikozz) #8

I'm running it with command promt
I've tried with single line and even tried to execute an example from https://github.com/richardwilly98/elasticsearch-river-mongodb with double quote coz Windows seems don't like single quote

curl -XPUT "http://localhost:9200/_river/mongodb/_meta" -d "{"type":"mongodb", "mongodb":{"db":"testmongo", "collection":"person"}, "index":{"name":"mongoindex", "type":"person"} }"

however still getting this error:

{"error":"MapperParsingException[Failed to parse]; nested: JsonParseException[Un
expected character ('m' (code 109)): expected a valid value (number, String, arr
ay, object, 'true', 'false' or 'null')\n at [Source: [B@5fa13de; line: 1, colum n: 8]]; ","status":400}

What this error actually means?


(medcl.net) #9

Hi,

A pinyin analysis plugin integrates Pinyin4j(http://pinyin4j.sourceforge.net/) has just rocks up,it can used to convert chinese characters to pinyin.

To install it use:
plugin -install medcl/elasticsearch-analysis-pinyin/1.1.0

And here is the project url:

Have fun~

Medcl


(system) #10