River plugin clarification

Hi,

The doc on elasticsearch River plugin says:

“A river instance (and its name) is a type within the _river index. All
different rivers implementations accept a document called _meta that at the
very least has the type of the river (twitter / couchdb / …) associated
with it.”

Isn’t “_meta” word an ‘id’ or ‘_action’ according to the elasticsearch
documentation?

http://host:port/[index]/[type]/[_action/id]

Can somebody give us a good example description, like:

*curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{ *

  • "type": "mongodb", // type*

  • "mongodb": { // mongodb
    instance – does it have to be the same as url type?*

  •    "db": "testmongo",                                       // I 
    

think that strightforward*

  •    "collection": "person"                                 // I think 
    

that strightforward*

  • }, *

  • "index": {*

  •    "name": "mongoindex", *
    
  •    "type": "person"                                         // why do 
    

I have to repeat it again (its defined in as a collection)?*

  • }*

}'

  •       *_river – an index*
    
  •       *mongodb – a type*
    
  •       *_meta – an id*
    

Regards,

Janusz

--

Hi,

On Wed, Jan 2, 2013 at 11:27 AM, JD jdalecki@tycoint.com wrote:

Hi,****

The doc on elasticsearch River plugin says:****

“A river instance (and its name) is a type within the _river index. All
different rivers implementations accept a document called _meta that at the
very least has the type of the river (twitter / couchdb / …) associated
with it.”****

Isn’t “_meta” word an ‘id’ or ‘_action’ according to the elasticsearch
documentation?

Yes, "_meta" is the document ID, as far as I understand.


http://host:port/[index]/[type]/[_action/id]

Can somebody give us a good example description, like:****


*curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{ *

  • "type": "mongodb", // type

  • "mongodb": { // mongodb
    instance – does it have to be the same as url type?*

No, the URL type is the name of your river (which can be anything AFAIK),
while "mongodb" is a field that's required by the mongodb plugin.

**

  •    "db": "testmongo",                                       // I
    

think that strightforward*

  •    "collection": "person"                                 // I
    

think that strightforward*

  • }, *

  • "index": {*

  •    "name": "mongoindex", *
    
  •    "type": "person"                                         // why
    

do I have to repeat it again (its defined in as a collection)?*

The type here is the ES type you're indexing data in from your collection.
It doesn't have to have the same name, it can be anything.

**

  • }*

}'

  •       *_river – an index*
    
  •       *mongodb – a type*
    
  •       *_meta – an id*
    

If you want more info about the mongodb river itself, I think the best
place to look (if you didn't already :D) is here:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

Hi,

What I find little bit confusing in mongodb river doc is lack of example
for multi collection setup.

Wiki doc says that you need to create new river for MongoDB collection and
gives this example:

$ curl -XPUT "localhost:9200/_river/mongodb/_meta" -d '

{

"type": "mongodb",

"mongodb": {

"servers":

[

  { "host": ${mongo.instance1.host}, "port": ${mongo.instance1.port} },

  { "host": ${mongo.instance2.host}, "port": ${mongo.instance2.port} }

],

"options": { "secondary_read_preference" : true},

"credentials":

[

  { "db": "local", "user": ${mongo.local.user}, "password": ${mongo.local.password} },

  { "db": ${mongo.db.name}, "user": ${mongo.db.user}, "password": ${mongo.db.password} }

],

"db": ${mongo.db.name}, 

"collection": ${mongo.collection.name}, 

"gridfs": ${mongo.is.gridfs.collection},

"filter": ${mongo.filter}

},

"index": {

"name": ${es.index.name}, 

"throttle_size": ${es.throttle.size},

"type": ${es.type.name}

}

}'

I tried it and it does not work until I use URL line like this
"localhost:9200/_river/{collection_name_river}/_meta"

… so in other words I need to replace “mongodb” word (in URL part) by the
unique river collection name, which I think is not clearly stated in
documentation. I am right?

Regards,

Janusz

On Wednesday, 2 January 2013 20:27:04 UTC+11, JD wrote:

Hi,

The doc on elasticsearch River plugin says:

“A river instance (and its name) is a type within the _river index. All
different rivers implementations accept a document called _meta that at the
very least has the type of the river (twitter / couchdb / …) associated
with it.”

Isn’t “_meta” word an ‘id’ or ‘_action’ according to the elasticsearch
documentation?

http://host:port/[index]/[type]/[_action/id]

Can somebody give us a good example description, like:

*curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{ *

  • "type": "mongodb", // type

  • "mongodb": { // mongodb
    instance – does it have to be the same as url type?*

  •    "db": "testmongo",                                       // I 
    

think that strightforward*

  •    "collection": "person"                                 // I 
    

think that strightforward*

  • }, *

  • "index": {*

  •    "name": "mongoindex", *
    
  •    "type": "person"                                         // why 
    

do I have to repeat it again (its defined in as a collection)?*

  • }*

}'

  •       *_river – an index*
    
  •       *mongodb – a type*
    
  •       *_meta – an id*
    

Regards,

Janusz

--

Hi Janusz,

On Fri, Jan 4, 2013 at 10:05 AM, JD jdalecki@tycoint.com wrote:

Hi,****

What I find little bit confusing in mongodb river doc is lack of example
for multi collection setup.****

Wiki doc says that you need to create new river for MongoDB collection and
gives this example:****

$ curl -XPUT "localhost:9200/_river/mongodb/_meta" -d '****

{****

"type": "mongodb",****

"mongodb": { ****

"servers":****

[****

  { "host": ${mongo.instance1.host}, "port": ${mongo.instance1.port} },****

  { "host": ${mongo.instance2.host}, "port": ${mongo.instance2.port} }****

],****

"options": { "secondary_read_preference" : true},****

"credentials":****

[****

  { "db": "local", "user": ${mongo.local.user}, "password": ${mongo.local.password} },****

  { "db": ${mongo.db.name}, "user": ${mongo.db.user}, "password": ${mongo.db.password} }****

],****

"db": ${mongo.db.name}, ****

"collection": ${mongo.collection.name}, ****

"gridfs": ${mongo.is.gridfs.collection},****

"filter": ${mongo.filter}****

}, ****

"index": { ****

"name": ${es.index.name}, ****

"throttle_size": ${es.throttle.size},****

"type": ${es.type.name}****

}****

}'****

I tried it and it does not work until I use URL line like this
"localhost:9200/_river/{collection_name_river}/_meta"****

… so in other words I need to replace “mongodb” word (in URL part) by the
unique river collection name, which I think is not clearly stated in
documentation. I am right?

Yeah, I don't see anything like that in the documentation. It's the first
time I see that being reported, although I don't use the mongodb river
plugin myself.

Either way, it seems to me that if you want to use the river with multiple
collections you'd have to set up one river for each collection.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--