Queries regarding the rivers implementation


(Mahendra M) #1

Hi,

I was going through the elasticsearch code and docs (couchdb river)
and had some doubts. (I am still in the process of setting up ES
0.11).

  1. The couchdb river plugin provides extra settings for "index" [1].
    Does this provide support for creating a index other than "_river" ?
    From the rivers documentation [2], it looks like the index name will
    be _river and the user can control only the type. Or is it actually
    possible to override _river with a different index name ?

  2. CouchDB filter args - CouchDB supports arguments to filters [3],
    which can be useful in filtering out specific data. From the code, it
    looks like we can add this as part of 'filter' configuration

"filter" = "abc&option1=value1"

In the code for CouchDB river, it is just
if (couchFilter != null) {
file = file + "&filter=" + couchFilter;
}

So, the HTTP url will get formed as '....&filter=abc&option1=value1',
but it looks like a hack.
Would it be better to provide a different configuration for filter
parameters or is the above hack a safe way to do things ?

[1] http://www.elasticsearch.com/docs/elasticsearch/river/couchdb/
[2] http://www.elasticsearch.com/blog/2010/09/28/the_river.html
[3] http://blog.couchone.com/post/468392274/whats-new-in-apache-couchdb-0-11-part-three-new

Regards,
Mahendra

http://twitter.com/mahendra


(Mahendra M) #2

Hi,

I guess I found an answer to one of my own questions.

On Wed, Sep 29, 2010 at 11:52 AM, Mahendra M wrote:

  1. The couchdb river plugin provides extra settings for "index" [1].
    Does this provide support for creating a index other than "_river" ?
    From the rivers documentation [2], it looks like the index name will
    be _river and the user can control only the type. Or is it actually
    possible to override _river with a different index name ?

From trying it out, I think the interpretation for the index settings
is as follows. (Please correct me if I am wrong)

While creating a couchdb river, if "index" is not specified, then the
default values are
index = <db_name>
type = <db_name>

So, searching for data has to be done against :
http://localhost:9200/<db_name>/<db_name>/_search

I was (wrongly) assuming that the index will be created as
index = _river
type = <db_name>

and search query had to be done using:
http://localhost:9200/_river/<db_name>/_search

Also, if "index" and "type" parameters are specified while setting up
the river, then ElasticSearch will use those instead of <db_name>.

Sorry for the confusion. I could not figure it out from the documentation.

Regards,
Mahendra

http://twitter.com/mahendra


(Shay Banon) #3

Hey,

Yea, the _river index acts as the registry for rivers and their metadata
and state (providing persistency and state management for the rivers, not
the data they index). For example, the coucdb one stores the last indexed
seq in it. You can control the index name and type you index to.

Regarding the filter, I misses the part where you can provide parameters
to filters, I can add it as a configuration option. Want to open an issue
for it?

-shay.banon

On Wed, Sep 29, 2010 at 10:29 AM, Mahendra M mahendra.m@gmail.com wrote:

Hi,

I guess I found an answer to one of my own questions.

On Wed, Sep 29, 2010 at 11:52 AM, Mahendra M wrote:

  1. The couchdb river plugin provides extra settings for "index" [1].
    Does this provide support for creating a index other than "_river" ?
    From the rivers documentation [2], it looks like the index name will
    be _river and the user can control only the type. Or is it actually
    possible to override _river with a different index name ?

From trying it out, I think the interpretation for the index settings
is as follows. (Please correct me if I am wrong)

While creating a couchdb river, if "index" is not specified, then the
default values are
index = <db_name>
type = <db_name>

So, searching for data has to be done against :
http://localhost:9200/<db_name>/<db_name>/_search

I was (wrongly) assuming that the index will be created as
index = _river
type = <db_name>

and search query had to be done using:
http://localhost:9200/_river/<db_name>/_search

Also, if "index" and "type" parameters are specified while setting up
the river, then ElasticSearch will use those instead of <db_name>.

Sorry for the confusion. I could not figure it out from the documentation.

Regards,
Mahendra

http://twitter.com/mahendra


(Lukáš Vlček) #4

Just out of curiosity, how it is possible to control the index and type name
for RabbitMQ river?

Also I found that it is possible to customize the river registry name by
settings river.index_name property (defaults to "_river"). Probably should
be added to the documentation.

Lukas

On Wed, Sep 29, 2010 at 11:33 AM, Shay Banon
shay.banon@elasticsearch.comwrote:

Hey,

Yea, the _river index acts as the registry for rivers and their metadata
and state (providing persistency and state management for the rivers, not
the data they index). For example, the coucdb one stores the last indexed
seq in it. You can control the index name and type you index to.

Regarding the filter, I misses the part where you can provide parameters
to filters, I can add it as a configuration option. Want to open an issue
for it?

-shay.banon

On Wed, Sep 29, 2010 at 10:29 AM, Mahendra M mahendra.m@gmail.com wrote:

Hi,

I guess I found an answer to one of my own questions.

On Wed, Sep 29, 2010 at 11:52 AM, Mahendra M wrote:

  1. The couchdb river plugin provides extra settings for "index" [1].
    Does this provide support for creating a index other than "_river" ?
    From the rivers documentation [2], it looks like the index name will
    be _river and the user can control only the type. Or is it actually
    possible to override _river with a different index name ?

From trying it out, I think the interpretation for the index settings
is as follows. (Please correct me if I am wrong)

While creating a couchdb river, if "index" is not specified, then the
default values are
index = <db_name>
type = <db_name>

So, searching for data has to be done against :
http://localhost:9200/<db_name>/<db_name>/_search

I was (wrongly) assuming that the index will be created as
index = _river
type = <db_name>

and search query had to be done using:
http://localhost:9200/_river/<db_name>/_search

Also, if "index" and "type" parameters are specified while setting up
the river, then ElasticSearch will use those instead of <db_name>.

Sorry for the confusion. I could not figure it out from the documentation.

Regards,
Mahendra

http://twitter.com/mahendra


(Shay Banon) #5

The rabbitmq has supports the bulk format, so all information (index, type)
is provided per bulk item.

-shay.banon

On Wed, Sep 29, 2010 at 12:10 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Just out of curiosity, how it is possible to control the index and type
name for RabbitMQ river?

Also I found that it is possible to customize the river registry name by
settings river.index_name property (defaults to "_river"). Probably should
be added to the documentation.

Lukas

On Wed, Sep 29, 2010 at 11:33 AM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Hey,

Yea, the _river index acts as the registry for rivers and their
metadata and state (providing persistency and state management for the
rivers, not the data they index). For example, the coucdb one stores the
last indexed seq in it. You can control the index name and type you index
to.

Regarding the filter, I misses the part where you can provide
parameters to filters, I can add it as a configuration option. Want to open
an issue for it?

-shay.banon

On Wed, Sep 29, 2010 at 10:29 AM, Mahendra M mahendra.m@gmail.comwrote:

Hi,

I guess I found an answer to one of my own questions.

On Wed, Sep 29, 2010 at 11:52 AM, Mahendra M wrote:

  1. The couchdb river plugin provides extra settings for "index" [1].
    Does this provide support for creating a index other than "_river" ?
    From the rivers documentation [2], it looks like the index name will
    be _river and the user can control only the type. Or is it actually
    possible to override _river with a different index name ?

From trying it out, I think the interpretation for the index settings
is as follows. (Please correct me if I am wrong)

While creating a couchdb river, if "index" is not specified, then the
default values are
index = <db_name>
type = <db_name>

So, searching for data has to be done against :
http://localhost:9200/<db_name>/<db_name>/_search

I was (wrongly) assuming that the index will be created as
index = _river
type = <db_name>

and search query had to be done using:
http://localhost:9200/_river/<db_name>/_search

Also, if "index" and "type" parameters are specified while setting up
the river, then ElasticSearch will use those instead of <db_name>.

Sorry for the confusion. I could not figure it out from the
documentation.

Regards,
Mahendra

http://twitter.com/mahendra


(Lukáš Vlček) #6

Ah... yea... thnx.

On Wed, Sep 29, 2010 at 12:19 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

The rabbitmq has supports the bulk format, so all information (index, type)
is provided per bulk item.

-shay.banon

On Wed, Sep 29, 2010 at 12:10 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Just out of curiosity, how it is possible to control the index and type
name for RabbitMQ river?

Also I found that it is possible to customize the river registry name by
settings river.index_name property (defaults to "_river"). Probably should
be added to the documentation.

Lukas

On Wed, Sep 29, 2010 at 11:33 AM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Hey,

Yea, the _river index acts as the registry for rivers and their
metadata and state (providing persistency and state management for the
rivers, not the data they index). For example, the coucdb one stores the
last indexed seq in it. You can control the index name and type you index
to.

Regarding the filter, I misses the part where you can provide
parameters to filters, I can add it as a configuration option. Want to open
an issue for it?

-shay.banon

On Wed, Sep 29, 2010 at 10:29 AM, Mahendra M mahendra.m@gmail.comwrote:

Hi,

I guess I found an answer to one of my own questions.

On Wed, Sep 29, 2010 at 11:52 AM, Mahendra M wrote:

  1. The couchdb river plugin provides extra settings for "index" [1].
    Does this provide support for creating a index other than "_river" ?
    From the rivers documentation [2], it looks like the index name will
    be _river and the user can control only the type. Or is it actually
    possible to override _river with a different index name ?

From trying it out, I think the interpretation for the index settings
is as follows. (Please correct me if I am wrong)

While creating a couchdb river, if "index" is not specified, then the
default values are
index = <db_name>
type = <db_name>

So, searching for data has to be done against :
http://localhost:9200/<db_name>/<db_name>/_search

I was (wrongly) assuming that the index will be created as
index = _river
type = <db_name>

and search query had to be done using:
http://localhost:9200/_river/<db_name>/_search

Also, if "index" and "type" parameters are specified while setting up
the river, then ElasticSearch will use those instead of <db_name>.

Sorry for the confusion. I could not figure it out from the
documentation.

Regards,
Mahendra

http://twitter.com/mahendra


(Mahendra M) #7

Hi Shay,

On Wed, Sep 29, 2010 at 3:03 PM, Shay Banon wrote:

Yea, the _river index acts as the registry for rivers and their metadata
and state (providing persistency and state management for the rivers, not
the data they index). For example, the coucdb one stores the last indexed
seq in it. You can control the index name and type you index to.

OK cool. So, I guess if one has to upload a mapping for an index
(river), it must be on the actual index and not the _river index.

Regarding the filter, I misses the part where you can provide parameters
to filters, I can add it as a configuration option. Want to open an issue
for it?

OK. I will open an issue for this.

Also, couchdb supports user authentication. Some setups may require
authentication for registering a stream.

The url has to be 'http://user:pass@host:port'

Should I open another issue for this as well ? I guess this cannot be
done with just the java.net.URL module.

Mahendra
http://twitter.com/mahendra


(Shay Banon) #8

On Wed, Sep 29, 2010 at 12:27 PM, Mahendra M mahendra.m@gmail.com wrote:

Hi Shay,

On Wed, Sep 29, 2010 at 3:03 PM, Shay Banon wrote:

Yea, the _river index acts as the registry for rivers and their
metadata
and state (providing persistency and state management for the rivers, not
the data they index). For example, the coucdb one stores the last indexed
seq in it. You can control the index name and type you index to.

OK cool. So, I guess if one has to upload a mapping for an index
(river), it must be on the actual index and not the _river index.

Yea, you can precreate the index and put mappings and then create the river
that indexes data from couchdb into it.

Regarding the filter, I misses the part where you can provide
parameters
to filters, I can add it as a configuration option. Want to open an issue
for it?

OK. I will open an issue for this.

Also, couchdb supports user authentication. Some setups may require
authentication for registering a stream.

The url has to be 'http://user:pass@host:port'

Should I open another issue for this as well ? I guess this cannot be
done with just the java.net.URL module.

Yea, open another issue. It can be done with the URL, not a problem.

Mahendra
http://twitter.com/mahendra


(Mahendra M) #9

Hi,

I have opened the issues:


Regards,
Mahendra

On Wed, Sep 29, 2010 at 4:02 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

On Wed, Sep 29, 2010 at 12:27 PM, Mahendra M mahendra.m@gmail.com wrote:

Hi Shay,

On Wed, Sep 29, 2010 at 3:03 PM, Shay Banon wrote:

Yea, the _river index acts as the registry for rivers and their
metadata
and state (providing persistency and state management for the rivers,
not
the data they index). For example, the coucdb one stores the last
indexed
seq in it. You can control the index name and type you index to.

OK cool. So, I guess if one has to upload a mapping for an index
(river), it must be on the actual index and not the _river index.

Yea, you can precreate the index and put mappings and then create the river
that indexes data from couchdb into it.

Regarding the filter, I misses the part where you can provide
parameters
to filters, I can add it as a configuration option. Want to open an
issue
for it?

OK. I will open an issue for this.

Also, couchdb supports user authentication. Some setups may require
authentication for registering a stream.

The url has to be 'http://user:pass@host:port'

Should I open another issue for this as well ? I guess this cannot be
done with just the java.net.URL module.

Yea, open another issue. It can be done with the URL, not a problem.

Mahendra
http://twitter.com/mahendra

--
Mahendra

http://twitter.com/mahendra


(Mahendra M) #10

Hi Shay,

On Wed, Sep 29, 2010 at 4:19 PM, Mahendra M wrote:

I have opened the issues:
http://github.com/elasticsearch/elasticsearch/issues/issue/389

I tested this out. while forming the url params, the "=" was missing.
I have sent a patch for it here.

http://github.com/elasticsearch/elasticsearch/issues/issue/390

Was not able to test this one. Looks like there is an issue in couchdb
w.r.t to auth for _changes. Will clarify it and give it a try.

Regards,
Mahendra

http://twitter.com/mahendra


(system) #11