River index by date

Hi,

I'm trying to index tweets stored in CouchDB with a river plugin (I'm
starting with CouchDB full of tweets and brand new empty ES server). That
works fine but I also need to index them by date. So if a tweet has a
created_at attribute "Mon, 14 Jan 2013 02:22:33 +0000", I'd like to index
it under 2013-01-14 index/type. I though the easy solution might be to use
a "script" attribute when creating the index, where I'd set the ctx._type
attribute to a dynamically computed value, e.g. '2013-01-14'. I've been
trying that for quite a long time without luck. The idea should be apparent
from the following river index definition:

{
"type": "couchdb",
"couchdb": {
"host": "localhost",
"port": "5984",
"db": "my_db",
"user": "user",
"password": "password",
"filter": null,
"script": "here I put javascript commands; that compute the type
name from ctx.doc.created_at (var computed_type_name=...) and then set it
as follows; ctx._type=computed_type_name;"
},
"index": {
"index": "my_index"
}
}

I put some semicolons there to suggest that I have multiple commands there.
When I PUT this JSON into ES with "script": null, it indexes all documents.
When I put there the example shown above, it creates index and type
my_index, which stays empty. I get no error.
This threadhttps://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/ctx._type/elasticsearch/S0bmeMvnLyQ/Pla3UShs6fgJlead me to belive this should work. I also tried a naive solution
with "script": "ctx._type=ctx.doc.created_at" but that also didn't index
anything.

My question is, should I expect this to work or is there a better approach?
I need to keep alive only ~7 last couple of days, no need to search through
too old tweets.

--

Not sure this is the problem, but did you install the
lang-javascripthttps://github.com/elasticsearch/elasticsearch-lang-javascriptplugin?
ES needs it to run your script snippet.

$ bin/plugin -install elasticsearch/elasticsearch-river-couchdb/1.1.0

--
Michael Jackson
@mjackson

On Tue, Jan 15, 2013 at 2:11 PM, Martin Janda damdrdidlda@gmail.com wrote:

Hi,

I'm trying to index tweets stored in CouchDB with a river plugin (I'm
starting with CouchDB full of tweets and brand new empty ES server). That
works fine but I also need to index them by date. So if a tweet has a
created_at attribute "Mon, 14 Jan 2013 02:22:33 +0000", I'd like to index
it under 2013-01-14 index/type. I though the easy solution might be to use
a "script" attribute when creating the index, where I'd set the ctx._type
attribute to a dynamically computed value, e.g. '2013-01-14'. I've been
trying that for quite a long time without luck. The idea should be apparent
from the following river index definition:

{
"type": "couchdb",
"couchdb": {
"host": "localhost",
"port": "5984",
"db": "my_db",
"user": "user",
"password": "password",
"filter": null,
"script": "here I put javascript commands; that compute the type
name from ctx.doc.created_at (var computed_type_name=...) and then set it
as follows; ctx._type=computed_type_name;"
},
"index": {
"index": "my_index"
}
}

I put some semicolons there to suggest that I have multiple commands
there. When I PUT this JSON into ES with "script": null, it indexes all
documents. When I put there the example shown above, it creates index and
type my_index, which stays empty. I get no error.
This threadhttps://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/ctx._type/elasticsearch/S0bmeMvnLyQ/Pla3UShs6fgJlead me to belive this should work. I also tried a naive solution
with "script": "ctx._type=ctx.doc.created_at" but that also didn't index
anything.

My question is, should I expect this to work or is there a better
approach? I need to keep alive only ~7 last couple of days, no need to
search through too old tweets.

--

--

Yes, I have. This version: elasticsearch-lang-javascript-1.2.0. My river
plugin version matches with yours.

On Tuesday, 15 January 2013 23:26:12 UTC+1, Michael Jackson wrote:

Not sure this is the problem, but did you install the lang-javascripthttps://github.com/elasticsearch/elasticsearch-lang-javascriptplugin? ES needs it to run your script snippet.

$ bin/plugin -install elasticsearch/elasticsearch-river-couchdb/1.1.0

--
Michael Jackson
@mjackson

On Tue, Jan 15, 2013 at 2:11 PM, Martin Janda <damdr...@gmail.com<javascript:>

wrote:

Hi,

I'm trying to index tweets stored in CouchDB with a river plugin (I'm
starting with CouchDB full of tweets and brand new empty ES server). That
works fine but I also need to index them by date. So if a tweet has a
created_at attribute "Mon, 14 Jan 2013 02:22:33 +0000", I'd like to index
it under 2013-01-14 index/type. I though the easy solution might be to use
a "script" attribute when creating the index, where I'd set the ctx._type
attribute to a dynamically computed value, e.g. '2013-01-14'. I've been
trying that for quite a long time without luck. The idea should be apparent
from the following river index definition:

{
"type": "couchdb",
"couchdb": {
"host": "localhost",
"port": "5984",
"db": "my_db",
"user": "user",
"password": "password",
"filter": null,
"script": "here I put javascript commands; that compute the type
name from ctx.doc.created_at (var computed_type_name=...) and then set
it as follows; ctx._type=computed_type_name;"
},
"index": {
"index": "my_index"
}
}

I put some semicolons there to suggest that I have multiple commands
there. When I PUT this JSON into ES with "script": null, it indexes all
documents. When I put there the example shown above, it creates index and
type my_index, which stays empty. I get no error.
This threadhttps://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/ctx._type/elasticsearch/S0bmeMvnLyQ/Pla3UShs6fgJlead me to belive this should work. I also tried a naive solution
with "script": "ctx._type=ctx.doc.created_at" but that also didn't index
anything.

My question is, should I expect this to work or is there a better
approach? I need to keep alive only ~7 last couple of days, no need to
search through too old tweets.

--

--

I found a satisfying solution: elasticsearch-rollindexhttps://github.com/karussell/elasticsearch-rollindex plugin.
The idea is to create a rolling index that will roll each day. It always
creates an index with unique name like index_2013-02-05-00-00. The
important thing is that the most recent index always has a constant alias
index_feed (rollindex takes care of that). Then I create a river that
streams data into the index_feed. In order to make it work, I had to supply
a dummy_index for the _river plugin to redirect data into index_feed via a
script as follows:

you have to set up a cron to run this periodically

curl -XPUT
"localhost:9200/_rollindex?indexPrefix=index&searchIndices=1&rollIndices=7&closeAfterRoll=false

this needs to be run just once

curl -XPUT localhost:9200/_river/dummy_index/_meta -d '{
"type": "couchdb",
"couchdb": {
"host": "localhost",
"port": "5984",
"db": "my_couchdb_database",
"user": "username",
"password": "password",
"filter": null,
"script": "ctx._index="index_feed""
},
"index": {
"index": "dummy_index",
"type": "some_type",
"bulk_size": "100",
"bulk_timeout": "10ms"
}
}'

I hope someone will find this information useful. If you know about a more
elegant solution, please let me know.

On Tuesday, 15 January 2013 23:28:09 UTC+1, Martin Janda wrote:

Yes, I have. This version: elasticsearch-lang-javascript-1.2.0. My river
plugin version matches with yours.

On Tuesday, 15 January 2013 23:26:12 UTC+1, Michael Jackson wrote:

Not sure this is the problem, but did you install the lang-javascripthttps://github.com/elasticsearch/elasticsearch-lang-javascriptplugin? ES needs it to run your script snippet.

$ bin/plugin -install elasticsearch/elasticsearch-river-couchdb/1.1.0

--
Michael Jackson
@mjackson

On Tue, Jan 15, 2013 at 2:11 PM, Martin Janda damdr...@gmail.com wrote:

Hi,

I'm trying to index tweets stored in CouchDB with a river plugin (I'm
starting with CouchDB full of tweets and brand new empty ES server). That
works fine but I also need to index them by date. So if a tweet has a
created_at attribute "Mon, 14 Jan 2013 02:22:33 +0000", I'd like to index
it under 2013-01-14 index/type. I though the easy solution might be to use
a "script" attribute when creating the index, where I'd set the ctx._type
attribute to a dynamically computed value, e.g. '2013-01-14'. I've been
trying that for quite a long time without luck. The idea should be apparent
from the following river index definition:

{
"type": "couchdb",
"couchdb": {
"host": "localhost",
"port": "5984",
"db": "my_db",
"user": "user",
"password": "password",
"filter": null,
"script": "here I put javascript commands; that compute the type
name from ctx.doc.created_at (var computed_type_name=...) and then set
it as follows; ctx._type=computed_type_name;"
},
"index": {
"index": "my_index"
}
}

I put some semicolons there to suggest that I have multiple commands
there. When I PUT this JSON into ES with "script": null, it indexes all
documents. When I put there the example shown above, it creates index and
type my_index, which stays empty. I get no error.
This threadhttps://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/ctx._type/elasticsearch/S0bmeMvnLyQ/Pla3UShs6fgJlead me to belive this should work. I also tried a naive solution
with "script": "ctx._type=ctx.doc.created_at" but that also didn't index
anything.

My question is, should I expect this to work or is there a better
approach? I need to keep alive only ~7 last couple of days, no need to
search through too old tweets.

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.