[ANN] RSS River Plugin 0.0.6

RSS River 0.0.6 has just been released [1].

RSS River will now works with the new Elasticsearch 0.19.0.RC1 release
candidate.

Please feel free to test it, fork it, create issues, contribute. :wink:

[1] http://dadoonet.github.com/rssriver/
http://dadoonet.github.com/rssriver/

--

David Pilato

http://dev.david.pilato.fr/ http://dev.david.pilato.fr/

Twitter : https://twitter.com/#!/dadoonet https://twitter.com/#!/dadoonet

Cheers!

On Wednesday, February 8, 2012 at 12:15 AM, David Pilato wrote:

RSS River 0.0.6 has just been released [1].

RSS River will now works with the new Elasticsearch 0.19.0.RC1 release candidate.

Please feel free to test it, fork it, create issues, contribute… :wink:

[1] http://dadoonet.github.com/rssriver/

--

David Pilato

http://dev.david.pilato.fr/

Twitter : https://twitter.com/#!/dadoonet

Hi,

I'm new to Java, but experienced in .NET and PHP. I still have trouble
getting the IDE and development environment set up.

What I accomplished so far is:

  1. Downloaded the intellij IDEA Community edition
  2. Openend from existing source on git following projects:
  • elasticsearch
  • some elasticsearch-river-* (twitter, wikipedia, couchdb) and
  • your rssriver.
  1. I was able to build all sources and made run the tests in
    elasticsearch and rssriver.
  2. By running tests I mean, full debug, step by step, variable
    inspection and so on

What I'm missing is:

  • not able to test elasticsearch-river-twitter or -wikipedia in the
    IDE. (no test file?)
  • I don't know how to contribute changes back to the community

Hi,

What I'm missing is:

  • not able to test elasticsearch-river-twitter or -wikipedia in the
    IDE. (no test file?)

There are no test in river twitter. No src/test/java folder in

https://github.com/elasticsearch/elasticsearch-river-twitter/tree/master/src
nor GitHub - elastic/elasticsearch-river-wikipedia: Wikipedia River Plugin for elasticsearch (STOPPED)
https://github.com/elasticsearch/elasticsearch-river-wikipedia
So if you want to test the twitter river, start an ES node, install the
twitter plugin and start the river.
If you want to "debug" the twitter river, build your own test classes and
start them in debug mode.

  • I don't know how to contribute changes back to the community
    Not sure I understand this. I would answer : fork the project on github,
    modify it, commit and push your changes and then make a pull request. Is
    that what you are after ?

HTH
David.

Hi, thanks for the answer!

There are no test in river twitter. No src/test/java folder
So if you want to test the twitter river, start an ES node, install the
twitter plugin and start the river.

I've done that, and it works like a charm :slight_smile:

If you want to "debug" the twitter river, build your own test classes and
start them in debug mode.

There is my problem. Without experience in Java, I don't even can
create a proper test class...

Not sure I understand this. I would answer : fork the project on github,
modify it, commit and push your changes and then make a pull request. Is
that what you are after ?

Yes, I signed up on github and will play around a bit. I have to learn
this too, I guess.

The change what comes in my mind is some RSS "automation" work I've
done already in PHP and will try to port it in Java. It is basically
an automatic recognition when the feed changes, instead of using fixed
intervals (hourly, daily, every minute, ...). It looks for changes,
saves them in a database and makes calculations for the next run. So,
very active feeds are pulled frequently and some others just once in a
while.

If you want to "debug" the twitter river, build your own test classes
and
start them in debug mode.
There is my problem. Without experience in Java, I don't even can
create a proper test class...
Have a look on other projects :
https://github.com/dadoonet/rssriver/tree/master/src/test/java/org/elasticse
arch/river/rss

RomeTest.java is a real test class (I mean that's a JUnit test case so you
need to launch it with JUnit).
RssRiverTestLauncher.java is a standalone class (main()). You can launch it
directly.
It uses RssRiverTest.java and its parent class AbstractRssRiverTest.java.

You can clone this code for your own needs.

Not sure I understand this. I would answer : fork the project on
github,
modify it, commit and push your changes and then make a pull request.
Is
that what you are after ?
Yes, I signed up on github and will play around a bit. I have to learn
this too, I guess.
Oh yes ! Learn Git and GitHub. Very powerful but it's IMHO a revolution in
your mind.
If you used to play with SVN or CVS, forget everything you have learned
before and start to understand git concepts.

But, ES mailing list is not the best place to talk about it.

The change what comes in my mind is some RSS "automation" work I've
done already in PHP and will try to port it in Java. It is basically
an automatic recognition when the feed changes, instead of using fixed
intervals (hourly, daily, every minute, ...). It looks for changes,
saves them in a database and makes calculations for the next run. So,
very active feeds are pulled frequently and some others just once in a
while.
So you mean that you try to fetch content every 5 minutes and if there is no
change, you change the period to 10 mn and so on ???
I thought using the RSS specification as some feeds provides information on
rate change. See :
RSS 2.0 Specification (Current)

Hi David,

great, thank you so much again - I will look for some java/github
tutorials, books and videos to get up to date.

So you mean that you try to fetch content every 5 minutes and if there is no
change, you change the period to 10 mn and so on ???

Thats what I used to do. Double the time (if no new posts) and then
take half of time (if new posts). But thats not working with real
bloggers. Nowadays I save the last seven days, with the corresponding
hours. And then I decide when to fetch the feed again. On some blogs
you see activity only during weekdays from 9am to 5pm in their
timezone.Other autogenerated blogs may fill up the RSS feed every 5
minutes. It depends. I guess this will be way easier to implement
here, since you already save the timestamp of the new link and
Elasticsearch could give us some nice facetting over the last 7 days /
24 hours. Then we would only need one more field for the timestamp of
the next fetch date. Its not that I save bandwith or computing time
with such a strategy, but if we thiink of million of rss feeds it
might matter.

Jean

Something we probably can do is to modify the
update_rate
to be a cron expression.

With a new property like
auto_adjust_time:true/false
we can implement what you described here.

What do you think ?

David :wink:
@dadoonet

Le 8 févr. 2012 à 23:09, "jeangld@yahoo.com" jeangld@yahoo.com a écrit :

Hi David,

great, thank you so much again - I will look for some java/github
tutorials, books and videos to get up to date.

So you mean that you try to fetch content every 5 minutes and if there is no
change, you change the period to 10 mn and so on ???

Thats what I used to do. Double the time (if no new posts) and then
take half of time (if new posts). But thats not working with real
bloggers. Nowadays I save the last seven days, with the corresponding
hours. And then I decide when to fetch the feed again. On some blogs
you see activity only during weekdays from 9am to 5pm in their
timezone.Other autogenerated blogs may fill up the RSS feed every 5
minutes. It depends. I guess this will be way easier to implement
here, since you already save the timestamp of the new link and
Elasticsearch could give us some nice facetting over the last 7 days /
24 hours. Then we would only need one more field for the timestamp of
the next fetch date. Its not that I save bandwith or computing time
with such a strategy, but if we thiink of million of rss feeds it
might matter.

Jean

Hi David,

sorry for taking me so long to reply, but I made my feet wet with
java :slight_smile:

I found a great java tutorial on sourceforge and was able to do a lot
of things. I've choosen the free version of IntelliJ IDEA, but will
also try out eclipse. In java I find myself just typing one or two
letters and using the autocomplete feature a lot. I made also some
speed tests with my old php scripts and java was on average 20x
faster. Unbelievalble :slight_smile: It's almost C speed...

I can't believe it, but I really managed to write tests for the
twitter and wikipedia river plugins. I took your mentioned examples
and I finally understood that in the tests we run new node, feed it
with the river data and then wait until the code is triggered.

With a new property like
auto_adjust_time:true/false
we can implement what you described here.
What do you think ?

Yes, something like this would be great. I will have a look again in
my old php code, since a lot changes with the object oriented way to
do things.

Jean

Hi Jean,

With a new property like
auto_adjust_time:true/false
we can implement what you described here.
What do you think ?
Yes, something like this would be great. I will have a look again in
my old php code, since a lot changes with the object oriented way to
do things.

Ok. You can start by opening an issue here :

If you want to contribute, you're welcome ! :wink:

David.