I don't know how to contribute changes back to the community
Not sure I understand this. I would answer : fork the project on github,
modify it, commit and push your changes and then make a pull request. Is
that what you are after ?
There are no test in river twitter. No src/test/java folder
So if you want to test the twitter river, start an ES node, install the
twitter plugin and start the river.
I've done that, and it works like a charm
If you want to "debug" the twitter river, build your own test classes and
start them in debug mode.
There is my problem. Without experience in Java, I don't even can
create a proper test class...
Not sure I understand this. I would answer : fork the project on github,
modify it, commit and push your changes and then make a pull request. Is
that what you are after ?
Yes, I signed up on github and will play around a bit. I have to learn
this too, I guess.
The change what comes in my mind is some RSS "automation" work I've
done already in PHP and will try to port it in Java. It is basically
an automatic recognition when the feed changes, instead of using fixed
intervals (hourly, daily, every minute, ...). It looks for changes,
saves them in a database and makes calculations for the next run. So,
very active feeds are pulled frequently and some others just once in a
while.
If you want to "debug" the twitter river, build your own test classes
and
start them in debug mode.
There is my problem. Without experience in Java, I don't even can
create a proper test class...
Have a look on other projects : https://github.com/dadoonet/rssriver/tree/master/src/test/java/org/elasticse
arch/river/rss
RomeTest.java is a real test class (I mean that's a JUnit test case so you
need to launch it with JUnit).
RssRiverTestLauncher.java is a standalone class (main()). You can launch it
directly.
It uses RssRiverTest.java and its parent class AbstractRssRiverTest.java.
You can clone this code for your own needs.
Not sure I understand this. I would answer : fork the project on
github,
modify it, commit and push your changes and then make a pull request.
Is
that what you are after ?
Yes, I signed up on github and will play around a bit. I have to learn
this too, I guess.
Oh yes ! Learn Git and GitHub. Very powerful but it's IMHO a revolution in
your mind.
If you used to play with SVN or CVS, forget everything you have learned
before and start to understand git concepts.
But, ES mailing list is not the best place to talk about it.
The change what comes in my mind is some RSS "automation" work I've
done already in PHP and will try to port it in Java. It is basically
an automatic recognition when the feed changes, instead of using fixed
intervals (hourly, daily, every minute, ...). It looks for changes,
saves them in a database and makes calculations for the next run. So,
very active feeds are pulled frequently and some others just once in a
while.
So you mean that you try to fetch content every 5 minutes and if there is no
change, you change the period to 10 mn and so on ???
I thought using the RSS specification as some feeds provides information on
rate change. See : RSS 2.0 Specification (Current)
great, thank you so much again - I will look for some java/github
tutorials, books and videos to get up to date.
So you mean that you try to fetch content every 5 minutes and if there is no
change, you change the period to 10 mn and so on ???
Thats what I used to do. Double the time (if no new posts) and then
take half of time (if new posts). But thats not working with real
bloggers. Nowadays I save the last seven days, with the corresponding
hours. And then I decide when to fetch the feed again. On some blogs
you see activity only during weekdays from 9am to 5pm in their
timezone.Other autogenerated blogs may fill up the RSS feed every 5
minutes. It depends. I guess this will be way easier to implement
here, since you already save the timestamp of the new link and
Elasticsearch could give us some nice facetting over the last 7 days /
24 hours. Then we would only need one more field for the timestamp of
the next fetch date. Its not that I save bandwith or computing time
with such a strategy, but if we thiink of million of rss feeds it
might matter.
great, thank you so much again - I will look for some java/github
tutorials, books and videos to get up to date.
So you mean that you try to fetch content every 5 minutes and if there is no
change, you change the period to 10 mn and so on ???
Thats what I used to do. Double the time (if no new posts) and then
take half of time (if new posts). But thats not working with real
bloggers. Nowadays I save the last seven days, with the corresponding
hours. And then I decide when to fetch the feed again. On some blogs
you see activity only during weekdays from 9am to 5pm in their
timezone.Other autogenerated blogs may fill up the RSS feed every 5
minutes. It depends. I guess this will be way easier to implement
here, since you already save the timestamp of the new link and
Elasticsearch could give us some nice facetting over the last 7 days /
24 hours. Then we would only need one more field for the timestamp of
the next fetch date. Its not that I save bandwith or computing time
with such a strategy, but if we thiink of million of rss feeds it
might matter.
sorry for taking me so long to reply, but I made my feet wet with
java
I found a great java tutorial on sourceforge and was able to do a lot
of things. I've choosen the free version of IntelliJ IDEA, but will
also try out eclipse. In java I find myself just typing one or two
letters and using the autocomplete feature a lot. I made also some
speed tests with my old php scripts and java was on average 20x
faster. Unbelievalble It's almost C speed...
I can't believe it, but I really managed to write tests for the
twitter and wikipedia river plugins. I took your mentioned examples
and I finally understood that in the tests we run new node, feed it
with the river data and then wait until the code is triggered.
With a new property like
auto_adjust_time:true/false
we can implement what you described here.
What do you think ?
Yes, something like this would be great. I will have a look again in
my old php code, since a lot changes with the object oriented way to
do things.
With a new property like
auto_adjust_time:true/false
we can implement what you described here.
What do you think ?
Yes, something like this would be great. I will have a look again in
my old php code, since a lot changes with the object oriented way to
do things.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.