For those that are not regulars on the mailing list, I am a fairly active
member that has used Elasticsearch for years.
I am leaving my full-time job to focus on other (techie and non-techie)
goals and would love to work on some interesting projects part-time. It can
be either paid assignments or free open-source projects. My main interests
are search with a focus on development. Not too keen on devops tasks such
as administering servers. I would rather work on my own stuff than be a
sysadmin.
We could always use help with CirrisSearch. It is the open source project
that links MediaWiki to Elasticsearch. We have it installed on all the
wikis at the wikimedia foundation but it isn't the default search backend
on the largest ones yet.
"Selling" points:
Huge user community
Basic queries work reasonably well
Expert syntax to support power users
PHP
Elastica
I manage the elasticsearch installation
I contribute changes we need upstream
Uses customized highlighter (also needs contributors)
Reasonably easy development installation with vagrant
Working on it is my full time job so review would be quick
For those that are not regulars on the mailing list, I am a fairly active
member that has used Elasticsearch for years.
I am leaving my full-time job to focus on other (techie and non-techie)
goals and would love to work on some interesting projects part-time. It can
be either paid assignments or free open-source projects. My main interests
are search with a focus on development. Not too keen on devops tasks such
as administering servers. I would rather work on my own stuff than be a
sysadmin.
If you want to lend a hand for interesting projects, here are some of my
current favorites:
building a global library catalog index with Elasticsearch of all the
open data / metadata on academic library servers, complete with harvester
and updater, SRU, OAI etc. A starting point for SRU implementation is https://github.com/xbib/elasticsearch-sru
implementing a plugin for Elasticsearch that turns ES into a W3C Linked
Data Platform Linked Data Platform 1.0 Primer,
with HTTP PATCH support, JSON Patch RFC 6902, maybe even a Sparql-to-ES DSL
translator
a harvester/pull plugin framework for ES, in order to supersede the river
singleton concept, with provisioning for all kind of different sources,
e.g. JDBC, or web crawling
helping British Library labs to find correct image legend texts in OCR
XML from the book scanning project. See Millions of historical images posted to Flickr - BBC News I think Elasticsearch can
handle the 230G zipped input. I got a copy from BL. No good algorithm
exists yet. Maybe with ES? First step would be to design an index and to
index/publish the OCR for better search?
Not sure where the incentives are. Ever lasting fame, honor, glory, world
domination, super power etc.
Jörg
On Wed, Sep 3, 2014 at 1:47 AM, Nikolas Everett nik9000@gmail.com wrote:
We could always use help with CirrisSearch. It is the open source project
that links MediaWiki to Elasticsearch. We have it installed on all the
wikis at the wikimedia foundation but it isn't the default search backend
on the largest ones yet.
"Selling" points:
Huge user community
Basic queries work reasonably well
Expert syntax to support power users
PHP
Elastica
I manage the elasticsearch installation
I contribute changes we need upstream
Uses customized highlighter (also needs contributors)
Reasonably easy development installation with vagrant
Working on it is my full time job so review would be quick
For those that are not regulars on the mailing list, I am a fairly active
member that has used Elasticsearch for years.
I am leaving my full-time job to focus on other (techie and non-techie)
goals and would love to work on some interesting projects part-time. It can
be either paid assignments or free open-source projects. My main interests
are search with a focus on development. Not too keen on devops tasks such
as administering servers. I would rather work on my own stuff than be a
sysadmin.
The incentives for an open-source project is to pad my resume since I have
been working with obsolete technologies and processes for almost the past
three years. I implemented many changes at my company (Elasticsearch,
Maven, central logging, application-level monitoring), but there is only so
much one person can do. Plus, I love this stuff. The incentives for simply
contracting is purely money! Do not really need the cash, but I plan to
embark on some travels and it would easy my mind a bit.
Your project list reminds me of a project I have been working on, but I
could use some help. I am looking for datasets that also include example
queries and golden records for those queries. My goal is to test different
similarity algorithms using unknown data. Would love to use the Wikipedia
dump, but I never found any golden records. Perhaps Nik has something. The
only thing I have found are the TREC datasets, but I was hoping for a more
sizable example.
If you want to lend a hand for interesting projects, here are some of my
current favorites:
building a global library catalog index with Elasticsearch of all the
open data / metadata on academic library servers, complete with harvester
and updater, SRU, OAI etc. A starting point for SRU implementation is https://github.com/xbib/elasticsearch-sru
implementing a plugin for Elasticsearch that turns ES into a W3C Linked
Data Platform Linked Data Platform 1.0 Primer,
with HTTP PATCH support, JSON Patch RFC 6902, maybe even a Sparql-to-ES DSL
translator
a harvester/pull plugin framework for ES, in order to supersede the
river singleton concept, with provisioning for all kind of different
sources, e.g. JDBC, or web crawling
helping British Library labs to find correct image legend texts in OCR
XML from the book scanning project. See Millions of historical images posted to Flickr - BBC News I think Elasticsearch can
handle the 230G zipped input. I got a copy from BL. No good algorithm
exists yet. Maybe with ES? First step would be to design an index and to
index/publish the OCR for better search?
Not sure where the incentives are. Ever lasting fame, honor, glory, world
domination, super power etc.
Jörg
On Wed, Sep 3, 2014 at 1:47 AM, Nikolas Everett nik9000@gmail.com wrote:
We could always use help with CirrisSearch. It is the open source project
that links MediaWiki to Elasticsearch. We have it installed on all the
wikis at the wikimedia foundation but it isn't the default search backend
on the largest ones yet.
"Selling" points:
Huge user community
Basic queries work reasonably well
Expert syntax to support power users
PHP
Elastica
I manage the elasticsearch installation
I contribute changes we need upstream
Uses customized highlighter (also needs contributors)
Reasonably easy development installation with vagrant
Working on it is my full time job so review would be quick
For those that are not regulars on the mailing list, I am a fairly
active member that has used Elasticsearch for years.
I am leaving my full-time job to focus on other (techie and non-techie)
goals and would love to work on some interesting projects part-time. It can
be either paid assignments or free open-source projects. My main interests
are search with a focus on development. Not too keen on devops tasks such
as administering servers. I would rather work on my own stuff than be a
sysadmin.
On Thu, Sep 4, 2014 at 12:10 AM, Ivan Brusic ivan@brusic.com wrote:
Thanks Jörg.
The incentives for an open-source project is to pad my resume since I have
been working with obsolete technologies and processes for almost the past
three years. I implemented many changes at my company (Elasticsearch,
Maven, central logging, application-level monitoring), but there is only so
much one person can do. Plus, I love this stuff. The incentives for simply
contracting is purely money! Do not really need the cash, but I plan to
embark on some travels and it would easy my mind a bit.
Your project list reminds me of a project I have been working on, but I
could use some help. I am looking for datasets that also include example
queries and golden records for those queries. My goal is to test different
similarity algorithms using unknown data. Would love to use the Wikipedia
dump, but I never found any golden records. Perhaps Nik has something. The
only thing I have found are the TREC datasets, but I was hoping for a more
sizable example.
Ivan, I'm actually working on something like this (and I don't thing Jorg
actually meant that..). I was involved with Apache Lucene - but its now discontinued and in
some spare time I have I'm trying to take that initiative forward.
Ping me privately if that sounds interesting and we can continue discussing.
I did not realize Jörg's response was to the list and not privately (as
most other responses were). I am thankful that I did not bad mouth my
employer too badly!
I am very aware of the open relevancy project and its discontinued status.
I emailed the Lucene mailing list about it not to long ago. Would love to
work on something in that regard.
Cheers,
Ivan
On Wed, Sep 3, 2014 at 2:16 PM, Itamar Syn-Hershko itamar@code972.com
wrote:
On Thu, Sep 4, 2014 at 12:10 AM, Ivan Brusic ivan@brusic.com wrote:
Thanks Jörg.
The incentives for an open-source project is to pad my resume since I
have been working with obsolete technologies and processes for almost the
past three years. I implemented many changes at my company (Elasticsearch,
Maven, central logging, application-level monitoring), but there is only so
much one person can do. Plus, I love this stuff. The incentives for simply
contracting is purely money! Do not really need the cash, but I plan to
embark on some travels and it would easy my mind a bit.
Your project list reminds me of a project I have been working on, but I
could use some help. I am looking for datasets that also include example
queries and golden records for those queries. My goal is to test different
similarity algorithms using unknown data. Would love to use the Wikipedia
dump, but I never found any golden records. Perhaps Nik has something. The
only thing I have found are the TREC datasets, but I was hoping for a more
sizable example.
Ivan, I'm actually working on something like this (and I don't thing Jorg
actually meant that..). I was involved with Apache Lucene - but its now discontinued and in
some spare time I have I'm trying to take that initiative forward.
Ping me privately if that sounds interesting and we can continue
discussing.
On Thu, Sep 4, 2014 at 12:36 AM, Ivan Brusic ivan@brusic.com wrote:
I did not realize Jörg's response was to the list and not privately (as
most other responses were). I am thankful that I did not bad mouth my
employer too badly!
I am very aware of the open relevancy project and its discontinued status.
I emailed the Lucene mailing list about it not to long ago. Would love to
work on something in that regard.
Cheers,
Ivan
On Wed, Sep 3, 2014 at 2:16 PM, Itamar Syn-Hershko itamar@code972.com
wrote:
On Thu, Sep 4, 2014 at 12:10 AM, Ivan Brusic ivan@brusic.com wrote:
Thanks Jörg.
The incentives for an open-source project is to pad my resume since I
have been working with obsolete technologies and processes for almost the
past three years. I implemented many changes at my company (Elasticsearch,
Maven, central logging, application-level monitoring), but there is only so
much one person can do. Plus, I love this stuff. The incentives for simply
contracting is purely money! Do not really need the cash, but I plan to
embark on some travels and it would easy my mind a bit.
Your project list reminds me of a project I have been working on, but I
could use some help. I am looking for datasets that also include example
queries and golden records for those queries. My goal is to test different
similarity algorithms using unknown data. Would love to use the Wikipedia
dump, but I never found any golden records. Perhaps Nik has something. The
only thing I have found are the TREC datasets, but I was hoping for a more
sizable example.
Ivan, I'm actually working on something like this (and I don't thing Jorg
actually meant that..). I was involved with Apache Lucene - but its now discontinued and in
some spare time I have I'm trying to take that initiative forward.
Ping me privately if that sounds interesting and we can continue
discussing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.