Elastic Search with Mahout


(sam-2) #1

Hi,

I am facing problem integrating Elastic Search with Mahout because in
ES am not able to control the format in which the files are indexed.
For example in Solr or Lucene, following links
http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
says that you can add term vector and gives a command which would
create Mahout documents from search indices.

Say if I want to trigger an event, that whenever a record is added to
Elastic Search engine it also gets added to Mahout, Mahout actually
doesn't understand the indexed files. Is it possible to index the
records in ES in another format like csv, json, etc? Since ES has
emerged from Lucene, I believe there should be a way to do it.

I would really appreciate any information on this topic.

Many Thanks!
Sambodhi


(ofavre) #2

Aren't rivers what you need?
I didn't fully get what they are, but I see them as a way to plug another
source/end to ES.
If I'm right, you could write a river that would notify Mahout of any new
documents indexed in ES.

I scanned your page quickly, but I think it is what is done with Solr.

--
Olivier Favre

2011/5/23 sam sambodhi.sagi@gmail.com

Hi,

I am facing problem integrating Elastic Search with Mahout because in
ES am not able to control the format in which the files are indexed.
For example in Solr or Lucene, following links

http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
says that you can add term vector and gives a command which would
create Mahout documents from search indices.

Say if I want to trigger an event, that whenever a record is added to
Elastic Search engine it also gets added to Mahout, Mahout actually
doesn't understand the indexed files. Is it possible to index the
records in ES in another format like csv, json, etc? Since ES has
emerged from Lucene, I believe there should be a way to do it.

I would really appreciate any information on this topic.

Many Thanks!
Sambodhi


(Berkay Mollamustafaoglu-2) #3

Docs are stored as Json in ES already. All you need to do is execute search
in ES, get the source from the docs as json, and then push the data to
Mahout.
Have you created in index in ES and run queries against it?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, May 23, 2011 at 10:17 AM, Olivier Favre olivier@yakaz.com wrote:

Aren't rivers what you need?
I didn't fully get what they are, but I see them as a way to plug another
source/end to ES.
If I'm right, you could write a river that would notify Mahout of any new
documents indexed in ES.

I scanned your page quickly, but I think it is what is done with Solr.

--
Olivier Favre

www.yakaz.com

2011/5/23 sam sambodhi.sagi@gmail.com

Hi,

I am facing problem integrating Elastic Search with Mahout because in
ES am not able to control the format in which the files are indexed.
For example in Solr or Lucene, following links

http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
says that you can add term vector and gives a command which would
create Mahout documents from search indices.

Say if I want to trigger an event, that whenever a record is added to
Elastic Search engine it also gets added to Mahout, Mahout actually
doesn't understand the indexed files. Is it possible to index the
records in ES in another format like csv, json, etc? Since ES has
emerged from Lucene, I believe there should be a way to do it.

I would really appreciate any information on this topic.

Many Thanks!
Sambodhi


(system) #4