ElasticSearch and Mahout


(Adam Estrada) #1

Has anyone been successful in grabbing tokenized fields from ES to go
in to Mahouts format?

You can point to an index using $MAHOUT_HOME/bin/mahout lucene.vector
but the possibility to have multiple indexes is pretty high in ES so
what is the best strategy for doing this? Can anyone give me a good
starting point to go from?

Thanks,
Adam


(AGuereca) #2

Hi Adam,

Just curious, did you find a working approach for this?

Thanks,
~AG

On Aug 18, 1:04 pm, Adam Estrada estrada.a...@gmail.com wrote:

Has anyone been successful in grabbing tokenized fields from ES to go
in to Mahouts format?

You can point to an index using $MAHOUT_HOME/bin/mahoutlucene.vector
but the possibility to have multiple indexes is pretty high in ES so
what is the best strategy for doing this? Can anyone give me a good
starting point to go from?

Thanks,
Adam


(Adam Estrada) #3

I have not but I assume (after watching one of Shay's videos) that the
main index or (0) is where the data I need is located.

Adam

On Aug 23, 12:31 pm, AGuereca aguer...@gmail.com wrote:

Hi Adam,

Just curious, did you find a working approach for this?

Thanks,
~AG

On Aug 18, 1:04 pm, Adam Estrada estrada.a...@gmail.com wrote:

Has anyone been successful in grabbing tokenized fields from ES to go
in to Mahouts format?

You can point to an index using $MAHOUT_HOME/bin/mahoutlucene.vector
but the possibility to have multiple indexes is pretty high in ES so
what is the best strategy for doing this? Can anyone give me a good
starting point to go from?

Thanks,
Adam


(Tomislav Poljak) #4

Hi,

2011/8/24 Adam Estrada estrada.adam@gmail.com:

I have not but I assume (after watching one of Shay's videos) that the
main index or (0) is where the data I need is located.

if you don't want to deal with multiple indices (multiple index
shards) you can set number of shards to 1 with

curl -XPUT 'http://localhost:9200/mahout_index/' -d '
index :
number_of_shards : 1
number_of_replicas : 1
'

With the default settings you will find Lucene index which you can
open with Luke, Mahout etc. in
ES/data/elasticsearch/nodes/0/indices/mahout_index/0/index

If you do need multiple indices (multiple index shards), with shared
FS gateway defined you can open all indices in shared gateway location
(one by one and create Mahout vectors from each).

Hope this helps,

Tomislav

Adam

On Aug 23, 12:31 pm, AGuereca aguer...@gmail.com wrote:

Hi Adam,

Just curious, did you find a working approach for this?

Thanks,
~AG

On Aug 18, 1:04 pm, Adam Estrada estrada.a...@gmail.com wrote:

Has anyone been successful in grabbing tokenized fields from ES to go
in to Mahouts format?

You can point to an index using $MAHOUT_HOME/bin/mahoutlucene.vector
but the possibility to have multiple indexes is pretty high in ES so
what is the best strategy for doing this? Can anyone give me a good
starting point to go from?

Thanks,
Adam


(system) #5