ElasticSearch and Mahout

Adam_Estrada · August 18, 2011, 6:04pm

Has anyone been successful in grabbing tokenized fields from ES to go
in to Mahouts format?

You can point to an index using $MAHOUT_HOME/bin/mahout lucene.vector
but the possibility to have multiple indexes is pretty high in ES so
what is the best strategy for doing this? Can anyone give me a good
starting point to go from?

Thanks,
Adam

AGuereca · August 23, 2011, 4:31pm

Hi Adam,

Just curious, did you find a working approach for this?

Thanks,
~AG

On Aug 18, 1:04 pm, Adam Estrada estrada.a...@gmail.com wrote:

Has anyone been successful in grabbing tokenized fields from ES to go
in to Mahouts format?

You can point to an index using $MAHOUT_HOME/bin/mahoutlucene.vector
but the possibility to have multiple indexes is pretty high in ES so
what is the best strategy for doing this? Can anyone give me a good
starting point to go from?

Thanks,
Adam

Adam_Estrada · August 23, 2011, 10:07pm

I have not but I assume (after watching one of Shay's videos) that the
main index or (0) is where the data I need is located.

Adam

On Aug 23, 12:31 pm, AGuereca aguer...@gmail.com wrote:

Hi Adam,

Just curious, did you find a working approach for this?

Thanks,
~AG

On Aug 18, 1:04 pm, Adam Estrada estrada.a...@gmail.com wrote:

Has anyone been successful in grabbing tokenized fields from ES to go
in to Mahouts format?

You can point to an index using $MAHOUT_HOME/bin/mahoutlucene.vector
but the possibility to have multiple indexes is pretty high in ES so
what is the best strategy for doing this? Can anyone give me a good
starting point to go from?

Thanks,
Adam

Tomislav_Poljak · August 29, 2011, 1:43pm

Hi,

2011/8/24 Adam Estrada estrada.adam@gmail.com:

I have not but I assume (after watching one of Shay's videos) that the
main index or (0) is where the data I need is located.

if you don't want to deal with multiple indices (multiple index
shards) you can set number of shards to 1 with

curl -XPUT 'http://localhost:9200/mahout_index/' -d '
index :
number_of_shards : 1
number_of_replicas : 1
'

With the default settings you will find Lucene index which you can
open with Luke, Mahout etc. in
ES/data/elasticsearch/nodes/0/indices/mahout_index/0/index

If you do need multiple indices (multiple index shards), with shared
FS gateway defined you can open all indices in shared gateway location
(one by one and create Mahout vectors from each).

Hope this helps,

Tomislav

Adam

On Aug 23, 12:31 pm, AGuereca aguer...@gmail.com wrote:

Hi Adam,

Just curious, did you find a working approach for this?

Thanks,
~AG

On Aug 18, 1:04 pm, Adam Estrada estrada.a...@gmail.com wrote:

Has anyone been successful in grabbing tokenized fields from ES to go
in to Mahouts format?

You can point to an index using $MAHOUT_HOME/bin/mahoutlucene.vector
but the possibility to have multiple indexes is pretty high in ES so
what is the best strategy for doing this? Can anyone give me a good
starting point to go from?

Thanks,
Adam

Topic		Replies	Views
Elasticsearch integration with mahout! Elasticsearch	3	752	July 6, 2017
Does elasticsearch index file can be used in Mahout? Elasticsearch	2	551	January 2, 2018
Lucene index generated by Elasticsearch as input to Mahout Elasticsearch	1	371	July 6, 2017
ES indexing strategy Elasticsearch	4	3084	July 5, 2017
Types and Indices. One to one? Elasticsearch	5	1978	July 6, 2017

ElasticSearch and Mahout

Related Topics