Lucene index generated by Elasticsearch as input to Mahout


(pitaga) #1

I'm trying to use a Lucene index generated by Elasticsearch as input to Mahout. The response to the command

mahout lucene.vector -d /var/lib/elasticsearch/elasticsearch/nodes/0/indices/testindex8/0/index -f body -o /usr/share/mahout/foo.txt -t /usr/share/mahout/bar.txt

includes the error message

Exception in thread "main" java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene41]

body is specified in the mapping with "term_vector" set as "yes". I tried replacing lucene-core-4.6.1.jar in the directory under mahout with a copy of lucene-core-4.6.1.jar from the directory under elasticsearch, without effect on the response to the command above.

I'd appreciate any guidance on how to get around this obstacle. I'd rather not get into a customized build of either Elasticsearch or Mahout, especially if the Lucene index that elasticsearch generates is perfectly ok for Mahout except for occurrences of literal names. I've skimmed the mahout command source, and experimented unsuccessfully with command line variants. My Java is rusty at this point. I may be missing a simple classpath manipulation.


(system) #2