Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.
Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.
I haven't tried it yet, but there is a plugin, see:
I use the plugin mentioned by Rui in our production environment and
most files work as expected. It uses Tika internally. Installation
into elasticsearch is pretty easy. Stop your elasticsearch server and
run the following command in your elasticsearch folder:
./bin/plugin install mapper-attachments
Then restart the server and push your documents to the index (make
sure you have enough memory for large documents).
Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.
I use the plugin mentioned by Rui in our production environment and
most files work as expected. It uses Tika internally. Installation
into elasticsearch is pretty easy. Stop your elasticsearch server and
run the following command in your elasticsearch folder:
./bin/plugin install mapper-attachments
Then restart the server and push your documents to the index (make
sure you have enough memory for large documents).
Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.