Indexing office documents


(Slava G ) #1

Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.

Thank You and Best Regards.


(Rui Lopes) #2

On Tue, Jun 7, 2011 at 08:10, slavag slavago@gmail.com wrote:

Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.

I haven't tried it yet, but there is a plugin, see:

http://www.elasticsearch.org/guide/reference/mapping/attachment-type.html

Best regards,
Rui Lopes


(ruflin-2) #3

I use the plugin mentioned by Rui in our production environment and
most files work as expected. It uses Tika internally. Installation
into elasticsearch is pretty easy. Stop your elasticsearch server and
run the following command in your elasticsearch folder:

./bin/plugin install mapper-attachments

Then restart the server and push your documents to the index (make
sure you have enough memory for large documents).

On Jun 7, 9:10 am, slavag slav...@gmail.com wrote:

Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.

Thank You and Best Regards.


(Slava G ) #4

Thank You for helpful replies, I'm going to try this.

On Jun 7, 11:43 am, ruflin ruf...@gmail.com wrote:

I use the plugin mentioned by Rui in our production environment and
most files work as expected. It uses Tika internally. Installation
into elasticsearch is pretty easy. Stop your elasticsearch server and
run the following command in your elasticsearch folder:

./bin/plugin install mapper-attachments

Then restart the server and push your documents to the index (make
sure you have enough memory for large documents).

On Jun 7, 9:10 am, slavag slav...@gmail.com wrote:

Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.

Thank You and Best Regards.


(fashionalwallet) #5
  • deleted -

(system) #6