Indexing office documents

Slava_G · June 7, 2011, 7:10am

Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.

Thank You and Best Regards.

Rui_Lopes · June 7, 2011, 8:36am

On Tue, Jun 7, 2011 at 08:10, slavag slavago@gmail.com wrote:

Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.

I haven't tried it yet, but there is a plugin, see:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Best regards,
Rui Lopes

ruflin_2 · June 7, 2011, 8:43am

I use the plugin mentioned by Rui in our production environment and
most files work as expected. It uses Tika internally. Installation
into elasticsearch is pretty easy. Stop your elasticsearch server and
run the following command in your elasticsearch folder:

./bin/plugin install mapper-attachments

Then restart the server and push your documents to the index (make
sure you have enough memory for large documents).

On Jun 7, 9:10 am, slavag slav...@gmail.com wrote:

Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.

Thank You and Best Regards.

Slava_G · June 7, 2011, 8:09pm

Thank You for helpful replies, I'm going to try this.

On Jun 7, 11:43 am, ruflin ruf...@gmail.com wrote:

I use the plugin mentioned by Rui in our production environment and
most files work as expected. It uses Tika internally. Installation
into elasticsearch is pretty easy. Stop your elasticsearch server and
run the following command in your elasticsearch folder:

./bin/plugin install mapper-attachments

Then restart the server and push your documents to the index (make
sure you have enough memory for large documents).

On Jun 7, 9:10 am, slavag slav...@gmail.com wrote:

Hi,
Is there any convenient way to index office documents rather then
parse them by-myself (using Tika or Aperture) and the to feed the
elasticsearch with the parsed data ? If yes, some reference to java
API will be very helpful.

Thank You and Best Regards.

fashionalwallet · June 10, 2011, 12:30am

deleted -

Topic		Replies	Views
Parsing and indexing documents with Apache Tika Elasticsearch	11	19979	July 5, 2017
I'm trying to parse and index .doc files into elasticsearch with apache Tika Elasticsearch	2	488	March 16, 2017
Indexing all pdfs within a folder Elasticsearch	2	462	December 12, 2018
How to index text files (pdf, doc, txt...) in Java? Elasticsearch	6	2629	January 18, 2023
Problems indexing attachments using attachment mapping Elasticsearch	3	1504	July 6, 2017

Indexing office documents

Related topics