Attachments from ECM to Elastic Search using Java API

PARAG_BHAGWAT · February 25, 2017, 10:44pm

Hello,

I have done basic indexing in Elastic search using the JAVA api. I am using Elastic Search 5.1.1. I am really new to this so sorry if this is a basic question but this is what i am trying to do

I installed the ingest-attachment plugin using the command tool
We have an attachment storage for now lets assume i can read document in bytes or have access to the location where tthe document is stored like c:\temp\transactionid.pdf

The question is now how do i tie all of this together? How do i use the Java transport client and Index API (PUT) to submit this attachment to an index named attachment_index so it becomes searchable?

Thanks in advance.

Thanks,

dadoonet · February 25, 2017, 11:06pm

You need to first create a pipeline which uses ingest-attachment processor.
Then simply index a document as usual but with a pipeline. In this document, encode the binary content into BASE64 and add this to a field of your doc.

FYI you can also look at FSCrawler project in case it can help.

PARAG_BHAGWAT · February 25, 2017, 11:58pm

Thanks David. I will take a look at FSCrawler. Where i am getting stuck is calling the pipeline through Java. Here's what i have done till now

I created an mapping as given in documentation through the Kibana developer kit.

PUT _ingest/pipeline/ecm_attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}

Now i want to use this pipeline in my Java client during my client.prepareIndex call. Is there a way to specify which pipeline to use as well when sending the data? I will be using Apache commons BASE64 encoder to encode the file.

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#encodeBase64(byte[])

Thanks,
Parag Bhagwat

dadoonet · February 26, 2017, 12:13am

From the top of my head it should be something like:

client.prepareIndex(...).source(json).pipeline("pipeline")

system · March 26, 2017, 12:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingest Attachment Plugin Elasticsearch	2	485	August 24, 2017
Search froma a pdf file content Elasticsearch	9	484	July 23, 2020
How to index a file with elasticsearch 5.5.1 Elasticsearch	22	7960	September 1, 2017
Hot to represent pipeline in Java API for ingest-attachment? Elasticsearch	3	1445	December 27, 2016
Recommended workflow for indexing many binary docs Elasticsearch	4	768	July 6, 2021

Attachments from ECM to Elastic Search using Java API

Related topics