Attachments from ECM to Elastic Search using Java API

Hello,

I have done basic indexing in Elastic search using the JAVA api. I am using Elastic Search 5.1.1. I am really new to this so sorry if this is a basic question but this is what i am trying to do

  1. I installed the ingest-attachment plugin using the command tool
  2. We have an attachment storage for now lets assume i can read document in bytes or have access to the location where tthe document is stored like c:\temp\transactionid.pdf

The question is now how do i tie all of this together? How do i use the Java transport client and Index API (PUT) to submit this attachment to an index named attachment_index so it becomes searchable?

Thanks in advance.

Thanks,

You need to first create a pipeline which uses ingest-attachment processor.
Then simply index a document as usual but with a pipeline. In this document, encode the binary content into BASE64 and add this to a field of your doc.

FYI you can also look at FSCrawler project in case it can help.

Thanks David. I will take a look at FSCrawler. Where i am getting stuck is calling the pipeline through Java. Here's what i have done till now

I created an mapping as given in documentation through the Kibana developer kit.

PUT _ingest/pipeline/ecm_attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}

Now i want to use this pipeline in my Java client during my client.prepareIndex call. Is there a way to specify which pipeline to use as well when sending the data? I will be using Apache commons BASE64 encoder to encode the file.

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#encodeBase64(byte[])

Thanks,
Parag Bhagwat

From the top of my head it should be something like:

client.prepareIndex(...).source(json).pipeline("pipeline")

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.