What is the curl command to convert pdf into base64 format?

I want to convert my pdf to base64.
i am using below code but it is giving me an error:

curl -XPOST "http://localhost:9200/test/xmlfile?pretty=1" -d '
{
"attachment" : "' base64 /path/filename | perl -pe 's/\n/\\n/g' '"
}'

Hi,

NOTE: Please say "Hi / Hello", "Thank you" in your message to optimize your chance of response.

What do you want to do exactly ? The code you are given seems to be a Linux command. You
can run it into a Shell, but it's not a Json command, isn't it ?

bye
Xavier

You need to transform your file binary to BASE64 before sending it to elasticsearch.
This is to be made before calling elasticsearch.

You can do that with

Or you can do that using some linux commands like base64.
Or by writing some code like here:

You can also have a look at FSCrawler project. It has an upload endpoint where you can directly upload your binary document to elasticsearch. See https://fscrawler.readthedocs.io/en/latest/admin/fs/rest.html#uploading-a-binary-document

Hi,

Thanks for your reply!!!!:slightly_smiling_face:
I have written JavaScript code to transform .pdf file to BASE64. I am getting value for DATA field that needs to be passed. but can i pass more than one data to ES? So currently i am indexing only one pdf document. I want to index more than one pdf document. so how can i pass it using below code?

PUT my_index/_doc/my_id?pipeline=attachment
{
"data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}

Thanks,
Priyanka

Why do you want to index documents together and not individually? Are they related?

To answer your question, you can define multiple attachment processors within the same pipeline.

Hi,

Thanks for your reply!!!
Yes, I want to indexed documents together. Because it is our business requirement.
after creating pipeline and passing data value to new index,when you create index pattern and discover it, you can see one pdf file that is indexed. I want more pdf indexed record under one index pattern. And want to search it.

Thanks,
Priyanka

Hello @dadoonet,

Thanks for your help!!!
As per reply i tried for multiple attachment processors within the same pipeline. It is indexing documents together. But when i create index pattern and discover it, it is giving me one single record even if i have indexed 3 documents in one pipeline. If i have indexed 3 documents, i will be getting 3 different records. Correct me if i am wrong.

Thanks,
Priyanka

So you won't get back one document when you search but an array of documents? Meaning that the user will have to guess in which document the text has been found.

Is that what you really want?

Hello,

Yes, like google search. if user searches any word from attachment file, then it should give in which document the text has been found.

Thanks,
Priyanka

This won't be possible if you index an array of attachments. You need to index attachments individually.

Hi @dadoonet ,

Thanks for your reply!!!!

If I indexed attachments individually every time, I have to create a new index. I want all the indexed attachments in one index only. So that I can see all the documents as a separate record and search through it.

Thanks,
Priyanka

No. All documents will go to the same index.

Hi @dadoonet,

Thanks for reply!!!
Could you please suggest me how I can indexed multiple documents with same index as I cannot use multiple attachment processors?

Thanks,
Priyanka

Like this:

PUT my_index/_doc/1?pipeline=attachment
{
  "data": "BASE64-doc1"
}
PUT my_index/_doc/2?pipeline=attachment
{
  "data": "BASE64-doc2"
}

Hi @dadoonet,

Thanks for your quick help!!!!! :slight_smile:

This solves my problem.

Thanks,
Priyanka

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.