What is the curl command to convert pdf into base64 format?


(Priyanka Suresh Yerunkar) #1

I want to convert my pdf to base64.
i am using below code but it is giving me an error:

curl -XPOST "http://localhost:9200/test/xmlfile?pretty=1" -d '
{
"attachment" : "' base64 /path/filename | perl -pe 's/\n/\\n/g' '"
}'


(Xavier Facq) #2

Hi,

NOTE: Please say "Hi / Hello", "Thank you" in your message to optimize your chance of response.

What do you want to do exactly ? The code you are given seems to be a Linux command. You
can run it into a Shell, but it's not a Json command, isn't it ?

bye
Xavier


(David Pilato) #4

You need to transform your file binary to BASE64 before sending it to elasticsearch.
This is to be made before calling elasticsearch.

You can do that with

Or you can do that using some linux commands like base64.
Or by writing some code like here:

You can also have a look at FSCrawler project. It has an upload endpoint where you can directly upload your binary document to elasticsearch. See https://fscrawler.readthedocs.io/en/latest/admin/fs/rest.html#uploading-a-binary-document


(Priyanka Suresh Yerunkar) #5

Hi,

Thanks for your reply!!!!:slightly_smiling_face:
I have written JavaScript code to transform .pdf file to BASE64. I am getting value for DATA field that needs to be passed. but can i pass more than one data to ES? So currently i am indexing only one pdf document. I want to index more than one pdf document. so how can i pass it using below code?

PUT my_index/_doc/my_id?pipeline=attachment
{
"data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}

Thanks,
Priyanka


(David Pilato) #6

Why do you want to index documents together and not individually? Are they related?

To answer your question, you can define multiple attachment processors within the same pipeline.


(Priyanka Suresh Yerunkar) #7

Hi,

Thanks for your reply!!!
Yes, I want to indexed documents together. Because it is our business requirement.
after creating pipeline and passing data value to new index,when you create index pattern and discover it, you can see one pdf file that is indexed. I want more pdf indexed record under one index pattern. And want to search it.

Thanks,
Priyanka


(Priyanka Suresh Yerunkar) #8

Hello @dadoonet,

Thanks for your help!!!
As per reply i tried for multiple attachment processors within the same pipeline. It is indexing documents together. But when i create index pattern and discover it, it is giving me one single record even if i have indexed 3 documents in one pipeline. If i have indexed 3 documents, i will be getting 3 different records. Correct me if i am wrong.

Thanks,
Priyanka


(David Pilato) #9

So you won't get back one document when you search but an array of documents? Meaning that the user will have to guess in which document the text has been found.

Is that what you really want?


(Priyanka Suresh Yerunkar) #10

Hello,

Yes, like google search. if user searches any word from attachment file, then it should give in which document the text has been found.

Thanks,
Priyanka


(David Pilato) #11

This won't be possible if you index an array of attachments. You need to index attachments individually.


(Priyanka Suresh Yerunkar) #12

Hi @dadoonet ,

Thanks for your reply!!!!

If I indexed attachments individually every time, I have to create a new index. I want all the indexed attachments in one index only. So that I can see all the documents as a separate record and search through it.

Thanks,
Priyanka


(David Pilato) #13

No. All documents will go to the same index.


(Priyanka Suresh Yerunkar) #14

Hi @dadoonet,

Thanks for reply!!!
Could you please suggest me how I can indexed multiple documents with same index as I cannot use multiple attachment processors?

Thanks,
Priyanka


(David Pilato) #15

Like this:

PUT my_index/_doc/1?pipeline=attachment
{
  "data": "BASE64-doc1"
}
PUT my_index/_doc/2?pipeline=attachment
{
  "data": "BASE64-doc2"
}

(Priyanka Suresh Yerunkar) #16

Hi @dadoonet,

Thanks for your quick help!!!!! :slight_smile:

This solves my problem.

Thanks,
Priyanka