How to create tokens from the text data

Hi guys, I have converted the pdf into base64 format and now i want to convert the text data into tokens. Can you please guide me.

What is the use case?

I have to extract the data from the resume

What do you want to do with the extracted data?

This is the small task assigned to me. Further it will be used for analysis by the other team.

What kind of analysis?

Anyway, you can use the _analyze API and use its output. With that you might be able to do the rest.

Yes, I used _analyze it takes this two parameters "tokenizer": "", "text": "".
But my approach is different.

  1. Convert pdf into base64 to read the text data present in pdf.
  2. Filter the stop words and convert the text data into tokens.

Yes. That's what I wrote. Use the _analyze API to do step 2.

Okay Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.