Index PDF in Elastic App Search

JorgeL-TI · September 13, 2020, 9:28pm

Hello,

I am using elastic App Search, and I want to index some PDF Documents with data such as title, writer, etc.

Is there any option for index them directly (I know FSCRAWLER for Elastic Search)??

My second option is to codify a Ruby program using gems that can make transform the pdf data in a Json document.

I think to use Apache tika but I do not know if there are better options for doing it.

Thank you for your help.

dadoonet · September 14, 2020, 6:07am

There's a PR coming to connect to Workplace Search. See

It should not be super hard to implement then a similar thing for AppSearch.

But do you really want AppSearch or Workplace Search ?

JorgeL-TI · September 16, 2020, 3:04pm

Mi idea is index some PDFs and can search one using a label that says about the topic.

For example a library with a science book, If I interest in learn about frogs, I just want to look for documents with those labels.

My idea is a search in this library, trying to look for a document that has a specific label

Thank you very much.

dadoonet · September 16, 2020, 6:32pm

I feel Workplace Search fitting more naturally to this use case.

JorgeL-TI · September 19, 2020, 2:33pm

Yes, I think so, but workplace search needs a month suscriptition for using it so it is why I choose elastic app search.
I see your point and yes, I think Workplace could be more useful in this case.

stephenb · September 19, 2020, 3:27pm

Some Workplace Search capabilities are included in the Basic / Free license.

JorgeL-TI · September 22, 2020, 12:04pm

Hello,
I am trying to install WorkPlace search and it shows me the next message: Workplace Search requires Platinum features of the Elastic Stack. Starting a trial enables the full product functionality for 30 days. Learn more about Elastic Stack licenses.
So I understand it is not.
I am Installing it In an Ubuntu 20.04, and I have the version 7.8.1

Thank you so much

stephenb · September 23, 2020, 2:15am

How did you install the stack and Enterprise Search and where does that message come from?

JorgeL-TI · September 29, 2020, 5:59pm

I have installed it local in an ubuntu 20.04.
I run ElasticSearch, and I run Enterprise search.
when I log in in enterprise search it shows me the two options:

That is a link that shows my computer screen

Thank you very much

stephenb · September 29, 2020, 6:56pm

Version 7.8.1 ... hmmm Any chance you can try a clean install with 7.9.2?

JorgeL-TI · September 29, 2020, 7:12pm

I try so.

Thank you and I write my experiences with it them jejeje

JorgeL-TI · October 1, 2020, 3:55pm

Hello,

I figured out how to run workplace search in ubuntu, son now I want to index some PDF, if I use FSCRAWLER, I will need a windows machine so I need to reach my Ubuntu machine from outside it.

My second option is using other source like google drive or one drive.

Thank you very much

dadoonet · October 1, 2020, 5:04pm

Why this?

JorgeL-TI · October 2, 2020, 3:02pm

because as I know FSCRAWLER doesn´t work in linux.

I was be able to run it in windows.

what I saw yesterday was workplace-search-ruby-master, that I can use for index some documents, but what I see is that I can´t index PDFs directly, I need to parser them to json first.

The best solution for my project is that I can index PDFs directly, and create a field "label", I don't know if is it possible.

Thank you very much

dadoonet · October 2, 2020, 3:26pm

Source?

Sorry but I think this is wrong.

JorgeL-TI · October 2, 2020, 3:37pm

ok I will try to use it and I tell you what I get.

Thank you very much

system · October 30, 2020, 3:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to index the PDF documents Elasticsearch	9	12067	April 17, 2023
Is it possible to index Files (PDF, DOC, PPT) using App Search? Elastic Search	5	1295	November 4, 2022
How to index PDF file data and search data from attachment PDF file Elastic Search elastic-app-search	7	7779	March 29, 2021
How do I get my indexes of pdf documents into enterprise search? Elastic Search	3	549	November 4, 2022
Hello, I am a newbie . I am looking for a solution where I can search with keywords from millions of pdfs Elastic Search elastic-workplace-search	2	329	January 2, 2023

Index PDF in Elastic App Search

Related topics