Hello,
I am using workplace search, and I want now to create a customised source.
I explain my project.
It consists of a search of documents (PDFs) that need to have a label with their topic.
So I try to download this program GitHub - elastic/workplace-search-ruby: Elastic Workplace Search Official Ruby Client, but I do not know how it works.
On the other hand, I tried to use FSCrawler for ingesting the pdf document, but I neither am able to do it.
What of these both options is better?
I am using an Ubuntu 20.04.01 and the elastic version 7.9.1
For FSCrawler, remember that it is not an Elastic product, so this forum probably isn't the right place to seek support for it. However, I can tell you that Workplace Search support in FSCrawler is still unmerged. You can follow its progress here: https://github.com/dadoonet/fscrawler/issues/723, and I am sure that the owner of the project would be happy to assist you with any issues you file.
The Ruby Client, which you've linked there, has instructions for usage in the README. For ingesting documents, you'd be particularly interested in this section. Is there a particular piece that you don't understand how to use?
I have just indexed some PDFs in Linux and I will go to configureFScrawler in Kibana just to know how to do it.
I had some problems (That I have just resulted) writting the path in the YAML file
Hi David,
I have just installed Intellj and I am trying to clone the project (fscrawler-master), but I do not how.
jorge@ubuntu:~/Escritorio/Scripts$ git clone git@github.com:dadoonet/fscrawler.git
Clonando en 'fscrawler'...
Warning: Permanently added the RSA host key for IP address '*******' to the list of known hosts.
git@github.com: Permission denied (publickey).
fatal: No se pudo leer del repositorio remote.
Por favor asegúrate que tienes los permisos de acceso correctos
y que el repositorio existe.
So I cannot access.
It seems like I do not have the public key
[ERROR] No plugin found for prefix 'docker-compose' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (/root/.m2/repository), central (https://repo.maven.apache.org/maven2)] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoPluginFoundForPrefixException
But it is still working with elastic search but not with workplace.
Secondly, I tried to use the program that you upload to the internet but I am still having problem with docker -compose:
it shows this error:
jorge@ubuntu:~/Escritorio/Proyectos/fscrawler/contrib/docker-compose-example$ docker-compose up
ERROR: Named volume "path_to_files_to_scan:/usr/app/data:ro" is used in service "fscrawler" but no declaration was found in the volumes section.
I read the file but I am not sure if I have to change any data of it.
Thank you very much again, I am finding a nice help here
But I understand that for using it to make fscrawler work with workplace I have to download the project, and is where I have the problem with the docker -compose
jorge@ubuntu:~/Escritorio/FSCRAWLER/FSCrawler Workplace/fscrawler-es7-2.7-SNAPSHOT/bin$ ./fscrawler /home/jorge/Escritorio/FSCRAWLER/FSCrawler/resum
17:39:49,299 INFO [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [129.8mb/2.8gb=4.44%], RAM [4.4gb/11.7gb=37.55%], Swap [1.9gb/1.9gb=100.0%].
17:39:49,718 INFO [f.p.e.c.f.c.FsCrawlerCli] Workplace Search integration is an experimental feature. As is it is not fully implemented and settings might change in the future.
17:39:49,719 WARN [f.p.e.c.f.c.FsCrawlerCli] Workplace Search integration does not support yet watching a directory. It will be able to run only once and exit. We manually force from --loop -1 to --loop 1. If you want to remove this message next time, please start FSCrawler with --loop 1
17:39:49,740 INFO [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
17:39:50,471 INFO [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.9.2
17:39:50,872 INFO [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.9.2
17:39:50,891 INFO [f.p.e.c.f.FsParserAbstract] FS crawler started for [resum] for [/home/jorge/Escritorio/FSCRAWLER/FSCrawler/Ficheros] every [1m]
17:39:51,015 INFO [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
17:39:51,096 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [resum] stopped
so because the program is not able to upload one folder, I am going to try with one file.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.