Create custom source elasticWorkplace

I am using workplace search, and I want now to create a customised source.
I explain my project.
It consists of a search of documents (PDFs) that need to have a label with their topic.
So I try to download this program, but I do not know how it works.
On the other hand, I tried to use FSCrawler for ingesting the pdf document, but I neither am able to do it.

What of these both options is better?
I am using an Ubuntu 20.04.01 and the elastic version 7.9.1

Thank you very much

Hi @JorgeL-TI,

Sounds like a cool project.

For FSCrawler, remember that it is not an Elastic product, so this forum probably isn't the right place to seek support for it. However, I can tell you that Workplace Search support in FSCrawler is still unmerged. You can follow its progress here:, and I am sure that the owner of the project would be happy to assist you with any issues you file.

The Ruby Client, which you've linked there, has instructions for usage in the README. For ingesting documents, you'd be particularly interested in this section. Is there a particular piece that you don't understand how to use?


On the other hand, I tried to use FSCrawler for ingesting the pdf document, but I neither am able to do it.

I'll be happy to help. Could you tell more about what you did exactly?

1 Like


I have just indexed some PDFs in Linux and I will go to configureFScrawler in Kibana just to know how to do it.
I had some problems (That I have just resulted) writting the path in the YAML file


Could you share configuration, logs....?

1 Like


I share my screenshot:

My difficult here was configure de Yaml because you have to erase the path.
But It is working with elastic, thank you very much


I don't understand. Is it working?

1 Like

yes, now I have to try in with workplace.
sorry If I did not explain myself properly

Thank you

Is there anything I can help with?

Not now but I keep in touch. Thank you very much
and thanks again for soving my questions

Hi David,
I have just installed Intellj and I am trying to clone the project (fscrawler-master), but I do not how.

jorge@ubuntu:~/Escritorio/Scripts$ git clone
Clonando en 'fscrawler'...
Warning: Permanently added the RSA host key for IP address '*******' to the list of known hosts. Permission denied (publickey).
fatal: No se pudo leer del repositorio remote. 
Por favor asegúrate que tienes los permisos de acceso correctos
y que el repositorio existe.

So I cannot access.
It seems like I do not have the public key

Thank you very much

That's a problem with git and GitHub. I'm afraid I can't really help.

Out of curiosity why do you want to clone it?

1 Like

What I wanted was to downloaded and I did it right now.
I am running the different test of this site:

And I ran the elastic search test and it worked

mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v7 \
    -Dtests.cluster.user=elastic \
    -Dtests.cluster.pass=changeme \
    -Dtests.cluster.url= \

and now I am doing the same with workplace search:

sudo mvn docker-compose:up waitfor:waitfor -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v7

but it shows an error:

[ERROR] No plugin found for prefix 'docker-compose' in the current project and in the plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from the repositories [local (/root/.m2/repository), central (] -> [Help 1]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1]

and it did the same with this command:

sudo mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v7 \
    -Dtests.cluster.user=elastic \
    -Dtests.workplace.url= \

I am not able to work on the app.

I figured out how to index some files in elastic search, but now I want to index them in workplace search.

This is my the file that I create using FSCrawler:

name: "resumes"
  url: "//home//jorge//Escritorio//FSCRAWLER//Ficheros"
  update_rate: "1m"
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
  username: "elastic"
  password: "L3pfydSSgRtZxfg5gWmX"
  - url: ""
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
  access_token: "489e56799532ca13c49161f82093a41387fca45458617277705b5e8d0e250e77"
  key: "5f959a6e1d41c88afcdc280e"
  - url: ""
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"

But it is still working with elastic search but not with workplace.

Secondly, I tried to use the program that you upload to the internet but I am still having problem with docker -compose:
it shows this error:

jorge@ubuntu:~/Escritorio/Proyectos/fscrawler/contrib/docker-compose-example$ docker-compose up
ERROR: Named volume "path_to_files_to_scan:/usr/app/data:ro" is used in service "fscrawler" but no declaration was found in the volumes section.

I read the file but I am not sure if I have to change any data of it.

Thank you very much again, I am finding a nice help here

What url did you use to download FSCrawler ?

The version that is working I downloaded from here( []), and this is the Yaml that I modified

But I understand that for using it to make fscrawler work with workplace I have to download the project, and is where I have the problem with the docker -compose

Thank you

So this branch is not pushed to sonatype. I don't think it can work then with workplace search.

The last build I shared is available at

1 Like

Ok, thank you very much.

I am going to try it and I share the results

Thank you

This is the message that it shows

jorge@ubuntu:~/Escritorio/FSCRAWLER/FSCrawler Workplace/fscrawler-es7-2.7-SNAPSHOT/bin$ ./fscrawler /home/jorge/Escritorio/FSCRAWLER/FSCrawler/resum
17:39:49,299 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [129.8mb/2.8gb=4.44%], RAM [4.4gb/11.7gb=37.55%], Swap [1.9gb/1.9gb=100.0%].
17:39:49,718 INFO  [f.p.e.c.f.c.FsCrawlerCli] Workplace Search integration is an experimental feature. As is it is not fully implemented and settings might change in the future.
17:39:49,719 WARN  [f.p.e.c.f.c.FsCrawlerCli] Workplace Search integration does not support yet watching a directory. It will be able to run only once and exit. We manually force from --loop -1 to --loop 1. If you want to remove this message next time, please start FSCrawler with --loop 1
17:39:49,740 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
17:39:50,471 INFO  [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.9.2
17:39:50,872 INFO  [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.9.2
17:39:50,891 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler started for [resum] for [/home/jorge/Escritorio/FSCRAWLER/FSCrawler/Ficheros] every [1m]
17:39:51,015 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
17:39:51,096 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [resum] stopped

so because the program is not able to upload one folder, I am going to try with one file.

Thank you

It looks good. To make sure, start it with --debug --restart options