Why is my index not growing past 4GB when using fscrawler?

So I am trying to index about 2.5TB of data using fscrawler that is about 3 million files.. I have 40GB of ram of which I have set aside 20GB heap for fscrawler for maximum throughput.

C:\Elastic\fscrawler-MAR15\bin>fscrawler trial2
14:58:51,919 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [18.8gb/19.1gb=98.43%], RAM [37gb/40.9gb=90.53%], Swap [1.8gb/47.1gb=3.97%].
14:58:52,998 INFO  [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.1.1
14:58:53,185 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
14:58:53,185 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.

However, only about 1 million of data has been indexed so far with the index size remaining at 4GB for the last 3 weeks. I don't know if indexing is going on or it has stalled. (kindly also explain to me what the swap memory is with regard to fscrawler, does mine which is 1.8gb affect the performance?)

N/B I once had to restart indexing because I found an error "your computer is low on memory.. save files and close programs. Java (TM) Platform binary".

kindly advice me on the indexing situation and memory.
thank you.

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

kindly also explain to me what the swap memory is

It's when you don't have enough RAM. OS can use the hard disk as a memory. But this makes everything super slow.

The important number to look at is the number of documents that needs to be indexed.
You can't really compare the source size and the size in elasticsearch as only the extracted content is indexed.
So I can see that almost 1m documents have been indexed by FSCrawler. Do you know how many documents should have been indexed?
FSCrawler visited somehow 100k folders.

Sadly, there is no "progress report" available yet in FSCrawler.

The only way to know is by starting FSCrawler with the --debug option.

I have tried to update the question.

Immediately after posting this question, I got another memory full error.. What could be causing this? Because from the task manager, memory used is at about 30% . Is my heap and ram sufficient?

Could you share it?
What are the fscrawler job settings?

I am sorry but I can only use a photo here.
This is the error !
tymHb|367x289

My job settings looks like this:

---
name: "index"
fs:
  url: "\\\\DESKTOP-O5VVOG6\\shared_docs"
  update_rate: "15m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: true
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
elasticsearch:
  nodes:
  - url: "http://127.0.0.1:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"

That's strange that Windows complains about the memory usage.
I'd expect an OutOfMemory java exception instead.

The memory is supposed to be allocated and available for the process.
Is there any option on windows machines to make sure that a process can actually lock the memory?

So I think I found the error.. my c drive was full (location of my index).. after reboot I found 0 bytes available.. so I added more memory and tried again... Am monitoring it and it seem to be running fine..

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.