I liked the simulate API approach, I got it to work via curl :
curl -F "file=@pdf_files/non-searchable-text.pdf" "http://127.0.0.1:8080/fscrawler/_upload?debug=true&simulate=true"
Which gave me this response :
{
"ok": true,
"filename": "non-searchable-text.pdf",
"url": "http://127.0.0.1:9200/resumes/_doc/f39614d4716aed76167498ac4945ed7",
"doc": {
"content": "\n \n\nTABLE OF CONTENTS\n\n \n\nIntroduction 1\nChapter 1: The ABC of Programming 11\nChapter 2: Basic JavaScript Instructions 53\nChapter 3: Functions, Methods & Objects 85\n(i aY-] 0) (=) ae Sam DY -101 |} (0) aioe\" Mole) 0-) 145\nChapter 5: Document Object Model 183\nChapter 6: Events 243\nChapter 7: jQuery 293\nChapter 8: Ajax & JSON 367\nChapter 9: APIs 409\nChapter 10: Error Handling & Debugging 449\nChapter 11: Content Panels 487\nChapter 12: Filtering, Searching & Sorting 527\nChapter 13: Form Enhancement & Validation 567\nTare toy 623\n\n \n\nTry out & download the code in this book\nwww. javascriptbook.com\n\n \n\n \n\n \n\n\n",
"meta": {
"date": "2021-02-02T11:44:56.000+00:00",
"format": "application/pdf; version=1.6"
},
"file": {
"extension": "pdf",
"content_type": "application/pdf",
"indexing_date": "2021-02-04T13:49:29.353+00:00",
"filename": "non-searchable-text.pdf"
},
"path": {
"virtual": "non-searchable-text.pdf",
"real": "non-searchable-text.pdf"
}
}
}
I tried making it work through PHP like this :
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://127.0.0.1:8080/fscrawler/_upload?debug=true&simulate=true');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_POST, 1);
$args['file'] = '@/pdf_files/non-searchable-text.pdf';
curl_setopt($curl, CURLOPT_POSTFIELDS, $args);
$result = curl_exec($curl);
if (curl_errno($curl)) {
echo 'Error:' . curl_error($curl);
} else {
echo '<pre> response : ', print_r($result, true) ,'</pre>';
}
curl_close($curl);
But this was the response :

Any idea what I'm doing wrong ?