+1. Feel free to drop a note at user@tika.apache.org if you have questions or please open an issue on our JIRA (https://issues.apache.org/jira/projects/TIKA/summary) if you find problems. Cheers, Tim
Hi @Tim_Allison
Any idea on Converting doc file to pdf or extract content from doc file page by page.
Please suggest
Thanks for your time always
-Rahul
Doc and docx are, unfortunately, paragraph based not page/coordinate based. Tika doesnât calculate page breaks in doc/x. You might drop a note the the Apache POI user list or see what you can find via Google...sorry I canât help.
Thanks @Tim_Allison
No problem.
How about reading the text from doc files and then converting them into pdfs? Is it possible? I tried with python but couldnât make it.
This might be a lead, but I have no experience with it: https://stackoverflow.com/questions/50982064/converting-docx-to-pdf-with-pure-python-on-linux-without-libreoffice
+1 Definitely ask on the Apache POI user list...those folks know their MSOffice formats.
Sure @Tim_Allison
Thank you =D
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.