My hunch is that Tika is running out of memory on you. Tika moved from
using filesystem for temp storage to an in memory based approach for
PDFs, which led to some out of memory issues on my side. Really, it is
PDFBox under Tika that made the change and I believe tika 0.9 picked
this up. You should be able to confirm by analyzing the heap dump.
In Tika 1.1, you can control this behavior by passing in a file object
instead of a stream.
On Apr 3, 9:05 am, Shay Banon kim...@gmail.com wrote:
You just specify the two index names in the URI, or if you use the Java
API, specify both indices in the search API.
On Tue, Apr 3, 2012 at 5:40 PM, Shane Witbeck sh...@digitalsanctum.comwrote:
I don't have experience (yet) with searching on more than one index at a
time. Is the multisearch API (
I need here or is there some way of associating one index with another?
On Tuesday, April 3, 2012 10:32:10 AM UTC-4, kimchy wrote:
Yea, probably breaking down the attachments to their own docs make more
On Tue, Apr 3, 2012 at 12:18 AM, Shane Witbeck sh...@digitalsanctum.comwrote:
Given the scenario I've outlined, does it make more sense to put
attachments in their own index? It seems I've hit a limitation of the
attachment plugin with the limited amount of RAM that I have and the
potential of several mutli-MB attachments per document. I'm also curious if
you think increasing the amount of RAM on the machines would help in this
I have just the one index and was hoping to avoid creating another index
for attachments but if this is the way to go what would be the best way to
On Saturday, March 31, 2012 7:36:37 PM UTC-4, Shane Witbeck wrote:
Yes, each document may have several attachments associated with it.