Indexing PDF documents with ElasticSearch


(cs.irfan) #1

Hello Dear All,
I am beginner to ElasticSearch and very much interested in ElasticSearch. I
am using ElasticSearch 0.90.5 binary on Windows. I have copied Apache Tika
1.4 jar file (tika-app-1.4.jar) and
elasticsearch-mapper-attachments-1.9.0.jar into lib folder of
elasticsearch. When I index pdf file, it gives me the following exception:

{
"error": "ClassCastException[java.util.ArrayList cannot be cast to
java.util.Map]",
"status": 500
}
I am using Dell Core i3 with Windows 7 64-bit.

Kindly guide.....
Regards

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Lukáš Vlček) #2

Hi,

If you plan to index a lot of documents then I would consider extracting
text from PDF on the client side - i.e. before you send the data to
Elasticsearch. PDF parsing can be quite expensive and the size of the data
in index request will be probably much smaller if it contains pure plain
text compared to Base64 encoded PDF file.

Just my 2 cents.

Regards,
Lukas

On Thu, Oct 10, 2013 at 5:53 PM, cs.irfan@upesh.edu.pk wrote:

Hello Dear All,
I am beginner to ElasticSearch and very much interested in ElasticSearch.
I am using ElasticSearch 0.90.5 binary on Windows. I have copied Apache
Tika 1.4 jar file (tika-app-1.4.jar) and elasticsearch-mapper-**attachments-1.9.0.jar
into lib folder of elasticsearch. When I index pdf file, it gives me the
following exception:

{
"error": "ClassCastException[java.util.**ArrayList cannot be cast to
java.util.Map]",
"status": 500
}
I am using Dell Core i3 with Windows 7 64-bit.

Kindly guide.....
Regards

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3