I have installed Elasticsearch 7.3 and am trying to index pdf files using the ingest-attachement plug-in and python to convert the pdf to base64 and the request library to load the data. I am able to successfully create the pipeline. However, when I try to index the PDF, I receive a 500 error and the log file shows an error stating:
Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
Following is the code, output, and the relevant log file information.
Any help would be greatly appreciated, thank you in advance!
Code:
#!/usr/local/bin/python3.7
import sys
import os
import json
import base64
import requests
from elasticsearch import Elasticsearch
es = Elasticsearch()
open up a pipeline so that documents may be ingested
cpipe = '''
curl -XPUT "localhost:9200/_ingest/pipeline/attachment" -H "Content-Type: application/json" -d '
{
"description" : "Field for processing file attachments",
"processors" : [
{ "attachment" : { "field" : "data", "indexed_chars" : -1 } }
]
}
'
'''
print(cpipe)
cout = os.popen(cpipe).read()
print (cout)
file = 'PDFS/880905.pdf'
print("file = {}".format(file))
with open(file, "rb") as pdf_file:
print(pdf_file)
enc_pdf = base64.b64encode(pdf_file.read())
putstr = "http://localhost:9200/cdocs/_doc/880905?pipeline=attachment"
headers = {
'Content-type': 'application/json'
}
print("headers={}".format(headers))
print("putstr={}".format(putstr))
response = requests.put(putstr, headers=headers, data=enc_pdf)
print(response)
Output with 500 error:
curl -XPUT "localhost:9200/_ingest/pipeline/attachment" -H "Content-Type: application/json" -d '
{
"description" : "Field for processing file attachments",
"processors" : [
{
"attachment" : { "field" : "data", "indexed_chars" : -1 }
}
]
}
'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 184 100 21 100 163 723 5618 --:--:-- --:--:-- --:--:-- 5821
{"acknowledged":true}
file = PDFS/880905.pdf
<_io.BufferedReader name='PDFS/880905.pdf'>
headers={'Content-type': 'application/json'}
putstr=http://localhost:9200/cdocs/_doc/880905?pipeline=attachment
<Response [500]>
Log file snippet:
[2019-08-26T10:29:48,416][DEBUG][o.e.a.b.TransportBulkAction] [host-1] failed to execute pipeline [attachment] for document [cdocs/_doc/868612]
org.elasticsearch.common.compress.NotXContentException: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
More log file data:
[2019-08-26T10:29:48,416][DEBUG][o.e.a.b.TransportBulkAction] [host-1] failed to execute pipeline [attachment] for document [cdocs/_doc/868612]
org.elasticsearch.common.compress.NotXContentException: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
at org.elasticsearch.common.compress.CompressorFactory.compressor(CompressorFactory.java:56) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:101) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.index.IndexRequest.sourceAsMap(IndexRequest.java:356) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.ingest.IngestService.innerExecute(IngestService.java:425) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.ingest.IngestService.access$100(IngestService.java:70) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.ingest.IngestService$3.doRun(IngestService.java:355) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.3.0.jar:7.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-08-26T10:29:48,417][WARN ][r.suppressed ] [host-1] path: /cdocs/_doc/868612, params: {pipeline=attachment, index=cdocs, id=868612}
org.elasticsearch.common.compress.NotXContentException: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
at org.elasticsearch.common.compress.CompressorFactory.compressor(CompressorFactory.java:56) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:101) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.action.index.IndexRequest.sourceAsMap(IndexRequest.java:356) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.ingest.IngestService.innerExecute(IngestService.java:425) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.ingest.IngestService.access$100(IngestService.java:70) ~[elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.ingest.IngestService$3.doRun(IngestService.java:355) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.3.0.jar:7.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.3.0.jar:7.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]