Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes

sup3race · January 19, 2017, 4:57pm

I wrote this easy connection on python:

class ElasticS(object):

def __init__(self,portI,ip):
    try:
        self.es = Elasticsearch([ip],port=portI )
        print ("Connected: ",  self.es.info())
    except Exception as ex:
            print ("Error: ", ex)
            return None
        
def get_elastic(self):
    return self.es 
if __name__ == '__main__':
ElasticS(portI=9200,ip='localhost')

Pipeline:

from elasticsearch import Elasticsearch
class Pipeline(object):
    def __init__(self):
        es = Elasticsearch()
        es.cat.health()
        body = {
         "description": "parse arxiv pdfs and index into ES",
         "processors" :
        [
            { "attachment" : { "field": "pdf" } },
            { "remove" : { "field": "pdf" } },
            ]
                 }
        es.index(index='_ingest', doc_type='pipeline', id='attachment', body=body)
    
if __name__ == '__main__':
    Pipeline()

Insert with bulk streaming:

from elasticsearch.helpers import streaming_bulk
from Config.Connection import ElasticS
from Config.Parse import Parse

es = ElasticS(ip='localhost',portI=9200)
p = Parse()

for ok, result in streaming_bulk(
            client=es.get_elastic(),
            actions=p.parse_pdf(filename="QzpccHJvdmFcMTcwMS4wMDg5MC5wZGY="),
            index="arxiv",
            doc_type="arxiv",
            chunk_size=4,
            params={"pipeline": "attachment"}
        ):
            action, result = result.popitem()
if not ok:
            print("failed to index document")
else:
            print("Success!")

I GOT THIS ERROR:

[2017-01-19T17:53:14,342][DEBUG][o.e.a.i.IngestActionFilter] [node1] failed to execute pipeline [attachment] for document [arxiv/arxiv/null]
org.elasticsearch.common.compress.NotXContentException: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
        at org.elasticsearch.common.compress.CompressorFactory.compressor(CompressorFactory.java:57) ~[elasticsearch-5.1.1.jar:5.1.1]
        at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:65) ~[elasticsearch-5.1.1.jar:5.1.1]
        at org.elasticsearch.action.index.IndexRequest.sourceAsMap(IndexRequest.java:403) ~[elasticsearch-5.1.1.jar:5.1.1]
        at org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:164) ~[elasticsearch-5.1.1.jar:5.1.1]
        at org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:41) ~[elasticsearch-5.1.1.jar:5.1.1]
        at org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:88) [elasticsearch-5.1.1.jar:5.1.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:527) [elasticsearch-5.1.1.jar:5.1.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.1.1.jar:5.1.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]

i'm following this guide btw: GUIDE

My main goal is to insert documents on elasticsearch and work with them...
i'm pretty sure im doing something wrong. thank you

sup3race · January 20, 2017, 8:48am

can someone help me out please

sup3race · January 20, 2017, 12:10pm

@dadoonet

talevy · January 20, 2017, 8:07pm

Hi @sup3race, I think I am missing some context with the code, but it seems p.parse_pdf(filename="QzpccHJvdmFcMTcwMS4wMDg5MC5wZGY=") is to blame.

however you are parsing the pdf and generating the appropriate actions objects. It seems they do not follow the appropriate json structure that is required.

take a look at the example action objects found in the python docs here: http://elasticsearch-py.readthedocs.io/en/master/helpers.html. the pdf needs to be converted into a base64 string value in a json field.

let me know if that helps!

system · February 17, 2017, 8:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Compressor detection can only be called on some xcontent bytes Elasticsearch	2	32125	July 8, 2019
Problem ingesting PDF Elasticsearch	1	618	September 23, 2019
How to use pipeline (ingest) in Java Index/Update API Elasticsearch	1	653	September 12, 2017
Inserting PDF's and PPT's into 5.1.1 failing ingest attachment - not_x_content_exception Elasticsearch	3	697	January 14, 2017
[solved] - BulkIndexError - Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes Elasticsearch	3	10896	June 21, 2019

Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes

Related topics