Errors indexing binary attachment using ingest-attachment pipeline

Hi all, I am ingesting a document A and I receive the error: "Last unit does not have enough valid bits" when the document contains only the word "Lorem".

For document B which contains the words "Lorem ipsum" then I get the error "Illegal base64 character 20" because of the space.

How do I approach fixing these errors and also enable indexing a document that has spaces?

Content of document A:

Lorem

CBOR encoded data for A:

\xa1\x64\x64\x61\x74\x61\x65\x4c\x6f\x72\x65\x6d

Diagnostic decoding of encoded data A:

{"data": "Lorem"}

Content of binary document B:

Lorem ipsum

CBOR encoded data for B:

\xa1\x64\x64\x61\x74\x61\x6b\x4c\x6f\x72\x65\x6d\x20\x69\x70\x73\x75\x6d

Diagnostic decoding of encoded data B:

{"data": "Lorem ipsum"}

Create ES index

curl -X PUT "localhost:9200/example?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "number_of_shards": 1,  
      "number_of_replicas": 0 
    }
  },
  "mappings": {
    "properties": {
      "data": {
        "type": "binary"
      }
    }
  }
}'

Create ES ingest attachment pipeline

curl -X PUT "localhost:9200/_ingest/pipeline/cbor-attachment?pretty" -H 'Content-Type: application/json' -d'
{
  "description" : "Extract attachment information encoded in CBOR",
  "processors" : [
    {
      "attachment": {
        "description" : "Ingest attachment",
        "field": "data",
        "indexed_chars": -1
      }
    }
  ]
}'

Ingest document A

echo -e '\xa1\x64\x64\x61\x74\x61\x65\x4c\x6f\x72\x65\x6d' | \
    curl -X PUT \
       "localhost:9200/example/_doc/test1?pipeline=cbor-attachment" \
       -H 'Content-Type: application/cbor' --data-binary @-

Error ingesting document A

illegal_argument_exception: Last unit does not have enough valid bits

Ingest document B

echo -e '\xa1\x64\x64\x61\x74\x61\x6b\x4c\x6f\x72\x65\x6d\x20\x69\x70\x73\x75\x6d' | \
    curl -X PUT \
       "localhost:9200/example/_doc/test2?pipeline=cbor-attachment" \
       -H 'Content-Type: application/cbor' --data-binary @-

Error ingesting document B

illegal_argument_exception: illegal base64 character 20

Any ideas, please @dadoonet @spinscale @Badger @rugenl or anyone else experienced with Elasticsearch? Can't find anything in the docs about these two issues.

Please be patient in waiting for responses to your question and refrain from pinging multiple times asking for a response or opening multiple topics for the same question. This is a community forum, it may take time for someone to reply to your question. For more information please refer to the Community Code of Conduct specifically the section "Be patient". Also, please refrain from pinging folks directly, this is a forum and anyone that participates might be able to assist you.

If you are in need of a service with an SLA that covers response times for questions then you may want to consider talking to us about a subscription.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

I never used cbor but I'm using BASE64 Json document.
I don't think I can help.

I apologize.

I figured this was a different question from the other thread I created, since that was a different error message and I am also trying something else in here now. I also didn't know pinging people was considered rude, I am new here.

This attempt is for a proof of concept which unfortunately I can't get any approval for a subscription since I haven't been able to get Elasticsearch to work.

Thank you for your response, and once again I am sorry to have bothered you.

What is the goal of your POC?

BTW have a look at FSCrawler project. It could help also to build easily your POC.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.