Hi all, I am ingesting a document A and I receive the error: "Last unit does not have enough valid bits" when the document contains only the word "Lorem".
For document B which contains the words "Lorem ipsum" then I get the error "Illegal base64 character 20" because of the space.
How do I approach fixing these errors and also enable indexing a document that has spaces?
Content of document A:
Lorem
CBOR encoded data for A:
\xa1\x64\x64\x61\x74\x61\x65\x4c\x6f\x72\x65\x6d
Diagnostic decoding of encoded data A:
{"data": "Lorem"}
Content of binary document B:
Lorem ipsum
CBOR encoded data for B:
\xa1\x64\x64\x61\x74\x61\x6b\x4c\x6f\x72\x65\x6d\x20\x69\x70\x73\x75\x6d
Diagnostic decoding of encoded data B:
{"data": "Lorem ipsum"}
Create ES index
curl -X PUT "localhost:9200/example?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"data": {
"type": "binary"
}
}
}
}'
Create ES ingest attachment pipeline
curl -X PUT "localhost:9200/_ingest/pipeline/cbor-attachment?pretty" -H 'Content-Type: application/json' -d'
{
"description" : "Extract attachment information encoded in CBOR",
"processors" : [
{
"attachment": {
"description" : "Ingest attachment",
"field": "data",
"indexed_chars": -1
}
}
]
}'
Ingest document A
echo -e '\xa1\x64\x64\x61\x74\x61\x65\x4c\x6f\x72\x65\x6d' | \
curl -X PUT \
"localhost:9200/example/_doc/test1?pipeline=cbor-attachment" \
-H 'Content-Type: application/cbor' --data-binary @-
Error ingesting document A
illegal_argument_exception: Last unit does not have enough valid bits
Ingest document B
echo -e '\xa1\x64\x64\x61\x74\x61\x6b\x4c\x6f\x72\x65\x6d\x20\x69\x70\x73\x75\x6d' | \
curl -X PUT \
"localhost:9200/example/_doc/test2?pipeline=cbor-attachment" \
-H 'Content-Type: application/cbor' --data-binary @-
Error ingesting document B
illegal_argument_exception: illegal base64 character 20