Nodejs ingest pdf

Hello,
I've tried to inject pdf file to elastic.

PUT _ingest/pipeline/pdfnew
{
  "description": "Extract attachment information encoded in Base64 with UTF-8 charset",
  "processors": [
    {
      "attachment": {
        "field": "file"
      }
    },
    {
      "remove": {
        "field": ["file"]
      }
    }
  ]
}

first i made manually in kibana ingest,
and then on my route I'm trying something like this:

 let buffer = fs.readFileSync('../../../Desktop/sample.pdf', {
    encoding: 'base64',
  });

const index = client.index({
    id: 101,
    index: 'pdf-test9',
    pipeline: 'pdfnew',
    body: {
      file: buffer,
    },
  });

when a request is made I get content like this in Kibana with this weird """ on start and end.
Can you help me with this

when I get then in Kibana
GET pdf-test9/_search
i get

The triple quotes are normal for storing large unstructured text. This is so you don't have to escape special characters, maintains line breaks, etc. The attachment ingest processor is most likely doing this for you.

POST test/_doc/
{
  "message": """
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
  
  ";';';^#$*()$#*&$(#*&@$@*#$&@#)_)({}|}{:>?<,./;'[])
  
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
  """
}

Thank you Aaron on response,
So that means that all my content is searchable now, like this:


GET pdf-test9/_search
{
  "query": {
    "match_phrase": {
      "attachment.content": {
        "query": "and more text."
      }
    }
  }
}

Yes, as long as your data mapping for attachment.content is type text which I think it should be.

Use GET pdf-test9/_mapping to verify.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.