How to use the elasticsearch Attachment Processor with node.js

I'm making an application to search for words or phrases in .PDF files and I made the following code (Found on the internet):

    const elasticsearch = require('elasticsearch');
const fse = require('fs-extra')

let client = new elasticsearch.Client({
    host: "localhost:9200",
    log: ["error", "warning"]
});

client.indices.create({index: 'files'})
.then(() => {
    // create a mapping for the attachment
    return client.indices.putMapping({
        index: 'files',
        type: 'document',
        body: {
            document: {
                properties: {
                    file: {
                        type: 'attachment',
                        fields: {
                            content: {
                                type: 'string',
                                term_vector: 'with_positions_offsets',
                                store: true
                            }
                        }
                    }
                }
            }
        }
    });
});

const fileContents = fse.readFileSync('C:\\Users\\JoaoDJunior\\Downloads\\João D. Junior - rc.pdf');
const fileBase64 = new Buffer(fileContents).toString('base64');
//console.log(fileBase64);
client.create({
    index: 'files',
    type: 'document',
    id: 'somefileid',
    body: {
        file_id: 'somefileid',
        file: {
            _content: fileBase64
        }
    }
})
.catch((err) => {
    console.error('Error while creating elasticsearch record', err);
});

client.search({
    q: 'java',
    index: 'files'
}, (error, result) => {
    if (error) return console.log(error);
    console.log(result.hits);
});

The problem is that I can not get the words inside the document. Is there any error in my code, could anyone help?

You are using here the mapper-attachments plugin which has been removed in 5.x.

Read https://www.elastic.co/guide/en/elasticsearch/plugins/6.6/ingest-attachment.html about ingest-attachment feature and how to use it.

Through the command line, I get results, but I can not do with the nodejs application. Is it possible to use the library?

What did you run on the command line? Vs what is your code then? Did you change it?

I run via the curl this code:
PUT _ingest / pipeline / attachment { "description": "Extract attachment information", "processors": [ { "attachment": { "field": "date" } } ] } PUT my_index / _doc / my_id? Pipeline = attachment { "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0 =" } GET my_index / _doc / my_id
but I do not know how to send my base64 pdf file to the body.

Is your question "how to generate a BASE64 from a binary file in python?"

with python I got it, however I have to mount with node.js, and use the elasticsearch client, and I'm not getting results. I saw that fsclawler might be a solution, however I wanted to make sure it could solve everything just with the elasticsearch

Hmmm. Weird. Not sure what happened but I think this answer:

Was not related to your question but to another one... :woozy_face:

Anyway I'm not sure I understand what your problem is. The code you shared initially is not about ingest-attachment but mapper-attachments.

You need to write using JS an equivalent code to:

The code posted was the only one I found by example and apparently would do what I like: Read the contents of a pdf file and make it available for search.

But it does not fulfill what you would like, so I would like an example equivalent to the code made available in the documentation for the elasticsearch plugin.

I got the result with fscrawler.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.