[PHP] How to ingest my pdf file using PHP ES Client


(evert) #1

I have updated from ES 2.3 to 5, and in the docs has how to ingest data into ES, but when using processors, with Ingest Attachment, I could not figure it out how should be done.

I have created my Pipeline with success as of below:

$params = [
    'id' => 'attachment',
    'body' => [
        'description' => 'Extract attachment information',
        'processors' => [
            [
                'attachment' => [
                    'field' => 'content',
                    'indexed_chars' => -1
                ]
            ]
        ]
    ]
];
return $client->ingest()->putPipeline($params);

I tried to index my pdf file with:

$params = [
    'index' => 'index',
    'type'  => 'type',
    'id'    => 'document_id',
    'body'  => [
        'content' => base64_encode(file_get_contents($fullfile))
    ]                
];
return $client->index($params);

or with:

return $client->ingest()->putPipeline($params);

With no success...

Using regular json (with postman) the code below works smoothly:

PUT /index/type/my_indexed_id?pipeline=attachment
{
  "content" : "BuDQowMDAwNDgyMzA0I.....MY_WHOLE_ENCODED_PDF_FILE"
}

So, how do we inform out $client that we must use an specific pipeline?

As of used above with json: PUT /index/type/my_indexed_id?pipeline=attachment

Thanks!


(Zachary Tong) #2

Note: on the run, haven't actually tried this personally yet, but you should be able to just add a 'pipeline' param to the indexing request. Set it to the pipeline's name:

$params = [
    'index' => 'index',
    'type'  => 'type',
    'id'    => 'document_id',
    'pipeline' => 'attachment',  // <----- here
    'body'  => [
        'content' => base64_encode(file_get_contents($fullfile))
    ]                
];
return $client->index($params);

(evert) #3

Thanks!!


(system) #4