I cant index microsoft office files and pdf in elisticsearch


(sara) #1

hello ,
i use elisticsearch 2.0.0 and i try to index and search for pdf files or word files but it always failed because content inserted on it like encoded text and when use file_get_content it also failed
and i search and try to use tika apache but i cant install plugin for mapper and i dont know how integrated it i'm new in elisticsearch and
is the elisticsearch support searching in word files or pdf file ?


(David Pilato) #2

Have a look at:

If you are looking at something more OOTB look at https://swiftype.com/


(sara) #3

but ingest-attachment not available on elisticsearch 2.0.0

i use this plugin elasticsearch-mapper-attachments and try to index document by this code

   $content = base64_encode(file_get_contents($doc_src));

//Indexing by mapping
$json = '{
"attachment" : {
"_content_type" : "application/pdf",
"_name" : "'.$doc_src.'",
"_content" : "'.$content.'"
}
}';

    $params = array();
    $params['index'] = 'documents';
    $params['type']  = 'attachment';
    $params['id']    = '17';
    $params['body'] = $json;

      $response= $this->elasticsearch->client->index($params);
  print_r($response);

and it index the document with encoding document but when i search

 $params['index'] = 'documents';
    $params['type']  = 'attachment';
    $params2['body']['query']['match']['file.content'] = 'Text';
    $response= $this->elasticsearch-> client->search($params2);
    print_r($response);

there is now result it always release empty array


(David Pilato) #4

but ingest-attachment not available on elisticsearch 2.0.0

That's true. I'd suggest to upgrade to 5.6 or 6.2.
At the very least you should upgrade to latest 2.x. 2.0.0 is so old.

mapper-attachments plugin has been removed in 6.x so I'd really recommend not using it.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.