Indexing file (.doc,.pdf.xls etc)

i want to index the content of files like .doc or .pdf using elasticsearch . i read about elasticsearch-mapper-attachments but it is storing whole file , i just want it to index the content of file not to store the whole file as i have to index more than 90000 files .can any one have suggestion?

Wait for 5.0 and use ingest attachment plugin.

In the mean time you can use source exclude.
IIRC it's written in plugin docs.

1 Like

thanks for reply will try this solution.

hey David i tried disabling _source like my json schema is
{
"mappings": {
"person": { "_source" : { "enabled" : false },
"properties": {
"documents ": { "type": "attachment" },
"path": { "type": "string" }
}
}
}
}

now i am able to search through documents but my "path" is not showing, i need "path " for retrieving it from disk.

What did you send in path field?

path to the document . the directory address where my document is present.

So if you sent a doc containing field path, this document should come back.

If not, create a script which recreates your problem. Note that you don't need attachment to reproduce it.