I am indexing the text content of a file by using
res = es.index(index=myIndex,body=jsonDict), where jsonDict is just the text of a pdf file in a json dictionary.
I am able to query the content and retrieve the relevant text.
My question is : how do I index the file contents so I could also retrieve file info such as :-
"file" : {
"extension" : "pdf",
"content_type" : "application/pdf",
"created" : "2019-10-01T15:28:31.000+0000",
"last_modified" : "2019-10-01T15:28:31.000+0000",
"last_accessed" : "2019-10-01T15:38:50.000+0000",
"indexing_date" : "2019-10-01T15:39:08.055+0000",
"filesize" : 877861,
and
"path" : {
"root" : "4a997482a3826d51751b8e7c01e476c",
"virtual" : "/P_GB27980_20120213.pdf",
"real" : "/Users/madabhuc/Documents/IR/presto/eval/P_GB27980_20120213.pdf"
Do I need to explicitly provide this info when I submit the files to ElasticSearch for indexing ?
Thanks.