Hi,
I have indexed metadata from documents with FSCrawler, so in ES I have an index that looks like this :
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 19,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "BigData.pptx",
"_score" : 1.0,
"_source" : {
"meta" : {
"author" : "Jin",
"title" : "PowerPoint Presentation",
"date" : "2012-01-12T20:47:47.000+0000",
"modifier" : "Jin",
"created" : "2012-01-12T19:50:20.000+0000",
"raw" : {
"date" : "2012-01-12T21:47:47Z",
"cp:revision" : "20",
"Total-Time" : "57",
"extended-properties:AppVersion" : "14.0000",
"meta:paragraph-count" : "273",
"meta:word-count" : "1319",
"extended-properties:PresentationFormat" : "On-screen Show (4:3)",
"dc:creator" : "Jin",
"Word-Count" : "1319",
"dcterms:created" : "2012-01-12T20:50:20Z",
"dcterms:modified" : "2012-01-12T21:47:47Z",
"Last-Modified" : "2012-01-12T21:47:47Z",
"title" : "PowerPoint Presentation",
"Last-Save-Date" : "2012-01-12T21:47:47Z",
"Paragraph-Count" : "273",
"meta:save-date" : "2012-01-12T21:47:47Z",
"dc:title" : "PowerPoint Presentation",
"Application-Name" : "Microsoft Office PowerPoint",
"extended-properties:TotalTime" : "57",
"modified" : "2012-01-12T21:47:47Z",
"Notes" : "17",
"Content-Type" : "application/vnd.openxmlformats-officedocument.presentationml.presentation",
"Slide-Count" : "47",
"X-Parsed-By" : "org.apache.tika.parser.DefaultParser",
"creator" : "Jin",
"extended-properties:Notes" : "17",
"meta:author" : "Jin",
"meta:creation-date" : "2012-01-12T20:50:20Z",
"extended-properties:Application" : "Microsoft Office PowerPoint",
"meta:last-author" : "Jin",
"meta:slide-count" : "47",
"Creation-Date" : "2012-01-12T20:50:20Z",
"xmpTPg:NPages" : "47",
"resourceName" : "BigData.pptx",
"Last-Author" : "Jin",
"Revision-Number" : "20",
"Application-Version" : "14.0000",
"Author" : "Jin",
"Presentation-Format" : "On-screen Show (4:3)"
}
},
"file" : {
"extension" : "pptx",
"content_type" : "application/vnd.openxmlformats-officedocument.presentationml.presentation",
"created" : "2019-07-08T10:45:34.000+0000",
"last_modified" : "2019-07-08T10:45:34.000+0000",
"last_accessed" : "2019-07-17T09:50:04.000+0000",
"indexing_date" : "2019-07-17T13:32:13.807+0000",
"filesize" : 2496305,
"filename" : "BigData.pptx",
"url" : "file:///home/ubuntu/Downloads/FSCrawler/BigData.pptx",
"indexed_chars" : 0
},
"path" : {
"root" : "4d1f91a687e6d7c4e1dd3e1cbb4bd2",
"virtual" : "/BigData.pptx",
"real" : "/home/ubuntu/Downloads/FSCrawler/BigData.pptx"
}
}
},
and so on that is just one hit from the hits field .
I want to get the matching index when I give one or many words, and from the metadata.
for example here the metadata has many fields like meta.raw.date, meta.raw.title ...etc
I want to get a result of searching, for example, the words 'Big Data' on the whole 'meta', I must get a result because the field meta.raw.resourceName has "BigData.pptx".
I couldn't find a way to execute such a search, I've tried 'more like this' and 'multi_match' but the problem that is I have to put the exact field in the query ( i have to put "meta.raw.resourceName": "BigData.pptx") in order to get the result and I have to put the exact word 'BigData.pptx' to get the result otherwise I get nothing if I put the word 'Big'
can anyone help me