Hi David ,
Thank you for your early responses. It would be greatful if u clarify the below mentioned doubts.
C:\Users\Administrator.fscrawler\job1_settings.json
"elasticsearch" : {
"index" : "jobindex1",
"index_folder" : "jobfoldersindex1",
"pipeline" : "fscrawler",
"nodes" : [ {
"url" : "http://127.0.0.1:9200"
} ],
"bulk_size" : 100,
"flush_interval" : "5s",
"byte_size" : "25mb"
}
the above is my settings in fscrawler w.r.t elasticsearch.
Request to create a pipeline
PUT _ingest/pipeline/fscrawler
{
"description" : "fscrawler pipeline",
"processors" : [
{
"set" : {
"field": "foo",
"value": "bar"
}
}
]
}
Files are imported into elasticsearch successfully.
Request to get files which contains the below mentioned string
GET /jobindex1/_search
{
"query" : {
"match": {
"content" : "emad"
}
}
}
Result:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 3.15744,
"hits" : [
{
"_index" : "jobindex1",
"_type" : "_doc",
"_id" : "7bfc1ba6cb2ea96a7cea1b84f4dbd",
"_score" : 3.15744,
"_source" : {
"path" : {
"virtual" : "/Attendance_Appraisals_07022017120507.pdf",
"root" : "8f384e4d1aa1e6127ed1195953dccce3",
"real" : """D:\PDf list\Attendance_Appraisals_07022017120507.pdf"""
},
"file" : {
"extension" : "pdf",
"last_accessed" : "2019-01-13T06:41:24.289+0000",
"filename" : "Attendance_Appraisals_07022017120507.pdf",
"content_type" : "application/pdf",
"created" : "2019-01-13T06:41:24.289+0000",
"indexing_date" : "2019-01-13T12:04:23.798+0000",
"filesize" : 509153,
"last_modified" : "2017-02-07T09:05:03.872+0000",
"url" : """file://D:\PDf list\Attendance_Appraisals_07022017120507.pdf"""
},
"meta" : {
"created" : "2017-02-07T02:59:52.000+0000",
"format" : "application/pdf; version=1.3",
"raw" : {
"pdf:PDFVersion" : "1.3",
"X-Parsed-By" : "org.apache.tika.parser.pdf.PDFParser",
"xmp:CreatorTool" : "Canon ",
"access_permission:modify_annotations" : "true",
"access_permission:can_print_degraded" : "true",
"meta:creation-date" : "2017-02-07T06:59:52Z",
"created" : "2017-02-07T06:59:52Z",
"access_permission:extract_for_accessibility" : "true",
"access_permission:assemble_document" : "true",
"xmpTPg:NPages" : "1",
"Creation-Date" : "2017-02-07T06:59:52Z",
"resourceName" : "Attendance_Appraisals_07022017120507.pdf",
"dcterms:created" : "2017-02-07T06:59:52Z",
"dc:format" : "application/pdf; version=1.3",
"access_permission:extract_content" : "true",
"access_permission:can_print" : "true",
"pdf:docinfo:creator_tool" : "Canon ",
"access_permission:fill_in_form" : "true",
"pdf:encrypted" : "false",
"producer" : " ",
"access_permission:can_modify" : "true",
"pdf:docinfo:producer" : " ",
"pdf:docinfo:created" : "2017-02-07T06:59:52Z",
"Content-Type" : "application/pdf"
},
"creator_tool" : "Canon "
},
"foo" : "bar",
"content" : """
A.
NPtr
INTER OFFICE MEMO
Dear All Employees
ln reference to the above mentioned subject, irrespective of numerous correspondences, it
has been noticed that many employees are still reporting to work late on many occasions.
The grace period for morning Punch lN time is only 15 minutes from the official start timing
irrespective of Head Office or Sites. Late attendance will be deducted from the monthly
salary. Also PUNCH IN/OUT is mandatory. The Missed Punching will also be considered as
Absent.
Also note that the late Punching and related deduction will be affecting the Performance
Appraisal of the employees.
ln view of all the above all staff are requested to do proper attendance punching and if any
technical issue please coordinate with the IT/HR department to rectify the same at the
earliest will be given to any staff on the attendance punching
E Janabi
HR & Admin Manager
Ref No. Trojan/lOM/HR & ADM/44581 17 Date: 07to2t2017 Pages 1
To All Staff TROJAN & NPC
From Emad AI Janabi HR & Admin Manager
CC: Engr. Hamad Al Ameri Managing Director
Subject Attendance Regulations & Performance Appraisals
P.o. Box 111059, Abu Dhabi, uAE. Tel. no. +9t1 2 so973oo - Fax: +gl1 2 5g2gs94
.i - oi
I'l{( )f n N
"""
}
}
]
}
}
Why i am not able to get the result like
{
"found": true,
"_index": "my_index",
"_type": "_doc",
"_id": "my_id",
"_version": 1,
"_source": {
"data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
"attachment": {
"content_type": "application/rtf",
"language": "ro",
"content": "Lorem ipsum dolor sit amet",
"content_length": 28
}
}
}