Hi all,
using exiftool
I add a custom xmp field to a jpg via
exiftool -config exif.hash.conf -xmp-xmp:Test="T1" data/images/13-Moss-Mushrooms-1.jpg
where exif.hash.conf contains
%Image::ExifTool::UserDefined = (
'Image::ExifTool::XMP::xmp' => {
ImageHashAverage => { Writable => 'string' },
ImageHashPerception => { Writable => 'string' },
ImageHashDifference => { Writable => 'string' },
ImageHashWavelet => { Writable => 'string' },
Test => { Writable => 'string' },
},
);
1;
exiftool
is able to read the tag, e.g. with
exiftool 13-Moss-Mushrooms-1.jpg | grep -i t1
Test : T1
but if I run fscrawler --loop 1 --config_dir /jobs/images/config --restart images
in my docker container, I get a bunch of raw metadata, but none of my custom fields.
Here is my _settings.yaml
:
name: "images"
fs:
url: "/tmp/es/images"
update_rate: "15m"
includes:
- "*/*.jpg"
- "*/*.png"
- "*/*.tif"
- "*/*.tiff"
#excludes:
# - "*/resume*"
#filters:
# - ".*foo.*"
# error from fscrawler: cannot support both json and xml
#json_support: true
#xml_support: true
add_as_inner_object: true
#index_folders: false
attributes_support: true
raw_metadata: true
#add_filesize: false
remove_deleted: true
continue_on_error: true
lang_detect: true
store_source: false
#indexed_chars: "100000"
#ignore_above: "512mb"
# required
index_content: true
#indexed_chars: 0
checksum: "SHA-1"
#follow_symlink: true
elasticsearch:
nodes:
- url: "http://elasticsearch:9200"
index: "images"
index_folder: "images"
Sorry for being kind of off-topic, but should fscrawler be able to index my metadata? Or does exiftool use some custom method to save the tags in the images, such that they are not recognized by fscrawler? Is there some standard command line tool to get this working and can anyone provide an example on how to create a custom field in an image (jpg, png and/or pdf) which should work?
My goal is, to add a bunch of perceptual hashes to each image. Also I want that the user is able to mark faces including coordinates and the name of the person within the image. This information should then be stored in the image itself, fscrawler should index it and, finally, with a few ES queries I'd like to search for duplicate images, ocr'ed text and a set of images a given person is found.
Many thanks for your time and help