How to add metadata to an image such that it gets indexed by fscrawler?

Hi all,

using exiftool I add a custom xmp field to a jpg via

exiftool -config exif.hash.conf -xmp-xmp:Test="T1" data/images/13-Moss-Mushrooms-1.jpg

where exif.hash.conf contains

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::XMP::xmp' => {
        ImageHashAverage => { Writable => 'string' },
        ImageHashPerception => { Writable => 'string' },
        ImageHashDifference => { Writable => 'string' },
        ImageHashWavelet => { Writable => 'string' },
        Test => { Writable => 'string' },
    },
);

1;

exiftool is able to read the tag, e.g. with

exiftool 13-Moss-Mushrooms-1.jpg | grep -i t1
Test                            : T1

but if I run fscrawler --loop 1 --config_dir /jobs/images/config --restart images in my docker container, I get a bunch of raw metadata, but none of my custom fields.
Here is my _settings.yaml:

name: "images"
fs:
  url: "/tmp/es/images"
  update_rate: "15m"
  includes:
    - "*/*.jpg"
    - "*/*.png"
    - "*/*.tif"
    - "*/*.tiff"
  #excludes:
  #  - "*/resume*"
  #filters:
  #  - ".*foo.*"
  # error from fscrawler: cannot support both json and xml
  #json_support: true
  #xml_support: true
  add_as_inner_object: true
  #index_folders: false
  attributes_support: true
  raw_metadata: true
  #add_filesize: false
  remove_deleted: true
  continue_on_error: true
  lang_detect: true
  store_source: false
  #indexed_chars: "100000"
  #ignore_above: "512mb"
   # required
  index_content: true
  #indexed_chars: 0
  checksum: "SHA-1"
  #follow_symlink: true

elasticsearch:
  nodes:
    - url: "http://elasticsearch:9200"
  index: "images"
  index_folder: "images"

Sorry for being kind of off-topic, but should fscrawler be able to index my metadata? Or does exiftool use some custom method to save the tags in the images, such that they are not recognized by fscrawler? Is there some standard command line tool to get this working and can anyone provide an example on how to create a custom field in an image (jpg, png and/or pdf) which should work?

My goal is, to add a bunch of perceptual hashes to each image. Also I want that the user is able to mark faces including coordinates and the name of the person within the image. This information should then be stored in the image itself, fscrawler should index it and, finally, with a few ES queries I'd like to search for duplicate images, ocr'ed text and a set of images a given person is found.

Many thanks for your time and help

Interesting.

I don't know (yet) if Tika (which is used by FSCrawler behind the scene) is able to extract those informations.

Could you send a sample file you would like to have with the description of the available metadata in an FSCrawler GitHub issue?
Similar to what you did here but with the attached file. :wink:

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.