Bounding boxes in images

I am interested in indexing metadata about images. In particular, what's the best way to be able to store and search on image tags with bounding boxes? For example a single image could have many items tagged in it, e.g a person, a car, a face, an object. With the tag there is a bounding box defining where in the image it appears.
eg.
{
"BoundingBox": {
"Height": 0.05817228928208351,
"Left": 0.4477301836013794,
"Top": 0.633576512336731,
"Width": 0.12381361424922943
},
"Confidence": 94.39461517333984
}
I am wondering what's the best way to index this so I can search for a tag at a particular location in the image. Is it possible to re-use any of the geo_bounding_box logic to do positional search?

//cc: @cbuescher

Hi Gethin,

just to let you know I haven't forgotten our f2f conversation about this, but all solution that we talked about have one or the other problem. Just to rephrase the problem I have in mind from our chat that you are trying to solve with my own words:

You have a dataset of images that you are indexing that you run image detection on and as a result for each image you get N objects that are part of the main image, together with the bounding box information you mentioned.

The query pattern that you are looking for that we talked about was "Give me all images with Object A and Object B in it, and A should be left/right/... of B".

The first part is relatively easy: storing all object Ids (e.g. their names etc...) in a multi-field in the parent image allows for filtering all images containing both objects A and B. But the additional constraint about the spacial relation requires some calculation (e.g. A.left < B.left).
I played around with both storing the object as plain "inner" object and also as nested object, but in both cases the positional filtering that I tried to do using painless scripts wasn't possible because scripted filters don't allow access to inner/nested objects (an issue around this is e.g. https://github.com/elastic/elasticsearch/issues/23719). Also access to the document "_source" which would allow pulling out the positional properties isn't currently possible, but there is some debate around making this possible even though it would potentially be slow.

I'm still toying around with another idea to retrieve the documents matching the query above with two subsequent queries, but haven't got a working example yet. Its an interesting puzzle.

About re-using geo_bounding_box or similar geo queries: there is some discussion going on about making them work on generic coordinate sets but this wouldn't really solve the issue at hand here. They would allow for querying all documents that contain object A in a bbox that is "left of" some particular Object B, but not all pairs of them that exist in all images, I think. For this I think you would still need some scripting (which currently doesn't seem to work due to the limitations mentioned above).

Will keep you updated if I find a two-phase approach though.

I also just learned about the two open issues about enabling indexing and searching general cartesian geometries, one in Lucene and the follow up in Elasticsearch. Might be worth watching although in your case this still leaves the problem of having to compare locations of two or more objects in the same image.

Thanks for continuing to look into this. I will keep an eye on the various tickets.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Sorry, its been a while but I kept this in the back of my mind and recently got another idea. Its a bit "hacky" and not nearly close to ready, but I though I might as well share it because the direction should be clear. The idea is to:

  • store a document per Image
  • include one multi-valued field with the "detected_entities_ids" (maybe some space efficient short version)
  • alongside the other image properties and the above field, store a simple inner object under each detected object id that includes the bbox information.
  • filter for images containing only object "a" and "b" (or more) in filter match clauses on the "detected_entities_ids" field
  • add another script filter that accesses the bbox information for each required object, using the object id in the path to lookup the corner values.

Here is a simplified example of what I mean. The query should return document 1. The script models some kind of "a left of b" relationship, meaning that the left+width value of the "a" object must be smaller than the left value of the "b" object.

Caveats:

  • I don't know how performant this is on larger datasets, but hopefully the first filter on the "detected_objects" field narrows the search space down enough for the script to not have to iterate over too many docs.
  • real-world locality relationships might be harder to express, but it should be doable
  • you need the image ids when you generate the query (maybe the script is parametrizable though) in order to use the right "lookup" path for the inner objects

There are a lot of edge cases to consider, e.g. this simple script breaks badly if any of the docs matching the first two filters doesn't contain either of the inner objects that match the "detected_object" fields. But it should be possible to manage those cases somehow.

Hope this gives you some direction to play with, I'm suprised this works somehow after all and is suprisingly short. Maybe I missed an important constraint, but there you go :wink:

PUT test/_doc/1
{
  "detected_objects" : [ "a", "b" ],
  "a" :  {
    "Height": 0.05,
    "Left": 0.4,
    "Top": 0.6,
    "Width" : 0.1
  },
  "b" :  {
    "Height": 0.05,
    "Left": 0.6,
    "Top": 0.6,
    "Width" : 0.1
  }
}

PUT test/_doc/2
{
  "detected_objects" : [ "a", "b" ],
  "a" :  {
    "Height": 0.05,
    "Left": 0.6,
    "Top": 0.6,
    "Width" : 0.1
  },
  "b" :  {
    "Height": 0.05,
    "Left": 0.4,
    "Top": 0.6,
    "Width" : 0.1
  }
}

PUT test/_doc/3
{
  "detected_objects" : [ "a", "c" ],
  "a" :  {
    "Height": 0.05,
    "Left": 0.4,
    "Top": 0.6,
    "Width" : 0.1
  },
  "c" :  {
    "Height": 0.05,
    "Left": 0.6,
    "Top": 0.6,
    "Width" : 0.1
  }
}

PUT test/_doc/4
{
  "detected_objects" : [ "a", "b" ],
  "a" :  {
    "Height": 0.05,
    "Left": 0.4,
    "Top": 0.6,
    "Width" : 0.1
  },
  "b" :  {
    "Height": 0.05,
    "Left": 0.5,   <- This shouldn't match currently because the value is right on a.Left + a.Width, but you can play with it by "moving" it
    "Top": 0.6,
    "Width" : 0.1
  }
}

GET /test/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "detected_objects": "a"
          }
        },
        {
          "match": {
            "detected_objects": "b"
          }
        },
        {
          "script": {
            "script": {
              "source": """
                        return doc['a.Left'].value + doc['a.Width'].value < doc['b.Left'].value;
""",
              "lang": "painless"
            }
          }
        }
      ]
    }
  }
}