Categorizing images with deep learning into Elasticsearch


(Emmanuel Benazera) #1

Hi all,

I've built a short pipeline for using deep learning over images (e.g. for image categorization) and indexing them into Elasticsearch. I thought this could be of interest to other practitioners as well. In a nutshell, this allows to tag and retrieve image documents via ES even when no caption is available.

On the technical side, a deep learning server directly pushes the image classification results into an instance of ES, so that no glue code is necessary.
See http://www.deepdetect.com/tutorials/es-image-classifier/ for a short tutorial, and https://github.com/beniz/deepdetect for the more generic deep learning server (that relies on the Caffe library).

Many of the machine learning pipelines that I am involved with include an ES instance so my guess is that it may probably be the case for others as well.

This type of coupling is pretty generic and should capture a larger set of common cases, from text classification to prediction based on a variety of features and data. A typical extension would be an image similarity search ability, much more powerful than the existing LIRE plugin.

Let me know your thoughts and issues if any,

Thanks,

Em


ElasticSearch5系での画像検索方法を知りたい
(Isabel Drost-Fromm) #2

The examples in your tutorial look really nice - both the actual tagging but also the ease of integrating with Elasticsearch.

[quote="beniz, post:1, topic:33217"]
On the technical side, a deep learning server directly pushes the image classification results into an instance of ES, so that no glue code is necessray.[/quote]

So essentially you are making those classification results easily available and searchable through ES, right? Sounds like a useful integration to me.

Just a random thought wrt. to text classification in particular: Right now you are using ES as a sink for the data your model generates. There's quite some few text analysis and pre-processing capabilities that ES ships with. I'm wondering what it would take to integrate those with DeepDetect (and whether you think this would make any sense at all).

Thanks for sharing your interesting project,
Isabel


(Emmanuel Benazera) #3

So essentially you are making those classification results easily available and searchable through ES, right? Sounds like a useful integration to me.

This is correct.

Just a random thought wrt. to text classification in particular: Right now you are using ES as a sink for the data your model generates. There's quite some few text analysis and pre-processing capabilities that ES ships with. I'm wondering what it would take to integrate those with DeepDetect (and whether you think this would make any sense at all).

Using ES as a source for prediction with DeepDetect is one of the envisioned next steps. It is actually an interesting idea to use text features internal to ES as a source for the machine learning. Though DeepDetect comes with a generic text connector that automatically computes reasonable text features for input to the neural nets.

Em.


(Mark Walkom) #4

Awesome stuff!


(Isabel Drost-Fromm) #5

Oh - nice.

Sorry if it's listed in the docs - couldn't find the information: Which types of features do you support? Which languages do you cover?

Cheers,
Isabel


(Emmanuel Benazera) #6

Sorry if it's listed in the docs - couldn't find the information: Which types of features do you support?

Text features are hidden away since they are computed out of the raw text. This means you can pass raw text to the deep learning server in either training or prediction mode. Not sure about your affinity with machine learning at large, so training phase builds a model, i.e. for sentiment classification, while prediction phase leverages that model to predict for instance the sentiment category of pieces of text. The text feature computation is all handled by the 'txt' connector to DeepDetect. The following may help:

Now, slightly more technical, the server computes text features that range from counting words to TF/IDF when needed. These features are known as BOW (bag of words), but others exist, such as word vectors (e.g. word2vec).

If you happen to already have features similar to the above (i.e. word counts, etc.., maybe from ES itself), they can be used for training and prediction as well, but through a different connector (the 'csv' connector).

My apologies if this feels a bit hairy still, let me know if this is not clear enough.

Which languages do you cover?

Models can be trained for any language and task really. Though the existing input text connector may not be appropriate for chinese, arabic, korean and a few others. But as you mentioned, taking the raw features out of ES, which certainly supports a large variety of languages, would be a smart thing to do. This is something that I'd need to look at in more details.

FYI, at the moment I do not provide a list of models for generic tasks and/or languages. I'm in the process of doing so, typically for a range of image classification tasks, and it would be easy to do similar things for text.

If you have an application in mind, and that you can share, I'd be interested to know, typically in order to best target the range of generic models above.


(Isabel Drost-Fromm) #7

I used to spend a few years in the area of web spam classification based on textual and linkage features in a previous life. Also co-founded Apache Mahout at a time when there were barely any production grade machine learning suites out there. Fortunately the latter has changed in the past few years...

Not hairy at all. You do a great job of balancing technical depth with simplicity which I know is hard when writing for someone who's background one doesn't know exactly.

Yeah - those are the really tricky ones. Though there are a few caveats e.g. when using the exact same processing pipeline developed for English text on something like German. Just one example that comes to my mind: For the latter you might need some more sophisticated implementation for stemming words.

Are you re-using any text processing toolkits like e.g. OpenNLP or are all feature generation implementations your own implementation?

Sorry - mostly asking out of pure personal interest here. Always nice to see ppl making classification easier for everyone else. Especially after having spent quite a bit of time on Apache Mahout I think I know how hard it is to make machine learning in general and classification in particular approachable for your average Joe Developer who easily gets scared when confronted with Greek letters :wink:

Isabel


(Emmanuel Benazera) #8

Also co-founded Apache Mahout at a time when there were barely any production grade machine learning suites out there

haha OK, great! In between research labs, I've spent a good chunk of the last six years applying and simplifying ML in various industries. I have a strong belief that ML is the next commodity on the stack. Just some people need to do it :wink:

For the latter you might need some more sophisticated implementation for stemming words. Are you re-using any text processing toolkits like e.g. OpenNLP or are all feature generation implementations your own implementation?

My current set of applications and experiments with Deep Learning (DL) for NLP is that some classifical features (e.g. ngrams/bow) yield the best results and that sometimes TF/IDF is not even needed (though I still have a range of experiments to run before definitely confirming this last one). DeepDetect is not relying on OpenNLP and does simple parsing and counting, mostly because other connectors (e.g. 'csv') can handle any type of data, including those generated by OpenNLP. Though building a direct input connector to OpenNLP sounds like a pretty good idea.

But there's another hidden reason: my current understanding, and somewhat belief, is that DL is a game changer and that the parsing/chunking etc.. steps will disappear and be handled directly by the training phase. Recent work on character-level classification (1) does a great job summarizing the performances of a variety of schemes, including word vectors, BOW and ngrams. When and if character-level takes the lead, the text pipeline will be greatly simplified.

FYI, across years I must confess I've never witnessed that stemming could yield better results for any ML task so it's not part of the current pipeline. This typically does not apply to indexing & searching.

(1) http://arxiv.org/abs/1509.01626


(Isabel Drost-Fromm) #9

We've certainly come a long way since I first touched neural networks back in 2001. I like the approach DeepDetect is taking on "give some data to my service - I'll figure out the rest". If I'm not mistaken the part of Mahout that received most adoption was the part that had gone to similar length wrt. to automating things.

I remember hearing similar things in the past, but can't remember in which context - definitely was classification related IIRC.

The thing I remember is that when chaining pre-processing steps errors essentially add up so that often it's better to find a way to encode as much as possible in the final model. So, without knowing much about DL (thanks for sharing the link by the way) what you describe might actually be a very interesting approach.

Cheers,
Isabel


(Emmanuel Benazera) #10

I like the approach DeepDetect is taking on "give some data to my service - I'll figure out the rest".

Thanks, yes, the rationality for that is there are best practices in processing the features to squeeze the most of a given ML technique. This is somewhat even truer for neural nets which are very sensitive to un-scaled features etc... Having these steps automated equals to passing the expertise and knowledge on to users while avoiding the pain!


(system) #11