Auto Classification

I'm working on a model where i will have the input data in AVRO format
and index the data into ES.
Before indexing I would like to add catagories to the data i.e Auto-*
Classification*.

eg: the data contains books. Each book will have a field topics or text.
Now before indexing i would like to assign catagories to it. Say a
particular book is for hadoop, servlets etc.there may be n number of
catagories assigned.

For assigning catagories i shall have dictionary in text format.

Any suggestions as to how i can proceed.
Is there any plugin available for the same?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There are no existing plugins for classification. I would be great to see
an Elasticsearch wrapper around a library like Weka or LingPipe. There are
many variables involved, it would be tough to get a general classifier to
work from everybody.

If you already have auto-classification running, I would keep the logic on
the client-indexing side.

--
Ivan

On Fri, Apr 5, 2013 at 4:28 AM, kuwar sahani kuwarsahani@gmail.com wrote:

I'm working on a model where i will have the input data in AVRO format

and index the data into ES.
Before indexing I would like to add catagories to the data i.e Auto-*
Classification*.

eg: the data contains books. Each book will have a field topics or text.
Now before indexing i would like to assign catagories to it. Say a
particular book is for hadoop, servlets etc.there may be n number of
catagories assigned.

For assigning catagories i shall have dictionary in text format.

Any suggestions as to how i can proceed.
Is there any plugin available for the same?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Lucene 4.2 has started a simple classification package

http://lucene.apache.org/core/4_2_0/classification/org/apache/lucene/classification/package-summary.html

Lucene 5.0 will get a classification module, I'm sure it will be usable
by Elasticsearch API then.

https://issues.apache.org/jira/browse/LUCENE-4345

Jörg

Am 05.04.13 22:25, schrieb Ivan Brusic:

There are no existing plugins for classification. I would be great to
see an Elasticsearch wrapper around a library like Weka or LingPipe.
There are many variables involved, it would be tough to get a general
classifier to work from everybody.

If you already have auto-classification running, I would keep the
logic on the client-indexing side.

--
Ivan

On Fri, Apr 5, 2013 at 4:28 AM, kuwar sahani <kuwarsahani@gmail.com
mailto:kuwarsahani@gmail.com> wrote:

    I'm working on a model where i will have the input data in
    *AVRO* format and index the data into *ES*.
    Before indexing I would like to add catagories to the data i.e
    *Auto*-*Classification*.
    eg: the data contains books. Each book will have a field
    topics or text. Now before indexing i would like to assign
    catagories to it. Say a particular book is for hadoop,
    servlets etc.there may be n number of catagories assigned.

    For assigning catagories i shall have dictionary in text format.
    Any suggestions as to how i can proceed.
    Is there any plugin available for the same?

-- 
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Another idea could be to use percolator feature.
Before indexing a doc, percolate it and find the list of corresponding queries (percolators names).
Add this list to your document as tags or categories.
Index this doc.

That said, you'll probably have to register many percolators.

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 avr. 2013 à 22:25, Ivan Brusic ivan@brusic.com a écrit :

There are no existing plugins for classification. I would be great to see an Elasticsearch wrapper around a library like Weka or LingPipe. There are many variables involved, it would be tough to get a general classifier to work from everybody.

If you already have auto-classification running, I would keep the logic on the client-indexing side.

--
Ivan

On Fri, Apr 5, 2013 at 4:28 AM, kuwar sahani kuwarsahani@gmail.com wrote:

I'm working on a model where i will have the input data in AVRO format and index the data into ES.
Before indexing I would like to add catagories to the data i.e Auto-Classification.

eg: the data contains books. Each book will have a field topics or text. Now before indexing i would like to assign catagories to it. Say a particular book is for hadoop, servlets etc.there may be n number of catagories assigned.
For assigning catagories i shall have dictionary in text format.
Any suggestions as to how i can proceed.
Is there any plugin available for the same?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I've come across an OpenNLP plugin though I've never used it:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

the opennlp plugin does not do classification, but only named entity
recognition. I merely wrote to find out as a practice to convert parts of
the excellent "Taming Text" book into elasticsearch architecture. Loading
up the models requires half a gig of additional memory iirc - this is
definately something you rather would want to have in an external
application.

But maybe it is a nice boilerplate for further work.. like a classification
module. Need to take a closer look at the lucene classification, maybe it
is easier to implement.

--Alex

On Mon, Apr 8, 2013 at 2:56 PM, Borislav Gankov bgankov@gmail.com wrote:

I've come across an OpenNLP plugin though I've never used it:
GitHub - spinscale/elasticsearch-opennlp-plugin: Additional opennlp mapping type for elasticsearch in order to perform named entity recognition

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I perform automatic classification using the "fuzzy like this" query. I
search for documents similar to the one i am indexing and give them the
same classification. I tried this technique and It works nicely for me, but
at first you will have to classify some documents manually to be used as
learning data.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e4ecc90-5397-4c35-8c1a-e3db3951ce3c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello, I was looking at document classification problem and I found that Solr 6.1 version (not yet released) has the "Document Classification" feature. It's based on "Lucene Classification Module".

More details in this blog post:

I tried to find if the equivalent feature exists in Elasticsearch, but it doesn't seem so.

This topic also mentions that "Lucene 4.2 has started a simple classification package", but the topic itself is 865 days old.

Do you know if there any plans to integrate "Lucene Classification Module" into Elasticsearch?