I'm working on a model where i will have the input data in AVRO format
and index the data into ES.
Before indexing I would like to add catagories to the data i.e Auto-*
Classification*.
eg: the data contains books. Each book will have a field topics or text.
Now before indexing i would like to assign catagories to it. Say a
particular book is for hadoop, servlets etc.there may be n number of
catagories assigned.
For assigning catagories i shall have dictionary in text format.
Any suggestions as to how i can proceed.
Is there any plugin available for the same?
There are no existing plugins for classification. I would be great to see
an Elasticsearch wrapper around a library like Weka or LingPipe. There are
many variables involved, it would be tough to get a general classifier to
work from everybody.
If you already have auto-classification running, I would keep the logic on
the client-indexing side.
I'm working on a model where i will have the input data in AVRO format
and index the data into ES.
Before indexing I would like to add catagories to the data i.e Auto-*
Classification*.
eg: the data contains books. Each book will have a field topics or text.
Now before indexing i would like to assign catagories to it. Say a
particular book is for hadoop, servlets etc.there may be n number of
catagories assigned.
For assigning catagories i shall have dictionary in text format.
Any suggestions as to how i can proceed.
Is there any plugin available for the same?
There are no existing plugins for classification. I would be great to
see an Elasticsearch wrapper around a library like Weka or LingPipe.
There are many variables involved, it would be tough to get a general
classifier to work from everybody.
If you already have auto-classification running, I would keep the
logic on the client-indexing side.
I'm working on a model where i will have the input data in
*AVRO* format and index the data into *ES*.
Before indexing I would like to add catagories to the data i.e
*Auto*-*Classification*.
eg: the data contains books. Each book will have a field
topics or text. Now before indexing i would like to assign
catagories to it. Say a particular book is for hadoop,
servlets etc.there may be n number of catagories assigned.
For assigning catagories i shall have dictionary in text format.
Any suggestions as to how i can proceed.
Is there any plugin available for the same?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
For more options, visit https://groups.google.com/groups/opt_out.
Another idea could be to use percolator feature.
Before indexing a doc, percolate it and find the list of corresponding queries (percolators names).
Add this list to your document as tags or categories.
Index this doc.
That said, you'll probably have to register many percolators.
My 2 cents.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 5 avr. 2013 à 22:25, Ivan Brusic ivan@brusic.com a écrit :
There are no existing plugins for classification. I would be great to see an Elasticsearch wrapper around a library like Weka or LingPipe. There are many variables involved, it would be tough to get a general classifier to work from everybody.
If you already have auto-classification running, I would keep the logic on the client-indexing side.
I'm working on a model where i will have the input data in AVRO format and index the data into ES.
Before indexing I would like to add catagories to the data i.e Auto-Classification.
eg: the data contains books. Each book will have a field topics or text. Now before indexing i would like to assign catagories to it. Say a particular book is for hadoop, servlets etc.there may be n number of catagories assigned.
For assigning catagories i shall have dictionary in text format.
Any suggestions as to how i can proceed.
Is there any plugin available for the same?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
the opennlp plugin does not do classification, but only named entity
recognition. I merely wrote to find out as a practice to convert parts of
the excellent "Taming Text" book into elasticsearch architecture. Loading
up the models requires half a gig of additional memory iirc - this is
definately something you rather would want to have in an external
application.
But maybe it is a nice boilerplate for further work.. like a classification
module. Need to take a closer look at the lucene classification, maybe it
is easier to implement.
--Alex
On Mon, Apr 8, 2013 at 2:56 PM, Borislav Gankov bgankov@gmail.com wrote:
I perform automatic classification using the "fuzzy like this" query. I
search for documents similar to the one i am indexing and give them the
same classification. I tried this technique and It works nicely for me, but
at first you will have to classify some documents manually to be used as
learning data.
Hello, I was looking at document classification problem and I found that Solr 6.1 version (not yet released) has the "Document Classification" feature. It's based on "Lucene Classification Module".
More details in this blog post:
I tried to find if the equivalent feature exists in Elasticsearch, but it doesn't seem so.
This topic also mentions that "Lucene 4.2 has started a simple classification package", but the topic itself is 865 days old.
Do you know if there any plans to integrate "Lucene Classification Module" into Elasticsearch?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.