Hey,
first, did you see the uaxurlemail tokenizer:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/uaxurlemail-tokenizer/
Also, if you do not want to return certain documents in your search
results, it might make more sense, not to index them at all...
A short note about then OpenNLP plugin: I have merely written this as a
test balloon in order to find out if I could - there are several reasons,
why NLP is more likely a pre indexing step (at least the way how I
implemented it, there are deeper lucene integrations like the UIMA one,
where it might be useful to integrate it into elasticsearch).
a) The model takes up a lot of RAM, which will be duplicated for each node.
b) You have to shutdown your cluster, if you want to update your model
c) You have to shutdown your cluster, if you want to update your opennlp
libraries
Last, the OpenNLP plugin does not help you with your second requirement I
think (at least I did not intend to do that 
--Alex
On Sat, Aug 3, 2013 at 3:55 PM, Janno Järv jannojarv@gmail.com wrote:
Just found OpenNLP plugin for ElasticSearch - it can detect entites like
mone, location, date etc and store in separate fields for filtering .
Janno
laupäev, 3. august 2013 16:16.41 UTC+3 kirjutas Janno Järv:
Hi!
I just started experimenting with ElasticSearch and everything is still
very overwhelming.
So I was hoping that maybe somebody can point me to right direction with
these questions.
-
My document contains *Url *and *Content *fields. I have two lists,
one containing ~1000 domain names and another with ~5000 words/phrases that
I would like to act as stop words. For example, if I do a search and
document and its Url or Content contains any of these excludes I don't want
it to return in search results.
-
What is the best way to accomplish numeric value search within text?
For example, I have text "Facebook Now Has 1.15 Billion Monthly Active
Users". I would like to search 1 500 000 000 and same thing with ranges
also, like 1 500 000 000 - 2 000 000 000. Can it be done withing text field
using some kind of special number analyzer and tokenizer? Or I should
extract all numeric values first, store as an array in separate field and
then use number range filter.
Thanks!
Janno
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.