Newbie questions

Hi!

I just started experimenting with ElasticSearch and everything is still
very overwhelming.

So I was hoping that maybe somebody can point me to right direction with
these questions.

  1. My document contains *Url *and *Content *fields. I have two lists, one
    containing ~1000 domain names and another with ~5000 words/phrases that I
    would like to act as stop words. For example, if I do a search and document
    and its Url or Content contains any of these excludes I don't want it to
    return in search results.

  2. What is the best way to accomplish numeric value search within text? For
    example, I have text "Facebook Now Has 1.15 Billion Monthly Active Users".
    I would like to search 1 500 000 000 and same thing with ranges also, like
    1 500 000 000 - 2 000 000 000. Can it be done withing text field using some
    kind of special number analyzer and tokenizer? Or I should extract all
    numeric values first, store as an array in separate field and then use
    number range filter.

Thanks!
Janno

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Just found OpenNLP plugin for Elasticsearch - it can detect entites like
mone, location, date etc and store in separate fields for filtering .

Janno

laupäev, 3. august 2013 16:16.41 UTC+3 kirjutas Janno Järv:

Hi!

I just started experimenting with Elasticsearch and everything is still
very overwhelming.

So I was hoping that maybe somebody can point me to right direction with
these questions.

  1. My document contains *Url *and *Content *fields. I have two lists, one
    containing ~1000 domain names and another with ~5000 words/phrases that I
    would like to act as stop words. For example, if I do a search and document
    and its Url or Content contains any of these excludes I don't want it to
    return in search results.

  2. What is the best way to accomplish numeric value search within text?
    For example, I have text "Facebook Now Has 1.15 Billion Monthly Active
    Users". I would like to search 1 500 000 000 and same thing with ranges
    also, like 1 500 000 000 - 2 000 000 000. Can it be done withing text field
    using some kind of special number analyzer and tokenizer? Or I should
    extract all numeric values first, store as an array in separate field and
    then use number range filter.

Thanks!
Janno

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

first, did you see the uaxurlemail tokenizer:

Also, if you do not want to return certain documents in your search
results, it might make more sense, not to index them at all...

A short note about then OpenNLP plugin: I have merely written this as a
test balloon in order to find out if I could - there are several reasons,
why NLP is more likely a pre indexing step (at least the way how I
implemented it, there are deeper lucene integrations like the UIMA one,
where it might be useful to integrate it into elasticsearch).

a) The model takes up a lot of RAM, which will be duplicated for each node.
b) You have to shutdown your cluster, if you want to update your model
c) You have to shutdown your cluster, if you want to update your opennlp
libraries

Last, the OpenNLP plugin does not help you with your second requirement I
think (at least I did not intend to do that :wink:

--Alex

On Sat, Aug 3, 2013 at 3:55 PM, Janno Järv jannojarv@gmail.com wrote:

Just found OpenNLP plugin for Elasticsearch - it can detect entites like
mone, location, date etc and store in separate fields for filtering .

Janno

laupäev, 3. august 2013 16:16.41 UTC+3 kirjutas Janno Järv:

Hi!

I just started experimenting with Elasticsearch and everything is still
very overwhelming.

So I was hoping that maybe somebody can point me to right direction with
these questions.

  1. My document contains *Url *and *Content *fields. I have two lists,
    one containing ~1000 domain names and another with ~5000 words/phrases that
    I would like to act as stop words. For example, if I do a search and
    document and its Url or Content contains any of these excludes I don't want
    it to return in search results.

  2. What is the best way to accomplish numeric value search within text?
    For example, I have text "Facebook Now Has 1.15 Billion Monthly Active
    Users". I would like to search 1 500 000 000 and same thing with ranges
    also, like 1 500 000 000 - 2 000 000 000. Can it be done withing text field
    using some kind of special number analyzer and tokenizer? Or I should
    extract all numeric values first, store as an array in separate field and
    then use number range filter.

Thanks!
Janno

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.