Newbie questions

Janno_Jarv · August 3, 2013, 1:16pm

Hi!

I just started experimenting with ElasticSearch and everything is still
very overwhelming.

So I was hoping that maybe somebody can point me to right direction with
these questions.

My document contains *Url *and *Content *fields. I have two lists, one
containing ~1000 domain names and another with ~5000 words/phrases that I
would like to act as stop words. For example, if I do a search and document
and its Url or Content contains any of these excludes I don't want it to
return in search results.
What is the best way to accomplish numeric value search within text? For
example, I have text "Facebook Now Has 1.15 Billion Monthly Active Users".
I would like to search 1 500 000 000 and same thing with ranges also, like
1 500 000 000 - 2 000 000 000. Can it be done withing text field using some
kind of special number analyzer and tokenizer? Or I should extract all
numeric values first, store as an array in separate field and then use
number range filter.

Thanks!
Janno

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Janno_Jarv · August 3, 2013, 1:55pm

Just found OpenNLP plugin for Elasticsearch - it can detect entites like
mone, location, date etc and store in separate fields for filtering .

Janno

laupäev, 3. august 2013 16:16.41 UTC+3 kirjutas Janno Järv:

Hi!

I just started experimenting with Elasticsearch and everything is still
very overwhelming.

So I was hoping that maybe somebody can point me to right direction with
these questions.

My document contains *Url *and *Content *fields. I have two lists, one
containing ~1000 domain names and another with ~5000 words/phrases that I
would like to act as stop words. For example, if I do a search and document
and its Url or Content contains any of these excludes I don't want it to
return in search results.

What is the best way to accomplish numeric value search within text?
For example, I have text "Facebook Now Has 1.15 Billion Monthly Active
Users". I would like to search 1 500 000 000 and same thing with ranges
also, like 1 500 000 000 - 2 000 000 000. Can it be done withing text field
using some kind of special number analyzer and tokenizer? Or I should
extract all numeric values first, store as an array in separate field and
then use number range filter.

Thanks!
Janno

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · August 5, 2013, 7:39am

Hey,

first, did you see the uaxurlemail tokenizer:

Also, if you do not want to return certain documents in your search
results, it might make more sense, not to index them at all...

A short note about then OpenNLP plugin: I have merely written this as a
test balloon in order to find out if I could - there are several reasons,
why NLP is more likely a pre indexing step (at least the way how I
implemented it, there are deeper lucene integrations like the UIMA one,
where it might be useful to integrate it into elasticsearch).

a) The model takes up a lot of RAM, which will be duplicated for each node.
b) You have to shutdown your cluster, if you want to update your model
c) You have to shutdown your cluster, if you want to update your opennlp
libraries

Last, the OpenNLP plugin does not help you with your second requirement I
think (at least I did not intend to do that

--Alex

On Sat, Aug 3, 2013 at 3:55 PM, Janno Järv jannojarv@gmail.com wrote:

Just found OpenNLP plugin for Elasticsearch - it can detect entites like
mone, location, date etc and store in separate fields for filtering .

Janno

laupäev, 3. august 2013 16:16.41 UTC+3 kirjutas Janno Järv:

Hi!

I just started experimenting with Elasticsearch and everything is still
very overwhelming.

So I was hoping that maybe somebody can point me to right direction with
these questions.

My document contains *Url *and *Content *fields. I have two lists,
one containing ~1000 domain names and another with ~5000 words/phrases that
I would like to act as stop words. For example, if I do a search and
document and its Url or Content contains any of these excludes I don't want
it to return in search results.

What is the best way to accomplish numeric value search within text?
For example, I have text "Facebook Now Has 1.15 Billion Monthly Active
Users". I would like to search 1 500 000 000 and same thing with ranges
also, like 1 500 000 000 - 2 000 000 000. Can it be done withing text field
using some kind of special number analyzer and tokenizer? Or I should
extract all numeric values first, store as an array in separate field and
then use number range filter.

Thanks!
Janno

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Numeric filter in string field- painless? Elasticsearch	1	613	May 29, 2019
Best way to search for number ranges in text Elasticsearch	1	423	November 3, 2017
Indexing a numeric range field Elasticsearch	2	294	July 6, 2017
How to exclude numeric values from ES-Index? Elasticsearch	3	2558	July 5, 2017
How to query only numeric values in a String field? Elasticsearch	2	1575	February 8, 2018

Newbie questions

Related topics