How can I make this search requirement work?


#1

I have a bit of an odd requirement in so far as analyzer is concerned.
Wondering if anyone has any tips/suggestions.
I have an item I am indexing (grade) that has a property (name) whose value
can be "0# (99.995%)".
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%.
I also want the user to be able to copy-paste "0# (99.995%)" and it should
work.

I am currently using the whitespace analyzer - which works for many of my
cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the
token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont
work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hello Mooky ,

You can apply multiple analyzers to a field -

So you can add all your analyzer here and apply it.

Thanks
Vineeth

On Tue, Jul 15, 2014 at 8:10 PM, mooky nick.minutello@gmail.com wrote:

I have a bit of an odd requirement in so far as analyzer is concerned.
Wondering if anyone has any tips/suggestions.
I have an item I am indexing (grade) that has a property (name) whose
value can be "0# (99.995%)".
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%.
I also want the user to be able to copy-paste "0# (99.995%)" and it should
work.

I am currently using the whitespace analyzer - which works for many of my
cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the
token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)"
wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mdRgWFJ8Q3Nwr%2BWh6SLFGtzcCWJg1VVV%2BSbOEhw5ktzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Glen Smith) #3

I would start by suggesting that you create an indexing/querying analyzer
specifically for the field you know has this format.

Otherwise, I think your likeliest path to success, I think, is somewhere in
the character filters domain.
Character filters are applied to the string before the tokenizer:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html

One possibility here is a pattern replace char filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html

If you can write a matching pattern for all of the allowed values of this
field, and replace them with just the number,
apply that pattern to your indexing and searching, then you are only
dealing with searching for the numbers.

You may need a different character filter for the search analyzer, though,
since you are allowing for more formats than
are found in the source document field.

On Tuesday, July 15, 2014 10:40:30 AM UTC-4, mooky wrote:

I have a bit of an odd requirement in so far as analyzer is concerned.
Wondering if anyone has any tips/suggestions.
I have an item I am indexing (grade) that has a property (name) whose
value can be "0# (99.995%)".
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%.
I also want the user to be able to copy-paste "0# (99.995%)" and it should
work.

I am currently using the whitespace analyzer - which works for many of my
cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the
token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)"
wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/805c3115-be4f-4ea5-a0d0-0153f9216043%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


#4

Thanks. That looks interesting!

On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:

Hello Mooky ,

You can apply multiple analyzers to a field -
https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
Vineeth

On Tue, Jul 15, 2014 at 8:10 PM, mooky <nick.mi...@gmail.com <javascript:>

wrote:

I have a bit of an odd requirement in so far as analyzer is concerned.
Wondering if anyone has any tips/suggestions.
I have an item I am indexing (grade) that has a property (name) whose
value can be "0# (99.995%)".
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%.
I also want the user to be able to copy-paste "0# (99.995%)" and it
should work.

I am currently using the whitespace analyzer - which works for many of my
cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the
token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)"
wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e1a9a56-c504-4bc3-b59f-aed6e0226796%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


#5

I think I can probably use a combo of the whitespace* and standard
analyzers.

My current analyzer settings are :

{

"analysis": {
    "analyzer": {
        "default_index": {
            "tokenizer": "whitespace",
            "filter": ["lowercase"]
        },
        "default_search": {
            "tokenizer": "whitespace",
            "filter": ["lowercase"]
        }
    }
}

}

-M

On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:

Hello Mooky ,

You can apply multiple analyzers to a field -
https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
Vineeth

On Tue, Jul 15, 2014 at 8:10 PM, mooky <nick.mi...@gmail.com <javascript:>

wrote:

I have a bit of an odd requirement in so far as analyzer is concerned.
Wondering if anyone has any tips/suggestions.
I have an item I am indexing (grade) that has a property (name) whose
value can be "0# (99.995%)".
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%.
I also want the user to be able to copy-paste "0# (99.995%)" and it
should work.

I am currently using the whitespace analyzer - which works for many of my
cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the
token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)"
wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1f3177ef-020f-4263-bae4-ced1870567e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


#6

And it works a treat. Thanks.

It leads me to think that it would be very useful to use with a series of
specialist (special-case) analyzers in conjunction with the standard
analyzer.

Back to my original example - "0# (99.995%)" - what I really want is
something that will extract "99.995%".
The standard analyzer will extract "99.995" (and the rest of the text), the
whitespace analyzer will extract "(99.995%)".

Does a financial/numeric/accounting analyzer already exist? ie Something
that extracts "99.995%" or "$44.5665" or "-45bps" ?

-M

On Tuesday, 15 July 2014 18:58:46 UTC+1, mooky wrote:

Thanks. That looks interesting!

On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:

Hello Mooky ,

You can apply multiple analyzers to a field -
https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
Vineeth

On Tue, Jul 15, 2014 at 8:10 PM, mooky nick.mi...@gmail.com wrote:

I have a bit of an odd requirement in so far as analyzer is concerned.
Wondering if anyone has any tips/suggestions.
I have an item I am indexing (grade) that has a property (name) whose
value can be "0# (99.995%)".
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%.
I also want the user to be able to copy-paste "0# (99.995%)" and it
should work.

I am currently using the whitespace analyzer - which works for many of
my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the
token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)"
wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(smonasco-2) #7

A little late to the party but I would have used a custom index analyzer with lowercase, pattern, edgengram and a search analyzer of lowercase, pattern (maybe you have to flip lowercase and pattern)

With the pattern tokenizer you can specify a regex.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/693ed0c3-2998-4da4-b30a-c7bf9f311770%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #8

Hello Mooky ,

Elasticsearch is not any domain specific and hence wont take out these
financial terms.
You will need to write your own analyzer to facilitate this function.

Thanks
Vineeth

On Wed, Jul 16, 2014 at 4:17 PM, mooky nick.minutello@gmail.com wrote:

And it works a treat. Thanks.

It leads me to think that it would be very useful to use with a series of
specialist (special-case) analyzers in conjunction with the standard
analyzer.

Back to my original example - "0# (99.995%)" - what I really want is
something that will extract "99.995%".
The standard analyzer will extract "99.995" (and the rest of the text),
the whitespace analyzer will extract "(99.995%)".

Does a financial/numeric/accounting analyzer already exist? ie Something
that extracts "99.995%" or "$44.5665" or "-45bps" ?

-M

On Tuesday, 15 July 2014 18:58:46 UTC+1, mooky wrote:

Thanks. That looks interesting!

On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:

Hello Mooky ,

You can apply multiple analyzers to a field -https://github.com/yakaz/
elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
Vineeth

On Tue, Jul 15, 2014 at 8:10 PM, mooky nick.mi...@gmail.com wrote:

I have a bit of an odd requirement in so far as analyzer is concerned.
Wondering if anyone has any tips/suggestions.
I have an item I am indexing (grade) that has a property (name) whose
value can be "0# (99.995%)".
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or
99.995%.
I also want the user to be able to copy-paste "0# (99.995%)" and it
should work.

I am currently using the whitespace analyzer - which works for many of
my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization,
the token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)"
wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kg7TRG%3DX_%2B7tAueFaZ8pUYXbHrJhFZMVQaYcQyTicenQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #9