I'm a bit lost with a couple of ES string issues.
So the idea is that I receive a search string, and I would like to be able
to search also on special characters.
I've found out that ES strips out all the special characters, and I found
that this should work for example:
But then suppose I index something like this:
{'name': 'special',
'created_datetime': '2012-02-14T13:10:47Z',
'doc_type': 'User',
'ideas': range(20),
'description': 'Another tester with @special'},
pyes.StringQuery('@special', search_fields=['description'])
never returns anything (even with the mapping), but this:
pyes.StringQuery('@special')
always works, even if I think it should not without the mapping.
Any idea of what could that be and how do I correctly allow special
characters? Is there a quoting function somewhere to make me quote
the special characters that would be interpreted in some other ways?
The latter search is probably working because it is defaulting to the _all
field, which is using the standard analyzer. A non-analyzed field will not
work, since the field will be indexed as one term: "Another tester with @special".
You need to create your own custom analyzer that does not strip out the
special characters. Are you referring to the @ symbol? The issue with the
standard analyzer is not that is strips out characters (char filter), but
the tokenizer has specific word boundaries.
You can see how a sentence is parsed by using the analysis API. You have a
few options for tokenziers. You can use these whitespace analyzer, which
might be too lenient for most case, or you can use the pattern tokenizer.
There are many other tokenizers, but these should help with your case. All
these concepts come from Lucene, so you can learn more about them by
reading up on Lucene.
I'm a bit lost with a couple of ES string issues.
So the idea is that I receive a search string, and I would like to be able
to search also on special characters.
I've found out that ES strips out all the special characters, and I found
that this should work for example:
But then suppose I index something like this:
{'name': 'special',
'created_datetime': '2012-02-14T13:10:47Z',
'doc_type': 'User',
'ideas': range(20),
'description': 'Another tester with @special'},
pyes.StringQuery('@special', search_fields=['description'])
never returns anything (even with the mapping), but this:
pyes.StringQuery('@special')
always works, even if I think it should not without the mapping.
Any idea of what could that be and how do I correctly allow special
characters? Is there a quoting function somewhere to make me quote
the special characters that would be interpreted in some other ways?
Thanks for the reply..
I think it's not worth then after all, also because I then have to make
sure I quote the special symbols that would be interpreted by ES, and so
on..
On Monday, January 28, 2013 6:48:07 PM UTC, Ivan Brusic wrote:
The latter search is probably working because it is defaulting to the _all
field, which is using the standard analyzer. A non-analyzed field will not
work, since the field will be indexed as one term: "Another tester with @special".
You need to create your own custom analyzer that does not strip out the
special characters. Are you referring to the @ symbol? The issue with the
standard analyzer is not that is strips out characters (char filter), but
the tokenizer has specific word boundaries.
You can see how a sentence is parsed by using the analysis API. You have a
few options for tokenziers. You can use these whitespace analyzer, which
might be too lenient for most case, or you can use the pattern tokenizer.
There are many other tokenizers, but these should help with your case. All
these concepts come from Lucene, so you can learn more about them by
reading up on Lucene.
Cheers,
Ivan
On Mon, Jan 28, 2013 at 4:11 AM, andrea crotti <andrea....@gmail.com<javascript:>
wrote:
I'm a bit lost with a couple of ES string issues.
So the idea is that I receive a search string, and I would like to be
able to search also on special characters.
I've found out that ES strips out all the special characters, and I found
that this should work for example:
But then suppose I index something like this:
{'name': 'special',
'created_datetime': '2012-02-14T13:10:47Z',
'doc_type': 'User',
'ideas': range(20),
'description': 'Another tester with @special'},
pyes.StringQuery('@special', search_fields=['description'])
never returns anything (even with the mapping), but this:
pyes.StringQuery('@special')
always works, even if I think it should not without the mapping.
Any idea of what could that be and how do I correctly allow special
characters? Is there a quoting function somewhere to make me quote
the special characters that would be interpreted in some other ways?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.