Searching on special characters

I'm a bit lost with a couple of ES string issues.
So the idea is that I receive a search string, and I would like to be able
to search also on special characters.

I've found out that ES strips out all the special characters, and I found
that this should work for example:

need to add a special mapping to make this work

NOT_ANALYZE_MAP = {
"properties": {
"description": {
"type": "string",
"index": "not_analyzed"
}
}
}
CON.put_mapping(DOC_TYPE, NOT_ANALYZE_MAP, indices=[TEST_INDEX])

But then suppose I index something like this:
{'name': 'special',
'created_datetime': '2012-02-14T13:10:47Z',
'doc_type': 'User',
'ideas': range(20),
'description': 'Another tester with @special'},

pyes.StringQuery('@special', search_fields=['description'])
never returns anything (even with the mapping), but this:

pyes.StringQuery('@special')
always works, even if I think it should not without the mapping.

Any idea of what could that be and how do I correctly allow special
characters? Is there a quoting function somewhere to make me quote
the special characters that would be interpreted in some other ways?

--

The latter search is probably working because it is defaulting to the _all
field, which is using the standard analyzer. A non-analyzed field will not
work, since the field will be indexed as one term: "Another tester with
@special".

You need to create your own custom analyzer that does not strip out the
special characters. Are you referring to the @ symbol? The issue with the
standard analyzer is not that is strips out characters (char filter), but
the tokenizer has specific word boundaries.

You can see how a sentence is parsed by using the analysis API. You have a
few options for tokenziers. You can use these whitespace analyzer, which
might be too lenient for most case, or you can use the pattern tokenizer.
There are many other tokenizers, but these should help with your case. All
these concepts come from Lucene, so you can learn more about them by
reading up on Lucene.

Cheers,

Ivan

On Mon, Jan 28, 2013 at 4:11 AM, andrea crotti andrea.crotti.0@gmail.comwrote:

I'm a bit lost with a couple of ES string issues.
So the idea is that I receive a search string, and I would like to be able
to search also on special characters.

I've found out that ES strips out all the special characters, and I found
that this should work for example:

need to add a special mapping to make this work

NOT_ANALYZE_MAP = {
"properties": {
"description": {
"type": "string",
"index": "not_analyzed"
}
}
}
CON.put_mapping(DOC_TYPE, NOT_ANALYZE_MAP, indices=[TEST_INDEX])

But then suppose I index something like this:
{'name': 'special',
'created_datetime': '2012-02-14T13:10:47Z',
'doc_type': 'User',
'ideas': range(20),
'description': 'Another tester with @special'},

pyes.StringQuery('@special', search_fields=['description'])
never returns anything (even with the mapping), but this:

pyes.StringQuery('@special')
always works, even if I think it should not without the mapping.

Any idea of what could that be and how do I correctly allow special
characters? Is there a quoting function somewhere to make me quote
the special characters that would be interpreted in some other ways?

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group, send email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the reply..
I think it's not worth then after all, also because I then have to make
sure I quote the special symbols that would be interpreted by ES, and so
on..

On Monday, January 28, 2013 6:48:07 PM UTC, Ivan Brusic wrote:

The latter search is probably working because it is defaulting to the _all
field, which is using the standard analyzer. A non-analyzed field will not
work, since the field will be indexed as one term: "Another tester with
@special".

You need to create your own custom analyzer that does not strip out the
special characters. Are you referring to the @ symbol? The issue with the
standard analyzer is not that is strips out characters (char filter), but
the tokenizer has specific word boundaries.

Elasticsearch Platform — Find real-time answers at scale | Elastic

You can see how a sentence is parsed by using the analysis API. You have a
few options for tokenziers. You can use these whitespace analyzer, which
might be too lenient for most case, or you can use the pattern tokenizer.
There are many other tokenizers, but these should help with your case. All
these concepts come from Lucene, so you can learn more about them by
reading up on Lucene.

Cheers,

Ivan

On Mon, Jan 28, 2013 at 4:11 AM, andrea crotti <andrea....@gmail.com<javascript:>

wrote:

I'm a bit lost with a couple of ES string issues.
So the idea is that I receive a search string, and I would like to be
able to search also on special characters.

I've found out that ES strips out all the special characters, and I found
that this should work for example:

need to add a special mapping to make this work

NOT_ANALYZE_MAP = {
"properties": {
"description": {
"type": "string",
"index": "not_analyzed"
}
}
}
CON.put_mapping(DOC_TYPE, NOT_ANALYZE_MAP, indices=[TEST_INDEX])

But then suppose I index something like this:
{'name': 'special',
'created_datetime': '2012-02-14T13:10:47Z',
'doc_type': 'User',
'ideas': range(20),
'description': 'Another tester with @special'},

pyes.StringQuery('@special', search_fields=['description'])
never returns anything (even with the mapping), but this:

pyes.StringQuery('@special')
always works, even if I think it should not without the mapping.

Any idea of what could that be and how do I correctly allow special
characters? Is there a quoting function somewhere to make me quote
the special characters that would be interpreted in some other ways?

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.