Search with special caracters


(Taieb) #1

Hi,

how can i search for fields that contain special caracters like : @ , | - _
? * ...
I try query_string, wildcard but it doesn't work !

Thanks


(Ævar Arnfjörð Bjarmason) #2

This comes down to what analyzer you use:

$ searchanalyze -t '@ , | - _ ? *'
We analyzed <@ , | - _ ? *> with as <{
'tokens' => []
}

$ searchanalyze -a keyword -t '@ , | - _ ? *'
We analyzed <@ , | - _ ? *> with as <{
'tokens' => [
{
'end_offset' => 13,
'position' => 1,
'start_offset' => 0,
'token' => '@ , | - _ ? *',
'type' => 'word'
}
]
}

As you can see the standard analyzer completely ignores all those
special characters. You could use the whitespace analyzer:

$ searchanalyze -a whitespace -t '@ , | - _ ? '|grep token
'tokens' => [
'token' => '@',
'token' => ',',
'token' => '|',
'token' => '-',
'token' => '_',
'token' => '?',
'token' => '
',

But you should think carefully about what characters you want to make
searchable, some of them, all of them, in what context etc.


(Taieb) #3

Thank you AEvar,

$ searchanalyze -a whitespace -t '@ , | - _ ? '|grep token
'tokens' => [
'token' => '@',
'token' => ',',
'token' => '|',
'token' => '-',
'token' => '_',
'token' => '?',
'token' => '
',

but how can I use searchanalyze !(currently I use elasticsearch.yml to
change the analyzer)

On Fri, Jun 22, 2012 at 1:33 PM, Ævar Arnfjörð Bjarmason
avarab@gmail.comwrote:

This comes down to what analyzer you use:

$ searchanalyze -t '@ , | - _ ? *'
We analyzed <@ , | - _ ? *> with as <{
'tokens' => []
}

$ searchanalyze -a keyword -t '@ , | - _ ? *'
We analyzed <@ , | - _ ? *> with as <{
'tokens' => [
{
'end_offset' => 13,
'position' => 1,
'start_offset' => 0,
'token' => '@ , | - _ ? *',
'type' => 'word'
}
]
}

As you can see the standard analyzer completely ignores all those
special characters. You could use the whitespace analyzer:

$ searchanalyze -a whitespace -t '@ , | - _ ? '|grep token
'tokens' => [
'token' => '@',
'token' => ',',
'token' => '|',
'token' => '-',
'token' => '_',
'token' => '?',
'token' => '
',

But you should think carefully about what characters you want to make
searchable, some of them, all of them, in what context etc.

--
Taieb Charrada
élève ingénieur en 3ème année Informatique à l'ENIT
Tel : 06 52 37 18 39
E-mail : taieb.charrada@gmail.com


(Ævar Arnfjörð Bjarmason) #4

On Fri, Jun 22, 2012 at 4:24 PM, Taieb CHARRADA
taieb.charrada@gmail.com wrote:

but how can I use searchanalyze !(currently I use elasticsearch.yml to
change the analyzer)

It's a small wrapper I wrote that I haven't gotten around to open sourcing.


(Taieb) #5

So could you explain to me how can I create a custom whitespace tokenizer?
cause at least I create a keyword tokenizer in my yml file and for some
caracters like "@" "," "-" it works but for others like "(" ")" "!" it must
write query = ""text)"" to find *text) and in this case I can't search
it with only a keyword like text)

On Fri, Jun 22, 2012 at 5:07 PM, Ævar Arnfjörð Bjarmason
avarab@gmail.comwrote:

On Fri, Jun 22, 2012 at 4:24 PM, Taieb CHARRADA
taieb.charrada@gmail.com wrote:

but how can I use searchanalyze !(currently I use elasticsearch.yml to
change the analyzer)

It's a small wrapper I wrote that I haven't gotten around to open sourcing.

--
Taieb Charrada
élève ingénieur en 3ème année Informatique à l'ENIT
Tel : 06 52 37 18 39
E-mail : taieb.charrada@gmail.com


(Ævar Arnfjörð Bjarmason) #6

On Fri, Jun 22, 2012 at 5:26 PM, Taieb CHARRADA
taieb.charrada@gmail.com wrote:

So could you explain to me how can I create a custom whitespace tokenizer?
cause at least I create a keyword tokenizer in my yml file and for some
caracters like "@" "," "-" it works but for others like "(" ")" "!" it must
write query = ""text)"" to find text) and in this case I can't search it
with only a keyword like text)*

Read up on mappings and analyzers in the ES manual. Basically ES takes
the text you feed it and analyzes it (tokenizes it) before adding it
to the index.

If you want to search for something it needs to be in the index, with
the standard tokenizer the punctuation you want to search for will be
filtered out.


(system) #7