Phonetic search && i18n

Hi,

Does anyone know a way to use specific (localized) version of phonetic
analysers and use it through the existing plugin ? (For anyone wondering,
looking for something for french language...)

Thanks.

Yann

--

The Beider-Morse phonetic analyzer was developed also for french and is
available in Lucene Core

http://stevemorse.org/phonetics/bmpm.htm

In Elasticsearch, the phonetic filter name is "beider_morse"

Best regards,

Jörg

On Wednesday, January 23, 2013 3:45:21 PM UTC+1, Yann Barraud wrote:

Hi,

Does anyone know a way to use specific (localized) version of phonetic
analysers and use it through the existing plugin ? (For anyone wondering,
looking for something for french language...)

Thanks.

Yann

--

Hi Jörg,

I did not see this one. Double-metaphone seems to do the job also. Am I
wrong ?
I'll try both in the next few days hopefully...

Thanks !

Cordialement,
Yann Barraud

2013/1/24 Jörg Prante joergprante@gmail.com

The Beider-Morse phonetic analyzer was developed also for french and is
available in Lucene Core

Beider-Morse Phonetic Matching

In Elasticsearch, the phonetic filter name is "beider_morse"

Best regards,

Jörg

On Wednesday, January 23, 2013 3:45:21 PM UTC+1, Yann Barraud wrote:

Hi,

Does anyone know a way to use specific (localized) version of phonetic
analysers and use it through the existing plugin ? (For anyone wondering,
looking for something for french language...)

Thanks.

Yann

--

--

If you check double metaphone, you can decide if it meets your requirements.

Note the development timeline of phonetic encodings

  • Soundex, 1918 (start of names recognized, number codes)
  • American Soundex, ~1930 (for american-english names, used by U.S.
    Census Bureau)
  • Kölner Phonetik, 1970 (for german names)
  • Daitch-Mokotoff, 1985 (for eastern european names)
  • Metaphone, 1990 (improvements for variants in english names)
  • Double Metaphone, 2000 (foreign pronounciation extension, start of
    names recognized)
  • Beider-Morse, 2008 (pronounciation rules for identified languages,
    full name recognized)

So I think Alexander Beider (Paris) must have done a good job in 2008
when he developed a family name matching algorithm.

Best regards,

Jörg

Am 25.01.13 11:07, schrieb Yann Barraud:

Hi Jörg,

I did not see this one. Double-metaphone seems to do the job also. Am
I wrong ?
I'll try both in the next few days hopefully...

Thanks !

Cordialement,
Yann Barraud

2013/1/24 Jörg Prante <joergprante@gmail.com
mailto:joergprante@gmail.com>

The Beider-Morse phonetic analyzer was developed also for french
and is available in Lucene Core

http://stevemorse.org/phonetics/bmpm.htm

In Elasticsearch, the phonetic filter name is "beider_morse"

Best regards,

Jörg


On Wednesday, January 23, 2013 3:45:21 PM UTC+1, Yann Barraud wrote:

    Hi,

    Does anyone know a way to use specific (localized) version of
    phonetic analysers and use it through the existing plugin ?
    (For anyone wondering, looking for something for french
    language...)

    Thanks.

    Yann

-- 

--

--

Mmmm... Makes (lots of) sense !!

Cordialement,
Yann Barraud

2013/1/25 Jörg Prante joergprante@gmail.com

If you check double metaphone, you can decide if it meets your
requirements.

Note the development timeline of phonetic encodings

  • Soundex, 1918 (start of names recognized, number codes)
  • American Soundex, ~1930 (for american-english names, used by U.S. Census
    Bureau)
  • Kölner Phonetik, 1970 (for german names)
  • Daitch-Mokotoff, 1985 (for eastern european names)
  • Metaphone, 1990 (improvements for variants in english names)
  • Double Metaphone, 2000 (foreign pronounciation extension, start of names
    recognized)
  • Beider-Morse, 2008 (pronounciation rules for identified languages, full
    name recognized)

So I think Alexander Beider (Paris) must have done a good job in 2008 when
he developed a family name matching algorithm.

Best regards,

Jörg

Am 25.01.13 11:07, schrieb Yann Barraud:

Hi Jörg,

I did not see this one. Double-metaphone seems to do the job also. Am I
wrong ?
I'll try both in the next few days hopefully...

Thanks !

Cordialement,
Yann Barraud

2013/1/24 Jörg Prante <joergprante@gmail.com <mailto:
joergprante@gmail.com>**>

The Beider-Morse phonetic analyzer was developed also for french
and is available in Lucene Core

http://stevemorse.org/**phonetics/bmpm.htm<http://stevemorse.org/phonetics/bmpm.htm>

In Elasticsearch, the phonetic filter name is "beider_morse"

Best regards,

Jörg


On Wednesday, January 23, 2013 3:45:21 PM UTC+1, Yann Barraud wrote:

    Hi,

    Does anyone know a way to use specific (localized) version of
    phonetic analysers and use it through the existing plugin ?
    (For anyone wondering, looking for something for french
    language...)

    Thanks.

    Yann

--

--

--

--

Hi,

Can anyone tell me how to exploit the given filter ?

"query" : {
"bool": {
"must":
[{
"field":{
"prenom": {
"query":"yann"
}
}
},
{"field": {
"nom":{
"query":"rimbault"
}
}
},
{"field": {
"code_postal": {
"query":"75*"
}
}
}]
}
}
gives the correct answer (exact match), while
"query" : {
"bool": {
"must":
[{
"field":{
"prenom": {
"query":"yan"
}
}
},
{"field": {
"nom":{
"query":"rimbault"
}
}
},
{"field": {
"code_postal": {
"query":"75*"
}
}
}]
}
}
gives no answer.

Mapping is set to have beider-morse analyzer on fileds "nom" and "prenom"

Le vendredi 25 janvier 2013 11:30:23 UTC+1, Jörg Prante a écrit :

If you check double metaphone, you can decide if it meets your
requirements.

Note the development timeline of phonetic encodings

  • Soundex, 1918 (start of names recognized, number codes)
  • American Soundex, ~1930 (for american-english names, used by U.S.
    Census Bureau)
  • Kölner Phonetik, 1970 (for german names)
  • Daitch-Mokotoff, 1985 (for eastern european names)
  • Metaphone, 1990 (improvements for variants in english names)
  • Double Metaphone, 2000 (foreign pronounciation extension, start of
    names recognized)
  • Beider-Morse, 2008 (pronounciation rules for identified languages,
    full name recognized)

So I think Alexander Beider (Paris) must have done a good job in 2008
when he developed a family name matching algorithm.

Best regards,

Jörg

Am 25.01.13 11:07, schrieb Yann Barraud:

Hi Jörg,

I did not see this one. Double-metaphone seems to do the job also. Am
I wrong ?
I'll try both in the next few days hopefully...

Thanks !

Cordialement,
Yann Barraud

2013/1/24 Jörg Prante <joerg...@gmail.com <javascript:>
<mailto:joerg...@gmail.com <javascript:>>>

The Beider-Morse phonetic analyzer was developed also for french 
and is available in Lucene Core 

http://stevemorse.org/phonetics/bmpm.htm 

In Elasticsearch, the phonetic filter name is "beider_morse" 

Best regards, 

Jörg 


On Wednesday, January 23, 2013 3:45:21 PM UTC+1, Yann Barraud wrote: 

    Hi, 

    Does anyone know a way to use specific (localized) version of 
    phonetic analysers and use it through the existing plugin ? 
    (For anyone wondering, looking for something for french 
    language...) 

    Thanks. 

    Yann 

-- 

--

--

Yes, the use is non-trivial. So I prepared an example how to use
Beider-Morse with Elasticsearch in a gist

Cordialement,

Jörg

Am 28.01.13 11:00, schrieb Yann Barraud:

Hi,

Can anyone tell me how to exploit the given filter ?

"query" : {
"bool": {
"must":
[{
"field":{
"prenom": {
"query":"yann"
}
}
},
{"field": {
"nom":{
"query":"rimbault"
}
}
},
{"field": {
"code_postal": {
"query":"75*"
}
}
}]
}
}
gives the correct answer (exact match), while
"query" : {
"bool": {
"must":
[{
"field":{
"prenom": {
"query":"yan"
}
}
},
{"field": {
"nom":{
"query":"rimbault"
}
}
},
{"field": {
"code_postal": {
"query":"75*"
}
}
}]
}
}
gives no answer.

Mapping is set to have beider-morse analyzer on fileds "nom" and "prenom"

Le vendredi 25 janvier 2013 11:30:23 UTC+1, Jörg Prante a écrit :

If you check double metaphone, you can decide if it meets your
requirements.

Note the development timeline of phonetic encodings

- Soundex, 1918 (start of names recognized, number codes)
- American Soundex, ~1930 (for american-english names, used by U.S.
Census Bureau)
- Kölner Phonetik, 1970 (for german names)
- Daitch-Mokotoff, 1985 (for eastern european names)
- Metaphone, 1990 (improvements for variants in english names)
- Double Metaphone, 2000 (foreign pronounciation extension, start of
names recognized)
- Beider-Morse, 2008 (pronounciation rules for identified languages,
full name recognized)

So I think Alexander Beider (Paris) must have done a good job in 2008
when he developed a family name matching algorithm.

Best regards,

Jörg

Am 25.01.13 11:07, schrieb Yann Barraud:
> Hi Jörg,
>
> I did not see this one. Double-metaphone seems to do the job
also. Am
> I wrong ?
> I'll try both in the next few days hopefully...
>
> Thanks !
>
>
> Cordialement,
> Yann Barraud
>
>
> 2013/1/24 Jörg Prante <joerg...@gmail.com <javascript:>
> <mailto:joerg...@gmail.com <javascript:>>>
>
>     The Beider-Morse phonetic analyzer was developed also for
french
>     and is available in Lucene Core
>
> http://stevemorse.org/phonetics/bmpm.htm
<http://stevemorse.org/phonetics/bmpm.htm>
>
>     In Elasticsearch, the phonetic filter name is "beider_morse"
>
>     Best regards,
>
>     Jörg
>
>
>     On Wednesday, January 23, 2013 3:45:21 PM UTC+1, Yann
Barraud wrote:
>
>         Hi,
>
>         Does anyone know a way to use specific (localized)
version of
>         phonetic analysers and use it through the existing plugin ?
>         (For anyone wondering, looking for something for french
>         language...)
>
>         Thanks.
>
>         Yann
>
>     --
>
>
>
> --
>
>

--

Thnaks a lot !

What are the parts following used for ?

curl -XGET 'localhost:9200/test/_analyze?analyzer=phoneticAnalyzer&text=yann'

echo

echo "Query 1"

echo

Le lundi 28 janvier 2013 11:40:43 UTC+1, Jörg Prante a écrit :

Yes, the use is non-trivial. So I prepared an example how to use
Beider-Morse with Elasticsearch in a gist

Demonstration of Beider-Morse phonetic filter with Elasticsearch · GitHub

Cordialement,

Jörg

Am 28.01.13 11:00, schrieb Yann Barraud:

Hi,

Can anyone tell me how to exploit the given filter ?

"query" : {
"bool": {
"must":
[{
"field":{
"prenom": {
"query":"yann"
}
}
},
{"field": {
"nom":{
"query":"rimbault"
}
}
},
{"field": {
"code_postal": {
"query":"75*"
}
}
}]
}
}
gives the correct answer (exact match), while
"query" : {
"bool": {
"must":
[{
"field":{
"prenom": {
"query":"yan"
}
}
},
{"field": {
"nom":{
"query":"rimbault"
}
}
},
{"field": {
"code_postal": {
"query":"75*"
}
}
}]
}
}
gives no answer.

Mapping is set to have beider-morse analyzer on fileds "nom" and
"prenom"

Le vendredi 25 janvier 2013 11:30:23 UTC+1, Jörg Prante a écrit :

If you check double metaphone, you can decide if it meets your 
requirements. 

Note the development timeline of phonetic encodings 

- Soundex, 1918 (start of names recognized, number codes) 
- American Soundex, ~1930 (for american-english names, used by U.S. 
Census Bureau) 
- Kölner Phonetik, 1970 (for german names) 
- Daitch-Mokotoff, 1985 (for eastern european names) 
- Metaphone, 1990 (improvements for variants in english names) 
- Double Metaphone, 2000 (foreign pronounciation extension, start of 
names recognized) 
- Beider-Morse, 2008 (pronounciation rules for identified languages, 
full name recognized) 

So I think Alexander Beider (Paris) must have done a good job in 

2008

when he developed a family name matching algorithm. 

Best regards, 

Jörg 

Am 25.01.13 11:07, schrieb Yann Barraud: 
> Hi Jörg, 
> 
> I did not see this one. Double-metaphone seems to do the job 
also. Am 
> I wrong ? 
> I'll try both in the next few days hopefully... 
> 
> Thanks ! 
> 
> 
> Cordialement, 
> Yann Barraud 
> 
> 
> 2013/1/24 Jörg Prante <joerg...@gmail.com <javascript:> 
> <mailto:joerg...@gmail.com <javascript:>>> 
> 
>     The Beider-Morse phonetic analyzer was developed also for 
french 
>     and is available in Lucene Core 
> 
> http://stevemorse.org/phonetics/bmpm.htm 
<http://stevemorse.org/phonetics/bmpm.htm> 
> 
>     In Elasticsearch, the phonetic filter name is "beider_morse" 
> 
>     Best regards, 
> 
>     Jörg 
> 
> 
>     On Wednesday, January 23, 2013 3:45:21 PM UTC+1, Yann 
Barraud wrote: 
> 
>         Hi, 
> 
>         Does anyone know a way to use specific (localized) 
version of 
>         phonetic analysers and use it through the existing plugin 

?

>         (For anyone wondering, looking for something for french 
>         language...) 
> 
>         Thanks. 
> 
>         Yann 
> 
>     -- 
> 
> 
> 
> -- 
> 
> 

--

--

The bash script calls the _analyze and the _search API to demonstrate
the usage for the term 'yann' and 'yan'

Jörg

Am 28.01.13 11:57, schrieb Yann Barraud:

Thnaks a lot !

What are the parts following used for ?

--

Hello,

Works like a charm.

Have you any idea of the meaning of scores ? I get scores > 13 ? What does
it means ? I can't figure out why I get such scores, dans find no/few
documentation about how to interpret it...

Yann

Le lundi 28 janvier 2013 12:00:11 UTC+1, Jörg Prante a écrit :

The bash script calls the _analyze and the _search API to demonstrate
the usage for the term 'yann' and 'yan'

Jörg

Am 28.01.13 11:57, schrieb Yann Barraud:

Thnaks a lot !

What are the parts following used for ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Don't worry. The scoring of docs is not absolute but relative to other
scores in the same result set in its meaning. What you see in the scores
are very short query terms matching very short words (phonetic codes) in
documents. Elasticsearch default scoring is like Lucene scoring, you can
find more information here Apache Lucene - Scoring

Jörg

Am 29.01.13 17:19, schrieb Yann Barraud:

Have you any idea of the meaning of scores ? I get scores > 13 ? What
does it means ? I can't figure out why I get such scores, dans find
no/few documentation about how to interpret it...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.