Custom analyzers with elasticsearch-php API


(Olivier Revollat) #1

Hello I currently creating an elasticsearch extension for Bolt (a [very
cool] symfony2 based CMS).
For the PHP client i'm using elasticsearch-php.

Great ! I already wrote the code that create the index, mapping and add
some data :slight_smile:

Now I would like to customize the analyzer, so OK in the mapping I could
tell (as described in
http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_index_operations.html)
:

$myTypeMapping = array(
'_source' => array(
'enabled' => true
),
'properties' => array(
'first_name' => array(
'type' => 'string',
'analyzer' => 'whatever_analyzer'
),
'age' => array(
'type' => 'integer'
)
)
);

But I don't really understand* how/where to declare the so called **whatever_analyzer
*Instead of the default "standard" analyzer ... can you point me to the
example in the documentation ?

For example, if I want this kind of analyzers, how ca I declare this with
elasticsearch-php ?

                    analyzer:
                        default_index:
                            type: "custom"
                            char_filter: html_strip
                            tokenizer: "standard" # 

"my_edge_ngram_tokenizer"
filter: [ trim, lowercase, stop_fr,
fr_stemmer, my_edge_ngram_filter, asciifolding ]
default_search:
type: custom
tokenizer: standard
filter: [ trim, lowercase, stop_fr,
fr_stemmer, asciifolding ]
mots_clefs:
type: "custom"
tokenizer: "keyword"
filter: [ standard, trim, lowercase,
asciifolding ]
filter:
my_edge_ngram_filter:
type: "edgeNGram"
min_gram: "3"
max_gram: "20"
stop_fr:
type: "stop"
stopwords: [ french ]
fr_stemmer:
type: "stemmer"
name: "french"

Thanks :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0a9cd8f0-a6ac-473f-b090-9cff90fa9fa3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Olivier Revollat) #2

I publish my own answer :slight_smile:

so following the exemple
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html

you can add custom analyzers simply at the index creation :slight_smile:

    $indexParams['body'] = $this->config['indexsetting'];
    $result = $this->client->indices()->create($indexParams);

given that for example, $this->config['indexsetting'] is the following yaml
(that have to be converted in php array) :

indexsetting:
analysis:
analyzer:
nom_analyzer:
type: custom
tokenizer: standard
filter: [ trim, lowercase, asciifolding ]

Le lundi 23 juin 2014 14:44:49 UTC+2, Olivier Revollat a écrit :

Hello I currently creating an elasticsearch extension for Bolt (a [very
cool] symfony2 based CMS).
For the PHP client i'm using elasticsearch-php.

Great ! I already wrote the code that create the index, mapping and add
some data :slight_smile:

Now I would like to customize the analyzer, so OK in the mapping I could
tell (as described in
http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_index_operations.html)
:

$myTypeMapping = array(
'_source' => array(
'enabled' => true
),
'properties' => array(
'first_name' => array(
'type' => 'string',
'analyzer' => 'whatever_analyzer'
),
'age' => array(
'type' => 'integer'
)
)
);

But I don't really understand* how/where to declare the so called **whatever_analyzer
*Instead of the default "standard" analyzer ... can you point me to the
example in the documentation ?

For example, if I want this kind of analyzers, how ca I declare this with
elasticsearch-php ?

                    analyzer:
                        default_index:
                            type: "custom"
                            char_filter: html_strip
                            tokenizer: "standard" # 

"my_edge_ngram_tokenizer"
filter: [ trim, lowercase, stop_fr,
fr_stemmer, my_edge_ngram_filter, asciifolding ]
default_search:
type: custom
tokenizer: standard
filter: [ trim, lowercase, stop_fr,
fr_stemmer, asciifolding ]
mots_clefs:
type: "custom"
tokenizer: "keyword"
filter: [ standard, trim, lowercase,
asciifolding ]
filter:
my_edge_ngram_filter:
type: "edgeNGram"
min_gram: "3"
max_gram: "20"
stop_fr:
type: "stop"
stopwords: [ french ]
fr_stemmer:
type: "stemmer"
name: "french"

Thanks :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c8528a5d-18e5-4017-9708-329ab133386d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3