How to use one token filter for indexing and another for search?


(Petr Janský) #1

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

  • stem
  • sentence elements(noun, adjective...)
  • verb conjugations and noun declensions and adjective
  • .....

2nd - for search that retrieves only:

  • stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search than
for indexing. But I think it's better way than duplicate index fields for
each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #2

Hi Petr

Create two custom analyzers in your index, then you can set fields to use
one for the search_analyzer and the other for the index_analyzer

clint

On 25 October 2013 13:32, Petr Janský petr.jansky@6hats.cz wrote:

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

  • stem
  • sentence elements(noun, adjective...)
  • verb conjugations and noun declensions and adjective
  • .....

2nd - for search that retrieves only:

  • stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search than
for indexing. But I think it's better way than duplicate index fields for
each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #3

Alternatively, you can always specify a particular analyzer at query time
in the query itself

On 25 October 2013 13:46, Clinton Gormley clint@traveljury.com wrote:

Hi Petr

Create two custom analyzers in your index, then you can set fields to use
one for the search_analyzer and the other for the index_analyzer

clint

On 25 October 2013 13:32, Petr Janský petr.jansky@6hats.cz wrote:

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

  • stem
  • sentence elements(noun, adjective...)
  • verb conjugations and noun declensions and adjective
  • .....

2nd - for search that retrieves only:

  • stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search than
for indexing. But I think it's better way than duplicate index fields for
each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Petr Janský) #4

Hi Clint,

thank you for your quick reply.

So if I get you right I should change my analyzers and mapping to

curl -X POST 'localhost:9200/index3/' -d '{

  • "settings" : {*
  • "analysis" : {*
  •  "analyzer" : {*
    
  •    "index_cestina" : {*
    
  •      "type" : "custom",*
    
  •      "tokenizer" : "standard",*
    
  •      "filter" : ["stopwords_CZ", "index_mor_czech"]*
    
  •    },*
    
  • "search_cestina" : {*
  •      "type" : "custom",*
    
  •      "tokenizer" : "standard",*
    
  •      "filter" : ["stopwords_CZ", "search_mor_czech"]*
    
  •    }*
    
  •  },*
    
  • "stopwords_CZ" : {*
  •      "type" : "stop",*
    
  •      "stopwords" : [ "právě", "že", "_czech_" ],*
    
  •      "ignore_case" : true*
    
  •    },*
    
  •  "filter" : {*
    
  •    "index_mor_czech" : {*
    
  •      "type" : "morphologyIndex",*
    
  •      "language" : "cs",*
    
  •      "path" : "/opt/morphologyCs"*
    
  • },*
  • "search_mor_czech" : {*
  •      "type" : "morphologySearch",*
    
  •      "language" : "cs",*
    
  •      "path" : "/opt/morphologyCs"*
    
  • } *
  • }}},*
  • "mappings" : {*
  • "article" : {*
  •    "_id" : {*
    
  •        "path" : "reference"*
    
  •    },*
    
  • "properties" : {*
  •    "title"     : { "type" : "string", "index_analyzer" : 
    

"index_cestina", "search_analyzer" : "search_cestina"},*

  •    "content"     : { "type" : "string", "index_analyzer" : 
    

"index_cestina", "search_analyzer" : "search_cestina"}*
}}}}'

Thanks
Petr

Dne pátek, 25. října 2013 13:46:44 UTC+2 Clinton Gormley napsal(a):

Alternatively, you can always specify a particular analyzer at query
time in the query itself

On 25 October 2013 13:46, Clinton Gormley <cl...@traveljury.com<javascript:>

wrote:

Hi Petr

Create two custom analyzers in your index, then you can set fields to use
one for the search_analyzer and the other for the index_analyzer

clint

On 25 October 2013 13:32, Petr Janský <petr....@6hats.cz <javascript:>>wrote:

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

  • stem
  • sentence elements(noun, adjective...)
  • verb conjugations and noun declensions and adjective
  • .....

2nd - for search that retrieves only:

  • stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search
than for indexing. But I think it's better way than duplicate index fields
for each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Lukáš Vlček) #5

Hi Petr,

Your example is looking good to me. I was testing search/index analyzers
for which I created the following small recreation script:


(It is testing that the search_analyzer is used and it works fine).

Out of curiosity, is there any public record about what the morphologyIndex
token filter type is based on? Is it based on @imotov's analysis-morphology
plugin or some other (proprietary?) solution?

Thanks,
Lukáš

On Fri, Oct 25, 2013 at 6:56 AM, Petr Janský petr.jansky@6hats.cz wrote:

Hi Clint,

thank you for your quick reply.

So if I get you right I should change my analyzers and mapping to

curl -X POST 'localhost:9200/index3/' -d '{

  • "settings" : {*
  • "analysis" : {*
  •  "analyzer" : {*
    
  •    "index_cestina" : {*
    
  •      "type" : "custom",*
    
  •      "tokenizer" : "standard",*
    
  •      "filter" : ["stopwords_CZ", "index_mor_czech"]*
    
  •    },*
    
  • "search_cestina" : {*
  •      "type" : "custom",*
    
  •      "tokenizer" : "standard",*
    
  •      "filter" : ["stopwords_CZ", "search_mor_czech"]*
    
  •    }*
    
  •  },*
    
  • "stopwords_CZ" : {*
  •      "type" : "stop",*
    
  •      "stopwords" : [ "právě", "že", "_czech_" ],*
    
  •      "ignore_case" : true*
    
  •    },*
    
  •  "filter" : {*
    
  •    "index_mor_czech" : {*
    
  •      "type" : "morphologyIndex",*
    
  •      "language" : "cs",*
    
  •      "path" : "/opt/morphologyCs"*
    
  • },*
  • "search_mor_czech" : {*
  •      "type" : "morphologySearch",*
    
  •      "language" : "cs",*
    
  •      "path" : "/opt/morphologyCs"*
    
  • } *
  • }}},*
  • "mappings" : {*
  • "article" : {*
  •    "_id" : {*
    
  •        "path" : "reference"*
    
  •    },*
    
  • "properties" : {*
  •    "title"     : { "type" : "string", "index_analyzer" :
    

"index_cestina", "search_analyzer" : "search_cestina"},*

  •    "content"     : { "type" : "string", "index_analyzer" :
    

"index_cestina", "search_analyzer" : "search_cestina"}*
}}}}'

Thanks
Petr

Dne pátek, 25. října 2013 13:46:44 UTC+2 Clinton Gormley napsal(a):

Alternatively, you can always specify a particular analyzer at query
time in the query itself

On 25 October 2013 13:46, Clinton Gormley cl...@traveljury.com wrote:

Hi Petr

Create two custom analyzers in your index, then you can set fields to
use one for the search_analyzer and the other for the index_analyzer

clint

On 25 October 2013 13:32, Petr Janský petr....@6hats.cz wrote:

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

  • stem
  • sentence elements(noun, adjective...)
  • verb conjugations and noun declensions and adjective
  • .....

2nd - for search that retrieves only:

  • stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?**pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search
than for indexing. But I think it's better way than duplicate index fields
for each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6