How to use one token filter for indexing and another for search?

Petr_Jansky · October 25, 2013, 11:32am

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

stem
sentence elements(noun, adjective...)
verb conjugations and noun declensions and adjective
.....

2nd - for search that retrieves only:

stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search than
for indexing. But I think it's better way than duplicate index fields for
each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · October 25, 2013, 11:46am

Hi Petr

Create two custom analyzers in your index, then you can set fields to use
one for the search_analyzer and the other for the index_analyzer

clint

On 25 October 2013 13:32, Petr Janský petr.jansky@6hats.cz wrote:

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

stem

sentence elements(noun, adjective...)

verb conjugations and noun declensions and adjective

.....

2nd - for search that retrieves only:

stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search than
for indexing. But I think it's better way than duplicate index fields for
each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · October 25, 2013, 11:46am

Alternatively, you can always specify a particular analyzer at query time
in the query itself

On 25 October 2013 13:46, Clinton Gormley clint@traveljury.com wrote:

Hi Petr

Create two custom analyzers in your index, then you can set fields to use
one for the search_analyzer and the other for the index_analyzer

clint

On 25 October 2013 13:32, Petr Janský petr.jansky@6hats.cz wrote:

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

stem

sentence elements(noun, adjective...)

verb conjugations and noun declensions and adjective

.....

2nd - for search that retrieves only:

stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search than
for indexing. But I think it's better way than duplicate index fields for
each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Petr_Jansky · October 25, 2013, 11:56am

Hi Clint,

thank you for your quick reply.

So if I get you right I should change my analyzers and mapping to

curl -X POST 'localhost:9200/index3/' -d '{

"settings" : {*
"analysis" : {*
```
 "analyzer" : {*
```
```
   "index_cestina" : {*
```
```
     "type" : "custom",*
```
```
     "tokenizer" : "standard",*
```

     "filter" : ["stopwords_CZ", "index_mor_czech"]*

```
   },*
```
"search_cestina" : {*
```
     "type" : "custom",*
```
```
     "tokenizer" : "standard",*
```

     "filter" : ["stopwords_CZ", "search_mor_czech"]*

```
   }*
```
```
 },*
```
"stopwords_CZ" : {*
```
     "type" : "stop",*
```

     "stopwords" : [ "právě", "že", "_czech_" ],*

```
     "ignore_case" : true*
```
```
   },*
```
```
 "filter" : {*
```
```
   "index_mor_czech" : {*
```
```
     "type" : "morphologyIndex",*
```
```
     "language" : "cs",*
```
```
     "path" : "/opt/morphologyCs"*
```
},*
"search_mor_czech" : {*
```
     "type" : "morphologySearch",*
```
```
     "language" : "cs",*
```
```
     "path" : "/opt/morphologyCs"*
```
} *
}}},*
"mappings" : {*
"article" : {*
```
   "_id" : {*
```
```
       "path" : "reference"*
```
```
   },*
```
"properties" : {*

   "title"     : { "type" : "string", "index_analyzer" :

"index_cestina", "search_analyzer" : "search_cestina"},*

   "content"     : { "type" : "string", "index_analyzer" :

"index_cestina", "search_analyzer" : "search_cestina"}*
}}}}'

Thanks
Petr

Dne pátek, 25. října 2013 13:46:44 UTC+2 Clinton Gormley napsal(a):

Alternatively, you can always specify a particular analyzer at query
time in the query itself

On 25 October 2013 13:46, Clinton Gormley <cl...@traveljury.com<javascript:>

wrote:

Hi Petr

Create two custom analyzers in your index, then you can set fields to use
one for the search_analyzer and the other for the index_analyzer

clint

On 25 October 2013 13:32, Petr Janský <petr....@6hats.cz <javascript:>>wrote:

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

stem

sentence elements(noun, adjective...)

verb conjugations and noun declensions and adjective

.....

2nd - for search that retrieves only:

stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search
than for indexing. But I think it's better way than duplicate index fields
for each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Lukas_Vlcek1 · October 27, 2013, 11:49am

Hi Petr,

Your example is looking good to me. I was testing search/index analyzers
for which I created the following small recreation script:

gist.github.com

https://gist.github.com/lukas-vlcek/6972058

gistfile1.sh

#!/bin/sh

echo "Elasticsearch version and build:"
curl localhost:9200; echo; echo;

echo "Delete index"
curl -X DELETE 'localhost:9200/i'; echo; echo;

echo "Create index with analysis and mappings"
curl -X PUT 'localhost:9200/i' -d '{

This file has been truncated. show original

(It is testing that the search_analyzer is used and it works fine).

Out of curiosity, is there any public record about what the morphologyIndex
token filter type is based on? Is it based on @imotov's analysis-morphology
plugin or some other (proprietary?) solution?

Thanks,
Lukáš

On Fri, Oct 25, 2013 at 6:56 AM, Petr Janský petr.jansky@6hats.cz wrote:

Hi Clint,

thank you for your quick reply.

So if I get you right I should change my analyzers and mapping to

curl -X POST 'localhost:9200/index3/' -d '{
"settings" : {*

"analysis" : {*
 "analyzer" : {*
   "index_cestina" : {*
     "type" : "custom",*
     "tokenizer" : "standard",*
     "filter" : ["stopwords_CZ", "index_mor_czech"]*
   },*
"search_cestina" : {*
     "type" : "custom",*
     "tokenizer" : "standard",*
     "filter" : ["stopwords_CZ", "search_mor_czech"]*
   }*
 },*
"stopwords_CZ" : {*
     "type" : "stop",*
     "stopwords" : [ "právě", "že", "_czech_" ],*
     "ignore_case" : true*
   },*
 "filter" : {*
   "index_mor_czech" : {*
     "type" : "morphologyIndex",*
     "language" : "cs",*
     "path" : "/opt/morphologyCs"*
},*

"search_mor_czech" : {*
     "type" : "morphologySearch",*
     "language" : "cs",*
     "path" : "/opt/morphologyCs"*
} *

}}},*

"mappings" : {*

"article" : {*
   "_id" : {*
       "path" : "reference"*
   },*
"properties" : {*
   "title"     : { "type" : "string", "index_analyzer" :
"index_cestina", "search_analyzer" : "search_cestina"},*
   "content"     : { "type" : "string", "index_analyzer" :
"index_cestina", "search_analyzer" : "search_cestina"}*
}}}}'

Thanks
Petr

Dne pátek, 25. října 2013 13:46:44 UTC+2 Clinton Gormley napsal(a):

Alternatively, you can always specify a particular analyzer at query
time in the query itself

On 25 October 2013 13:46, Clinton Gormley cl...@traveljury.com wrote:

Hi Petr

Create two custom analyzers in your index, then you can set fields to
use one for the search_analyzer and the other for the index_analyzer

clint

On 25 October 2013 13:32, Petr Janský petr....@6hats.cz wrote:

Hello,

I would like to use two different token filters for same field(s):
1st - for indexing that creates terms:

stem

sentence elements(noun, adjective...)

verb conjugations and noun declensions and adjective

.....

2nd - for search that retrieves only:

stem (the same as the 1st filter)

I want to be able to use classic search and search specific sentence
elements using

curl -X GET 'localhost:9200/_search?**pretty=true' -d '{
"query" : {
"term" : { "content" : "==noun==" }
},
"highlight" : {
"tags_schema" : "styled",
"fields" : {
"content" : {}
}
}
}'

I know it's very unusual to use an different token filter for search
than for indexing. But I think it's better way than duplicate index fields
for each token filter.

Thanks
Petr

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Multi Match query with one field of edge n-grams and another field of stemmed terms Elasticsearch	1	553	January 7, 2019
Need suggestions on type of query to be used for a given analysis for better results? Elasticsearch	2	373	July 6, 2017
Search multiple fields with “and” operator (but use fields' own analyzers) Elasticsearch	7	2420	July 6, 2017
Using two analyzers stemmer and synonym at a same time Elasticsearch	3	945	July 5, 2017
Ngram and edgeNgram combined for _all field; or different token filters per field for _all Elasticsearch	1	582	July 6, 2017

How to use one token filter for indexing and another for search?

Related topics