Exact match search on field with "standard" analyzer


(hemant pahilwani) #1

Hi,

i have following mapping where firstname is analyzed using standard
analyzer. The reason for using standar analyzer is to allow regular search
features on firstname field. I have a required to do exact search and
match query works fine as long as there are no special charaters in search
string or field value. However if there is any special character like '@'
which are stored in firstname field or if there are special characters in
search string. the results are not as expected. Is there any way to make
sure the exact search works with special characters?

Mapping:

"mappings": {
    "user": {
        "_source": {
            "enabled": true
        },
        "properties": {
            "myuniqueid": {
                "type": "string",
                "index": "not_analyzed"
            },
            "firstname": {
                "type": "string",
                "index": "analyzed",
                "analyzer": "standard"
            },

.
.
.

        }
    }
}

--


(simonw-2) #2

On Thursday, November 8, 2012 5:28:19 AM UTC+1, hemantp wrote:

Hi,

i have following mapping where firstname is analyzed using standard
analyzer. The reason for using standar analyzer is to allow regular search
features on firstname field. I have a required to do exact search and
match query works fine as long as there are no special charaters in search
string or field value. However if there is any special character like '@'
which are stored in firstname field or if there are special characters in
search string. the results are not as expected. Is there any way to make
sure the exact search works with special characters?

hey,

can you elaborate on what an exact match means to you. An example would be
great too so I can give you good advice.

simon

Mapping:

"mappings": {
    "user": {
        "_source": {
            "enabled": true
        },
        "properties": {
            "myuniqueid": {
                "type": "string",
                "index": "not_analyzed"
            },
            "firstname": {
                "type": "string",
                "index": "analyzed",
                "analyzer": "standard"
            },

.
.
.

        }
    }
}

--


(hemant pahilwani) #3

Exact match as in the record should be returned only if the value in
firstname field is exactly same as the search string.(case ignored)

For example, if search string is John then it should only return records
that have firstname as "John". records with firstname as Johnathan, Johny,
Dijohn should not be returned. Exact match query works fine and results
are as expected if there are no special characters like '@'.
If the search string is "good@one", it returns records that have firstname
as "good" as well as records with firstname as "one" and also records which
have firstname as "good@one". I just want records which have "good@one" as
firstname, it seems to divide search string into "good" and "one" and is
returning results accordingly.

On Thursday, November 8, 2012 2:12:55 AM UTC-8, simonw wrote:

On Thursday, November 8, 2012 5:28:19 AM UTC+1, hemantp wrote:

Hi,

i have following mapping where firstname is analyzed using standard
analyzer. The reason for using standar analyzer is to allow regular search
features on firstname field. I have a required to do exact search and
match query works fine as long as there are no special charaters in search
string or field value. However if there is any special character like '@'
which are stored in firstname field or if there are special characters in
search string. the results are not as expected. Is there any way to make
sure the exact search works with special characters?

hey,

can you elaborate on what an exact match means to you. An example would be
great too so I can give you good advice.

simon

Mapping:

"mappings": {
    "user": {
        "_source": {
            "enabled": true
        },
        "properties": {
            "myuniqueid": {
                "type": "string",
                "index": "not_analyzed"
            },
            "firstname": {
                "type": "string",
                "index": "analyzed",
                "analyzer": "standard"
            },

.
.
.

        }
    }
}

--


(hemant pahilwani) #4

match query that i am currently using, it is multi match because i will be
adding more fields in future:

{query:{"multi_match" : {"query" : "good@one","fields" : [ "firstname"],
"use_dis_max" : false}}}

On Thursday, November 8, 2012 8:21:05 AM UTC-8, hemantp wrote:

Exact match as in the record should be returned only if the value in
firstname field is exactly same as the search string.(case ignored)

For example, if search string is John then it should only return records
that have firstname as "John". records with firstname as Johnathan, Johny,
Dijohn should not be returned. Exact match query works fine and results
are as expected if there are no special characters like '@'.
If the search string is "good@one", it returns records that have firstname
as "good" as well as records with firstname as "one" and also records which
have firstname as "good@one". I just want records which have "good@one" as
firstname, it seems to divide search string into "good" and "one" and is
returning results accordingly.

On Thursday, November 8, 2012 2:12:55 AM UTC-8, simonw wrote:

On Thursday, November 8, 2012 5:28:19 AM UTC+1, hemantp wrote:

Hi,

i have following mapping where firstname is analyzed using standard
analyzer. The reason for using standar analyzer is to allow regular search
features on firstname field. I have a required to do exact search and
match query works fine as long as there are no special charaters in search
string or field value. However if there is any special character like '@'
which are stored in firstname field or if there are special characters in
search string. the results are not as expected. Is there any way to make
sure the exact search works with special characters?

hey,

can you elaborate on what an exact match means to you. An example would
be great too so I can give you good advice.

simon

Mapping:

"mappings": {
    "user": {
        "_source": {
            "enabled": true
        },
        "properties": {
            "myuniqueid": {
                "type": "string",
                "index": "not_analyzed"
            },
            "firstname": {
                "type": "string",
                "index": "analyzed",
                "analyzer": "standard"
            },

.
.
.

        }
    }
}

--


(Ivan Brusic) #5

You cannot achieve what you require by using the standard analyzer. The
standard analyzer is tokenizing the string and creating two tokens: "good"
and "one". You can see the results by using the analysis API:

curl '
http://localhost:9200/_analyze?text=good@one&analyzer=standard&pretty=true'

The field would need to be not_analyzed in order for it to have an exact
match, or at the very least, an analyzer that creates only one token (use
case dependent).

If the field needs to be analyzed for other queries, you can use the
multi-field type, which is a common use case:

http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

If you choose to set the field as not_analyzed, then it will be case
sensitive. You can create a custom analyzer that does not tokenize, but
still applies a lowercase filter. Search the mailing list for examples.

Cheers,

Ivan

On Thu, Nov 8, 2012 at 8:21 AM, hemantp hemant.pahilwani@gmail.com wrote:

Exact match as in the record should be returned only if the value in
firstname field is exactly same as the search string.(case ignored)

For example, if search string is John then it should only return records
that have firstname as "John". records with firstname as Johnathan, Johny,
Dijohn should not be returned. Exact match query works fine and results
are as expected if there are no special characters like '@'.
If the search string is "good@one", it returns records that have
firstname as "good" as well as records with firstname as "one" and also
records which have firstname as "good@one". I just want records which
have "good@one" as firstname, it seems to divide search string into
"good" and "one" and is returning results accordingly.

On Thursday, November 8, 2012 2:12:55 AM UTC-8, simonw wrote:

On Thursday, November 8, 2012 5:28:19 AM UTC+1, hemantp wrote:

Hi,

i have following mapping where firstname is analyzed using standard
analyzer. The reason for using standar analyzer is to allow regular search
features on firstname field. I have a required to do exact search and
match query works fine as long as there are no special charaters in search
string or field value. However if there is any special character like '@'
which are stored in firstname field or if there are special characters in
search string. the results are not as expected. Is there any way to make
sure the exact search works with special characters?

hey,

can you elaborate on what an exact match means to you. An example would
be great too so I can give you good advice.

simon

Mapping:

"mappings": {
    "user": {
        "_source": {
            "enabled": true
        },
        "properties": {
            "myuniqueid": {
                "type": "string",
                "index": "not_analyzed"
            },
            "firstname": {
                "type": "string",
                "index": "analyzed",
                "analyzer": "standard"
            },

.
.
.

        }
    }
}

--

--


How to query exactly
(hemant pahilwani) #6

Thanks Ivan, this is the answer is was looking for. One small question:

I have an existing index with half a million records and i will be using
putmapping to change the mapping for firstname field to multi-field type in
order to store not_analyzed version of firstname. After changing the
mapping, do i need to reindex all the records?

On Thursday, November 8, 2012 9:57:42 AM UTC-8, Ivan Brusic wrote:

You cannot achieve what you require by using the standard analyzer. The
standard analyzer is tokenizing the string and creating two tokens: "good"
and "one". You can see the results by using the analysis API:

curl '
http://localhost:9200/_analyze?text=good@one&analyzer=standard&pretty=true
'

The field would need to be not_analyzed in order for it to have an exact
match, or at the very least, an analyzer that creates only one token (use
case dependent).

If the field needs to be analyzed for other queries, you can use the
multi-field type, which is a common use case:

http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

If you choose to set the field as not_analyzed, then it will be case
sensitive. You can create a custom analyzer that does not tokenize, but
still applies a lowercase filter. Search the mailing list for examples.

Cheers,

Ivan

On Thu, Nov 8, 2012 at 8:21 AM, hemantp <hemant.p...@gmail.com<javascript:>

wrote:

Exact match as in the record should be returned only if the value in
firstname field is exactly same as the search string.(case ignored)

For example, if search string is John then it should only return records
that have firstname as "John". records with firstname as Johnathan, Johny,
Dijohn should not be returned. Exact match query works fine and results
are as expected if there are no special characters like '@'.
If the search string is "good@one", it returns records that have
firstname as "good" as well as records with firstname as "one" and also
records which have firstname as "good@one". I just want records which have
"good@one" as firstname, it seems to divide search string into "good" and
"one" and is returning results accordingly.

On Thursday, November 8, 2012 2:12:55 AM UTC-8, simonw wrote:

On Thursday, November 8, 2012 5:28:19 AM UTC+1, hemantp wrote:

Hi,

i have following mapping where firstname is analyzed using standard
analyzer. The reason for using standar analyzer is to allow regular search
features on firstname field. I have a required to do exact search and
match query works fine as long as there are no special charaters in search
string or field value. However if there is any special character like '@'
which are stored in firstname field or if there are special characters in
search string. the results are not as expected. Is there any way to make
sure the exact search works with special characters?

hey,

can you elaborate on what an exact match means to you. An example would
be great too so I can give you good advice.

simon

Mapping:

"mappings": {
    "user": {
        "_source": {
            "enabled": true
        },
        "properties": {
            "myuniqueid": {
                "type": "string",
                "index": "not_analyzed"
            },
            "firstname": {
                "type": "string",
                "index": "analyzed",
                "analyzer": "standard"
            },

.
.
.

        }
    }
}

--

--


(simonw-2) #7

On Thursday, November 8, 2012 10:45:33 PM UTC+1, hemantp wrote:

Thanks Ivan, this is the answer is was looking for. One small question:

I have an existing index with half a million records and i will be using
putmapping to change the mapping for firstname field to multi-field type in
order to store not_analyzed version of firstname. After changing the
mapping, do i need to reindex all the records?

hey there,

the simple answer is: yes!

simon

On Thursday, November 8, 2012 9:57:42 AM UTC-8, Ivan Brusic wrote:

You cannot achieve what you require by using the standard analyzer. The
standard analyzer is tokenizing the string and creating two tokens: "good"
and "one". You can see the results by using the analysis API:

curl '
http://localhost:9200/_analyze?text=good@one&analyzer=standard&pretty=true
'

The field would need to be not_analyzed in order for it to have an exact
match, or at the very least, an analyzer that creates only one token (use
case dependent).

If the field needs to be analyzed for other queries, you can use the
multi-field type, which is a common use case:

http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

If you choose to set the field as not_analyzed, then it will be case
sensitive. You can create a custom analyzer that does not tokenize, but
still applies a lowercase filter. Search the mailing list for examples.

Cheers,

Ivan

On Thu, Nov 8, 2012 at 8:21 AM, hemantp hemant.p...@gmail.com wrote:

Exact match as in the record should be returned only if the value in
firstname field is exactly same as the search string.(case ignored)

For example, if search string is John then it should only return records
that have firstname as "John". records with firstname as Johnathan, Johny,
Dijohn should not be returned. Exact match query works fine and results
are as expected if there are no special characters like '@'.
If the search string is "good@one", it returns records that have
firstname as "good" as well as records with firstname as "one" and also
records which have firstname as "good@one". I just want records which have
"good@one" as firstname, it seems to divide search string into "good" and
"one" and is returning results accordingly.

On Thursday, November 8, 2012 2:12:55 AM UTC-8, simonw wrote:

On Thursday, November 8, 2012 5:28:19 AM UTC+1, hemantp wrote:

Hi,

i have following mapping where firstname is analyzed using standard
analyzer. The reason for using standar analyzer is to allow regular search
features on firstname field. I have a required to do exact search and
match query works fine as long as there are no special charaters in search
string or field value. However if there is any special character like '@'
which are stored in firstname field or if there are special characters in
search string. the results are not as expected. Is there any way to make
sure the exact search works with special characters?

hey,

can you elaborate on what an exact match means to you. An example would
be great too so I can give you good advice.

simon

Mapping:

"mappings": {
    "user": {
        "_source": {
            "enabled": true
        },
        "properties": {
            "myuniqueid": {
                "type": "string",
                "index": "not_analyzed"
            },
            "firstname": {
                "type": "string",
                "index": "analyzed",
                "analyzer": "standard"
            },

.
.
.

        }
    }
}

--

--


(Girdhar Sojitra) #8

You can use match_phrase query to solve this problem.