How Percolate can perform, qeries on not analyzed index?

sarath_r_nair · October 7, 2017, 3:08am

So , after so much of digging around , I came to ask it over here . Let me start with a simple use case .

curl -XPUT 'localhost:9200/my-index?pretty' -H 'Content-Type: application/json' -d'
{
    "mappings": {
        "doctype": {
            "properties": {
                "message": {
                    "type": "text"                     }
            }
        },
        "queries": {
            "properties": {
                "query": {
                    "type": "percolator"
                }
            }
        }
    }
}
'

curl -XPUT 'localhost:9200/my-index/queries/2?refresh&pretty' -H 'Content-Type: application/json' -d'
{
    "query" : {
        "match_phrase" : {
            "message" : "pub/sub"
        }
    }
}
'


curl -XPUT 'localhost:9200/my-index/queries/1?refresh&pretty' -H 'Content-Type: application/json' -d'
{
    "query" : {
        "match_phrase" : {
            "message" : "x++"
        }
    }
}
'

Now my problem is if I execute

curl -XGET 'localhost:9200/my-index/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query" : {
        "percolate" : {
            "field" : "query",
            "document_type" : "doctype",
            "document" : {
                "message" : "A new bonsai pub sub tree in the office x"
            }
        }
    }
}
'

I will get two matched . one for "pub" and other for "x" , as pub'/sub and x++ .. I know , its because of analyzer . But , even in the mapping field if I change to

curl -XPUT 'localhost:9200/my-index?pretty' -H 'Content-Type: application/json' -d'
{
"mappings": {
    "doctype": {
        "properties": {
            "message": {
                "type": "string" , 
                "index": "not_analyzed"                     }
        }
    },
    "queries": {
        "properties": {
            "query": {
                "type": "percolator"
            }
        }
    }
}
}
'

then the "message" : "A new bonsai pub sub tree in the office x" will give zero match , because , it passes this entire text / doc as not_analyzed .

In simple any way to solve this issue ? I only want those phrase . non phrase queries to be matched , which are indexed without removing any special charaxcters like / , + etc ?

val · October 7, 2017, 4:42am

By default, the text field uses the standard analyzer. If you use the whitespace analyzer instead then your input will simply be split on whitespaces (but the token will not be be lowercased)

"mappings": {
    "doctype": {
        "properties": {
            "message": {
                "type": "text",
                "analyzer": "whitespace"
            }
        }
    },

If you also want the tokens to be lowercased, then you need to create a custom analyzer

curl -XPUT 'localhost:9200/my-index?pretty' -H 'Content-Type: application/json' -d'{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "doctype": {
      "properties": {
        "message": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    },
    "queries": {
      "properties": {
        "query": {
          "type": "percolator"
        }
      }
    }
  }
}'

Then this will only match the pub/sub query

curl -XGET 'localhost:9200/my-index/_search?pretty' -H 'Content-Type: application/json' -d'
    {
        "query" : {
            "percolate" : {
                "field" : "query",
                "document_type" : "doctype",
                "document" : {
                    "message" : "A new bonsai pub/sub tree in the office x"
            }
        }
    }
}
'

sarath_r_nair · October 7, 2017, 10:22am

Thank you so much vaal crettaz . Amazing and very precise explanation.

val · October 7, 2017, 11:47am

Awesome, glad it helped

system · November 4, 2017, 11:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Url percolation index:no, "index" : "not_analyzed" and uax_url_email not working Elasticsearch	1	340	July 6, 2017
Problem with Percolation on an existing document Elasticsearch	1	353	July 6, 2017
Some Change in Percolation/QueryStringQuery/Mapping between 7.6.2 and 7.7.0? Elasticsearch	5	467	December 4, 2020
Is percolator supposed to work with regex queries? Elasticsearch	2	705	July 6, 2017
Percolator issue? Elasticsearch	3	265	July 6, 2017

How Percolate can perform, qeries on not analyzed index?

Related topics