Exact match using sentinel token technique (elasticsearch 7.x)

stonebanks · October 25, 2024, 10:05pm

Hello,
I was reading the Relevant Search book and particularly the sentinel token technique, also described here.

Following the book recipe, the easiest way I could achieve an exact match is by creating a new field in my document so that:

{
   "name": "big blue rental car",
   "nameWithSentinels": "sentinel_begin big blue rental car sentinel_end"
}

and use a match_phrase query against the nameWithSentinels field with sentinel_begin <user_search> sentinel_end as a value.

However, I do wonder if there's a way a can achieve this without having to create the nameWithSentinels field but leveraging the creation of token filters.

I'm thinking of creating a filters that given the following tokens ["big", "blue", "rental", "car"] would be capable of returning ["sentinel_begin", "big", "blue", "rental", "car", "sentinel_end"] or ["sentinel_begin big", "blue", "rental", "car sentinel_end"](I believe the same match_phrase query presented above would still work )

So far my settings looks like this :

{
    "idx_0001": {
        "settings": {
         // stuff removed for brevety
            "analysis": {
                "filter": {
                    "sentinel_border_condition_end": {
                        "filter": [
                            "sentinel_border_end"
                        ],
                        "type": "condition",
                        "script": {
                            "source": "token.getPosition() === [NO IDEA HOW I CAN GET THE NUMBER OF TOKENS]"
                        }
                    },
                    "sentinel_border_begin": {
                        "pattern": "^",
                        "type": "pattern_replace",
                        "replacement": "SENTINEL_BEGIN"
                    },
                    {
                        "sentinel_border_end": {
                        "pattern": "$",
                        "type": "pattern_replace",
                        "replacement": "SENTINEL_END"
                    },
                    "sentinel_border_condition_begin": {
                        "filter": [
                            "sentinel_border_begin"
                        ],
                        "type": "condition",
                        "script": {
                            "source": "token.getPosition() === 0"
                        }
                    }
                },
                "analyzer": {
                    "my-analyzer": {
                        "filter": [
                            "sentinel_border_condition_begin",
                            "sentinel_border_condition_end",
                            "lowercase",
                            "stop"
                        ],
                        "type": "custom",
                        "tokenizer": "standard"
                    }
                }
            }
        }
    }
}

But as you can see I'm not sure of what would be the painless sciript for the sentinel_border_condition_end token filter.

Last resort would be creating a plugin but the documentation about token filters plugin is not large from what I can see.

Thank you in advance,

A.

stonebanks · October 26, 2024, 2:35pm

I just realized I can achieve what I need using an ingest pipeline actually

system · November 23, 2024, 2:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Exact match on fields of type "text" (beginning and end "anchored") Elasticsearch	3	5457	July 27, 2020
How do I build a query such that each token in a document field is matched? Elasticsearch	12	2013	July 6, 2017
Simple equality filter Elasticsearch	3	664	July 6, 2017
Need to get Exact match Elasticsearch	3	356	July 6, 2017
Exact Match in String field Elasticsearch	3	19503	July 6, 2017

Exact match using sentinel token technique (elasticsearch 7.x)

Related topics