Match with forward slash and dash


(Jake He) #1

Hi,

I am trying for find the best way to index addresses.

The problem is that I get no match when input search have forward slash.

For example: I get no match for 1/3 example st.

I think the problem is the whitespace tokenizer produce this [1/3, example, st]. But I need [1,/,3,example,st]

How do I build a custom tokenizer to tokenize base up whitespace and forward slash?
This is what I have.

    client.indices.create({
        index: 'address',
        body: {
            settings: {
                analysis: {
                    analyzer: {
                        address_analyzer: {
                            type: "custom",
                            tokenizer: "whitespace",
                            filter: [
                                "lowercase",
                                "asciifolding",
                                "synonym"
                            ]
                        }
                    },
                    filter: {
                        synonym: {
                            type: "synonym",
                            synonyms_path: "analysis/street_types.txt"
                        }
                    }
                }

            }
        }
        client.indices.putMapping({
            index: 'address',
            type: 'singleAddress',
            body: {
                properties: {
                    suggest: {
                        type: "completion",
                        analyzer: "address_analyzer",
                        preserve_separators: "false"
                    }
                }
            }
        });

(Igor Motov) #2

If you want really powerful, but potentially slow solution you can write your own regex base tokenizer. But the simplest way would be to replace symbols like / with special tokens. For example:

POST _analyze
{
  "char_filter": [{
          "type": "mapping",
          "mappings": [
            "/ => -_slash_-"
          ]
        }],
  "tokenizer": "standard", 
  "text": "1/3 example st"
}

will produce 1, _slash_, 3, example, st


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.