How do I search words seperated by _ and . in ElasticSearch?


(Steve Bissett) #1

I am new to ElasticSearch and I am very happy with the speed at the moment. I still have one requirement that I haven't been able to achieve.

ElasticSearch seems to split the following text when doing indexing:

"bmw.co.uk"
"apple_macbook_pro"

I am searching in Ruby like this using the ElasticSearch gem:

result = client.search index: 'elasticsearch_dev',
                  body: {
                      query: {
                          "bool" => {
                            "must" => {"match" => {"search_text" => {"query" => "bmw", "operator" => "and"}}},
                            "must_not" => {"match" => {"search_text" => {"query" => "", "operator" => "or"}}}
                          }
                      },
                  }

What I'm trying to achieve:

When I search for bmw or bmw.co, I want records with bmw.co.uk to match.

When I search for acboo or apple, I want records with apple_macbook_pro to match.

How do I go about achieving this?

Additional Info

I have looked at this site, which is along the lines of what I am looking to do, but not quite:

http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html.

If I have the URL:

uk-on-sale.com

I want this to be tokenized as:

[uk, on, sale, com, uk-on, on-sale, uk-on-sale.com, on-sale.com, sale.com, .com]

(Mark Walkom) #2

This is all about analysis with the various outputs that you want.
You may want to read through this chapter of the definitive guide to give you some context - https://www.elastic.co/guide/en/elasticsearch/guide/master/languages.html


(system) #3