Query questions (autocomplete)

Hello,

I am trying to craft a query which will allow a realtime search dropdown:

image

My search data has 3 columns:

  • name (text)
  • alternate_names (array of text)
  • description (text)

The only column which shows in the dropdown, and needs to be highlighted is name. The other fields are simply for matching.

Here is some example data:

{
  name: '.30-30 Winchester', 
  alternate_names: [
    '.30 Winchester Centerfire', 
    '.30 WCF', 
    '7.62×51mmR',
  ]
},
{
  name: '.22 Long Rifle', 
  alternate_names: [
    '.22 LR',
  ]
}

I also have a long list of synonyms:

[
  ['lr', 'long rifle'],
  ['wcf', 'winchester centerfire'],
  ['mag', 'magnum'],
  ... 50+ more
]

In the example data above, alternate_names are "official" designations of alternate names, and while sometimes they align with synonyms, other times they do not (see below)

Synonyms

For example, I might want someone to search 44mag or 44 mag to find .44 Remington Magnum, but I wouldn't want to publicly list "44 Mag" as an alternate name. This is where the synonyms come in, and are there for the pure raw permutations of combinations of input.

Highlighting

I'd like for the highlighting to work well, but it doesn't necessarily have to be "granular." The only important thing is that name would become highlighted if the matching text bringing up the record is NOT the name. I'm not sure if this is possible somehow.

Ie: User types 3030 w, 30 30 win, 30-30 win... the result .30-30 Winchester appears.

And in first example, 30-30 Winchester would highlight, or optionally 30-30 Winchester highlighting the entire token is also OK (I believe this may look better actually, and might be easier)

I'm not super picky here.. My main goal is just that its not completely broken like:

30-30 Winchester

Also if they type 22lr it would highlight 22 Long Rifle (because lr is synonym)

Inputs/Tokenizing

Lastly, there are many different permutations of what users may type. Here are the ones I can think of:

.22 Long Rifle

  • 22lr - note the synonym is touching the number
  • 22 lr
  • 22 long

9.3×74mmR

  • 9.3x74 mm (must allow both x and ×)
  • 9.3 × 74 mm (space around "mm" and "x/×" (should exclude x surrounded by letters on both sides like "axe" or "ox man")
  • 9.3 X74mm R (space before R, space on one side of x)

.30-30 Winchester

  • 3030 win (hyphens optional)
  • 30 30 win (hyphens to spaces)
  • 30-30 wcf (wcf triggers synonym which matches an alternate name, even though "Winchester Centerfire" doesn't appear in the name)

My Attempts

I almost don't want to show my attempts so far, as I'd rather start with a "clean slate" on a pure implementation rather than making my hack job work. I'm using SearchKick for rails which is very limiting. I ended up figuring out how to get it working with a completely custom analyzer etc, but now I have to start from scratch.

Essentially my current method seems extremely... questionable... and sketchy. I take every permutation of the name with/without those symbols/spaces. I then merge them onto one huge massive long line called combined. I couldnt get it working within a filter so I actually do it before storing it in the index with a plain ruby method.

This makes the highlighting and such very broken, which is like I mentioned why I'd rather attempt to build from scratch on a brand new implementation using JUST elasticsearch.

Any help guiding me in the right direction would be very greatly appreciated. :pray:t3:

def self.ghetto_token_filter(text)
  return nil if text.nil?
  # add spaces
  phase1 = text.gsub('×', ' x ')
  phase1 = phase1.gsub(/(\d)\s?x/){"#{$1} x "}
  phase1 = phase1.gsub(/(\d)mm/){"#{$1} mm "}
  # remove spaces
  phase2 = phase1.gsub(/\s/){''}
  # hyphens
  phase3 = phase1.gsub('-', ' ')
  phase4 = phase2.gsub('-', '')
  [phase1, phase2, phase3, phase4].join(' ')
end

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.