ElasticSearch Stemming (Dictionary and otherwise) plural matches singular, but nothing matches plural (PHP)


(George) #1

I have an ElasticSearch instance with about 800k documents in it. Those documents feature all sorts of information, but most notably for each type there is a 'name' - by which I am mostly trying to search.

With the setup (and search) pasted below, I have a relatively successful search for most queries, provided those queries don't involve plurals. My situation at this point is, using 'stockings' (as in stocking filler) as an example:

search "stocking" - receive matches where source (name) contains "stocking" but not "stockings"

search "stockings" - receive matches where source (name) contains "stocking" but not "stockings"

search "stocking*" - receive matches where source (name) contains "stocking" and "stockings"

Finally, search a short string featuring a dash (BT-103) and no results are returned at all - even when there is an exact match in (for instance) SKU, for BT-103.

My best guess understanding here is that:

  1. My search is being tokenised.
  2. My 'documents' are not being tokenised at the time of indexing. See code #3 for the ingestion code. Each time I run a test I delete the index and re-ingest it (bad for production, but I am absolutely certain re-indexing is occurring when I make changes).

I have tested the analyzer 'raw' with queries and they return expected results.

I would be most grateful for any guidance. I'm running a 2.X version of ElasticSearch at the moment.

$params = [
'index' => SEARCH2_NORMAL_INDEX,
'body' => [
    'settings' => [ 
        "analysis" => [
           "filter" => [
            "english_stop" => [
              "type" =>       "stop",
              "stopwords" =>  "_english_"
            ],
            "light_english_stemmer" => [
              "type" =>       "stemmer",
              "language" =>   "light_english" 
            ]
          ],
          "analyzer" => [
             "english" => [
             "tokenizer" =>  "standard",
             "filter" => [
               "lowercase",
               "english_stop",
               "light_english_stemmer", 
               "asciifolding"
             ]
           ]
          ]
        ]

    ],
    'mappings' => [
        'component' => [
            'properties' => [
              'name' => [
                'type' => 'string',
                'analyzer' => 'english'
              ]
            ]
        ]
    ]

]

];

Searching:

$params = [
'index' => $indexes,
//'index' => SEARCH2_NORMAL_INDEX,
'body' => [
    'from' => 0,
    'size' => 200,
    'query' => [
        'query_string' => [
            'query' => $final_search,
            'analyzer' => 'english',
            'default_operator' => 'AND'
        ]
    ],
    'indices_boost' => [
      SEARCH2_ORDERS_INDEX => 0.5
    ]
]
];

And ingesting:

foreach($suppliers as $supplier) {
  $params3['body'][] = [
      'index' => [
          '_index' => SEARCH2_PRIV_INDEX,
          '_type' => 'supplier',
          '_id' => 'S' . $supplier['supplier_id']
      ]
  ];

  $params3['body'][] = [
      'id' => $supplier['supplier_id'],
      'name' => $supplier['supplier_name'],
      'supplier_code' => $supplier['account_code'],
      'contact_name' => $supplier['contact_name'],
      'contact_email' => $supplier['contact_email']
  ];
}

$responses = $client->bulk($params3);

(system) #2