I have an ElasticSearch instance with about 800k documents in it. Those documents feature all sorts of information, but most notably for each type there is a 'name' - by which I am mostly trying to search.
With the setup (and search) pasted below, I have a relatively successful search for most queries, provided those queries don't involve plurals. My situation at this point is, using 'stockings' (as in stocking filler) as an example:
search "stocking" - receive matches where source (name) contains "stocking" but not "stockings"
search "stockings" - receive matches where source (name) contains "stocking" but not "stockings"
search "stocking*" - receive matches where source (name) contains "stocking" and "stockings"
Finally, search a short string featuring a dash (BT-103) and no results are returned at all - even when there is an exact match in (for instance) SKU, for BT-103.
My best guess understanding here is that:
- My search is being tokenised.
- My 'documents' are not being tokenised at the time of indexing. See code #3 for the ingestion code. Each time I run a test I delete the index and re-ingest it (bad for production, but I am absolutely certain re-indexing is occurring when I make changes).
I have tested the analyzer 'raw' with queries and they return expected results.
I would be most grateful for any guidance. I'm running a 2.X version of ElasticSearch at the moment.
$params = [
'index' => SEARCH2_NORMAL_INDEX,
'body' => [
'settings' => [
"analysis" => [
"filter" => [
"english_stop" => [
"type" => "stop",
"stopwords" => "_english_"
],
"light_english_stemmer" => [
"type" => "stemmer",
"language" => "light_english"
]
],
"analyzer" => [
"english" => [
"tokenizer" => "standard",
"filter" => [
"lowercase",
"english_stop",
"light_english_stemmer",
"asciifolding"
]
]
]
]
],
'mappings' => [
'component' => [
'properties' => [
'name' => [
'type' => 'string',
'analyzer' => 'english'
]
]
]
]
]
];
Searching:
$params = [
'index' => $indexes,
//'index' => SEARCH2_NORMAL_INDEX,
'body' => [
'from' => 0,
'size' => 200,
'query' => [
'query_string' => [
'query' => $final_search,
'analyzer' => 'english',
'default_operator' => 'AND'
]
],
'indices_boost' => [
SEARCH2_ORDERS_INDEX => 0.5
]
]
];
And ingesting:
foreach($suppliers as $supplier) {
$params3['body'][] = [
'index' => [
'_index' => SEARCH2_PRIV_INDEX,
'_type' => 'supplier',
'_id' => 'S' . $supplier['supplier_id']
]
];
$params3['body'][] = [
'id' => $supplier['supplier_id'],
'name' => $supplier['supplier_name'],
'supplier_code' => $supplier['account_code'],
'contact_name' => $supplier['contact_name'],
'contact_email' => $supplier['contact_email']
];
}
$responses = $client->bulk($params3);