Hi, I was able to obtain decent but not perfect results with ES v.1.4.0 on my local env, but then I have 5+ on production and results are worsened.
Please help me understand how ElasticSearch maps a field that is an array.
Here below a practical example of data I have, and results I'd like to obtain.
I have list of items:
[
{
"extract": "text 1",
"id": "1",
"ingredients": [
"Riso a chicco corto,",
"Pasta di salame",
"Scalogno",
"Vino bianco da tavola",
"Burro"
],
"name": "Recipe That is Awesome 1"
},
{
"extract": "Text 2",
"id": "2",
"ingredients": [
"Pomodori pelati in scatola",
"Costine di maiale (backribs)",
"Carne di vitello,"
"Ricotta di bufala",
"Mozzarella di bufala",
"Provola affumicata",
"Salame",
"Pecorino romano",
"Pangrattato"
],
"name": "Recipe That is Good 2"
}
]
I want to query against the properties name
and ingredients
, excluding the ingredients I don't want.
Below, I post search criteria to show by examples on above data:
I want to index items to search against an array of properties with boolean criteria in Elasticsearch (using legacy 1.4.0 - https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-bool-query.html )
See below an example with two items - two recipes with a "name" field and an "ingredients" field that is an array of strings.
I want to query for keywords against the ingredients array with boolean search, but I get empty results in some cases.
How does ElasticSearch map a field that is an array of properties (an array of strings) ?
Could you provide a mapping for indexing and searching, so that I can query against name and ingredients, excluding ingredients I don't want?
Below an example of what I tried and like to achieve.
Search criterias
- Search against "name" field with autocomplete:
queries like q="Recipe awe" , q="recipe awesome" q="awesome" should return Recipe 1, and q="recipe" should return both. - Search against "ingredients" field:
a query like q="recipe"&ingredients="salame&pecorino" would match recipe 2 ("Pecorino romano" contains the substring 'pecorino', so both 'salami' and 'percorino' meet ingredients of recipes 2 ) and q="recipe"&ingredients="salame" would meet both (recipe 1 has one ingredient "Pasta di salame" that includes the substring "salame" ) - I want to exclude recipes with certain ingredients:
a query like q="recipe"&ingredients="salame"¬="pecorino" would just return recipe 1 - that does not have "pecorino" in contained in the ingredients array. - Keywords like "salame" or "salami" or "Salamino" should provide same results:
I tried to include a stemmer in my tokenizer, feel free to comment for better suggestions.
Which would be a proper mapping for ES. 5+ ?
Below I show what has worked at my best for me in ES 1.4, but still it was not able to met criterias 2 and 3.
curl -X PUT localhost:9200/my_index -d '
{
"settings": {
"analysis": {
"analyzer": {
"index_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding", "standard", "my_stemmer"]
}
},
"filter": {
"my_stemmer" : {
"type" : "stemmer",
"name" : "italian"
}
}
}
},
"mappings": {
"recipe": {
"properties": {
"name": {
"type": "string",
"index_analyzer" : "index_analyzer",
"search_analyzer" : "index_analyzer"
},
"ingredients" : {
"type" : "string",
"index_analyzer" : "index_analyzer",
"search_analyzer" : "index_analyzer"
}
}
}
}
}
'
Above mapping still failed in these two queries.
The following query returns empty list, instead of both recipes:
// The following yields empty results, instead of recipe 2
curl -XPOST "http://localhost:9200/my_index/_search" -d'
{
"query": {
"bool": {
"must": [
{ "match": { "name": { "query" : "recipe that" , "operator" : "and"}}},
{ "match": { "ingredients": "salame" }},
{ "match": { "ingredients": "percorino" }}
]
}
}
}'
This query also returns empty results, instead of returning just recipe 1.
curl -XPOST "http://localhost:9200/my_index/_search" -d'
{
"query": {
"bool": {
"must": [
{ "match": { "name": { "query" : "recipe that" , "operator" : "and"}}}
],
"must_not" : {"term" : {"ingredients" : "pecorino" }}
}
}
}'
If you could post settings for mapping with comments explaining what it does, that would be much appreciated!