How to map multiple field properties, including boolean search


#1

Hi, I was able to obtain decent but not perfect results with ES v.1.4.0 on my local env, but then I have 5+ on production and results are worsened.

Please help me understand how ElasticSearch maps a field that is an array.

Here below a practical example of data I have, and results I'd like to obtain.

I have list of items:

[
  {
    "extract": "text 1",
    "id": "1",
    "ingredients": [
      "Riso a chicco corto,",
      "Pasta di salame",
      "Scalogno",
      "Vino bianco da tavola",
      "Burro"
    ],
    "name": "Recipe That is Awesome 1"
  },
  {
    "extract": "Text 2",
    "id": "2",
    "ingredients": [
      "Pomodori pelati in scatola",
      "Costine di maiale (backribs)",
      "Carne di vitello,"
      "Ricotta di bufala",
      "Mozzarella di bufala",
      "Provola affumicata",
      "Salame",
      "Pecorino romano",
      "Pangrattato"
    ],
    "name": "Recipe That is Good 2"
  }
]

I want to query against the properties name and ingredients, excluding the ingredients I don't want.

Below, I post search criteria to show by examples on above data:

I want to index items to search against an array of properties with boolean criteria in Elasticsearch (using legacy 1.4.0 - https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-bool-query.html )

See below an example with two items - two recipes with a "name" field and an "ingredients" field that is an array of strings.

I want to query for keywords against the ingredients array with boolean search, but I get empty results in some cases.

How does ElasticSearch map a field that is an array of properties (an array of strings) ?

Could you provide a mapping for indexing and searching, so that I can query against name and ingredients, excluding ingredients I don't want?

Below an example of what I tried and like to achieve.

Search criterias

  1. Search against "name" field with autocomplete:
    queries like q="Recipe awe" , q="recipe awesome" q="awesome" should return Recipe 1, and q="recipe" should return both.
  2. Search against "ingredients" field:
    a query like q="recipe"&ingredients="salame&pecorino" would match recipe 2 ("Pecorino romano" contains the substring 'pecorino', so both 'salami' and 'percorino' meet ingredients of recipes 2 ) and q="recipe"&ingredients="salame" would meet both (recipe 1 has one ingredient "Pasta di salame" that includes the substring "salame" )
  3. I want to exclude recipes with certain ingredients:
    a query like q="recipe"&ingredients="salame"&not="pecorino" would just return recipe 1 - that does not have "pecorino" in contained in the ingredients array.
  4. Keywords like "salame" or "salami" or "Salamino" should provide same results:
    I tried to include a stemmer in my tokenizer, feel free to comment for better suggestions.

Which would be a proper mapping for ES. 5+ ?

Below I show what has worked at my best for me in ES 1.4, but still it was not able to met criterias 2 and 3.

curl -X PUT localhost:9200/my_index -d '
{
  "settings": {
      "analysis": {
          "analyzer": {
              "index_analyzer": {
                  "type": "custom",
                  "tokenizer": "standard",
                  "filter": ["lowercase", "asciifolding", "standard", "my_stemmer"]
              }
          },
      "filter": {
         "my_stemmer" : {
                            "type" : "stemmer",
                            "name" : "italian"
                        }
          }
      }
  },

  "mappings": {
    "recipe": {
      "properties": {
        "name": {
          "type": "string",
          "index_analyzer" : "index_analyzer",
          "search_analyzer" : "index_analyzer"
        },
        "ingredients" : {
          "type" : "string",
          "index_analyzer" : "index_analyzer",
          "search_analyzer" : "index_analyzer"          
        }
      }
    }
  }
}
'

Above mapping still failed in these two queries.

The following query returns empty list, instead of both recipes:

// The following yields empty results, instead of recipe 2
curl -XPOST "http://localhost:9200/my_index/_search" -d'
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "name":  { "query" : "recipe that" , "operator" : "and"}}},
        { "match": { "ingredients": "salame" }},
        { "match": { "ingredients": "percorino" }}
        ]
    }
  }
}'

This query also returns empty results, instead of returning just recipe 1.

curl -XPOST "http://localhost:9200/my_index/_search" -d'
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "name":  { "query" : "recipe that" , "operator" : "and"}}}
      ],
      "must_not" : {"term" : {"ingredients" : "pecorino" }}
    }
  }
}'

If you could post settings for mapping with comments explaining what it does, that would be much appreciated!


(Ivan Brusic) #2

At a quick glance, your first example appears that it should work. For the
second example, you are using a term query against an analyzed field. The
query term of the term query (confusing? :)) is not analyzed, so you are
trying to compare 'pecorino' with the lowercased/stemmed version of
'pecorino'. I am just sure how that word is stemmed (I am fluent in
Italian, but I have never done any stemming in it), but I would assume it
would not be 'pecorino'. Try using a match query.

The first example still had me confused, so I tried running the example.
Your example has a typo in 'percorino'. Hopefully that is your issue.


#3

Hi Ivan,

thank you for your reply.
The example I posted was a mapping for ES 1.4 - I was not able to get the
two results I mentioned, not depending on typo.

Now, I need to make a mapping for ES 5+, as I have it in production.

Possibly, a better mapping, including all criterias and matching the
examples.

Could you help provide a mapping for ES 5, and brief out how mapping of an
array of properties is worked out ?

My goal is to be able to query from a text field, without filtered
navigation,

About The:
query term of the term query (confusing? :)) is not analyzed

yeah, a bit confusing :slight_smile:

Could you provide a mapping as example, if for you is fluent enough not to
take you too much of your time ?


(Ivan Brusic) #4

The only changes you need for the mapping is to update "string" to "text",
and update "index_analyzer" to just "analyzer". Since your index and search
analyzer are the same, you do not need to specify both.

Not tested, but it should be something like this:

{
"mappings": {
"recipe": {
"properties": {
"name": {
"type": "text",
"analyzer": "index_analyzer"
},
"ingredients": {
"type": "text",
"analyzer": "index_analyzer"
}
}
}
}
}

There is nothing special about arrays in Elasticsearch. At the Lucene
level, you will just have the same field with different values. Lucene is
schemaless, so it can have the same field repeated many times or not at all.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.