path_hierarchy in my settings / mapping Not Working


#1

Hello,

I'm trying to add taxonomies into my index. I add them like the following below.

However, when I do an aggregation as shown below the output breaks up "best price" into two separate buckets.
I want "best price" to be only one bucket. What I mean is I don't want my taxonomies being broken up by whitespace or any other special characters like &, only by /

When I do curl -XGET 'localhost:9200/_analyze?tokenizer=path_hierarchy&pretty' -d '/cars/economy/best price' then it behaves the way I want it to where "best price" is not broken up on whitespace. I'm not sure what is wrong.

Below is my code:

curl -XPUT 'localhost:9200/myindex' -d
 ' {
    "settings":{
          "index": {
             "analysis":{  
                "analyzer":{  
       	         "analyzer_taxonomy": {
                      "type": "custom",
                     "tokenizer": "path_hierarchy"
                   }
                }
             }
          }
       },
       "mappings":{  
          "myindex":{  
             "properties":{  
                "catalog":{  
                   "properties":{  
                      "price":{  
                         "type":"float"
                      },
                      "name":{  
                         "type":"string",
                         "index":"not_analyzed"
                      },
                      "description":{  
                         "type":"string"
                      },
    		           "taxonomies":{
                         "analyzer":"analyzer_taxonomy",
    		               "type":"string"
    		           }
                   }
                }
             }
          }
       }
    }'
    
    curl -XPUT 'localhost:9200/myindex/catalog/1' -d ' {
    "name":"corolla",
    "price":17999,
    "description":"this is a car that is a car that gets you places",
    "taxonomies":"/cars/economy"
    }'
    
    curl -XPUT 'localhost:9200/myindex/catalog/2' -d ' {
    "name":"ferrari",
    "price":221500,
    "description":"this is a car that is a quick car",
    "taxonomies":"/cars/ultimate"
    }'
    
    curl -XPUT 'localhost:9200/myindex/catalog/3' -d ' {
    "name":"ferrari500",
    "price":521500,
    "description":"this is a car that is fast car!!!",
    "taxonomies":"/cars/ultimate/extreme"
    }'
    
    curl -XPUT 'localhost:9200/myindex/catalog/4' -d ' {
    "name":"tercel",
    "price":2500,
    "description":"this is a car that is affordable",
    "taxonomies":"/cars/economy/best price"
    }'
    
    
    curl -XPOST 'localhost:9200/myindex/_search?pretty' -d ' {  
       "size":0,
       "aggs":{  
          "stuff":{  
             "terms":{  
                "field":"taxonomies"
             }
          }
       }
    }'
    
    -------------------------------------------------------------------------------
        OUTPUT
    -------------------------------------------------------------------------------
    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 4,
        "max_score" : 0.0,
        "hits" : [ ]
      },
      "aggregations" : {
        "stuff" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [ {
            "key" : "cars",
            "doc_count" : 4
          }, {
            "key" : "economy",
            "doc_count" : 2
          }, {
            "key" : "ultimate",
            "doc_count" : 2
          }, {
            "key" : "best",
            "doc_count" : 1
          }, {
            "key" : "extreme",
            "doc_count" : 1
          }, {
            "key" : "price",
            "doc_count" : 1
          } ]
        }
      }
    }

(Lee Hinman) #2

Hi Jack,

It looks like you are specifying the mapping incorrectly, you have an extra level where "myindex" is specified.

I tested and this works correctly for me:

PUT /myindex
{
  "settings":{
    "index": {
      "analysis":{
        "analyzer":{
          "analyzer_taxonomy": {
            "type": "custom",
            "tokenizer": "path_hierarchy"
          }
        }
      }
    }
  },
  "mappings":{
    "catalog":{
      "properties":{
        "taxonomies":{
          "analyzer":"analyzer_taxonomy",
          "type":"string"
        }
      }
    }
  }
}

(I removed the other fields while I was trying to reproduce this)

;; Lee


(system) #3