Hi Jorge,
your immediate problem is the tokenization process. You can use the Analyze API to inspect what's going on.
First we create an index:
PUT /customers
{
"mappings": {
"customers_list": {
"properties": {
"categories": {
"type": "string"
}
}
}
}
}
Then we can use the Analyze API to see the tokenization in action:
GET /customers/_analyze
{
"field": "customers_list.categories",
"text": "[{\"first_level\":585,\"second_level\":[1559,2445]},{\"first_level\":987,\"second_level\":[20384]}]"
}
The response is (I just post a shortened version which highlights the problem):
{
"tokens": [
{
"token": "first_level",
"start_offset": 3,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "585",
"start_offset": 16,
"end_offset": 19,
"type": "<NUM>",
"position": 1
},
{
"token": "second_level",
"start_offset": 21,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "1559,2445",
"start_offset": 36,
"end_offset": 45,
"type": "<NUM>",
"position": 3
},
...
{
"token": "20384",
"start_offset": 83,
"end_offset": 88,
"type": "<NUM>",
"position": 7
}
]
}
So, your second_level field gets tokenized by the analyzer of this field as "1559,2445" and not as two separate tokens as you seem to expect. You could maybe change the analyzer of this field but an even better solution is to change your data model.
If you really just have two levels of categories I'd rather store them as two separate fields. Also, I don't understand why you have two numbers on the second level. Nevertheless, I'd suggest you rather change your mapping to something along these lines.
PUT /customers
{
"mappings": {
"customers_list": {
"properties": {
"categories": {
"type": "nested",
"properties": {
"first_level": {
"type": "integer"
},
"second_level": {
"type": "integer"
}
}
}
}
}
}
}
As I really don't know what the second number for thesecond_level
field should be, I simplified the example but this should get you started. I've also assumed that your keys are integers, you can also use long if you need to.
If you really need the JSON structure in Elasticsearch you can still add it as a not_analyzed
field.
To complete the mapping example, let's insert two customers:
POST /customers/customers_list/1
{
"categories": [
{
"first_level": 585,
"second_level": 2445
},
{
"first_level": 987,
"second_level": 20384
}
]
}
POST /customers/customers_list/2
{
"categories": [
{
"first_level": 600,
"second_level": 3500
},
{
"first_level": 987,
"second_level": 20384
}
]
}
And search for the customer with a range query similar to your original one (for second_level = 2445
), which should return only the first customer:
GET /customers/customers_list/_search
{
"query": {
"nested": {
"path": "categories",
"query": {
"range": {
"categories.second_level": {
"from": 2445,
"to": 2445
}
}
}
}
}
}
And indeed, only the first customer is returned:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "customers",
"_type": "customers_list",
"_id": "1",
"_score": 1,
"_source": {
"categories": [
{
"first_level": 585,
"second_level": 2445
},
{
"first_level": 987,
"second_level": 20384
}
]
}
}
]
}
}
If you need to search on multiple levels you can combine the individual queries with a bool query.
Also the data modelling chapter of the Elasticsearch definitive guide should help.
Daniel