Okay, heres an example, maybe still not perfect but I think it shows some of the possibilities you have:
I created a test index with two analyzers, one for indexing (using the path_hierarchy
tokenizer) and one for the query, and a mapping for a doc just containing this ip field:
PUT /ip4test
{
"settings": {
"analysis": {
"tokenizer": {
"ip_4_tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
}
},
"filter": {
"remove_trailing_dot": {
"type": "pattern_replace",
"pattern": "\\.$",
"replace": ""
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "ip_4_tokenizer"
},
"dedot_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"remove_trailing_dot"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"ip": {
"type": "string",
"analyzer": "my_analyzer",
"search_analyzer": "dedot_keyword"
}
}
}
}
}
If you now use the _analyze
endpoint you can see how the IP adress gets broken up at index time:
curl -XGET 'localhost:9200/ip4test/_analyze?pretty&analyzer=my_analyzer' -d 11.22.33.44
{
"tokens" : [ {
"token" : "11",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
}, {
"token" : "11.22",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
}, {
"token" : "11.22.33",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
}, {
"token" : "11.22.33.44",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 0
} ]
}
So now enter some docs:
PUT /ip4test/my_type/1
{
"ip" : "11.4.76.03"
}
PUT /ip4test/my_type/2
{
"ip" : "11.4.71.04"
}
PUT /ip4test/my_type/3
{
"ip" : "11.41.71.04"
}
And do some querying:
GET /ip4test/my_type/_search
{
"query": { "match": {
"ip" : "11.4"
}
}
, "highlight": {
"fields": {"ip" : {}}
}
}
"hits": [
{
"_index": "ip4test",
"_type": "my_type",
"_id": "2",
"_score": 0.30685282,
"_source": {
"ip": "11.4.71.04"
},
"highlight": {
"ip": [
"<em>11.4</em>.71.04"
]
}
},
{
"_index": "ip4test",
"_type": "my_type",
"_id": "1",
"_score": 0.30685282,
"_source": {
"ip": "11.4.76.03"
},
"highlight": {
"ip": [
"<em>11.4</em>.76.03"
]
}
}
]
See how this one didn't match 11.41.71.04
not sure if that was the intention or not. I no, you have to use prefixes.
Without removing the dot at the end of the query term, the next example would return no results, but thanks to the pattern_replace
filter it does:
GET /ip4test/my_type/_search
{
"query": { "match": {
"ip" : "11.4."
}
}
, "highlight": {
"fields": {"ip" : {}}
}
}
"hits": [
{
"_index": "ip4test",
"_type": "my_type",
"_id": "2",
"_score": 0.30685282,
"_source": {
"ip": "11.4.71.04"
},
"highlight": {
"ip": [
"<em>11.4</em>.71.04"
]
}
},
{
"_index": "ip4test",
"_type": "my_type",
"_id": "1",
"_score": 0.30685282,
"_source": {
"ip": "11.4.76.03"
},
"highlight": {
"ip": [
"<em>11.4</em>.76.03"
]
}
}
]
Depending on your exact use case you might have to modify this a bit. Hope that helps a bit.