Just wanted to provide some additional details here. I'm currently trying to import log files from Cloudflare into the ElasticStack. I’m attempting to follow the instructions here:
In addition to the above tests, I have also performed a more accurate test:
Create pipeline (pulled from the Cloudflare file):
PUT /_ingest/pipeline/jmggeoip
{
"description": "My Log Pipeline",
"processors": [
{
"geoip": {
"field": "ClientIP",
"target_field": "source.geo",
"properties": [
"ip",
"country_name",
"continent_name",
"region_iso_code",
"region_name",
"city_name",
"timezone",
"location"
]
}
}
]
}
Create index template mapping (pulled from the Cloudflare file):
PUT /_template/jmgtemplate
{
"index_patterns": [
"jmgindex-*"
],
"mappings": {
"properties": {
"source.geo": {
"properties": {
"ip": {
"type": "ip"
},
"postal_code": {
"type": "keyword"
},
"location": {
"type": "geo_point"
},
"dma_code": {
"type": "long"
},
"country_code3": {
"type": "keyword"
},
"latitude": {
"type": "float"
},
"longitude": {
"type": "float"
},
"region_name": {
"type": "keyword"
},
"city_name": {
"type": "keyword"
},
"timezone": {
"type": "keyword"
},
"country_code2": {
"type": "keyword"
},
"continent_code": {
"type": "keyword"
},
"country_name": {
"type": "keyword"
},
"region_code": {
"type": "keyword"
},
"continent_name": {
"type": "keyword"
},
"region_iso_code": {
"type": "keyword"
}
}
}
}
},
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "1",
"mapping.ignore_malformed": true
}
}
}
Create index (index pattern matching above and pipline created above):
PUT /jmgindex-test/_doc/my_id?pipeline=jmggeoip
{"ClientIP":"8.8.8.8"}
Fetch the index:
GET /jmgindex-test/_doc/my_id
This call returns the following information:
{
"_index" : "jmgindex-test",
"_type" : "_doc",
"_id" : "my_id",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"source" : {
"geo" : {
"continent_name" : "North America",
"timezone" : "America/Chicago",
"ip" : "8.8.8.8",
"country_name" : "United States",
"location" : {
"lon" : -97.822,
"lat" : 37.751
}
}
},
"ClientIP" : "8.8.8.8"
}
}
So, as you can see, we are still getting latitude and longitude back. Now, let’s look at the field mapping:
Now, we are properly mapping to “geo_point”. However, while this example seems to be working, the ingest process I set up for Cloudflare is not working. So, there must be something about the setup process that is missing. I'm leaning towards the Lambda function that is provided by Cloudflare.
Just ran an even more accurate test to try and narrow this down. I created a brand new index using the existing Cloudflare pipeline and Cloudflare index template that I already submitted to Elastic. I also pulled a single JSON record from one of our edge logs that is getting dumped up to S3:
PUT /cloudflare-123/_doc/my_ip?pipeline=cloudflare-pipeline-weekly
{"BotScore":76,"BotScoreSrc":"Machine Learning","CacheCacheStatus":"hit","CacheResponseBytes":257510,"CacheResponseStatus":200,"CacheTieredFill":false,"ClientASN":####,"ClientCountry":"us","ClientDeviceType":"desktop","ClientIP":"###.###.###.###,"ClientIPClass":"noRecord","ClientRequestBytes":4147,"ClientRequestHost":"www.sample.com","ClientRequestMethod":"GET","ClientRequestPath":"/shop/test","ClientRequestProtocol":"HTTP/2","ClientRequestReferer":"https://www.sample.com/shop/test2","ClientRequestURI":"/shop/test","ClientRequestUserAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36","ClientSSLCipher":"AEAD-AES128-GCM-SHA256","ClientSSLProtocol":"TLSv1.3","ClientSrcPort":52355,"ClientXRequestedWith":"","EdgeColoCode":"BNA","EdgeColoID":115,"EdgeEndTimestamp":"2020-05-20T00:00:06Z","EdgePathingOp":"wl","EdgePathingSrc":"macro","EdgePathingStatus":"nr","EdgeRateLimitAction":"","EdgeRateLimitID":0,"EdgeRequestHost":"www.sample.com","EdgeResponseBytes":61001,"EdgeResponseCompressionRatio":4.29,"EdgeResponseContentType":"text/html","EdgeResponseStatus":200,"EdgeServerIP":"","EdgeStartTimestamp":"2020-05-20T00:00:06Z","FirewallMatchesActions":[],"FirewallMatchesRuleIDs":[],"FirewallMatchesSources":[],"OriginIP":"","OriginResponseBytes":0,"OriginResponseHTTPExpires":"","OriginResponseHTTPLastModified":"","OriginResponseStatus":0,"OriginResponseTime":0,"OriginSSLProtocol":"unknown","ParentRayID":"00","RayID":"####","SecurityLevel":"med","WAFAction":"unknown","WAFFlags":"0","WAFMatchedVar":"","WAFProfile":"unknown","WAFRuleID":"","WAFRuleMessage":"","ZoneID":####}
This created a new index called “cloudflare-2020-05-18”. When I queried the index, it returned a valid result with geo_point information:
GET /cloudflare-2020-05-18/_doc/my_ip
…
"found" : true,
"_source" : {
"BotScoreSrc" : "Machine Learning",
"source" : {
"geo" : {
"continent_name" : "North America",
"region_iso_code" : "US-TN",
"city_name" : "Murfreesboro",
"country_iso_code" : "us",
"timezone" : "America/Chicago",
"ip" : "###.###.###.###",
"country_name" : "United States",
"region_name" : "Tennessee",
"location" : {
"lon" : -86.3881,
"lat" : 35.8437
}
},
"as" : {
"number" : ####
},
…
This is why I’m hitting a wall. Everything “seems” to be setup properly from the Elastic side and I think the above proves the geo_point mapping and geoip functionality is working fine. I believe this also verifies that the Cloudflare index template and pipeline are working. The only thing I noticed about the Lambda function is that it is using a deprecated bulk load method, so perhaps that is impacting this? Here is that warning:
WARNING ... [types removal] Specifying types in bulk requests is deprecated."
Any help tracking down this issue would be greatly appreciated!