Elastic APM GeoIP Pipeline on cloud service

Kibana version: 7
Elasticsearch version:7
APM Server version:7
APM Agent language and version:RUM/js
Browser version:
Original install method (e.g. download page, yum, deb, from source, etc.) and version:cloud:
**Fresh install or upgraded from other version?**deployment created by elastic cloud

Hi , i am using the cloud deployment for elastic stack, i have 2 pipelines used for APM data

  1. user_agent
  2. apm_user_geoip
    the first pipeline added was the user_agent and it seems to be working.
    the second pipeline i am not sure
  3. the pipeline :

"apm_user_geoip" : {
"description" : "Resolve GeoIP information for APM events",
"processors" : [
{
"geoip" : {
"field" : "client.ip",
"target_field" : "qageo",
"ignore_missing" : true
}
}
]
}

  1. the test :

GET /_ingest/pipeline/apm_user_geoip/_simulate
{
"docs": [
{
"_source": {
"client": {
"ip": "108.2.12.80"
}
}
}
]
}

  1. test response :

{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"client" : {
"ip" : "108.2.12.80"
},
"qageo" : {
"continent_name" : "North America",
"region_iso_code" : "US-PA",
"city_name" : "Philadelphia",
"region_name" : "Pennsylvania",
"location" : {
"lon" : -75.1968,
"lat" : 39.9597
},
"country_iso_code" : "US"
}
},
"_ingest" : {
"timestamp" : "2019-05-06T19:31:19.219Z"
}
}
}
]
}

4.the pipelines in the APM

Note that the syntax for user settings can change between major versions.

You might need to update these user settings before performing a major version upgrade.

To learn more, see the documentation.

apm-server.register.ingest.pipeline.enabled: true
output.elasticsearch.pipelines:

  • pipeline: "apm_user_agent"
  • pipeline: "apm_user_geoip"

the problem :
i can see the client.ip field , i can not the the "qageo" target being created for it.

i am sure i have missed something .

any ideas ?
thanks ,
Ariel.

Hey @ariel_k, thanks for the detailed description of the issue. Your setup looks good to me so we'll need more inforrmation to figure this out. How are you checking for the qageo field? Would post whatever you're able to share from this query:

GET apm-*/_search
{
  "query": {
    "exists": {
      "field": "client.ip"
    }
  }
}

qageo won't be indexed by default so I'd like to rule that the chance the data is present but just not queryable yet - you'll want to update your index template and recreate the index if that's the case.

Hey @gil , about the search i can see results and i can see the client.ip field in documents.
this is part of the response to the search you suggested. any more info from this search is needed , the complete response is huge.
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 19,
"successful" : 19,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1127,
"relation" : "eq"
},

regarding the index, i was able to refresh it.
i am not sure how to re-create the index.
thanks,
Ariel

Hey @gil is there any information you need that i have not added ?
thanks
Ariel

I initially overlooked your configuration, the problem is here:

output.elasticsearch.pipelines:
- pipeline: "apm_user_agent"
- pipeline: "apm_user_geoip"

Only the first matching pipeline is used in the indexing query, in this case there are no conditions so only the first pipeline is ever used. I'd suggest combining the pipelines into a single one with multiple processors like:

[
  {
    "id": "apm_user_info",
    "body": {
      "description": "Add user agent information for APM events",
      "processors": [
        {
          "user_agent": {
            "field": "user_agent.original",
            "target_field": "user_agent",
            "ignore_missing": true
          }
        },
        {
          "geoip": {
            "field": "client.ip",
            "target_field": "qageo",
            "ignore_missing": true
          }
        }
      ]
    }
  }
]

And then using it:

GET /_ingest/pipeline/apm_user_info/_simulate
{
  "docs": [
    {
      "_source": {
        "user_agent": {
          "original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
        },
        "client": {
          "ip": "108.2.12.80"
        }
      }
    }
  ]
}

yields the expected result:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "client" : {
            "ip" : "108.2.12.80"
          },
          "user_agent" : {
            "name" : "Chrome",
            "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36",
            "os" : {
              "name" : "Mac OS X",
              "version" : "10.13.6",
              "full" : "Mac OS X 10.13.6"
            },
            "device" : {
              "name" : "Other"
            },
            "version" : "73.0.3683"
          },
          "qageo" : {
            "continent_name" : "North America",
            "region_iso_code" : "US-PA",
            "city_name" : "Philadelphia",
            "region_name" : "Pennsylvania",
            "location" : {
              "lon" : -75.1968,
              "lat" : 39.9597
            },
            "country_iso_code" : "US"
          }
        },
        "_ingest" : {
          "timestamp" : "2019-05-07T23:20:16.598455Z"
        }
      }
    }
  ]
}

We plan to add something similar as a default in the near future, you can follow https://github.com/elastic/apm-server/issues/1283 for updates on that effort.

Hey @gil , thank you , its working :slight_smile:

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.