I'm working with the Zscaler integration and noticed that when passing the logs through the pipeline the uri_parts ingest processor does not create url.domain out of the URL that Zscaler sends based off the test logs from the repo.
I built a pipeline to just test the uri_parts processor and it seems that not including a scheme is the cause for not populating the domain, so adding https:// to the URL will then populate url.domain. My question then, is this expected behavior? Zscaler does not send the scheme with the URL and while it probably isn't that big a deal to modify the pipeline, I'm trying to keep it as close to vanilla so upgrading the integration is simple. Sample outputs below:
## Sample without scheme
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "test uri_parts",
"processors": [
{
"uri_parts" : {
"field" : "message"
}
}
]
},
"docs":[
{
"_source":{
"message":"www.example.com/testpath/index.php?user=joe"
}
}
]
}
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_id" : "_id",
"_source" : {
"message" : "www.example.com/testpath/index.php?user=joe",
"url" : {
"path" : "www.example.com/testpath/index.php",
"extension" : "php",
"original" : "www.example.com/testpath/index.php?user=joe",
"scheme" : null,
"domain" : null,
"query" : "user=joe"
}
},
"_ingest" : {
"timestamp" : "2022-03-20T18:35:01.534115808Z"
}
}
}
]
}
## Sample with scheme
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "test uri_parts",
"processors": [
{
"uri_parts" : {
"field" : "message"
}
}
]
},
"docs":[
{
"_source":{
"message":"https://www.example.com/testpath/index.php?user=joe"
}
}
]
}
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_id" : "_id",
"_source" : {
"message" : "https://www.example.com/testpath/index.php?user=joe",
"url" : {
"path" : "/testpath/index.php",
"extension" : "php",
"original" : "https://www.example.com/testpath/index.php?user=joe",
"scheme" : "https",
"domain" : "www.example.com",
"query" : "user=joe"
}
},
"_ingest" : {
"timestamp" : "2022-03-20T18:35:13.970828081Z"
}
}
}
]
}