Hey,
I'm looking for a way to transform
{
"related": {
"user": [
"user1@domain",
"user2@anotherdomain"
]
}
}
into
{
"related": {
"user": [
"user1@domain",
"user1",
"user2@anotherdomain",
"user2"
]
}
}
... using ingest processors. Basically, I have an array of User Principal Names and want to extend the field with just the sAMAccountNames also.
I've tried
- Just Grok: does not work on Array fields
- Just Split: does not work on Array fields
- Foreach + Grok: Target expression
%{DATA:_ingest.temp_user_name}@%{GREEDYDATA:_ingest.temp_user_domain} always overrides the _ingest.temp_user_name and does not append - so at the end of the foreach, only one value is in _ingest.temp_user_name.
- Foreach + Split without target_field (in-place): yields
[["user1","domain"],["user2","anotherdomain"]] - but also overrides the original data. plus I wouldn't know how to collect this into just ["user1", "user2"]
- Foreach + Split with target_field: Will also override the target_field and only keep the value of the last iteration.
Anyone got more Ideas on how to solve this? do i have to go with script processors? 
HI @nemhods ,
Welcome to the Elastic community. I think yes you have to go with script processor, Since this requirement looks more custom.
Below is something which worked for me.
Create a Pipeline
PUT _ingest/pipeline/test-p1
{
"description": "Convert email addresses to both full and username format",
"processors": [
{
"script": {
"source": """
ctx.related.tmp_user = new ArrayList();
for (int i = 0; i < ctx.related.user.size(); i++) {
def email = ctx.related.user[i];
def username = email.splitOnToken('@')[0];
ctx.related.tmp_user.add(username);
ctx.related.tmp_user.add(email);
}
ctx.related.user = ctx.related.tmp_user;
ctx.related.remove('tmp_user');
""",
"lang": "painless"
}
}
]
}
Index sample data
POST test-index1/_doc?pipeline=test-p1
{
"related":{
"user":[
"test@domain.com",
"test1@domain1.com",
"test2@domain2.com"
]
}
}
Output
GET test-index1/_search
Docs
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "test-index1",
"_id": "KuIriosBa7YTodDiw2wW",
"_score": 1,
"_source": {
"related": {
"user": [
"test",
"test@domain.com",
"test1",
"test1@domain1.com",
"test2",
"test2@domain2.com"
]
}
}
}
]
}
}
Hey,
awesome, thanks for already providing the script solution.
I've added a check to make sure the operation only runs on array members that contain an "@" sign.
ctx.related.tmp_user = new ArrayList();
for (int i = 0; i < ctx.related.user.size(); i++) {
def email = ctx.related.user[i]; // email/upn
ctx.related.tmp_user.add(email);
if (email.contains("@")) {
def username = email.splitOnToken('@')[0];
ctx.related.tmp_user.add(username);
}
}
ctx.related.user = ctx.related.tmp_user;
ctx.related.remove('tmp_user');