Hi @Ankita_Pachauri
You have a couple of things going on....
Look at this....
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"kv": {
"field": "message",
"field_split": "\\\n",
"value_split": ":",
"trim_key": " ",
"trim_value": " "
}
}
]
},
"docs": [
{
"_source" : {
"message": " URL : https://www.google.com\n Action Type : Response Received\n RequestDateTime : 9/24/2024 18:06:15\n ResponseDateTime : 9/24/2024 18:06:15\n ErrorCode : \n Message : \n Stage : Response Came From Web Request\n ErrLog Generated_on : 2024/09/24 06:06:15:718\n\n TrxRef.No : 111111111\n InputXML : XXXXXXXXXXX\n ResponseReceivedData : AN is mandatory0001\n---------------------------------END LOG---------------------------------------------------"
}
},
{
"_source" : {
"message": " URL : https://www.google.com\n Action Type : Response Received\n RequestDateTime : 9/24/2024 18:06:15\n ResponseDateTime : 9/24/2024 18:06:15\n ErrorCode : \n Message : \n Stage : Response Came From Web Request\n ErrLog Generated_on : 2024/09/24 06:06:15:718\n TrxRef.No : 111111111\n InputXML : XXXXXXXXXXX\n ResponseReceivedData : AN is mandatory0001\n"
}
}
]
}
And the result...
{
"docs": [
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "field [message] does not contain value_split [:]"
}
],
"type": "illegal_argument_exception",
"reason": "field [message] does not contain value_split [:]"
}
},
{
"doc": {
"_index": "_index",
"_version": "-3",
"_id": "_id",
"_source": {
"ResponseDateTime": "9/24/2024 18:06:15",
"Message": "",
"InputXML": "XXXXXXXXXXX",
"ErrLog Generated_on": "2024/09/24 06:06:15:718",
"message": """ URL : https://www.google.com
Action Type : Response Received
RequestDateTime : 9/24/2024 18:06:15
ResponseDateTime : 9/24/2024 18:06:15
ErrorCode :
Message :
Stage : Response Came From Web Request
ErrLog Generated_on : 2024/09/24 06:06:15:718
TrxRef.No : 111111111
InputXML : XXXXXXXXXXX
ResponseReceivedData : AN is mandatory0001
""",
"TrxRef": {
"No": "111111111"
},
"ResponseReceivedData": "AN is mandatory0001",
"Action Type": "Response Received",
"URL": "https://www.google.com",
"RequestDateTime": "9/24/2024 18:06:15",
"Stage": "Response Came From Web Request",
"ErrorCode": ""
},
"_ingest": {
"timestamp": "2024-09-27T16:24:23.001941814Z"
}
}
}
]
}
So a couple things....
First that text at the end
---------------------------------END LOG---------------------------------------------------
Does not have a :
so it can not be split and fails
Also in the middle of the message
ErrLog Generated_on : 2024/09/24 06:06:15:718\n\n
Has 2 \n
s so that also fails so I manually cleaned that up in the example above
You will need to clean that up with some processing ahead of time...
Not pretty but this works...
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"gsub": {
"field": "message",
"pattern": "\\\n\\\n",
"replacement": "\\\n"
}
},
{
"gsub": {
"field": "message",
"pattern": "---------------------------------END LOG---------------------------------------------------",
"replacement": ""
}
},
{
"kv": {
"field": "message",
"field_split": "\\\n",
"value_split": ":",
"trim_key": " ",
"trim_value": " "
}
}
]
},
"docs": [
{
"_source" : {
"message": " URL : https://www.google.com\n Action Type : Response Received\n RequestDateTime : 9/24/2024 18:06:15\n ResponseDateTime : 9/24/2024 18:06:15\n ErrorCode : \n Message : \n Stage : Response Came From Web Request\n ErrLog Generated_on : 2024/09/24 06:06:15:718\n\n TrxRef.No : 111111111\n InputXML : XXXXXXXXXXX\n ResponseReceivedData : AN is mandatory0001\n---------------------------------END LOG---------------------------------------------------"
}
}
]
}
Result
{
"docs": [
{
"doc": {
"_index": "_index",
"_version": "-3",
"_id": "_id",
"_source": {
"ResponseDateTime": "9/24/2024 18:06:15",
"Message": "",
"InputXML": "XXXXXXXXXXX",
"ErrLog Generated_on": "2024/09/24 06:06:15:718",
"message": """ URL : https://www.google.com
Action Type : Response Received
RequestDateTime : 9/24/2024 18:06:15
ResponseDateTime : 9/24/2024 18:06:15
ErrorCode :
Message :
Stage : Response Came From Web Request
ErrLog Generated_on : 2024/09/24 06:06:15:718
TrxRef.No : 111111111
InputXML : XXXXXXXXXXX
ResponseReceivedData : AN is mandatory0001
""",
"TrxRef": {
"No": "111111111"
},
"ResponseReceivedData": "AN is mandatory0001",
"Action Type": "Response Received",
"URL": "https://www.google.com",
"RequestDateTime": "9/24/2024 18:06:15",
"Stage": "Response Came From Web Request",
"ErrorCode": ""
},
"_ingest": {
"timestamp": "2024-09-27T16:31:03.335552899Z"
}
}
}
]
}