HTML log file / PARSE FAILURE

Hello everyone!

I am trying to get a dashboard in Kibana of several HTML LOG files... the patter of those logs are like this example:

<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>
          Job
          </title>
<style type="text/css">
            body {font-family: verdana,arial,sans-serif; font-size:10pt;}
            h1 {font-size: 14pt }
            h2 {font-size: 12pt }
            table, tr, td, th {border: 1px;  border-spacing: 0px; border-collapse: collapse;font-size:10pt; text-align: left}
            td, th { padding: 5px; }
            th { padding-left: 0px; }
            a:link,a:visited { text-decoration: none; }
        </style>
</head>
<body>
<h1>Job 30226863</h1>
<table>
<tr>
<th>Domain</th><td>idcs-5e64742091c341b58db050b43599e161</td>
</tr>
<tr>
<th>Service</th><td>Database20180625203859QS</td>
</tr>
<tr>
<th>Namespace</th><td>cloudstack</td>
</tr>
<tr>
<th>Service Type</th><td>CloudStack</td>
</tr>
<tr>
<th>Compute Site</th><td>uscom-central-1</td>
</tr>
<tr>
<th>Username</th><td>siyuan15@uw.edu</td>
</tr>
<tr>
<th>Operation</th><td>create-psm-stack-service</td>
</tr>
<tr>
<th>Status</th><td>Failed</td>
</tr>
<tr>
<th>Sub Status</th><td></td>
</tr>
<tr>
<th>Create Time</th><td>2018-06-25T20:40:06.656+00:00</td>
</tr>
<tr>
<th>Start Time</th><td>2018-06-25T20:40:06.656+00:00</td>
</tr>
<tr>
<th>End Time</th><td>2018-06-25T21:28:35.417+00:00</td>
</tr>
<tr>
<th>Update Time</th><td>2018-06-25T21:28:35.419+00:00</td>
</tr>
<tr>
<th>Job Info</th><td></td>
</tr>
<tr>
<th>Request Parameters</th><td>{{{namespace=cloudstack, serviceInstance=Database20180625203859QS, securityAuthUser=siyuan15@uw.edu, is_sit_service=false, serviceType=CloudStack, isUpdateRequest=true, quick_start_instance=true, parent_agg
r_service_name=Database20180625203859QS, stack_service_base_uri=https://paassvcmngrinternal-us2-sm02.oraclecloud.com:8888/paas/, OPERATION_SERVICE_TYPE=CloudStack, activityType=CREATE_SERVICE, stackTemplate=Oracle-DBCS-Enterprise-Edition
:1.0.4, tenant=idcs-5e64742091c341b58db050b43599e161, serviceVersion=All, serviceName=Database20180625203859QS, serviceId=733181, operationName=create-psm-stack-service}}}</td>
</tr>
<tr>
<th>Supplemental Logs</th><td>none</td>
</tr>
<tr>
<th>Summary</th><td>
<pre>Job &lt;30226863&gt; v11, action=handleFailure, Failed, namespace=cloudstack, service type=CloudStack, version=All, operation=create-psm-stack-service, cleanupActionIndex=-1, retryCount:0, jobRetryCount:0, jobRetryWaitTime:0, create
d: 2018-06-25T20:40:06.656+0000, started: 2018-06-25T20:40:06.656+0000, failingStartTime: 2018-06-25T21:28:35.272+0000, domain:idcs-5e64742091c341b58db050b43599e161, instance:Database20180625203859QS, wm:SM-MS-chr302ru26.usdc2.oracleclo,
 owner:siyuan15@uw.edu

FAILED CURRENT JOB 30226863: action: createChildJobsLevel1

FAILED CHILD JOB 30226864: action: createServiceAssociation
FAILED CHILD JOB 29985247: action: addTags, job trail: 30226863 &gt; 30226864 &gt; 29985247
    code: sm.job.unexpected.execution, message: sm.job.unexpected.execution: An exception occurred during operation execution: action: com.oracle.cloudservice.db.service.operation.CreateDBaaSServiceOperation, addTags</pre>
</td>
</tr>

I created this file.conf in Logstash and it is "working"...

# THIS FILE HAS THE INPUT + FILTER AND OUTPUT OF HTML FILES (FOR THE BUGS)
# THIS IS THE INPUT CONFIGURATION
input {
  file {
    id => "htmlLogs_input_file"
    path => "/home/logs/create-dbaas-service/*.htm"
    type => "htm"
    start_position => beginning
    sincedb_path => "/dev/null"
  }
}
# THIS IS THE FILTER CONFIGURATION
filter {
  if [type] == "htm" {
    grok {
      id => "htmlLogs_filter_grok"
      match => { "message" => "<h1>%{DATA:Job_word}%{SPACE}%{NUMBER:job_id}</h1><table><tr><th>%{DATA:Domain_word}</th><td>%{DATA:Domain_id}</td></tr><tr><th>%{DATA:Service_word}</th><td>%{DATA:Service_id}</td></tr><tr><th>%{DATA:Namespace_word</th><td>%{DATA:Namespace_type}</td></tr><tr><th>%{DATA:ServiceType_word}</th><td>%{DATA:ServiceType_id}</td></tr><tr><th>%{DATA:ComputeSite_word}</th><td>%{DATA:ComputeSite_id}</td></tr><tr><th>>%{DATA:Username_word}</th><td>%{DATA:Username_value}</td></tr><tr><th>%{DATA:Operation_word}</th><td>%{DATA:Operation_value}</td></tr><tr><th>%{DATA:Status_word}</th><td>%{DATA:Status_value}</td></tr><tr><th>%{DATA:SubStatus_word}</th><td>%{DATA:SubStatus_value</td></tr><tr><th>%{DATA:CreateTime_word}</th><td>%{TIMESTAMP_ISO8601}</td></tr><tr><th>%{DATA:StartTime_word}</th><td>%{TIMESTAMP_ISO8601}</td></tr><tr><th>%{DATA:EndTime_word}</th><td>%{TIMESTAMP_ISO8601}</td></tr><tr><th>%{DATA:UpdateTime_word}</th><td>%{TIMESTAMP_ISO8601}</td></tr><tr><th>%{DATA:JobInfo_word}</th><td>%{DATA:JobInfo_text}</td></tr><tr><th>%{DATA:RequestParameters_word}</th><td>%{DATA:RequestParameters_log}</tr><tr><th>%{DATA:SupplementalLogs_word}</th><td>%{DATA:SupplementalLogs_value}</td></tr><tr><th>%{DATA:Summary_word}</th><td><pre>%{DATA:Summary_log}</pre></td></tr></table>" }
    }
  }
}
# THIS IS THE OUTPUT CONFIGURATION
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    id => "htmlLogs_output_ES" }
  stdout {
    codec => rubydebug }
}

... but the GROK filtering section seems that is not defined correctly, because in Discovery section (Kibana), I am getting this output (here seems that all is fine, because the message section is showing info form the HTML log (that sound good))...

but I need to filter the data... and also the next image showing the tag: _grokparsefailure :

Could someone help me to know what is the issue with my file.conf (filter grok parse section)?

THANKS!!

Hello... I improved my GROK filter using this website: GROK Debbuger and I could match the HTML log

<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>
          Job
          </title>
<style type="text/css">
            body {font-family: verdana,arial,sans-serif; font-size:10pt;}
            h1 {font-size: 14pt }
            h2 {font-size: 12pt }
            table, tr, td, th {border: 1px;  border-spacing: 0px; border-collapse: collapse;font-size:10pt; text-align: left}
            td, th { padding: 5px; }
            th { padding-left: 0px; }
            a:link,a:visited { text-decoration: none; }
        </style>
</head>
<body>
<h1>Job 30742430</h1>
<table>
<tr>
<th>Domain</th><td>Z14JHV5895044878</td>
</tr>
<tr>
<th>Service</th><td>DgA0630US2Z14301222</td>
</tr>
<tr>
<th>Namespace</th><td>dbaas</td>
</tr>
<tr>
<th>Service Type</th><td>dbaas</td>
</tr>
<tr>
<th>Compute Site</th><td>US006_Z16</td>
</tr>
<tr>
<th>Username</th><td>c9qa-infra_ww@oracle.com</td>
</tr>
<tr>
<th>Operation</th><td>create-dbaas-service</td>
</tr>
<tr>
<th>Status</th><td>Failed</td>
</tr>
<tr>
<th>Sub Status</th><td></td>
</tr>
<tr>
<th>Create Time</th><td>2018-06-30T12:23:46.092+00:00</td>
</tr>
<tr>
<th>Start Time</th><td>2018-06-30T12:31:33.936+00:00</td>
</tr>
<tr>
<th>End Time</th><td>2018-06-30T12:39:22.174+00:00</td>
</tr>
<tr>
<th>Update Time</th><td>2018-06-30T12:39:22.184+00:00</td>
</tr>
<tr>
<th>Job Info</th><td></td>
</tr>
<tr>
<th>Request Parameters</th><td>{{{trial=false, enableListenerPort=false, description=Description For Test Service, subscriptionType=HOURLY, dbConsolePort=1158, disasterRecovery=false, listenerPort=1521, cloudStorageContainer=https://us2.storage.oraclecloud.com/v1/Storage-Z14JHV5895044878/dbbackup, serviceInstance=DgA0630US2Z14301222, server_base_uri=https://jaas.oraclecloud.com:443/paas/service/dbcs/, ibkupOnPremise=false, operationName=create-dbaas-service, backupDestination=BOTH, noRetry=false, goldenGate=false, createStorageContainerIfMissing=false, version=11.2.0.4, serviceVersion=11.2.0.4, serviceEntitlementId=14099, timezone=UTC, isRac=false, usableStorage=50, isBYOL=false, sid=ORCL, emExpressPort=5500, computeSiteName=US006_Z16, useHighPerformanceStorage=false, noRollback=false, sla=NONE, assignPublicIP=true, ibkup=no, useOAuthForStorage=false, edition=EE, tenant=Z14JHV5895044878, hdg=false, provisioningTimeout=180, cloudStorageUser=c9qa-infra_ww@oracle.com, level=PAAS, count=2, serviceType=dbaas, enableNotification=false, failoverDatabase=false, serviceName=DgA0630US2Z14301222, identity_domain_id=Z14JHV5895044878, charset=AL32UTF8, tags=[], vmPublicKeyText=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxIdPC1Nh+DjSHLlLjum/sqZHjgG4R/Qftc7b8FBbSy3fVp7HNYetPimEPgcz5D6kBHCmaweQTj2VhhxrKEFkfRG43jdWg9ycnrCkvfkqIhBGj9j5UcBuWnPpwR9qN7KTsagrUNx4EqdGxbW7Yda38hIN5vREip8rnc/IQdGEg/8waD2oCkDb2xxSAmJ4uwGW+QPt7DGOPRH7+8PSeoeuTD2A2N2leA7pwIhWzHe/0jm4I85Gj98wD6dHfOOr1GbPiT4vPc/qJeMpelEddjsQ88buiIHZ0AOz9lQEskJ4gHAgohSz5g7x5HUz9x6Tc2fW2chFJgt4T6VYAv79wZA4f root@395b5e41f127, namespace=dbaas, shape=oc3, ncharset=AL16UTF16}}}</td>
</tr>
<tr>
<th>Supplemental Logs</th><td>none</td>
</tr>
<tr>
<th>Summary</th><td>
<pre>Job &lt;30742430&gt; v41, action=handleFailure, Failed, namespace=dbaas, version=11.2.0.4, operation=create-dbaas-service, cleanupActionIndex=1, retryCount:0, jobRetryCount:1, jobRetryWaitTime:0, created: 2018-06-30T12:23:46.092+0000, started: 2018-06-30T12:31:33.936+0000, to retry: 2018-06-30T12:31:31.690+0000, failingStartTime: 2018-06-30T12:35:01.969+0000, domain:Z14JHV5895044878, instance:DgA0630US2Z14301222, wm:SM-MS-chr302ru25.usdc2.oracleclo, owner:c9qa-infr
a_ww@oracle.com

FAILED CURRENT JOB 30742430: action: startServices

FAILED CHILD JOB 30703665: action: awaitResourcesForVMs
    code: PSM-COMPUTE-ERROR-004, message: Unable to start the Compute resources...    The orchestration /Compute-Z14JHV5895044878/c9qa-infra_ww@oracle.com/dbaas/DgA0630US2Z14301222/db_1/vm-1/resources is in 'error' state since Sat Jun 30
 2018 12:24:51:000</pre>
</td>
</tr>
</table>

with the parse GROK...

<h1>%{DATA:Job_word}%{SPACE}%{NUMBER:job_id}</h1>
<table>
<tr>
<th>%{DATA:Domain_word}</th><td>%{DATA:Domain_id}</td>
</tr>
<tr>
<th>%{DATA:Service_word}</th><td>%{DATA:Service_id}</td>
</tr>
<tr>
<th>%{DATA:Namespace_word</th><td>%{DATA:Namespace_id}</td>
</tr>
<tr>
<th>%{DATA:ServiceType_word}</th><td>%{DATA:ServiceType_id}</td>
</tr>
<tr>
<th>%{DATA:ComputeSite_word}</th><td>%{DATA:ComputeSite_id}</td>
</tr>
<tr>
<th>%{DATA:Username_word}</th><td>%{DATA:Username_value}</td>
</tr>
<tr>
<th>%{DATA:Operation_word}</th><td>%{DATA:Operation_value}</td>
</tr>
<tr>
<th>%{DATA:Status_word}</th><td>%{DATA:Status_value}</td>
</tr>
<tr>
<th>%{DATA:SbSts_word}</th><td>%{DATA:SbSts_id}</td>
</tr>
<tr>
<th>%{DATA:CreateTime_word}</th><td>%{DATA:CreateTime_time}</td>
</tr>
<tr>
<th>%{DATA:StartTime_word}</th><td>%{DATA:StartTime_time}</td>
</tr>
<tr>
<th>%{DATA:EndTime_word}</th><td>%{DATA:EndTime_time}</td>
</tr>
<tr>
<th>%{DATA:UpdateTime_word}</th><td>%{DATA:UpdateTime_time}</td>
</tr>
<tr>
<th>%{DATA:JobInfo_word}</th><td>%{DATA:JobInfo_text}</td>
</tr>
<tr>
<th>%{DATA:RequestParameters_word}</th><td>%{GREEDYDATA:[ (.*?) ] }</td>
</tr>
<tr>
<th>%{DATA:SupplementalLogs_word}</th><td>%{DATA:SupplementalLogs_value}</td>
</tr>
<tr>
<th>%{DATA:Summary_word}</th><td>
<pre>(?<message>(.|\r|\n)*)
</tr>
</table>

but I am still getting the TAG:

_grokparsefailure

Any idea?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.