The documents JSON is not valid

Hi everyone,
I just wanna ask for a solution of my problem when i try to test my pipeline i got an error "The documents JSON is not valid."

That's the document that i try to test my pipeline with it

[
  {
    "_source": {
      "message": "10.0.2.4 - - [14/Mar/2023:16:12:41 +0000] "HEAD /drupal/modules/search/tests/0302-exploits.php HTTP/1.1" 404 139 "-" "DirBuster-1.0-RC1 (http://www.owasp.org/index.php/Category:OWASP_DirBuster_Project)""
    }
  }
]

the message is a log of the logs that i have in elastic.

And that's the GROK pattern code

%{IPORHOST:source.ip} %{USER:user.id} %{USER:user.name} \[%{HTTPDATE:@timestamp}\] "%{WORD:http.request.method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http.version}" %{NUMBER:response:int} %{NUMBER:response.bytes:int} "%{DATA:referrer}" "%{DATA:useragent}"

I think about modifying the log by adding \ before every " in the log but i need to do this to all the logs that i have especially that I'm collecting the logs directly from filebeat and logs comes with this format.

And that's the error:

Hi @Hajar_Lachhab Welcome to the community

You have a couple of issues... and you are running into the subtle escaping of the " characters

%{NUMBER:response:int} %{NUMBER:response.bytes:int}
This is a fundamental issue the first sets reponse as an int the 2nd says its an object... I fixed that.

Go to Dev Tools and try the following...
Then go look at the UI

DELETE _ingest/pipeline/discuss-test

PUT _ingest/pipeline/discuss-test
{
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{IPORHOST:source.ip} %{USER:user.id} %{USER:user.name} \\[%{HTTPDATE:@timestamp}\\] \"%{WORD:http.request.method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http.version}\" %{NUMBER:response.code:int} %{NUMBER:response.bytes:int} \"%{DATA:referrer}\" \"%{DATA:useraegent}\""
        ]
      }
    }
  ]
}



DELETE discuss-test

POST discuss-test/_doc?pipeline=discuss-test
{
  "message": "10.0.2.4 - - [14/Mar/2023:16:12:41 +0000] \"HEAD /drupal/modules/search/tests/0302-exploits.php HTTP/1.1\" 404 139 \"-\" \"DirBuster-1.0-RC1 (http://www.owasp.org/index.php/Category:OWASP_DirBuster_Project)\""
}

Results

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "IyxfQ4cBVfQentc85xSk",
        "_score": 1,
        "_source": {
          "request": "/drupal/modules/search/tests/0302-exploits.php",
          "referrer": "-",
          "useraegent": "DirBuster-1.0-RC1 (http://www.owasp.org/index.php/Category:OWASP_DirBuster_Project)",
          "@timestamp": "14/Mar/2023:16:12:41 +0000",
          "response": {
            "code": 404,
            "bytes": 139
          },
          "http": {
            "request": {
              "method": "HEAD"
            },
            "version": "1.1"
          },
          "source": {
            "ip": "10.0.2.4"
          },
          "message": "10.0.2.4 - - [14/Mar/2023:16:12:41 +0000] \"HEAD /drupal/modules/search/tests/0302-exploits.php HTTP/1.1\" 404 139 \"-\" \"DirBuster-1.0-RC1 (http://www.owasp.org/index.php/Category:OWASP_DirBuster_Project)\"",
          "user": {
            "name": "-",
            "id": "-"
          }
        }
      }
    ]
  }
}

Now go to the UI and Run with this document

[
  {
    "_source": {
      "message": "10.0.2.4 - - [14/Mar/2023:16:12:41 +0000] \"HEAD /drupal/modules/search/tests/0302-exploits.php HTTP/1.1\" 404 139 \"-\" \"DirBuster-1.0-RC1 (http://www.owasp.org/index.php/Category:OWASP_DirBuster_Project)\""
    }
  }
]

1 Like

Thank you @stephenb for your response and your welcoming to the community.
The problem that i still have is that all the logs that I collect from apache don't have () before the ("), so I need a solution for the logs without the () for example the log that i have looks like that:

"message": "10.0.2.4 - - [14/Mar/2023:16:12:41 +0000] "HEAD /drupal/modules/search/tests/0302-exploits.php HTTP/1.1"

And the logs that you gave an example with looks like that

"message": "10.0.2.4 - - [14/Mar/2023:16:12:41 +0000] \"HEAD /drupal/modules/search/tests/0302-exploits.php HTTP/1.1\"

Thank you.

I don't understand ... there are no () in either example ... and for sure your message is invalid as the " don't match and the internal ones are not escaped.... but in short, it is most likely how you are cut-n-past the JSON is not correct + I have found some minor issues with that test windows on occcastion but I don't think that is your problem..

You can try to add a document using the Index and ID? Go To Discover and find a document and put in the index and the document ID

Discover

Then Add Test document (This will eliminate the Cut-n-Paste)

AND / OR

What are you using to collect and ship the logs Filebeat or Elastic Agent?

Please provide RAW 3 lines from the actual log file..

I mean the backslash before the " in the logs.
And that's the full log

@timestamp
    Mar 25, 2023 @ 23:06:07.797
agent.ephemeral_id
    1207d914-ca5a-48ff-98de-ab5f808f648c
agent.hostname
    metasploitable3-ub1404
agent.id
    edb9ae2f-34fa-4254-abab-bb185c0b024c
agent.name
    metasploitable3-ub1404
agent.type
    filebeat
agent.version
    8.6.2
ecs.version
    8.0.0
host.architecture
    x86_64
host.containerized
    false
host.hostname
    metasploitable3-ub1404
host.id
    30ac251c9b0b9a3ed652aacc5f9b16cd
host.ip
    [10.0.2.10, fe80::a00:27ff:fe20:f902, 172.28.128.3, fe80::a00:27ff:fe79:2a34, 172.17.0.1, fe80::42:1fff:fe2d:c2c7]
host.mac
    [02-42-1F-2D-C2-C7, 08-00-27-20-F9-02, 08-00-27-79-2A-34]
host.name
    metasploitable3-ub1404
host.os.codename
    trusty
host.os.family
    debian
host.os.kernel
    3.13.0-170-generic
host.os.name
    Ubuntu
host.os.platform
    ubuntu
host.os.type
    linux
host.os.version
    14.04.6 LTS, Trusty Tahr
input.type
    filestream
log.file.path
    /var/log/apache2/access.log
log.offset
    978,987,264
message
    127.0.0.1 - - [25/Mar/2023:23:04:42 +0000] "GET /chat/read_log.php HTTP/1.1" 200 285 "about:blank" "Node.js (linux; U; rv:v4.9.1) AppleWebKit/537.36 (KHTML, like Gecko)"
_id
    WEc6UocBCEkhMMQZ9klB
_index
    .ds-filebeat-8.6.2-2023.03.21-000001
_score
    - 

And for collecting I'm using filebeat.

Apologies ... that format above is not meaningful to me, I am not sure what that means. That is neither a JSON doc nor Raw Log Line, so I can not do anything.

I recommend trying the process of adding the document by index and document id that I showed above.

Thank you @stephenb I tried it and it worked well for me.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.