Logstash : JSON Array in input file . Pas si simple

elastock · March 17, 2019, 11:40am

Bonjour ,
Je pensais pouvoir résoudre ce problème en 2 minutes , mais finalement je suis bloqué dessus depuis un bon moment en testant tout un tas de solutions trouvés sur le net .
Mon problème est simple .
Je dispose d'un document json sur mon disque dur qui ressemble à ça pour le moment :

[
    {
        "reference": "2-4",
        "title": "jeanclaude",
        "createdAt": "2019-01-22 09:37:49",
        "publishedAt": "2019-01-22 09:37:49",
        "updatedAt": null,
        "trashed": false,
        "trashedStatus": null,
        "authorId": "VXNlcjoxMTQwMTc0YS0xZTFmLTExZTktOTRkMi1mYTE2M2VlYjExZTE=",
        "authorType": "Citoyen / Citoyenne"
    },
    {
        "reference": "2-5",
        "title": "hello boi",
        "createdAt": "2019-01-22 09:39:33",
        "publishedAt": "2019-01-22 09:39:33",
        "updatedAt": null,
        "trashed": false,
        "trashedStatus": null,
        "authorId": "VXNlcjpjOWYxZWQ1NS0xYzEwLTExZTktOTRkMi1mYTE2M2VlYjExZTE=",
        "authorType": "Citoyen / Citoyenne",
        "authorZipCode": "57000"
    },
    {
        "reference": "2-6",
        "title": "non pas trop",
        "createdAt": "2019-01-22 09:39:50",
        "publishedAt": "2019-01-22 09:39:50",
        "updatedAt": null,
        "trashed": false,
        "trashedStatus": null,
        "authorId": "VXNlcjozZjlhNzAwOS0xYTc2LTExZTktOTRkMi1mYTE2M2VlYjExZTE=",
        "authorType": "Citoyen / Citoyenne",
        "authorZipCode": "34140"
    },
    {
        "reference": "2-7",
        "title": "race carré",
        "createdAt": "2019-01-22 09:40:19",
        "publishedAt": "2019-01-22 09:40:19",
        "updatedAt": null,
        "trashed": false,
        "trashedStatus": null,
        "authorId": "VXNlcjozOWQwNzJjNC0xZDEwLTExZTktOTRkMi1mYTE2M2VlYjExZTE=",
        "authorType": "Citoyen / Citoyenne",
        "authorZipCode": "17400"
    }
]

Je souhaite simplement indexer ces 4 objets JSON dans elasticsearch pour le moment.
A vue de nez ça parait simple. Il y a plein de sujets la dessus sur le forum . Mais aucune solution testé n'a été vraiment satisfaisante
Au départ j’étais partis sur cette solution pour voir comment logstash comprenait le fichier :

 input {
         file{
       path => ["C:/ELK/Logstash/logstash-6.6.2/conf/test.json"]
       sincedb_path => "NUL"
       start_position => "beginning"
       codec =>"json"
            }
    }
    filter{
    }
    output {
      stdout {
        codec => rubydebug
      }
    }

Mais évidemment , ça part direct en JSON parse failure ..
Apres avoir écumé les forums , je me rend compte qu'il n'existe pas de méthode très simple pour traiter ce fichier JSON comme un tableau d'objet JSON à part entière parce qu'il est écrit avec une indentation en pretty... et que l'input file de logstash le comprend comme un fichier qu'il ne lit que ligne par ligne ... Dans certains cas il faut definir une règle & un pattern précis pour le multiline , dans un autre il faut simplement réorganiser le fichier json de manière a mettre 1 objet par ligne ..
J'ai un peu tout testé sans grand succès . C'est étonnant qu'il n'y ait pas moyen de traiter cet object JSON comme on pourrait le faire dans n'importe quel langage.

Est-ce que quelqu'un a un méthode simple ou un trick pour résoudre ce problème ?
EDIT :
Tout mes objets JSON commencent systématiquement avec la chaine de caractère "{
"reference":
Je me demande si il n'y a pas moyen de faire un multiline pattern basé la dessus .
Merci pour votre aide .

dadoonet · March 17, 2019, 12:49pm

Tu ne peux pas modifier ton fichier pour ajouter:

{ "foo" :

Au début et à la fin ceci:

?

elastock · March 17, 2019, 12:56pm

Bonjour ,
Si je peux modifier mon fichier sans problème de cette sorte .
Je vois ou tu veux en venir .
Je vais essayer cette technique
Le doc ressemble ainsi à

{
    "foo": [
        {
            "reference": "2-4",
            "title": "transition écologique",
            "createdAt": "2019-01-22 09:37:49",
            "publishedAt": "2019-01-22 09:37:49",
            "updatedAt": null,
            "trashed": false,
            "trashedStatus": null,
            "authorId": "VXNlcjoxMTQwMTc0YS0xZTFmLTExZTktOTRkMi1mYTE2M2VlYjExZTE=",
            "authorType": "Citoyen / Citoyenne"
        }
    {
            "reference": "2-5",
            "title": "La surpopulation",
            "createdAt": "2019-01-22 09:39:33",
            "publishedAt": "2019-01-22 09:39:33",
            "updatedAt": null,
            "trashed": false,
            "trashedStatus": null,
            "authorId": "VXNlcjpjOWYxZWQ1NS0xYzEwLTExZTktOTRkMi1mYTE2M2VlYjExZTE=",
            "authorType": "Citoyen / Citoyenne",
            "authorZipCode": "57000"
        }
    {
            "reference": "2-6",
            "title": "climat",
            "createdAt": "2019-01-22 09:39:50",
            "publishedAt": "2019-01-22 09:39:50",
            "updatedAt": null,
            "trashed": false,
            "trashedStatus": null,
            "authorId": "VXNlcjozZjlhNzAwOS0xYTc2LTExZTktOTRkMi1mYTE2M2VlYjExZTE=",
            "authorType": "Citoyen / Citoyenne",
            "authorZipCode": "34140"
        }
    {
            "reference": "2-7",
            "title": "POLLUTION AIR EAU",
            "createdAt": "2019-01-22 09:40:19",
            "publishedAt": "2019-01-22 09:40:19",
            "updatedAt": null,
            "trashed": false,
            "trashedStatus": null,
            "authorId": "VXNlcjozOWQwNzJjNC0xZDEwLTExZTktOTRkMi1mYTE2M2VlYjExZTE=",
            "authorType": "Citoyen / Citoyenne",
            "authorZipCode": "17400"
        }
    ] 
}

Au lancement de la conf suivante :

input {
     file{
   path => ["C:/ELK/Logstash/logstash-6.6.2/conf/test.json"]
   sincedb_path => "NUL"
   start_position => "beginning"
   codec => "json"
        }
}
filter{
  json {
    source => "foo"
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: >     >     Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{
"; line: 2, column: 3]>, :data=>"{\r"}
[2019-03-17T13:59:02,994][ERROR][logstash.codecs.json     ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: incompatible json object type=java.lang.String , only hash map or arrays are supported>, :data=>"    \"foo\": [\r"}
"; line: 1, column: 9])4][ERROR][logstash.codecs.json     ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"        {
"; line: 2, column: 11]>, :data=>"        {\r"}
[2019-03-17T13:59:02,994][ERROR][logstash.codecs.json     ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: incompatible json object type=java.lang.String , only hash map or arrays are supported>, :data=>"            \"reference\": \"2-4\",\r"}
[2019-03-17T13:59:03,010][ERROR][logstash.codecs.json     ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: incompatible json object type=java.lang.String , only hash map or arrays are supported>, :data=>"

elastock · March 17, 2019, 2:14pm

Bonjour ,
Bon finalement j'ai trouvé une solution de contournement partiel .
Du bon gros bricolage
Il suffit d'ordonner a logstash d'executer la commande cat ou type sur windows
Et la miracle il comprend le json ..
Exemple

   input{
  exec{
    command => "type C:/ELK/Logstash/logstash-6.6.2/conf/test.json"
    codec => json
    interval => 60
  }
}
output{
  stdout{codec => rubydebug}
}

Resultat

  "@version" => "1",
   "command" => "type C:/ELK/Logstash/logstash-6.6.2/conf/test.json",
"@timestamp" => 2019-03-17T14:11:11.931Z,
      "host" => "PC-de-philippe",
       "foo" => [
    [0] {
           "authorType" => "Citoyen / Citoyenne",
              "trashed" => false,
            "updatedAt" => nil,
          "publishedAt" => "2019-01-22 09:37:49",
            "reference" => "2-4",
        "trashedStatus" => nil,
            "createdAt" => "2019-01-22 09:37:49",
                "title" => "transition écologique",
             "authorId" => "VXNlcjoxMTQwMTc0YS0xZTFmLTExZTktOTRkMi1mYTE2M2VlYjExZTE="
    },
    [1] {
           "authorType" => "Citoyen / Citoyenne",
              "trashed" => false,
            "updatedAt" => nil,
          "publishedAt" => "2019-01-22 09:39:33",
        "authorZipCode" => "57000",
            "reference" => "2-5",
        "trashedStatus" => nil,
            "createdAt" => "2019-01-22 09:39:33",
                "title" => "La surpopulation",
             "authorId" => "VXNlcjpjOWYxZWQ1NS0xYzEwLTExZTktOTRkMi1mYTE2M2VlYjExZTE="
    },
    [2] {
           "authorType" => "Citoyen / Citoyenne",
              "trashed" => false,
            "updatedAt" => nil,
          "publishedAt" => "2019-01-22 09:39:50",
        "authorZipCode" => "34140",
            "reference" => "2-6",
        "trashedStatus" => nil,
            "createdAt" => "2019-01-22 09:39:50",
                "title" => "climat",
             "authorId" => "VXNlcjozZjlhNzAwOS0xYTc2LTExZTktOTRkMi1mYTE2M2VlYjExZTE="
    },
    [3] {
           "authorType" => "Citoyen / Citoyenne",
              "trashed" => false,
            "updatedAt" => nil,
          "publishedAt" => "2019-01-22 09:40:19",
        "authorZipCode" => "17400",
            "reference" => "2-7",
        "trashedStatus" => nil,
            "createdAt" => "2019-01-22 09:40:19",
                "title" => "POLLUTION AIR EAU",
             "authorId" => "VXNlcjozOWQwNzJjNC0xZDEwLTExZTktOTRkMi1mYTE2M2VlYjExZTE="
    }
]

}

dadoonet · March 17, 2019, 3:49pm

Tu peux aussi lancer logstash ainsi:

cat file | bin/logstash

Et utiliser le stdin input plugin.

Sinon j'aime bien aussi le http input plugin. Plus sympa pour recharger à chaud la configuration.

J'ai montré un exemple ici: https://www.elastic.co/blog/enriching-your-postal-addresses-with-the-elastic-stack-part-2

HTH

system · April 14, 2019, 3:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash parse JSON ARRAY Discussions en français	2	968	April 13, 2019
Problème pour parser un fichier JSON à l'aide de Logstash Discussions en français	6	1904	July 6, 2017
[simple question] import JSON into elasticsearch Logstash	14	11977	April 17, 2018
Logstash failing to parse json Logstash	21	2416	July 6, 2017
How to load json file into ES using logstash Logstash	13	2217	April 2, 2020

Logstash : JSON Array in input file . Pas si simple

Related topics