Bonjour,
Je cherche à automatiser la récupération régulière (toutes les semaines) de données via une API.
- Les requêtes se présentent sous cette forme : V https://api.acleddata.com/{data}/{command}.csv
Elles permettent de récupérer un fichier au format csv.
- Je souhaiterais faire en sorte que ce fichier soit un fichier tampon donc toujours avec le meme nom et qu'il soit écrasé à chaque fois par le suivant pour eviter l'accumulation
- ensuite j'aimerai indexer automatiquement (dans un index qui serait nommé acld_date de l'import) les données du fichier ce que je fais manuellement d'habitude. Dans Kibana j'ai ainsi un pipeline comme ceci qui est généré après l'import :
[
{
"csv": {
"field": "message",
"target_fields": [
"event_id_cnty",
"event_date",
"year",
"time_precision",
"disorder_type",
"event_type",
"sub_event_type",
"actor1",
"assoc_actor_1",
"inter1",
"actor2",
"assoc_actor_2",
"inter2",
"interaction",
"civilian_targeting",
"iso",
"region",
"country",
"admin1",
"admin2",
"admin3",
"location",
"latitude",
"longitude",
"geo_precision",
"source",
"source_scale",
"notes",
"fatalities",
"tags",
"timestamp"
],
"ignore_missing": false
}
},
{
"date": {
"field": "event_date",
"timezone": "America/New_York",
"formats": [
"dd MMMM yyyy"
]
}
},
{
"convert": {
"field": "fatalities",
"type": "long",
"ignore_missing": true
}
},
{
"convert": {
"field": "geo_precision",
"type": "long",
"ignore_missing": true
}
},
{
"convert": {
"field": "inter1",
"type": "long",
"ignore_missing": true
}
},
{
"convert": {
"field": "inter2",
"type": "long",
"ignore_missing": true
}
},
{
"convert": {
"field": "interaction",
"type": "long",
"ignore_missing": true
}
},
{
"convert": {
"field": "iso",
"type": "long",
"ignore_missing": true
}
},
{
"convert": {
"field": "latitude",
"type": "double",
"ignore_missing": true
}
},
{
"convert": {
"field": "longitude",
"type": "double",
"ignore_missing": true
}
},
{
"convert": {
"field": "time_precision",
"type": "long",
"ignore_missing": true
}
},
{
"convert": {
"field": "year",
"type": "long",
"ignore_missing": true
}
},
{
"remove": {
"field": "message"
}
},
{
"set": {
"field": "point_location",
"value": "{{latitude}},{{longitude}}"
}
}
]
Le mapping a appliqué est celui ci :
"mappings": {
"_meta": {
"created_by": "file-data-visualizer"
},
"properties": {
"@timestamp": {
"type": "date"
},
"actor1": {
"type": "keyword"
},
"actor2": {
"type": "keyword"
},
"admin1": {
"type": "keyword"
},
"admin2": {
"type": "keyword"
},
"admin3": {
"type": "keyword"
},
"assoc_actor_1": {
"type": "keyword"
},
"assoc_actor_2": {
"type": "keyword"
},
"civilian_targeting": {
"type": "keyword"
},
"country": {
"type": "keyword"
},
"disorder_type": {
"type": "keyword"
},
"event_date": {
"type": "keyword"
},
"event_id_cnty": {
"type": "keyword"
},
"event_type": {
"type": "keyword"
},
"fatalities": {
"type": "long"
},
"geo_precision": {
"type": "long"
},
"inter1": {
"type": "long"
},
"inter2": {
"type": "long"
},
"interaction": {
"type": "long"
},
"iso": {
"type": "long"
},
"latitude": {
"type": "double"
},
"location": {
"type": "keyword"
},
"longitude": {
"type": "double"
},
"notes": {
"type": "text"
},
"point_location": {
"type": "geo_point"
},
"region": {
"type": "keyword"
},
"source": {
"type": "text"
},
"source_scale": {
"type": "keyword"
},
"sub_event_type": {
"type": "keyword"
},
"tags": {
"type": "text"
},
"time_precision": {
"type": "long"
},
"timestamp": {
"type": "date",
"format": "epoch_second"
},
"year": {
"type": "long"
}
}
}
} ```
Par contre je ne vois pas du tout comment cela est faisable avec logstash et si meme c'est possible.
Si jamais vous avez des conseils ...
Merci !