Setting Up Logstash In Docker-Compose For Bulk Ingest Of CSV Files In Local Machine

No think of it as a link

I did suggest to get rid of the special characters ()
Also make sure there is read for all users for security on that directory etc... yup perhaps the others can help

Yes because we know that works you could create a directory under there copy in a few CSV files fix all the paths and see if that works

Also make sure there is read for all users for security on that directory etc... yup perhaps the others can help

2 days back I tried to use the Exec to try change the permissions via chmod and chown. Both commands are not permitted from my execution.

I dont think I can change any permissions within the file structure of my docker container

I was dead ended there. i dont even know what user was I acting as within the docker container env and neither could I sudo to find out.

Yes because we know that works you could create a directory under there copy in a few CSV files fix all the paths and see if that works

Sorry. That still doesn't work out. I dont think where I park my data files is a contributing factor to my problem.

image

OK this is on my Mac and I think you can do the same thing on your Windows
I put the files under csv_file where the docker compose is run...
I think you can actually use the exact same compose (with your passwords) on windows if you put the csv file there

$ pwd
/Users/sbrown/workspace/elastic-install/docker/8.x/logstash
hyperion:logstash sbrown$ ls
csv_files/              docker-compose.yml      logstash.conf
hyperion:logstash sbrown$ ls csv_files/
test.csv

hyperion:logstash sbrown$ cat csv_files/test.csv 
id,name,code
123,ferarri,sports
234,honda,family
hyperion:logstash sbrown$ 

All in the same directory

Here is my logstash docker compose.yml

networks:
  default:
    name: elastic
    external: true
    
services:
  logstash:
    image: docker.elastic.co/logstash/logstash:${STACK_VERSION}
    labels:
      co.elastic.logs/module: logstash
    user: root
    environment:
      - xpack.monitoring.enabled=false
    volumes:
      - ./csv_files:/usr/share/logstash/csv_files
      - ./:/usr/share/logstash/pipeline/
    command: logstash -r -f /usr/share/logstash/pipeline/logstash.conf
    mem_limit: ${LS_MEM_LIMIT}

Here is my logstash.conf

input { 
    file { 
        path => "/usr/share/logstash/csv_files/test.csv"
        start_position => "beginning" 
        sincedb_path => "/dev/null"
    } 
}

filter { 
    csv { 
        separator => ","
        columns => [ "id","name","code"] 
    } 
}

output { 
    elasticsearch {
        index => "ats-logs" 
        hosts => ["https://es01:9200"]
        user => "elastic"
        password => "mypassword"
        ssl_verification_mode=> "none"
    }
    stdout{ codec => "rubydebug"} 
}

here is the docker exec you can see the files

And the output of logstash

logstash-logstash-1  | /usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/manticore-0.9.1-java/lib/manticore/client.rb:534: warning: already initialized constant Manticore::Client::ByteArrayEntity
logstash-logstash-1  | {
logstash-logstash-1  |           "host" => {
logstash-logstash-1  |         "name" => "a43161456fb3"
logstash-logstash-1  |     },
logstash-logstash-1  |           "name" => "name",
logstash-logstash-1  |        "message" => "id,name,code",
logstash-logstash-1  |          "event" => {
logstash-logstash-1  |         "original" => "id,name,code"
logstash-logstash-1  |     },
logstash-logstash-1  |            "log" => {
logstash-logstash-1  |         "file" => {
logstash-logstash-1  |             "path" => "/usr/share/logstash/csv_files/test.csv"
logstash-logstash-1  |         }
logstash-logstash-1  |     },
logstash-logstash-1  |     "@timestamp" => 2023-11-16T19:29:22.555841418Z,
logstash-logstash-1  |       "@version" => "1",
logstash-logstash-1  |             "id" => "id",
logstash-logstash-1  |           "code" => "code"
logstash-logstash-1  | }
logstash-logstash-1  | {
logstash-logstash-1  |           "host" => {
logstash-logstash-1  |         "name" => "a43161456fb3"
logstash-logstash-1  |     },
logstash-logstash-1  |           "name" => "ferarri",
logstash-logstash-1  |        "message" => "123,ferarri,sports",
logstash-logstash-1  |          "event" => {
logstash-logstash-1  |         "original" => "123,ferarri,sports"
logstash-logstash-1  |     },
logstash-logstash-1  |            "log" => {
logstash-logstash-1  |         "file" => {
logstash-logstash-1  |             "path" => "/usr/share/logstash/csv_files/test.csv"
logstash-logstash-1  |         }
logstash-logstash-1  |     },
logstash-logstash-1  |     "@timestamp" => 2023-11-16T19:29:22.561935993Z,
logstash-logstash-1  |       "@version" => "1",
logstash-logstash-1  |             "id" => "123",
logstash-logstash-1  |           "code" => "sports"
logstash-logstash-1  | }
logstash-logstash-1  | {
logstash-logstash-1  |           "host" => {
logstash-logstash-1  |         "name" => "a43161456fb3"
logstash-logstash-1  |     },
logstash-logstash-1  |           "name" => "honda",
logstash-logstash-1  |        "message" => "234,honda,family",
logstash-logstash-1  |          "event" => {
logstash-logstash-1  |         "original" => "234,honda,family"
logstash-logstash-1  |     },
logstash-logstash-1  |            "log" => {
logstash-logstash-1  |         "file" => {
logstash-logstash-1  |             "path" => "/usr/share/logstash/csv_files/test.csv"
logstash-logstash-1  |         }
logstash-logstash-1  |     },
logstash-logstash-1  |     "@timestamp" => 2023-11-16T19:30:43.731945488Z,
logstash-logstash-1  |       "@version" => "1",
logstash-logstash-1  |             "id" => "234",
logstash-logstash-1  |           "code" => "family"
logstash-logstash-1  | }

And the Data is loaded in elasticsearch

in the Kibana Dev Tools

GET ats-logs/_search

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "ats-logs",
        "_id": "Ul2X2YsBRz0ztDDX5gqo",
        "_score": 1,
        "_source": {
          "@version": "1",
          "event": {
            "original": "123,ferarri,sports"
          },
          "name": "ferarri",
          "code": "sports",
          "message": "123,ferarri,sports",
          "id": "123",
          "log": {
            "file": {
....

So I think you just need to get the directory and path correct

I don't know what this means

Look closely at my directory structure above... and the picture of it right here
Simple... somehow you are making this too complicated.

This is all I have not sure what you are doing...

This is my complete fileset and works

1 Like

Its my mistake!

I was putting my ats-logs-mainline folder 1 level above my logstash-standard.

I just moved it in!

The magic has happened and growing!

This is amazing. Thank you so much for your help.

So now, how to extend this to cover all my files?

The reason why i had to give my data folder a name is because I might be receiving more in other types.

So in future, I will need to ingest more csv files into a seperate index to classify them.

I'm not sure if when that time comes and I ingest new logs, will my current ats-logs index / data persist?

For my logstash.conf and .yml, I just need to change which folder I'm trying to pipeline in order to pipe new logs? The change of these pointers like

 volumes:
      - ./:/usr/share/logstash/pipeline/
      - ./ats-logs-mainline:/usr/share/logstash/csv_files

and

path => "/usr/share/logstash/csv_files/events2022-01-01.csv"

wont affect existing Indexes piped in previously?

1 Like

Most Excellent!!!

Should Not that data will already be in Elasticsearch

however.

does this mean

  1. once i docker-compose down my ES/Kibana image, i lose all my data?
  2. if I change any ES/Kibana configuration, I risk losing all my piped data?
  3. if I wanna upgrade the ELK Version in future - I have to re-pipe in everything?

You really need to start to learn more about docker, volumes, and elasticsearch

First and foremost TEST TEST TEST.... :slight_smile:

docker-compose down

destroying or changing or updating / upgrading the containers but does not destroy the volumes i.e. where the elasticsearch data resides. The volumes persist.

However you CAN make changes that may not be compatible ... but generally not...

If you delete the volumes then you will lose your data.

If you run the containers without volumes (you are using volumes today) when you docker-compose down you will lose your data

So back to test test test.... so that you can understand

Apologies if I ask occasionally silly qns because not everytime Documentation maybe able to give me the very matter-of-fact kinda answer.

And also because since I'm on time pressure to deliver smth for Project.

I'm a little scared to test at the expense of breaking something we've worked so hard to get.

I know Im kinda asking ahead in a planning sense so that I know the risks of putting a perfectly working configuration in danger.

Oh no.

I left my docker containers and pipeline to continuously pipe in data overnight.

I just woke up and my Storage is now full. My docker crashed on me overnight.

I had 196GB of free space.

My data was only 10.7GB.

I'm forced to restart my machine. Why is this happening?

image

image

On Restart, my laptop has only 16GB of space in C Drive. This is not normal I cannot seem to find why so much space is being taken up!
image

I hope u now understand why I preferred to host my data in the D drive originally instead of copying over to park it same location as my logstash docker in C.

This is frustrating I'm so sickened by how problems arise again despite trying so hard to get this work omg why is it just so difficult to get something up and running.

@stephenb

Docker Containers wise, i just started up my containers. Logstash exited with error code 255. No error messages inside logs for logstash-1

At 09:05:30 (GMT+8), the crash happened. I woke up at 11:40 (GMT+8)
image
0

Docker Volumes

I have continued the pipeline process. The number of rows continue to grow.

Not sure if duplicate entries from my csv files are now going to be ingested into Elasticsearch. How to remove duplicates?


In explorer, all my 537 csv files were 10.7GB

Now in elasticsearch, 26.43gb.

I'm not sure how to react to this? I still don't know why I suddenly have insufficient space despite having several hundreds of GB to work in.

@Ethan777100

Lets take a step back

We have spent more that 50 Response to show you how to debug / setup the dockers etc JUST to get it to work...

Not necessarily how you wanted to use all this in the long term, and yes I understood WHY you wanted to leave files but we were debugging to get it to JUST work for you... as there were challenges understanding paths etc.

You never communicated the volumes etc..etc.. which, for a laptop you have some significant volumes

I repeatedly reminded you to try to get it working with 1 file... 1file

AFTER you did that and you knew how everything worked ... You then could have configured your paths, both the sources and the elasticsearch data, where you wanted.

What I would suggest is to clean up the all the volumes...

Set up the elasticsearch data volumes where you have space
You will probably need to read more about mounting volumes vs bind mounts

- esdata01:/usr/share/elasticsearch/data

Perhaps you should read this...

You can also then leave your sources where they are and set up the paths as you have learned...

So now, from now on, you should learn about mappings (schema)

And replicas, which you have 1, which I find odd ... did you go back to 3 nodes. I would have expected 1 primaries and 0 replicas if you were running 1 node

You don't need a 1 replica as it doubles the space...

Also you did not create a mapping so you got the default one which take up more space...

It seems that you are trying to do the minimum without some of the basic understanding of elasticsearch which is fine ... but then you should not be surprised.

We have been helping you with docker and logstash and all these things... these are just about setting up the data ingest ... but you have not spent anytime learning about actually elastic...

What I would do.....

Clean up everything including the volumes (that will reduce your filespace again

Now setup where you want to read the sources from.

I would only run 1 node... still confused if you went back to 3... you should not have a replica with 1 node.

Determine for the elasticsearch data whether you want to use a volume mount or bind mount (probably volume mount would be good)

I would upload 1 CSV file via the file uploaded, you look at the types and it will upload 1 file and create and index and Mapping etc...

Then I would set everything up and upload 1 file via logstash etc... to the SAME index that you upload and see if you like the way it works
Make sure everything is OK

Then I would probably divide up your sources into subdirectories or load them some subset at at time with *

events2022-01*.csv
events2022-02*.csv

In short ... be methodical ... Plan, Code, Test, Run, Fix... repeat THEN go for volume.

I did not go back to 3 nodes. i've always stuck with one.

So you would like me to sweep everything clean ES/Kibana and Logstash as docker-compose down

And docker-compose up -d again?

Ok good... sorry so it is Asking for 1 primary + 1 replica but it looks like you only have 1 primary... allocated which is why it is Yellow.

So that is not causing the double space

But you are using the default schema which makes 2 fields for every field a keyword and text more that you probably need...

Space can be compacted when you are done...

So Again... you probably should have loaded 10 or 20 files and looked at the space first...

Sorry, but we can not do this all for you... my suggestion is to slow down... and iterate...

So when I look at your Docs vs Space

I see 27GB for 51159726 = 27*1024^3 / 51159726 ~567 Bytes per document for the source and all the data structures to search, filter, aggregate.. that does not see too bad.

With proper mapping (schema) I sure the space could be lessened...

Why all your filesystem being taken up not sure...

@Ethan777100 Let me be clear: I have been very happy to help, but I can not do this project for you... Slow Down and Consider your steps

Me, I would not just leave the system ingesting for hours on its own and walk away, I would plan, segment, test, run, test etc...

you need to figure out what works for you, understand the components and process and proceed in an orderly fashion

late here... good luck

1 Like

Regarding mapping / schema

{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "@version": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "alarm": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "alarmvalue": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "column19": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "description": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "equipment": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "event": {
        "properties": {
          "original": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "eventtype": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "graphicelement": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "host": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "location": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "log": {
        "properties": {
          "file": {
            "properties": {
              "path": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      },
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "mmsstate": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "operator": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "severity": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "sourcetime": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "subsystem": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "system": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "uniqueid": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "value": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "zone": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

This what I retrieved and yeah I know its a json file. 19+ columns and yeah i don't really see anything weird about this?

it was 1-1 mapping of column to object (thats how I understand it)

But you are using the default schema which makes 2 fields for every field a keyword and text more that you probably need...

More than I need?

I just read up about mappings inside Mapping | Elasticsearch Guide [8.11] | Elastic

I don't think it looks that straight forward to edit the fields directly.

I consciously realised quite alot of my fields have been read as Text and not numbers.

Could this help me reduce file size?

Meanwhile: Docker Hard Disk Image File Being Too Large - Elastic Stack / Logstash - Discuss the Elastic Stack

Yes, this is the default mapping I was speaking of. I suspect many of these fields are exact matches or need to be aggregated not free text search. So instead the mapping for should look something like this for most of your fields

"alarm": {
        "type": "keyword",
         "ignore_above": 256
        
      },

Yes also Set the fields as numbers for the ones you want.
You simply have to create the mapping through the Dev Tool ahead of time and then write to that index. It's very simple and straightforward. I would suggest to try it and do it with one file and see what the results are.

Also for the time field you should probably make them a date type... But you need to pay attention to the format

This is interesting. So I managed to retrieve the mapping and pipeline.json from the Import File approach.

I uploaded 1 csv file under integrations, Upload File.

mapping.json

{
  "properties": {
    "@timestamp": {
      "type": "date"
    },
    "alarm": {
      "type": "keyword"
    },
    "alarmvalue": {
      "type": "long"
    },
    "description": {
      "type": "text"
    },
    "equipment": {
      "type": "keyword"
    },
    "eventtype": {
      "type": "keyword"
    },
    "graphicelement": {
      "type": "long"
    },
    "id": {
      "type": "long"
    },
    "location": {
      "type": "keyword"
    },
    "mmsstate": {
      "type": "long"
    },
    "operator": {
      "type": "keyword"
    },
    "severity": {
      "type": "long"
    },
    "sourcetime": {
      "type": "date",
      "format": "yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss.SS||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ss.S"
    },
    "state": {
      "type": "long"
    },
    "subsystem": {
      "type": "keyword"
    },
    "system": {
      "type": "keyword"
    },
    "uniqueid": {
      "type": "keyword"
    },
    "value": {
      "type": "keyword"
    },
    "zone": {
      "type": "long"
    }
  }
}

vs

{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "@version": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "alarm": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "alarmvalue": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "column19": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "description": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "equipment": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "event": {
        "properties": {
          "original": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "eventtype": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "graphicelement": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "host": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "id": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "location": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "log": {
        "properties": {
          "file": {
            "properties": {
              "path": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      },
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "mmsstate": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "operator": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "severity": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "sourcetime": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "subsystem": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "system": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "uniqueid": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "value": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "zone": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

There are more fields mapped as long when I upload 1 csv.

When I ingest via pipeline, everything is keyword, "type": "keyword" even if entire column is a number.

ingest-pipeline
Not sure what I should look out for ingest-pipeline

{
  "description": "Ingest pipeline created by text structure finder",
  "processors": [
    {
      "csv": {
        "field": "message",
        "target_fields": [
          "id",
          "uniqueid",
          "alarm",
          "eventtype",
          "system",
          "subsystem",
          "sourcetime",
          "operator",
          "alarmvalue",
          "value",
          "equipment",
          "location",
          "severity",
          "description",
          "state",
          "mmsstate",
          "zone",
          "graphicelement"
        ],
        "ignore_missing": false
      }
    },
    {
      "date": {
        "field": "sourcetime",
        "timezone": "{{ event.timezone }}",
        "formats": [
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"
        ]
      }
    },
    {
      "convert": {
        "field": "alarmvalue",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "graphicelement",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "id",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "mmsstate",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "severity",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "state",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "zone",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

The top mapping.json is much closer to what you want....

You might need to fix a few.... just change the mapping... to long etc.

And you could just use logstash to read the file and then set the ingest pipeline on the logstash output to run the ingest pipeline when the data is in

pipeline

  • Value type is string
  • There is no default value.

Set which ingest pipeline you wish to execute for an event. You can also use event dependent configuration here like pipeline => "%{[@metadata][pipeline]}". The pipeline parameter won’t be set if the value resolves to empty string ("").

input { 
    file { 
        path => "/usr/share/logstash/csv_files/test.csv"
        start_position => "beginning" 
        sincedb_path => "/dev/null"
    } 
}

output { 
    elasticsearch {
        index => "discuss-test" <!--- my test index... 
        hosts => ["https://es01:9200"]
        user => "elastic"
        password => "mypassword"
        ssl_verification_mode=> "none"
        pipeline => "discuss-test-pipeline"  <!--- This is from the file uploaded
    }
    stdout{ codec => "rubydebug"} 
}

Note the above is with my test data ... but I think you see the pattern...

You can see the ingest pipeline and index in Stack Management ...



1 Like