Importing multiple large csv and json files into a single index

I have an folder containing data (20 GB) and this folder contains 26 subfolders that are sorted city-wise. Each of these subfolder contain many more subfolders comprising of csv and json files (The data that is stored in the files have different as well as somewhat similar fields) that I want to upload in bulk on elasticsearch. Apart from this, I also want to be able to specify the index name and mapping during the upload. For the same I have the following queries/requests :

  1. Is this possible
  2. Can anyone please help out with a detailed explanation as to how this can be done as I'm a beginner in this field.
  1. Yes
  2. Yes!

You can tell Filebeat to look for particular files in it's input and then have those in specific input sections, eg one for *.csv and one for*.json. On the input you can also tag an event.

Then when you send them to Elasticsearch you can tell an output to filter only specific events, so the csv or json ones will go to the index you specifiy in set in each output section, you can also set the mapping there.

Take a look at filestream input | Filebeat Reference [8.6] | Elastic and Configure the Elasticsearch output | Filebeat Reference [8.6] | Elastic.

Can you please provide an example of exactly how this can be done along with a step-wise set of instructions for the same? I have referred to multiple documentations related to this but I find the instruction given to be a bit confusing.

Sorry for this but I'm new to elasticsearch and took this up as my first project.

How about you share what you have tried and what's not working and we can offer suggestions? That way we can help point out any mistakes and make it easier to learn.

Alright that makes sense. To start with, I'm facing issues with connecting filebeats with elasticsearch and kibana using cloud.auth (I'm unable to find it in the deployments overview-Security Section).

E:\elastic\filebeats\Elastic\Beats\filebeat>filebeat -e -E cloud.id="DETAILS -E cloud.auth="DETAILS"
Usage:
  filebeat [flags]
  filebeat [command]

Available Commands:
  export      Export current config or index template
  generate    Generate Filebeat modules, filesets and fields.yml
  help        Help about any command
  keystore    Manage secrets keystore
  modules     Manage configured modules
  run         Run filebeat
  setup       Setup index template, dashboards and ML jobs
  test        Test config
  version     Show current version info

Flags:
  -E, --E setting=value              Configuration overwrite
  -M, --M setting=value              Module configuration overwrite
  -N, --N                            Disable actual publishing for testing
  -c, --c string                     Configuration file, relative to path.config (default "filebeat.yml")
      --cpuprofile string            Write cpu profile to file
  -d, --d string                     Enable certain debug selectors
  -e, --e                            Log to stderr and disable syslog/file output
      --environment environmentVar   set environment being ran in (default default)
  -h, --help                         help for filebeat
      --httpprof string              Start pprof http server
      --memprofile string            Write memory profile to this file
      --modules string               List of enabled modules (comma separated)
      --once                         Run filebeat only once until all harvesters reach EOF
      --path.config string           Configuration path (default "")
      --path.data string             Data path (default "")
      --path.home string             Home path (default "")
      --path.logs string             Logs path (default "")
      --strict.perms                 Strict permission checking on config files (default true)
  -v, --v                            Log at INFO level

Use "filebeat [command] --help" for more information about a command.

This is the response i'm getting when I reset the password in the security section of the deployment that I created on elastic cloud.
I also made changes in the filebeats.yml file directly, hope that works.

Additionally, I'm unable to figure out exactly how to tell Filebeat to look for particular files in it's input.

Since this is your first elastic project I'd also recommend taking a look at uploading the file through kibana. By default it can handle csv, and json files up to 100MB in size. It will let you get the data in easily and get more familiar with mappings etc without too much overhead. Once you are more comfortable with that you it might help with configuring filebeat as well.

You can upload files through Kibana by going to Integrations and then searching for "Upload".

The only question that I have regarding this is that the data that I have with me is really large in number. So via Kibana, I'd have to individually upload every file and create indices for each of those files. In this case, is there any way I can upload multiple files under a common index during the importing process?

If it’s more than a one off using filebeat (or logstash) would probably be best then.

I have edited your post, you really want to avoid posting auth details publicly like that.

filebeat -E cloud.id="DETAILS -E cloud.auth="DETAILS" is what you should need to use.

Access is denied. This is the error that's getting displayed when I run the above command.

Please make sure you share the full command you are running and the error, it helps us help you :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.