Question about elasticsearch index and logstash ingestion

Hello everyone
I'd like to ask you 2 questions.
I receive approximately 270 csv per month, 9 of them per day. Each csv is between 1kB and 3MG in size. All these csv are sent to the same index on elasticsearch with logstash. I would like to know what the maximum storage capacity is for an index in elasticsearch.
I would also like to know how to automate the import into elasticsearch via logstash so that each time a file comes into the directory it retrieves it without retrieving those that were already there.

The sky is the limit... :stuck_out_tongue:

More seriously, the limitation will come from your hardware most likely.
As Elasticsearch is elastic, it means that you can scale out (add more machines) to add even more space...

But I'd recommend not going above 50gb per shard.

Depending on your use case, you should may be consider using datastreams.

How can I shard my index with logstash?

You can define the number of shards when you create the index in Elasticsearch or using an index template.

But I am not creating my index with elasticsearch. I create it with logstash

You need to create an index template in Elasticsearch first.

Do I need to create the index and specify the column names?

You need to create an index template.

You can leave elasticsearch guess what you want to do with the "columns" (we are saying "fields" in Elasticsearch vocabulary), but I'd definitely recommend to define this before the first document is indexed.

Something like:

PUT _index_template/csv-template
{
  "index_patterns": ["csv*"],
  "template": {
    "settings": {
      "number_of_shards": 1
    },
    "mappings": {
      "properties": {
        "my_field1": {
          "type": "text"
        },
        "my_field2": {
          "type": "date"
        }
      }
    }
  },
  "priority": 100
}

Source:

This example supposes that you configure logstash to send the data to an index name starting with csv...

Thank you . But I don't understand this line : "index_patterns": ["csv*"]

This means that the template will be applied to every indice that starts with csv, so in your logstash output you would configure your output index to csv-something for example.

It is explained in the already shared documentation.

got it. Thank you

What sort of data are you ingesting? Is it time based?

Yes!

Hello, I have a question.
After creating my index model I'd like to send my data to an elasticsearch index. In my context I receive data every day. How can I automate the logstash task so that it can send new files arriving on elasticsearch every day?

The best way is to run Logstash as a service and configure your input to read files on an specific folder, when a new file appears on this folder logstash will automatically read and process it.

How to run logstash as a service depends on your operating system, but just follow the documentation.

Make sure you are using ILM then.

When I create my index model, if I set the number of shards to 2, what happens when I send my data to the indexes?

Hello,
I have one more question.
Let me put it in context:
I've created an index template in elasticsearch. Then, using logstash, I created an index that follows this template. Before I can visualize my data in kibana, I need to create a data view from this index. Every day, a script is launched and logstash sends new data to my index. My question is: I'd like to know if every time I receive data in my index, my data view created from this index updates automatically, as well as the data in my dashboard.

The data in elasticsearch will be seen by Kibana.
As long as the data view corresponds to the same index, alias...

Thank you