Split data into multiple indices automatically on insert


#1

Hi guys,

what I like to do is that when I insert my data which has a timestamp field, my data is automatically should be splitted across weekly indices and if it's necessary, a new index with pre-defined mapping should be created.

Is there a way to do that whith ES config or I have to do it in business logic outside elasticsearch?


(Aditya Budi Utomo) #2

If you use multiple node in server, elasticsearch automatically split, you only configure same cluster_name


#3

Thanks for your reply. Maybe you misunderstood me.

Lets say I have data from 2016.01.01 to 2016.01.20.
When I start to insert my data I want that ES creates an index named mytype-20160101 and it stores all the data where the timestamp field is less than 2016.01.08. 00:00:00.
Next, it should create a new index named mytype-20160108 and it stores all the data where the timestamp field is between 2016.01.08 00:00:00 and 2016.01.15 00:00:00
... and so on.

If I use logstash for inserting data, can I achieve this?


(Magnus Bäck) #4

When I start to insert my data I want that ES creates an index named mytype-20160101 and it stores all the data where the timestamp field is less than 2016.01.08. 00:00:00.

ES won't do this automatically. You have to tell it which index to store a new document.

If I use logstash for inserting data, can I achieve this?

Yes.


#5

Thanks for your reply Magnus. I'm trying to import my data with logstash, and I'm trying to import my data into monthly created indexes. I have a date field called createDate and thats what I want to use.

Here's my config:

input {
    jdbc {        
        jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"        
        jdbc_user => "user"
		jdbc_password => "pass"        
        jdbc_driver_library => "mysql-connector-java-5.1.38-bin.jar"        
        jdbc_driver_class => "com.mysql.jdbc.Driver"
		jdbc_paging_enabled => "true"
		jdbc_page_size => "2"
        # just for testing
        statement => "SELECT * FROM mytable skip 0 limit 10"
    }
}

output {
    elasticsearch {
		hosts => ["localhost:9200"]		
		# here's the trick...
		index => "my-index-{myformatteddate}"
		document_type => "mydoc"
	}
}

What filter should I apply and how, to create a new field myformatteddate which contains the year and month of the createDate?


(Magnus Bäck) #6

One typically uses the date filter to parse date strings and convert them to ISO8601 format in UTC. Then you can use the %{+YYYY.MM.dd} notation in the index name to have Logstash insert the date from the @timestamp field into the index name. Exactly what the date filter should look like depends on what the createDate field looks like. I haven't used the jdbc input so I'm not sure what happens with date fields. Use a stdout { codec => rubydebug } until you get what you want. Don't jump into Elasticsearch prematurely.


(system) #7