James said, "As christian has mentioned above, ES-Hadoop will not create an index name using the current time, but rather will let you give a field name on your document to use in the index name creation. This field can contain a timestamp, which is usually what users want when they are using time based indices."
What is the field name that ES-Hadoop (or ES possibly) looking for index name creation? And how do we configure which field to use?
I had to combine two examples (one and two) to have it working for Structured Streaming timeseries indices.
In scala it looks something like
val df = spark.readStream.schema(schema)
.parquet("/path-to/*.parquet") //assuming parquet file has timestamp field
.withColumn("@timestamp",date_format('timestamp,"yyyy")) //format the field as you want in the index name
val stream = df.writeStream
.option("checkpointLocation", "/save/location")
.format("es")
.start("timeseries-{@timestamp}/doc")
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.