We collect in our logs, among other things, geo-coordinates and their accuracy in 2 "string" fields. Currently there are over 3,200,000 logs since the beginning of 2021. Now we want to display these coordinates on a map. However, the "geo_point" type is required instead of the "string" type. So all of our logs have the wrong data type in our "geolocation" field. The other fields in the logs work without problems for other evaluations.
How can we achieve that we can show the data of the existing and new logs in the map? I've already read about the mapping API and scripts, but unfortunately I don't know how to use it. Does anyone know of a tutorial or an example of how the data can be mapped or converted? However, the logs and the index must not be deleted.
Can the mapping of the logs be corrected easily during operation? With this query, the "geolocation" field would be set to "geo_point" wouldn't it? All other fields would remain as they are and are therefore not included here.
Or could a new field "geolocationNew" be created with the type "geo_point"? For this I would rebuild the query from above and use "geolocationNew" instead of "geolocation". Then perhaps all new logs could be written to the "geolocationNew" field. How could the old logs with the wrong fields be written to the new field?
Elasticsearch encourages to not modify your existing indices. Yet, if you are OK on modifying your current index, you can update your mapping and run an update by query, as the documentation states (bold is mine):
If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.
Of course, be sure you have backups of your data.
If you prefer not to touch your existing data and better move to new indices, I guess the workflow would be as follows:
[Optional] Consider adopting Data Streams for easier management
Create an index template with the correct mapping and settings that matches against an index pattern name say new-logs-*
Point your ingestion process to the new index, defined by your template (example: new-logs-YYYY-MM-DD if you want to create one index per day)
Reindex your old data into a new index (new-logs-old for example). If the field names are the same and types are compatible you only need to define origin and destination names. That is, if your geolocation strings are compatible with geo_point.
If you need to adjust types or make any other changes you should create an ingest pipeline to transform your data on re-index.
I'll move this question to the Elasticsearch forum since this is not really a Kibana question.
Thanks for your quick answer, but isn't there an easy (!) way?
It is possible to add new attributes to the documents in the existing index. For example, I could send the attribute "Test" with the content "Hello" to the Elastic and then see exactly this content under the item "Discover". First the data type is "Unknown field", later it changes to "String field". We have often done this for other fields.
Can't I send a new attribute "geolocationNew" with the content "1.123, 5.678" to the Elastic? Isn't that the same? The type would then only have to be "geo_point" and not "String field".
First option using update_by_query seems quite straight forward to me. The second option involves learning more about Elasticsearch, but in my opinion gets you in a better position to manage your data.
You can do that, yes, and then create a query that filters documents without that field (or just use the date or any other method) and use it on a reindex call to add it with an ingest pipeline to get all your data with that new geometry field.
Additionally, you can just add your new field and if you ever need geospatial support for your old data you can just generate that field on the fly using a Runtime Field. Check this other interesting thread on this topic.
Runtime fields are always going to be slower that stored data, but they are useful for changing mappings and ad-hoc analysis.