Hi All,
I have an index in elasticsearch to track a user's journey through a web portal. Each document records the page URL and timestamp which user visited as part of their journey.
E.g: if user visited 10 pages of the portal, the document records 10 fields (Each timestamp is an event which could be minutes apart or within the same second):
page1.timestamp
page2.timestamp
.
.
page10.timestamp
I need to determine the last page visited by the user(if there is a session dropout), meaning I need to sort fields within a single document and find the largest timestamp value. There are a few ways of doing this:
- Maintain a field with the pageId of the current event.This updates on every event. At the end of the journey, the page id holds the last page. However this will not be 100% accurate as any downtime could mean a large amount of events being ingested in a short time causing possible mismatch in order. Also won't work retrospectively.
- Run an update_by_query at fixed intervals to 'sweep' the records and find the latest timestamp field. Then record this pageURL in a field. Since we are using ES6.8 we cannot use for-loop in painless script (as per documentation), hence difficult to achieve ?
- Using Ruby plugin I will only be able to work with current event, not query in ES.
- Use an external script (javascript, shell, python) to fetch documents from elasticsearch using REST API, work on it locally and update the required field by firing an update API.
Is there is a better way of achieving this solution? Have I overlooked any existing features ? Please advise.