We are currently importing matomo logs to Elastic Search and I have ran into an issue that I cannot seem to figure out how to solve.
I am not sure how much you guys are familiar with Matomo (also known as Piwik) but it is a web analytics tool which we use to track some of the below things
- visit count
- unique visitor
- device usage
- and much much more
One thing that also can be set and which we are using currently is event tracking. We specify JS for when an event is triggered and then we receive this info each time the event is triggered.
Now in order to get a more accurate unique visitor count we introduced the userID setting : each time a customer logs in we link the visitor ID to this particular userID.
Let us assume that you as a visitor have 10 visits to our website but on the 8th you log in. The logs will look as follows in ElasticSearch
1-7th visit will contain visitor ID and page URL and other information but no userID
8-10th visit will contain the userID
Now let us assume that on 6th visit he entered an ABtest. We now wish to collect all those userIDs that entered the ABtest.
These are the issues I have:
- Segmentation is limited per log; so if I search for those who entered an ABtest then I only get one log where the ABtest entry is specified.
- I was then thinking that perhaps I could collect all the visitorIDs linked to ABtest entry (or any other event I may be interested in) and then use this list of visitorIDs to segment with and collect the userIDs. This however still gives me 0 results.
- I was then looking into scripted fields, to create a new field which links the visitorID and userID.. however, this is also limited to per log and I am not able to loop through the logs to find visitorID and userID match. (or maybe I do not know how to do it just..)
- Another issue with this is also that we have customers who have multiple accounts. One visitorID can be linked to only 1 userID.. but it can in some cases also be linked to more than 50 different accounts trying to exploit free offers.
- We therefore looked into the multi-value fields to use that for the mapping and create a separate event called 'userID' which then would let us know which account is used to log in in the case where we have multiple accounts.
Please see attached images which might clarify some things..
url_details_id contains the visitorID,
url_details.e_c is the event category (here we can have basically anything, for isntance we have abtesting, userID (which tracks the log in))
url_details.e_a is the associated event action name, for the event category userID is specifies the exact userID used by customer. If category is abtesting then e_a will reflectn abtest name etc.
url_details.uid contains the userID. As you can see sometimes this setting fails by matomo and in some cases the logs are just logs before the customer actually logged in so of course it does not contain userID.
If I segment for url_details.e_c = abtesting I get 1 log, I would like to be able to search for url_details.e_c = abtesting and get all of the logs associated with the visitorID used to enter the abtest - in particular those that contain the userID too.
The text in black explain the idea with multi_value field that we wish to try out. Please let me know if you have any input or just questions.
I am looking for some input regarding this as I am a bit stuck and do not know how to proceed from here.
We wish to have all of our data in one place to give us a clear overview of where things go wrong and where we cam improve.. but before this is even possible we need to be able to link web analytics data with the rest of the data (finance etc).
The userID is the essential thing as this will be used as identifier in the other data sets.
Has anyone had a similar issue? I am trying to figure out what might be the best solution.
Please let me know or just guide me towards something you think might be useful.
Thank you very much in advance