Structure of value for a key that represents nested classes

Xavi_Montero · March 31, 2017, 11:00pm

First of all, I'm a complete newbie to ElasticSeach and related tools. Although 30 yrs programming, in fact this is my First project using Elasticsearch. So sorry if this is not the kind of question I should be asking here.

So, the thing is:

I am planning to use ElasticSearch to store a sequence of events representing activity in my application.
I have designed an object structure in which I will store several types of events, similar but different, in the same collection of documents.
Each document will have at least the following data:

{
  "id": [a sha1 Id]
  "class": [the type of event being stored]
  "version": [version of the structure of data for that event class]
  [...]
}

My question is about the "type of event" to be stored in the key "class". Let me show a partial drill-down in the event type hierarchy:

I am planning to store http events (users calling my web controllers) and console events (users invoking CLI application calls).
For http I will have normal pages and API calls. Something like http.page and http.api for example.

For pages I will render an event when the controller is creating the page and one other event when the page is being rendered, which I will know loading an event tracker URL via an ajax. something like http.page.production and http.page.consumption. For example my PHP class for this last one is named HttpPageConsumptionEvent.

In addition I have "contextualized" events, which represent the previous ones but with added data representing "who tells what", for example if "event 1 is saying 'user logged in at date XXX'" and an importer running 1 year later is transforming data it may say "the importer is saying at date YYY that event 1 said that 'user logged in at date XXX'".

That class in my PHP is named after the ContextualizedEvent suffix. For example HttpPageConsumptionContextualizedEvent.

So here's my question:

Q1) I am wondering if I should name the events in some kind of structured way as separated by dots, like for example http.page.consumption vs something wich is compact HttpPageConsumption. Will those dots help me in ulterior analysis of the data for example to place things into buckets? Or don't mind at all?

Q2) I am wondering if I should name the events in a way that it feels more natural, like adding the event word at the end like in http.page.consumption.event or it is better if I don't add any word at all, like in http.page.consumption.

Q3) In the case I add the word event in the string of the value of the class key, I wonder if it may go anywhere, like at the end, like in http.page.consumption.event or it should look like a namespace for any search-reason or anything related to grouping or aggregating, like in event.http.page.consumption to separate it from event.http.page.consumption, from event.http.api..... from event.console...... in a way that it acts like a namespace structure nesting sub-concepts after grouping words.

Q4) I wonder if I may store all the events in the same collection of documents, or it is better that I set the http in one place, the console in another... even if I place the http.page in one place and the http.api in another... or it is completely transparent from a technical point of view and neither solution will limit me when asking for complex data analysis (thru for example Kibana).

Q5) When it comes to contextualized events the same problem arises adding one more word... HttpPageConsumptionContextualizedEvent or event.contextualized.http.page.consumption or contextualized.event.http.page.consumption or contextualizedEvent.http.page.consumption...

Probably you guys are going to feel this is like an stupid question, but I tend to be very purist when programming and I would love to kick-off with a naming that is not limiting me later when querying.

Thanks in advance. Any tips are really very welcome.

dadoonet · April 1, 2017, 6:57am

Hi!

So sorry if this is not the kind of question I should be asking here.

That's totally the place to ask questions. Any question. Welcome!

I would use something like http/page/consumption. Why this? Because you can then benefit from the Path Tokenizer which will help you to search within the path. dot notation will work as well though as you can set delimiter to . when configuring the tokenizer.

But, I'd be curious of what would a document look like. Can you share a sample document? I'm curious of what you have in [...] actually.

Xavi_Montero · April 1, 2017, 11:34am

Ah fantastic!! Didn't know about the Path Tokenizer thing. Seems so that writing some "namespacing" can be useful!!

About the [...] here is my full document of a contextualized event for a confirmation sent over ajax. It essentially confirms at the very end of the html <body> that everything was okey. At every page production I create a unique token used at all the internal events of the page consumption: pageView, scrillong to certain places, mouse over an important item, etc....

The document is essentially a "contextualized event" which mainly consists of a "context" and an "event".

The "context" part is the "history" of all softwares, servers and deployes by which the events is processed and transformed. In this example it only has a single element in the array: The controller attending the Ajax query. If later I re-import this event into other places, the "transformers" will be appended in order. This is the part of "Alice says that Bob says that Charlie says..."

The "event" part is the thing that actually happened. In this case, that there is an http.page.consumption sending a pageView (at the very end of the document). Other consumpers in the very same page could send signals when the images are fully loaded, or when the user scrolls in a way that a certain html element not initially visible because it is off-screen is revealed in the browser because of the scroll, for example. This is what I named eventAction in the event part of the document.

Besides the eventAction there is the pageToken which is the identifier that the controller producing the page generated. So via this property I will be able to link all the events (either html.page.production, either html.page.consumption) that go together and belong to the same page load from the user's point of view. Every production will create a pageToken that will be consumed by several page consumption events.

I also store the "received" cookies, the "desired" cookies (the ones I will set when answering the HttpResponse), and I just store all the server info raw data there, to have IPs, languages, referrers, servers, etc just to be able to post-process this info another day when I want to use it for any reason.

Here is an example of how it looks like just today. My question arised when trying to set the content to the three "class" attributes that appear (to the contextualized event, to the context and to the event itself) - Not yet changed, still looking weird the classes

As the body is limited to 7000 chars, I'm posting the event itself in a next reply...

Xavi_Montero · April 1, 2017, 11:40am

Hummm still the event itself is not allowing me to post it... I will strip a central part which does not add value...

So here it is the version I plan to set in production.

If you guys find I'm storing something wrong, I'll be more than happy to listen for your advice:

{ 
    "id": "80dd2586976d35fab59eb6be79008cc33fd045fc",
    "class": "contextualizedEvent.http.page.consumption",
    "version": "v1.0.0",
    "context": {
        "class": "context",
        "version": "v1.0.0",
        "contextUnits": [
            {
                "callingClass": "XaviMontero\\DirtyBundle\\Controller\\EventTrackerController",
                "action": "XaviMontero\\DirtyBundle\\Controller\\EventTrackerController::registerEventAction",
                "executionEnvironment": {
                    "serverInstanceId": "b30dd3baca21119064ec6b51333e49cbb28f2eaf",
                    "projectId": "tighten-violet",
                    "environmentId": "devel",
                    "deployVersionHash": "3960d2b098309848a6a601c453337319c54ee943",
                    "deployRootDir": "/files/custom_www/hello-trip/tighten-violet",
                    "deployTimeStamp": "2017-03-29T14:58:03.638200Z"
                },
                "creationTimeStamp": "2017-04-01T11:07:08.399831Z"
            }
        ]
    },
    "event": {
        "id": "88541bae1b99bfca9015a32655bd1cbbb93f0262",
        "class": "http.page.consumption.event",
        "version": "v1.0.0",
        "request": {
            "request": [],
            "query": [],
            "attributes": {
                "_controller": "XaviMontero\\DirtyBundle\\Controller\\EventTrackerController::registerEventAction",
                "pageToken": "7894758923478",
                "eventAction": "pageView",
                "_route": "eventTracker.registerEvent",
                "_route_params": {
                    "pageToken": "7894758923478",
                    "eventAction": "pageView"
                },
                "_method": {}
            },
            "cookies": {
                "_ga": "GA1.3.1772714855.1484259236",
                "hellotrip_a": "b861bbecd1e51cca7ff00455fa0aee4b98ce74b0",
                "hellotrip_c": "dc37e1feced3ba884cc36b1acf4bcce2cc4b4598"
            },
            "files": [],
            "server": {
                "BASE": "\/hello-trip\/tighten-violet\/web",
                "HTTP_HOST": "bromo.dsitelecom.com",
                "HTTP_CONNECTION": "keep-alive",
                "HTTP_CACHE_CONTROL": "max-age=0",
                "HTTP_UPGRADE_INSECURE_REQUESTS": "1",
                "HTTP_USER_AGENT": "Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/56.0.2924.87 Safari\/537.36",
[stripped some data to reduce to 7000 chars]
                "HTTP_ACCEPT_LANGUAGE": "es-ES,es;q=0.8",
                "HTTP_COOKIE": "_ga=GA1.3.1772714855.1484259236; hellotrip_a=b861bbecd1e51cca7ff00455fa0aee4b98ce74b0; hellotrip_b=d309a2c0a7d6b987f846619aa2e900ff4e936273; hellotrip_c=dc37e1feced3ba884cc36b1acf4bcce2cc4b4598",
                "PATH": "\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin",
[stripped some data to reduce to 7000 chars]
                "SERVER_ADDR": "192.168.2.55",
[stripped some data to reduce to 7000 chars]
                "SCRIPT_NAME": "\/hello-trip\/tighten-violet\/web\/app_dev.php",
                "PATH_INFO": "\/registerEvent\/7894758923478\/pageView",
                "PATH_TRANSLATED": "\/var\/www\/registerEvent\/7894758923478\/pageView",
                "PHP_SELF": "\/hello-trip\/tighten-violet\/web\/app_dev.php\/registerEvent\/7894758923478\/pageView",
                "REQUEST_TIME_FLOAT": 1491044828.337,
                "REQUEST_TIME": 1491044828
            },
            "server.headers": {
                "HOST": "bromo.dsitelecom.com",
                "CONNECTION": "keep-alive",
                "CACHE_CONTROL": "max-age=0",
                "UPGRADE_INSECURE_REQUESTS": "1",
                "USER_AGENT": "Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/56.0.2924.87 Safari\/537.36",
                "ACCEPT": "text\/html,application\/xhtml+xml,application\/xml;q=0.9,image\/webp,*\/*;q=0.8",
                "ACCEPT_ENCODING": "gzip, deflate, sdch",
                "ACCEPT_LANGUAGE": "es-ES,es;q=0.8",
                "COOKIE": "_ga=GA1.3.1772714855.1484259236; hellotrip_a=b861bbecd1e51cca7ff00455fa0aee4b98ce74b0; hellotrip_b=d309a2c0a7d6b987f846619aa2e900ff4e936273; hellotrip_c=dc37e1feced3ba884cc36b1acf4bcce2cc4b4598"
            },
            "headers": {
                "host": [
                    "bromo.dsitelecom.com"
                ],
                "connection": [
                    "keep-alive"
                ],
[stripped some data to reduce to 7000 chars]
                "accept-language": [
                    "es-ES,es;q=0.8"
                ],
                "cookie": [
                    "_ga=GA1.3.1772714855.1484259236; hellotrip_a=b861bbecd1e51cca7ff00455fa0aee4b98ce74b0; hellotrip_b=d309a2c0a7d6b987f846619aa2e900ff4e936273; hellotrip_c=dc37e1feced3ba884cc36b1acf4bcce2cc4b4598"
                ],
                "x-php-ob-level": [
                    1
                ]
            },
            "content": ""
        },
        "response": {
            "cookies": [
                {
                    "name": "hellotrip_a",
                    "value": "b861bbecd1e51cca7ff00455fa0aee4b98ce74b0",
                    "type": "persistent",
                    "validPeriodInSeconds": 173491200
                },
                {
                    "name": "hellotrip_b",
                    "value": "d309a2c0a7d6b987f846619aa2e900ff4e936273",
                    "type": "persistent",
                    "validPeriodInSeconds": 1800
                },
                {
                    "name": "hellotrip_c",
                    "value": "dc37e1feced3ba884cc36b1acf4bcce2cc4b4598",
                    "type": "session"
                }
            ]
        },
        "creationTimeStamp": "2017-04-01T11:07:08.400410Z",
        "pageToken": "472c22d12dd3a90b3a1853a116fafe74d8e24621",
        "eventAction": "pageView",
        "eventData": []
    }
}

dadoonet · April 2, 2017, 8:54am

Ok. You just need to make sure that fields are consistent across documents.

If a field is a text in one doc it can't become a number or a date later.

system · April 30, 2017, 8:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Feedback on Document Structure (and Inefficiencies) Elasticsearch	1	379	July 6, 2017
Feedback on Document Structure (and Inefficiencies) Elasticsearch	1	472	July 6, 2017
Newbie question: Fear about "Index and type structure" mistakes Elasticsearch	2	592	July 27, 2017
Storing user based events in elasticsearch Elasticsearch	1	659	April 18, 2019
Storing data in Nested type vs Storing directly in outer layer Elasticsearch	1	345	November 6, 2019

Structure of value for a key that represents nested classes

Related topics