Agg script that returns multiple-documents from 1 document ? (json_each in postgresql)

ddorian43 · March 14, 2016, 11:33am

I'm recording daily country-level analytics for each url on my site.
But I don't want to store each pageview in a document because that would require too much data + time to process.

So I can store 1 document for each country for each day for each url, but that too would be too much data.

So I come to the conclusion to store 1 document for each url on each day, and keep all the country-stats in there:

{"url": "/", "country": {"CA": {"seconds": 500, "pageviews": 20}, "US": {"pageviews": 5}}}

Is there a way to break this into multiple doucments when I'm searching so that I can do:::

top countries for url in date-range
top countries for all urls
sum(pageviews) for url in country "US"
top(urls) for country "US"
top(urls) (summing the values of all countries together)
etc.

The way this is done in postgresql is by using json_each() in a json column http://www.postgresql.org/docs/9.3/static/functions-json.html .

Is something like that for elasticsearch ? Maybe I need to change the schema a little from what I have to something else (like from nested to an array of {country,pageview}) ?

If this is possible, then I would like to go 1 level further, by storing inside the 'country' object, an array of cities each with it's own pageviews. So then I could do the queries above but also grouping by city?

Thank You!

warkolm · March 15, 2016, 3:17am

Having a massive nested document that is constantly updated is not really a good use of Elasticsearch

Why not store an event each day and then just aggregate to get the results, that's more what it's good at.

ddorian43 · March 15, 2016, 10:43am

Hi warkolm,

Note that the document won't ever be updated, it will just be inserted once at the end of the day (keeping increments happens outside elasticsearch). ( i know that updates are delete+insert --> inefficient).

What I want in this case is to lower the overhead-per-document if I would store it in a non-nested way.

What do you mean by "an event each day" ? WIth event, I understand a pageview, and that means 1 document for each pageview, which it's alot (multimillion pageviews(documents) / day).

warkolm · March 15, 2016, 8:06pm

A million documents a day is unlikely to be majorly different from one massive document a day. Plus you then don't need to deal with collating that information external to ES.

ddorian43 · March 16, 2016, 8:49am

I will have 1 document for each distinct url in each day. Just the per-document overhead will be very big. I would love to store raw events but that is costly currently.

Is there a way to do my original request ?

Topic		Replies	Views
Grouping with Elasticsearch (aggs) to join a field into a list of values Elasticsearch	2	461	June 7, 2018
Using ES for web analytics Elasticsearch	2	315	July 6, 2017
Export aggregated reports into Postgres Elasticsearch	3	170	January 18, 2024
How to aggregate values for embedded documents Elasticsearch	3	1063	July 5, 2017
Ingest SQL relationships as nested documents Logstash	4	626	June 15, 2018

Agg script that returns multiple-documents from 1 document ? (json_each in postgresql)

Related topics