I'm recording daily country-level analytics for each url on my site.
But I don't want to store each pageview in a document because that would require too much data + time to process.
So I can store 1 document for each country for each day for each url, but that too would be too much data.
So I come to the conclusion to store 1 document for each url on each day, and keep all the country-stats in there:
Is something like that for elasticsearch ? Maybe I need to change the schema a little from what I have to something else (like from nested to an array of {country,pageview}) ?
If this is possible, then I would like to go 1 level further, by storing inside the 'country' object, an array of cities each with it's own pageviews. So then I could do the queries above but also grouping by city?
Note that the document won't ever be updated, it will just be inserted once at the end of the day (keeping increments happens outside elasticsearch). ( i know that updates are delete+insert --> inefficient).
What I want in this case is to lower the overhead-per-document if I would store it in a non-nested way.
What do you mean by "an event each day" ? WIth event, I understand a pageview, and that means 1 document for each pageview, which it's alot (multimillion pageviews(documents) / day).
A million documents a day is unlikely to be majorly different from one massive document a day. Plus you then don't need to deal with collating that information external to ES.
I will have 1 document for each distinct url in each day. Just the per-document overhead will be very big. I would love to store raw events but that is costly currently.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.