Hi,
We plan to do statistical aggregations, and ES has the facets function,
but what's the performance of facets under big load? Our target is to count
the access times of different application from the app log in a time slot.
We are concerned the facet performance when there are lots of logs.
So we consider to run a hourly cronjob to search in the logs and get the
counters stored into mysql database. In this way, we get the count numbers
in sql and the statistical aggregations could be done by searching in
mysql datas.
But there's a problem, there are multiple applications identified by id in
the logs, and there are different urls for every application, I need to
count the url access times for every url which belongs to different
application.
and the statistical aggregations shows the url access times for specific
application.
here is my plan:
search all the application id from ES,
for every id, get the statistical aggregations of url access times by
facets
for step 1, there are multiple fields in the ES doc, how I can just get the
distinct id field from ES?
Hi,
We plan to do statistical aggregations, and ES has the facets function,
but what's the performance of facets under big load?
The performance is good Whether it's good enough for you or not, you
won't be able to say exactly without testing. Although people here might be
able to say if you're realistic or not if you give some more details, like:
how your documents look like
how your facets would look like
what hardware you have available for the job
Our target is to count the access times of different application from the
app log in a time slot.
We are concerned the facet performance when there are lots of logs.
So we consider to run a hourly cronjob to search in the logs and get the
counters stored into mysql database. In this way, we get the count numbers
in sql and the statistical aggregations could be done by searching in
mysql datas.
But there's a problem, there are multiple applications identified by id in
the logs, and there are different urls for every application, I need to
count the url access times for every url which belongs to different
application.
and the statistical aggregations shows the url access times for specific
application.
here is my plan:
search all the application id from ES,
for every id, get the statistical aggregations of url access times by
facets
for step 1, there are multiple fields in the ES doc, how I can just get
the distinct id field from ES?
I'm not sure I'm following. There's no ID for a specific field in a
document. But you have field names and document IDs.
If the application ID is a field in your document, then you can get it
during a search or a get by using the "fields" parameter. Take a look here:
there's already range in query part, in the elasticsearch guide, it says
the filter in query doesn't apply on facet. And need to do a facet filter.
That's why I'm using 2 filter, one for query and the other for facet.
there's already range in query part, in the elasticsearch guide, it says
the filter in query doesn't apply on facet. And need to do a facet filter.
That's why I'm using 2 filter, one for query and the other for facet.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.