The performance of facets

Hi,
We plan to do statistical aggregations, and ES has the facets function,
but what's the performance of facets under big load? Our target is to count
the access times of different application from the app log in a time slot.
We are concerned the facet performance when there are lots of logs.
So we consider to run a hourly cronjob to search in the logs and get the
counters stored into mysql database. In this way, we get the count numbers
in sql and the statistical aggregations could be done by searching in
mysql datas.
But there's a problem, there are multiple applications identified by id in
the logs, and there are different urls for every application, I need to
count the url access times for every url which belongs to different
application.
and the statistical aggregations shows the url access times for specific
application.

here is my plan:

  1. search all the application id from ES,
  2. for every id, get the statistical aggregations of url access times by
    facets

for step 1, there are multiple fields in the ES doc, how I can just get the
distinct id field from ES?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

On Thu, Jul 4, 2013 at 11:31 AM, lijionly@gmail.com wrote:

Hi,
We plan to do statistical aggregations, and ES has the facets function,
but what's the performance of facets under big load?

The performance is good :slight_smile: Whether it's good enough for you or not, you
won't be able to say exactly without testing. Although people here might be
able to say if you're realistic or not if you give some more details, like:

  • how your documents look like
  • how your facets would look like
  • what hardware you have available for the job

Our target is to count the access times of different application from the
app log in a time slot.
We are concerned the facet performance when there are lots of logs.
So we consider to run a hourly cronjob to search in the logs and get the
counters stored into mysql database. In this way, we get the count numbers
in sql and the statistical aggregations could be done by searching in
mysql datas.
But there's a problem, there are multiple applications identified by id in
the logs, and there are different urls for every application, I need to
count the url access times for every url which belongs to different
application.
and the statistical aggregations shows the url access times for specific
application.

here is my plan:

  1. search all the application id from ES,
  2. for every id, get the statistical aggregations of url access times by
    facets

for step 1, there are multiple fields in the ES doc, how I can just get
the distinct id field from ES?

I'm not sure I'm following. There's no ID for a specific field in a
document. But you have field names and document IDs.

If the application ID is a field in your document, then you can get it
during a search or a get by using the "fields" parameter. Take a look here:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

the documents is like:
{

              "LogDate" : "2013-03-12T15:25:11",

              "SourceIP" : "1.1.1.1",

                "AppID" : "51w",

                "AppVersion" : "0",

                "AppSubVersion" : "0",

                "RemoteAddr" : "10.2.43.89",

                "RemoteUser" : "",

                "Request" : "GET /forum.php",

                "Status" : "200",

                "HttpReferer" : "forum-72-1.html",

                "RequestLength" : 1432,

                "ResponseLength" : 699,

                "ResponseTime" : 1.259,

                "HttpUserAgent" : "Mozilla/5.0 ",

                "HttpForwardFor" : "",

}

and the facets:

{

  "size":0,

"query":{

    "term" : {"AppID" : "51w"}

},

"filter":{

    "range":{

        "LogDate":{

            "from":"2013-03-12T15:00:00",

            "to":"2013-03-12T16:00:00"

        }

    }

},

"facets":{

    "tags":{

        "terms":{

            "field":"Request"

        },

        "facet_filter":{

            "range":{

                "LogDate":{

                    "from":"2013-03-12T15:00:00",

                    "to":"2013-03-12T16:00:00"

                }

            }

        }

    }

}

}

the hardware is not clear yet. we create index every day with this format:
yyyymmdd, so we may need to search multiple indexes to match the time slot

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi ,from your query , I suggest to move the logTime range to query part ,
this will reduce the total count for the facet phase.

2013/7/5 lijionly [via Elasticsearch Users] <
ml-node+s115913n4037584h75@n3.nabble.com>

Hi,

the documents is like:
{

              "LogDate" : "2013-03-12T15:25:11",****

              "SourceIP" : "1.1.1.1",****

                "AppID" : "51w",****

                "AppVersion" : "0",****

                "AppSubVersion" : "0",****

                "RemoteAddr" : "10.2.43.89",****

                "RemoteUser" : "",****

                "Request" : "GET /forum.php",****

                "Status" : "200",****

                "HttpReferer" : "forum-72-1.html",****

                "RequestLength" : 1432,****

                "ResponseLength" : 699,****

                "ResponseTime" : 1.259,****

                "HttpUserAgent" : "Mozilla/5.0 ",****

                "HttpForwardFor" : "",

}

and the facets:

{

  "size":0,

"query":{

    "term" : {"AppID" : "51w"}

},

"filter":{

    "range":{

        "LogDate":{

            "from":"2013-03-12T15:00:00",

            "to":"2013-03-12T16:00:00"

        }

    }

},

"facets":{

    "tags":{

        "terms":{

            "field":"Request"

        },

        "facet_filter":{

            "range":{

                "LogDate":{

                    "from":"2013-03-12T15:00:00",

                    "to":"2013-03-12T16:00:00"

                }

            }

        }

    }

}

}

the hardware is not clear yet. we create index every day with this format:
yyyymmdd, so we may need to search multiple indexes to match the time slot

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4037584&i=0
.
For more options, visit https://groups.google.com/groups/opt_out.


If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/The-performance-of-facets-tp4037533p4037584.html
To start a new topic under Elasticsearch Users, email
ml-node+s115913n115913h74@n3.nabble.com
To unsubscribe from Elasticsearch Users, click herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=115913&code=dmJ6ZXJvY29vbEBnbWFpbC5jb218MTE1OTEzfDk1NDU4NzMxMA==
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

--
Jason You @Chengdu China

there's already range in query part, in the elasticsearch guide, it says
the filter in query doesn't apply on facet. And need to do a facet filter.
That's why I'm using 2 filter, one for query and the other for facet.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

you can use the filtered query

2013/7/5 lijionly [via Elasticsearch Users] <
ml-node+s115913n4037586h37@n3.nabble.com>

there's already range in query part, in the elasticsearch guide, it says
the filter in query doesn't apply on facet. And need to do a facet filter.
That's why I'm using 2 filter, one for query and the other for facet.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email]http://user/SendEmail.jtp?type=node&node=4037586&i=0
.
For more options, visit https://groups.google.com/groups/opt_out.


If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/The-performance-of-facets-tp4037533p4037586.html
To start a new topic under Elasticsearch Users, email
ml-node+s115913n115913h74@n3.nabble.com
To unsubscribe from Elasticsearch Users, click herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=115913&code=dmJ6ZXJvY29vbEBnbWFpbC5jb218MTE1OTEzfDk1NDU4NzMxMA==
.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml

--
Jason You @Chengdu China

Thank you, I finally do like this:
{
"query":{
"filtered":{
"query":{
"term":{
"AppID":"51weixuew"
}
},
"filter":{
"range":{
"LogDate":{
"from":"2013-03-12T15:25:10",
"to":"2013-03-12T15:25:12"
}
}
}
}
},
"facets":{
"tags":{
"terms":{
"field":"Request"
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.