Does ElasticSearch fit my use case?

Hi all!

I don't want to discuss my project directly, so I've made the below a bit vague as it's a clients project.

I'm a LAMP developer, and I'm working on a side project. Essentially, let's say that I need to log everyone that hits a page, and also, any GET variables (i.e. ?ref=google, ?colour=red... etc). The hit info to be stored is the usual basics: IP, country, referrer if applicable, time, etc.

I need to present this data on a per page basis (i.e. all stats for page=home) in a dashboard. I originally planned to use Graphite as I have some experience with this, and I know it's easy to just get JSON out of it to draw graphs and so on.

I'm very new to ES, but would it be possible to just ship all of my logs to ES, and then get dashboard data on the fly? Given a start + end date, I'd need to do things like work out how many hits were in the time frame, break it down per country, info about hits per set of GET parameters and so on. The reason this doesn't work in graphite is because once you get to things like GET params + countries, you need a metric per country for example - and this will chew through disk space.

I need to extract hit data in a way I can graph it, and draw nice tables of hits per country etc (similar to awstats really), and if it all worked in realtime that'd be great.

  1. Would ES fit this use case?
  2. Would I need to use something like logstash to put my data from PHP -> Logstash -> ES? Or can I just have PHP workers write hit data straight to ES?
  3. Would ES allow me to get data back quick enough to draw a dashboard for a customer without loads of delay whilst it runs the query?

Sorry for the newbie questions, I've read a fair few docs but thought it'd be best to just ask if ES is the right tooling for this job.

  1. Perfectly fit from what I know. In term of graphing, I can't tell you how much I love Kibana ^^
  2. If you have data structured in JSON, the njust have PHP workers send them to ES. Otherwise, you may need to use Logstash to parse data with regex or grok ...
  3. It depends on the amount of data, but you can definitely get data back in miliseconds.

That's cool, so I can store JSON straight in to ES?

That'd be awesome as obviously I could just build a PHP flat array of the data (IP, country, time, etc.) and then json_encode it and pass it to ES. Would I need to do much in terms of optimisation? I'm going to read up on some tutorials and maybe buy a book I think.

Kibana is good, I've played on someone elses cluster with it, but in this use case all data needs to be served through my custom LAMP stack web app.

One thing I'm confused about from reading PHP ES searching tutorials is how do I format my results? For example, I want to get back that there were 1000 hits, 900 from USA, 50 from UK, 50 from Australia, how do I get ES to tell me that without having to get ALL the results back between the two dates, and then count etc. in PHP?

That's cool, so I can store JSON straight in to ES?

ES is essentially a JSON document database optimized for text searching.

One thing I'm confused about from reading PHP ES searching tutorials is how do I format my results? For example, I want to get back that there were 1000 hits, 900 from USA, 50 from UK, 50 from Australia, how do I get ES to tell me that without having to get ALL the results back between the two dates, and then count etc. in PHP?

Use a terms aggregation query.