Retrieving query stats


(Vinicius Carvalho) #1

Hello there! I'm trying to find a way to get stats on the searches our ES
nodes are serving. My first idea was to index all the search terms we have
in a separate index. This would also assist in a future effort to use click
through analysis.

So my plan was to store all the terms and later display the top terms.
Well, I know luke has some support for that, so I'm assuming it should not
be that complicated to create a plugin of some sort for ES and use lucene
internal api to retrieve the information on terms frequencies and counts.

But, I'd like to visualize phrases as well, and this is where I'm a bit
lost. let me put an example:

Suppose we have lots of searches for "free mp3" but also for "mp3 download"
and "free mp3 download", What I would like is to have a sort of radixtree
for the shingles:

mp3 = 3

  • free = 2
    • download= 1
    • {empty} = 1
  • download = 1
    -{empty} = 1

Could someone share some thoughts on how one could achieve that? If that is
useful I'm willing to submit any code as a plugin for ES if someone else is
interested.

Regards


(David Pilato) #2

+1 for the idea. It would help to suggest searches based on popular searches
sent by users, as google does it.

David.

Le 30 juillet 2012 à 16:41, Vinicius Carvalho viniciusccarvalho@gmail.com a
écrit :

Hello there! I'm trying to find a way to get stats on the searches our ES
nodes are serving. My first idea was to index all the search terms we have in
a separate index. This would also assist in a future effort to use click
through analysis.

So my plan was to store all the terms and later display the top terms. Well,
I know luke has some support for that, so I'm assuming it should not be that
complicated to create a plugin of some sort for ES and use lucene internal api
to retrieve the information on terms frequencies and counts.

But, I'd like to visualize phrases as well, and this is where I'm a bit lost.
let me put an example:

Suppose we have lots of searches for "free mp3" but also for "mp3 download"
and "free mp3 download", What I would like is to have a sort of radixtree for
the shingles:

mp3 = 3

  • free = 2
    • download= 1
    • {empty} = 1
  • download = 1
    -{empty} = 1

Could someone share some thoughts on how one could achieve that? If that is
useful I'm willing to submit any code as a plugin for ES if someone else is
interested.

Regards

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


(Otis Gospodnetić) #3

Hi Vinicius,

That might be interesting.
You may also want to check out the Search Analytics service we run (still
all free) - see URL in signature below. While we don't yet provide the
feedback loop you are describing, that's coming soon. You may find the
following interesting:

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Monday, July 30, 2012 10:41:00 AM UTC-4, Vinicius Carvalho wrote:

Hello there! I'm trying to find a way to get stats on the searches our ES
nodes are serving. My first idea was to index all the search terms we have
in a separate index. This would also assist in a future effort to use click
through analysis.

So my plan was to store all the terms and later display the top terms.
Well, I know luke has some support for that, so I'm assuming it should not
be that complicated to create a plugin of some sort for ES and use lucene
internal api to retrieve the information on terms frequencies and counts.

But, I'd like to visualize phrases as well, and this is where I'm a bit
lost. let me put an example:

Suppose we have lots of searches for "free mp3" but also for "mp3
download" and "free mp3 download", What I would like is to have a sort of
radixtree for the shingles:

mp3 = 3

  • free = 2
    • download= 1
    • {empty} = 1
  • download = 1
    -{empty} = 1

Could someone share some thoughts on how one could achieve that? If that
is useful I'm willing to submit any code as a plugin for ES if someone else
is interested.

Regards


(Vinicius Carvalho) #4

Hi Otis, thanks a lot for the information, I'm reading your book on lucene
again to get more info on subject.

I'll look into sematex solution, looks like a very powerful tool for our
current needs, All I need was a way to display this type of query
information. I'm still digging on a way to implement it, checking some
word suffix trees to maybe work this out.

Thanks for the valuable links.

Regards

On Monday, July 30, 2012 6:16:14 PM UTC-4, Otis Gospodnetic wrote:

Hi Vinicius,

That might be interesting.
You may also want to check out the Search Analytics service we run (still
all free) - see URL in signature below. While we don't yet provide the
feedback loop you are describing, that's coming soon. You may find the
following interesting:

http://blog.sematext.com/2011/11/02/search-analytics-at-enterprise-search-summit-fall-2011-presentation/

http://blog.sematext.com/2012/01/06/relevance-tuning-and-competitive-advantage-via-search-analytics/

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Monday, July 30, 2012 10:41:00 AM UTC-4, Vinicius Carvalho wrote:

Hello there! I'm trying to find a way to get stats on the searches our ES
nodes are serving. My first idea was to index all the search terms we have
in a separate index. This would also assist in a future effort to use click
through analysis.

So my plan was to store all the terms and later display the top terms.
Well, I know luke has some support for that, so I'm assuming it should not
be that complicated to create a plugin of some sort for ES and use lucene
internal api to retrieve the information on terms frequencies and counts.

But, I'd like to visualize phrases as well, and this is where I'm a bit
lost. let me put an example:

Suppose we have lots of searches for "free mp3" but also for "mp3
download" and "free mp3 download", What I would like is to have a sort of
radixtree for the shingles:

mp3 = 3

  • free = 2
    • download= 1
    • {empty} = 1
  • download = 1
    -{empty} = 1

Could someone share some thoughts on how one could achieve that? If that
is useful I'm willing to submit any code as a plugin for ES if someone else
is interested.

Regards


(system) #5