Cluster query to fetch particular label records in carrot2


(Prashant Agrawal) #1

Hi ES Users,

Is there any specific query which can be used to fetch the records of one particular label.

e.g.: Using _search_with_clusters I got response as

{id: 3
score: 0.9494177601684511
label: Nokia
phrases: [
Nokia
]
documents: [
XQqcFFOlRDm7-FfIVz-U1A
rXJT_mKpQWKB9x9Zs2ipJw
]
}

{
id: 4
score: 0.9399315182096429
label: Samsung
phrases: [
Samsung
]
documents: [
rXJT_mKpQWKB9x9Zs2ipJw
nxrIS71jRGqcJ1U11LPlJw
]
}

{
id: 5
score: 0.9353337839487279
label: LG
phrases: [
LG
]
documents: [
2wGWpV8OQXCPWwlnsm_uCw
nxrIS71jRGqcJ1U11LPlJw
]
}

Now suppose I want to filter the response for one particular label say Nokia so is there any query which I can fire to get the specific set of record?


(Dawid Weiss) #2

Add the cluster's label as a boolean must-occur phrase to your search query.

Dawid

On Tue, Mar 11, 2014 at 10:19 AM, prashy prashant.agrawal@paladion.net wrote:

Hi ES Users,

Is there any specific query which can be used to fetch the records of one
particular label.

e.g.: Using _search_with_clusters I got response as

*{id: 3
score: 0.9494177601684511
label: Nokia
phrases: [
Nokia
]
documents: [
XQqcFFOlRDm7-FfIVz-U1A
rXJT_mKpQWKB9x9Zs2ipJw
]
}

{
id: 4
score: 0.9399315182096429
label: Samsung
phrases: [
Samsung
]
documents: [
rXJT_mKpQWKB9x9Zs2ipJw
nxrIS71jRGqcJ1U11LPlJw
]
}

{
id: 5
score: 0.9353337839487279
label: LG
phrases: [
LG
]
documents: [
2wGWpV8OQXCPWwlnsm_uCw
nxrIS71jRGqcJ1U11LPlJw
]
}*

Now suppose I want to filter the response for one particular label say Nokia
so is there any query which I can fire to get the specific set of record?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Cluster-query-to-fetch-particular-label-records-in-carrot2-tp4051485.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1394529582225-4051485.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt_LVXwX4xJn8eJMOYA-zfZx%3Dxk0SuMZs9M1fb6XOZX4Rw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Prashant Agrawal) #3

As the cluster labels will be known after performing one search query.
So if my search query is:
{
"search_request": {
"Content"
],
"query": {
"match": {
"_all": "mobile"
}
},
"size": 1000,
"from":0
},
"query_hint": "mobile",
"field_mapping": {
"content": [
"fields.Content"
]
},
"algorithm": "lingo3g"
}

I will get the results as mentioned in previous post.

So what query should be executed if I knows that there is one cluster for "mobile" search as "Nokia".

Add the cluster's label as a boolean must-occur phrase to your search query.
According to this you mean to say that I should execute like:
{
"search_request": {
"fields": [
"Host",
"URL",
"Content"
],
"query": {
"bool": {
"must": {
"term": {
"Content": "Nokia"
}
}
}
},
"size": 1000,
"from": 0
},
"query_hint": "Nokia",
"field_mapping": {
"content": [
"fields.Content"
]
},
"algorithm": "lingo3g"
}

Correct me if I am wrong.


(Dawid Weiss) #4

It would be probably better to add a filtering expression on top of
your original query instead of replacing it entirely (so that a subset
of original documents are matching).

Add a "filter" clause to your original "search_request".
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filters.html

Dawid

On Tue, Mar 11, 2014 at 10:33 AM, prashy prashant.agrawal@paladion.net wrote:

As the cluster labels will be known after performing one search query.
So if my search query is:
{
"search_request": {
"Content"
],
"query": {
"match": {
"_all": "mobile"
}
},
"size": 1000,
"from":0
},
"query_hint": "mobile",
"field_mapping": {
"content": [
"fields.Content"
]
},
"algorithm": "lingo3g"
}

I will get the results as mentioned in previous post.

So what query should be executed if I knows that there is one cluster for
"mobile" search as "Nokia".

Add the cluster's label as a boolean must-occur phrase to your search
query.
According to this you mean to say that I should execute like:
{
"search_request": {
"fields": [
"Host",
"URL",
"Content"
],
"query": {
"bool": {
"must": {
"term": {
"Content": "Nokia"
}
}
}
},
"size": 1000,
"from": 0
},
"query_hint": "Nokia",
"field_mapping": {
"content": [
"fields.Content"
]
},
"algorithm": "lingo3g"
}

Correct me if I am wrong.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Cluster-query-to-fetch-particular-label-records-in-carrot2-tp4051485p4051487.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1394530422299-4051487.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt-baF2J2-fbGNuu8m%3Dt6khF6yd8oOocmfZLYx82DH0htg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Prashant Agrawal) #5

By adding the filter clause it will filter the data and return the Cluster labels. But its not necessary that if I search with the cluster labels(as filter) so I will be getting the same records(same set of record ID in one particular cluster) as I will get in case of former query.

Just to explain my problem statement in terms of Cluster Website: http://search.carrot2.org/

If I search "mobile" in this page it will return the records in main body and labels on the left panel.
So on clicking the label on left panel it will display the record wrt that particular label in main body.

At the end of my execution I want result like this only so any idea how I can proceed with that.
Like on firing the query I will get the cluster labels now I want to display the data wrt each particular labels on user event.

Attached snapshots for the same.<nabble_img src="cluster.bmp" border="0"/>


(Dawid Weiss) #6

Every clustering request may hit a different collection of documents
so splitting it into
multiple requests may not be a good idea.

The solution I offered will sort of work -- you just need to issue a
regular search query (filtered with the cluster's label); the
documents returned from Elastic search will revolve around the
cluster's phrase and the original phrase, so it should be all right.

The code for the web site you provided a snapshot of is open source
and is part of the Carrot2 project. The way the user interface works
there is one query fetches both documents and clusters, the filtering
is then done on client side (showing only a subset of documents for
the original query for that particular cluster). You could do the same
with the clustered results returned from ES -- the clusters contain
document identifiers that link them back to the content of the
original search query.

Dawid

On Tue, Mar 11, 2014 at 11:17 AM, prashy prashant.agrawal@paladion.net wrote:

By adding the filter cause it will filter the data and return the Cluster
labels. But its not necessary that if I search with the cluster labels(as
filter) so I will be getting the same records(same set of record ID in one
particular cluster) as I will get in case of former query.

Just to explain my problem statement in terms of Cluster Website:
http://search.carrot2.org/ http://search.carrot2.org/

If I search "mobile" in this page it will return the records in main body
and labels on the left panel.
So on clicking the label on left panel it will display the record wrt that
particular label in main body.

At the end of my execution I want result like this only so any idea how I
can proceed with that.
Like on firing the query I will get the cluster labels now I want to display
the data wrt each particular labels on user event.

Attached snapshots for the same.
http://elasticsearch-users.115913.n3.nabble.com/file/n4051495/cluster.bmp

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Cluster-query-to-fetch-particular-label-records-in-carrot2-tp4051485p4051495.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1394533058514-4051495.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt8hrbJ_oo3LCNAMNMjUw43LHjZMbNOEYCOQxS3B5xUXfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Prashant Agrawal) #7

I was also looking for one query to be fired and returned the result as well as cluster but I got stuck when it came to display the record of particular label.

The code for the web site you provided a snapshot of is open source and is part of the Carrot2 project.
Can I get the UI code for this so that I can co-relate my requirements wrt this particular behaviour.

the clusters contain document identifiers that link them back to the content of the original search query.
How I can achieve this i.e. retrieving the identifiers for particular label and displaying it back to UI.


(Prashant Agrawal) #8
  • deleted -

(Dawid Weiss) #9

I'm sorry but you need to take some initiative too. Questions like this one:

Can I get the UI code for this so that I can co-relate my requirements wrt this particular behaviour.

really discourage me from trying to help you. I've pointed you at the
project, it's really 5 minutes work to check it out from github and
look at where the web application is.

I hope you won't take it personally, but I think you should read this
document. Smart questions yield better answers.

http://www.catb.org/esr/faqs/smart-questions.html#before

Dawid

On Tue, Mar 11, 2014 at 12:21 PM, prashy prashant.agrawal@paladion.net wrote:

Hi Dawid,

Any input for the above problem case?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Cluster-query-to-fetch-particular-label-records-in-carrot2-tp4051485p4051507.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1394536882990-4051507.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt8bOy0PbK-L4oHBPQMjY0MxPnGVu2GO%3Dp5%2Bb6ve2S_0HA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Prashant Agrawal) #10

Sorry If I have asked a question which is not relevant to forum.
I explored the Project repository already and what I got for Web App is war file at http://project.carrot2.org/download.html

But I was just thinking if there is any source code(html files) instead of war files to check for some html page rendering and all.


(Dawid Weiss) #11

Did you try the "source code" link?...

http://project.carrot2.org/source-code.html

On Tue, Mar 11, 2014 at 1:06 PM, prashy prashant.agrawal@paladion.net wrote:

Sorry If I have asked a question which is not relevant to forum.
I explored the Project repository already and what I got for Web App is war
file at http://project.carrot2.org/download.html
http://project.carrot2.org/download.html

But I was just thinking if there is any source code(html files) instead of
war files to check for some html page rendering and all.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Cluster-query-to-fetch-particular-label-records-in-carrot2-tp4051485p4051525.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1394539584840-4051525.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt8T-osK39h_NzfamTmKQ_gfiF%3D4LJabUc%3DQM8%2BY0d2d-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Prashant Agrawal) #12

Yes I checked this link as well till the web app. Which is having the code altogether as java code.

What I thought is if we are doing any search just through json request in html pages so it is storing some info as cookies in client side and through HTML pages only the results are being retrieved later on click of label name.

As I don't have working experience in Java so I didn't go through in depth of code and was thinking there are some html files only to retrieve the data (just like the example of data mining is there in carrot2 plugin).


(Dawid Weiss) #13

The part you're interested in is in JavaScript. It won't be easy to
pull out and it's not a particularly nice piece of code (we plan to
change it to something nicer, but there's never enough time).

On Tue, Mar 11, 2014 at 1:24 PM, prashy prashant.agrawal@paladion.net wrote:

Yes I checked this link as well till the web app. Which is having the code
altogether as java code.

What I thought is if we are doing any search just through json request in
html pages so it is storing some info as cookies in client side and through
HTML pages only the results are being retrieved later on click of label
name.

As I don't have working experience in Java so I didn't go through in depth
of code and was thinking there are some html files only to retrieve the data
(just like the example of data mining is there in carrot2 plugin).

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Cluster-query-to-fetch-particular-label-records-in-carrot2-tp4051485p4051532.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1394540645462-4051532.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt_pjn0eyaMAZnaincrZZfMUKDwX%3DnFgJgCZK-WsHp_wCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Prashant Agrawal) #14

Yes that's what I was looking for throughout the source code and not able to pullout.

What I was interested to know through the source code is "How data is handled on web after firing the first query(which returns the set of cluster labels and on user event we can navigate through various labels)" that I was unable to locate properly in source code.


(Prashant Agrawal) #15

It could be a silly question but just wanted to know that can we integrate the carrot2 web application with our Elasticsearch instance?


(Dawid Weiss) #16

Not without some code-writing (in Java). You'd have to add an
IDocumentSource implementation that would access your ES for documents
matching user's query. This isn't that difficult, but requires some
changes to the source code and recompiling Carrot2. Look at the Solr
document source, for example; ES would be quite similar:

Dawid

On Tue, Mar 11, 2014 at 2:31 PM, prashy prashant.agrawal@paladion.net wrote:

It could be a silly question but just wanted to know that can we integrate
the carrot2 web application with our Elasticsearch instance?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Cluster-query-to-fetch-particular-label-records-in-carrot2-tp4051485p4051544.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1394544666935-4051544.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt9m3em9aHbr%2BR-csyS%3DxTiNs4ns0vG6uJVhTbES1L9fxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Prashant Agrawal) #17

While I click on the label name on left hand side of search.carrot2.org then how exactly the query is being processed to ES.

i.e.

  1. Is there any cookie or structure is maintained for docIDs on web GUI, wrt each labels. So that on click event we can fetch the records wrt specified ID from ES.

  2. Are we sending another query to ES on click of label to retrieve the data wrt that label if yes then what type of query it is (is it normal search query by IDs or some other query)

Note: Search on search.carrot2.org is not working properly its giving the exception as org.carrot2.source.etools.IpBannedException: org.apache.http.client.HttpResponseException: Forbidden


(Dawid Weiss) #18

While I click on the label name on left hand side of search.carrot2.org
http://search.carrot2.org then how exactly the query is being processed
to ES.

This search is not driven by ES, so your question has no answer.

  1. Is there any cookie or structure is maintained for docIDs on web GUI, wrt
    each labels. So that on click event we can fetch the records wrt specified
    ID from ES.

All documents and clusters are stored in the user interface. Clusters
map to document IDs and the filtering/ redisplay is done via JS,
without additional queries.

Note: Search on search.carrot2.org http://search.carrot2.org is not
working properly its giving the exception as
org.carrot2.source.etools.IpBannedException:
org.apache.http.client.HttpResponseException: Forbidden

It is working properly. You've exceeded the request limit for your IP
address, it's a simple spam-blocker.

Dawid

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAM21Rt9tt5b3QhbPNzVTtCCVK6BinYp2bTZ782cd%3DiEDsi28Zw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #19