As the cluster labels will be known after performing one search query.
So if my search query is:
{
"search_request": {
"Content"
],
"query": {
"match": {
"_all": "mobile"
}
},
"size": 1000,
"from":0
},
"query_hint": "mobile",
"field_mapping": {
"content": [
"fields.Content"
]
},
"algorithm": "lingo3g"
}
I will get the results as mentioned in previous post.
So what query should be executed if I knows that there is one cluster for "mobile" search as "Nokia".
Add the cluster's label as a boolean must-occur phrase to your search query.
According to this you mean to say that I should execute like:
{
"search_request": {
"fields": [
"Host",
"URL",
"Content"
],
"query": {
"bool": {
"must": {
"term": {
"Content": "Nokia"
}
}
}
},
"size": 1000,
"from": 0
},
"query_hint": "Nokia",
"field_mapping": {
"content": [
"fields.Content"
]
},
"algorithm": "lingo3g"
}
It would be probably better to add a filtering expression on top of
your original query instead of replacing it entirely (so that a subset
of original documents are matching).
Add a "filter" clause to your original "search_request".
As the cluster labels will be known after performing one search query.
So if my search query is:
{
"search_request": {
"Content"
],
"query": {
"match": {
"_all": "mobile"
}
},
"size": 1000,
"from":0
},
"query_hint": "mobile",
"field_mapping": {
"content": [
"fields.Content"
]
},
"algorithm": "lingo3g"
}
I will get the results as mentioned in previous post.
So what query should be executed if I knows that there is one cluster for
"mobile" search as "Nokia".
Add the cluster's label as a boolean must-occur phrase to your search
query.
According to this you mean to say that I should execute like:
{
"search_request": {
"fields": [
"Host",
"URL",
"Content"
],
"query": {
"bool": {
"must": {
"term": {
"Content": "Nokia"
}
}
}
},
"size": 1000,
"from": 0
},
"query_hint": "Nokia",
"field_mapping": {
"content": [
"fields.Content"
]
},
"algorithm": "lingo3g"
}
By adding the filter clause it will filter the data and return the Cluster labels. But its not necessary that if I search with the cluster labels(as filter) so I will be getting the same records(same set of record ID in one particular cluster) as I will get in case of former query.
If I search "mobile" in this page it will return the records in main body and labels on the left panel.
So on clicking the label on left panel it will display the record wrt that particular label in main body.
At the end of my execution I want result like this only so any idea how I can proceed with that.
Like on firing the query I will get the cluster labels now I want to display the data wrt each particular labels on user event.
Attached snapshots for the same.<nabble_img src="cluster.bmp" border="0"/>
Every clustering request may hit a different collection of documents
so splitting it into
multiple requests may not be a good idea.
The solution I offered will sort of work -- you just need to issue a
regular search query (filtered with the cluster's label); the
documents returned from Elastic search will revolve around the
cluster's phrase and the original phrase, so it should be all right.
The code for the web site you provided a snapshot of is open source
and is part of the Carrot2 project. The way the user interface works
there is one query fetches both documents and clusters, the filtering
is then done on client side (showing only a subset of documents for
the original query for that particular cluster). You could do the same
with the clustered results returned from ES -- the clusters contain
document identifiers that link them back to the content of the
original search query.
By adding the filter cause it will filter the data and return the Cluster
labels. But its not necessary that if I search with the cluster labels(as
filter) so I will be getting the same records(same set of record ID in one
particular cluster) as I will get in case of former query.
If I search "mobile" in this page it will return the records in main body
and labels on the left panel.
So on clicking the label on left panel it will display the record wrt that
particular label in main body.
At the end of my execution I want result like this only so any idea how I
can proceed with that.
Like on firing the query I will get the cluster labels now I want to display
the data wrt each particular labels on user event.
I was also looking for one query to be fired and returned the result as well as cluster but I got stuck when it came to display the record of particular label.
The code for the web site you provided a snapshot of is open source and is part of the Carrot2 project.
Can I get the UI code for this so that I can co-relate my requirements wrt this particular behaviour.
the clusters contain document identifiers that link them back to the content of the original search query.
How I can achieve this i.e. retrieving the identifiers for particular label and displaying it back to UI.
I'm sorry but you need to take some initiative too. Questions like this one:
Can I get the UI code for this so that I can co-relate my requirements wrt this particular behaviour.
really discourage me from trying to help you. I've pointed you at the
project, it's really 5 minutes work to check it out from github and
look at where the web application is.
I hope you won't take it personally, but I think you should read this
document. Smart questions yield better answers.
Sorry If I have asked a question which is not relevant to forum.
I explored the Project repository already and what I got for Web App is war file at http://project.carrot2.org/download.html
But I was just thinking if there is any source code(html files) instead of war files to check for some html page rendering and all.
Yes I checked this link as well till the web app. Which is having the code altogether as java code.
What I thought is if we are doing any search just through json request in html pages so it is storing some info as cookies in client side and through HTML pages only the results are being retrieved later on click of label name.
As I don't have working experience in Java so I didn't go through in depth of code and was thinking there are some html files only to retrieve the data (just like the example of data mining is there in carrot2 plugin).
The part you're interested in is in JavaScript. It won't be easy to
pull out and it's not a particularly nice piece of code (we plan to
change it to something nicer, but there's never enough time).
Yes I checked this link as well till the web app. Which is having the code
altogether as java code.
What I thought is if we are doing any search just through json request in
html pages so it is storing some info as cookies in client side and through
HTML pages only the results are being retrieved later on click of label
name.
As I don't have working experience in Java so I didn't go through in depth
of code and was thinking there are some html files only to retrieve the data
(just like the example of data mining is there in carrot2 plugin).
Yes that's what I was looking for throughout the source code and not able to pullout.
What I was interested to know through the source code is "How data is handled on web after firing the first query(which returns the set of cluster labels and on user event we can navigate through various labels)" that I was unable to locate properly in source code.
Not without some code-writing (in Java). You'd have to add an
IDocumentSource implementation that would access your ES for documents
matching user's query. This isn't that difficult, but requires some
changes to the source code and recompiling Carrot2. Look at the Solr
document source, for example; ES would be quite similar:
While I click on the label name on left hand side of search.carrot2.org then how exactly the query is being processed to ES.
i.e.
Is there any cookie or structure is maintained for docIDs on web GUI, wrt each labels. So that on click event we can fetch the records wrt specified ID from ES.
Are we sending another query to ES on click of label to retrieve the data wrt that label if yes then what type of query it is (is it normal search query by IDs or some other query)
Note: Search on search.carrot2.org is not working properly its giving the exception as org.carrot2.source.etools.IpBannedException: org.apache.http.client.HttpResponseException: Forbidden
This search is not driven by ES, so your question has no answer.
Is there any cookie or structure is maintained for docIDs on web GUI, wrt
each labels. So that on click event we can fetch the records wrt specified
ID from ES.
All documents and clusters are stored in the user interface. Clusters
map to document IDs and the filtering/ redisplay is done via JS,
without additional queries.
Note: Search on search.carrot2.orghttp://search.carrot2.org is not
working properly its giving the exception as org.carrot2.source.etools.IpBannedException:
org.apache.http.client.HttpResponseException: Forbidden
It is working properly. You've exceeded the request limit for your IP
address, it's a simple spam-blocker.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.