I am trying to update a document to capture page visit or hitcount which
has id containing http:// say Comments on: About
That is probably a bad idea. Partial updates don't exist at the level of
on disk storage and indexing. Elasticsearch's scripted updates are great
for preventing race conditions across and stuff but the entire old document
has to be tombstoned and a whole new one has to be indexed. So its
probably a bad idea to implement a counter that way.
escaping URL is good suggestion and also the idea of implementing a counter
in this fashion is not good. But for we are crawling pages from manifoldcf
and want to maintain counter from other source directly in ES.
Is there a any other way of recording some attributes which are specific to
user who hit the searched id into elasticsearch or that should be done
outside Elasticsearch.
If this not the best way to do then we are planning to create a new index
type where we will store counter as;
Although, maintaining hitcount in Elasticsearch may not do anything with
indexing but keeping it in Elasticsearch might require only one query to
make join to fetch that.
thanks for the useful comments. It would be great if someone can share some
relevant links along same lines.
regards
naveen
On Wednesday, 31 December 2014 20:39:48 UTC+5:30, Nikolas Everett wrote:
On Wed, Dec 31, 2014 at 8:37 AM, N Bijalwan <ahci...@gmail.com
<javascript:>> wrote:
I am trying to update a document to capture page visit or hitcount which
has id containing http:// say Comments on: About
That is probably a bad idea. Partial updates don't exist at the level of
on disk storage and indexing. Elasticsearch's scripted updates are great
for preventing race conditions across and stuff but the entire old document
has to be tombstoned and a whole new one has to be indexed. So its
probably a bad idea to implement a counter that way.
Keeping the hit counter in elasticsearch is fine so long as you don't
update it every time the page is accessed. If you push updates daily or
hourly or something you should be much better off. That just means you
can't use Elasticsearch as the accumulator. I mean, it'll be fine either
way if you have hundreds or a couple thousand hits a day. But you get the
idea - at some point that kind of high update rate will cause trouble.
escaping URL is good suggestion and also the idea of implementing a
counter in this fashion is not good. But for we are crawling pages from
manifoldcf and want to maintain counter from other source directly in ES.
Is there a any other way of recording some attributes which are specific
to user who hit the searched id into elasticsearch or that should be done
outside Elasticsearch.
If this not the best way to do then we are planning to create a new index
type where we will store counter as;
Although, maintaining hitcount in Elasticsearch may not do anything with
indexing but keeping it in Elasticsearch might require only one query to
make join to fetch that.
thanks for the useful comments. It would be great if someone can share
some relevant links along same lines.
regards
naveen
On Wednesday, 31 December 2014 20:39:48 UTC+5:30, Nikolas Everett wrote:
On Wed, Dec 31, 2014 at 8:37 AM, N Bijalwan ahci...@gmail.com wrote:
That is probably a bad idea. Partial updates don't exist at the level of
on disk storage and indexing. Elasticsearch's scripted updates are great
for preventing race conditions across and stuff but the entire old document
has to be tombstoned and a whole new one has to be indexed. So its
probably a bad idea to implement a counter that way.
thanks Nikolas. We will keepyour sugesstions in mind.
On Thursday, 1 January 2015 00:41:52 UTC+5:30, Nikolas Everett wrote:
Keeping the hit counter in elasticsearch is fine so long as you don't
update it every time the page is accessed. If you push updates daily or
hourly or something you should be much better off. That just means you
can't use Elasticsearch as the accumulator. I mean, it'll be fine either
way if you have hundreds or a couple thousand hits a day. But you get the
idea - at some point that kind of high update rate will cause trouble.
Nik
On Wed, Dec 31, 2014 at 12:32 PM, N Bijalwan <ahci...@gmail.com
<javascript:>> wrote:
thanks Jorg, Nikolas
escaping URL is good suggestion and also the idea of implementing a
counter in this fashion is not good. But for we are crawling pages from
manifoldcf and want to maintain counter from other source directly in ES.
Is there a any other way of recording some attributes which are specific
to user who hit the searched id into elasticsearch or that should be done
outside Elasticsearch.
If this not the best way to do then we are planning to create a new index
type where we will store counter as;
Although, maintaining hitcount in Elasticsearch may not do anything with
indexing but keeping it in Elasticsearch might require only one query to
make join to fetch that.
thanks for the useful comments. It would be great if someone can share
some relevant links along same lines.
regards
naveen
On Wednesday, 31 December 2014 20:39:48 UTC+5:30, Nikolas Everett wrote:
On Wed, Dec 31, 2014 at 8:37 AM, N Bijalwan ahci...@gmail.com wrote:
That is probably a bad idea. Partial updates don't exist at the level
of on disk storage and indexing. Elasticsearch's scripted updates are
great for preventing race conditions across and stuff but the entire old
document has to be tombstoned and a whole new one has to be indexed. So
its probably a bad idea to implement a counter that way.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.