First of all congrats for the 1.0 release!! Thumbs up for the aggregation
framework
I'm trying to build a system which is kind of querying for analytics. I
have a document called event, and I have events of specific type (e.g.
click open etc.) per page. So per page i might have for example an open
event. The thing is that I might as well take the open event more than
once, but I want to count it only once. So I use the versioning API and I
provide the same document id having as a result the version to increase.
In my queries I use the _timestamp field to determine the last document
that I counted. But my problem is that since ES reindex the document, it
updates _timestamp so it seems as recent document, and in my queries I
count it again.
Is there a way to simply discard the document if the document with the
same id exists, without stopping the bulk operation of uploading documents?
If we use the op_type=create in the index request, will probably discard
the duplicate document. But, in the case where we do a bulk operation will
it stop the bulk upload? or will generate the error and move on to the next
document?
thanks
On Saturday, 15 February 2014 16:53:20 UTC+2, Thomas wrote:
Hi,
First of all congrats for the 1.0 release!! Thumbs up for the aggregation
framework
I'm trying to build a system which is kind of querying for analytics. I
have a document called event, and I have events of specific type (e.g.
click open etc.) per page. So per page i might have for example an open
event. The thing is that I might as well take the open event more than
once, but I want to count it only once. So I use the versioning API and
I provide the same document id having as a result the version to increase.
In my queries I use the _timestamp field to determine the last document
that I counted. But my problem is that since ES reindex the document, it
updates _timestamp so it seems as recent document, and in my queries I
count it again.
Is there a way to simply discard the document if the document with the
same id exists, without stopping the bulk operation of uploading documents?
Just for any other people that might find this post useful, finally we
managed to get the expected functionality as described here
Thanks
Thomas
On Saturday, 15 February 2014 16:53:20 UTC+2, Thomas wrote:
Hi,
First of all congrats for the 1.0 release!! Thumbs up for the aggregation
framework
I'm trying to build a system which is kind of querying for analytics. I
have a document called event, and I have events of specific type (e.g.
click open etc.) per page. So per page i might have for example an open
event. The thing is that I might as well take the open event more than
once, but I want to count it only once. So I use the versioning API and
I provide the same document id having as a result the version to increase.
In my queries I use the _timestamp field to determine the last document
that I counted. But my problem is that since ES reindex the document, it
updates _timestamp so it seems as recent document, and in my queries I
count it again.
Is there a way to simply discard the document if the document with the
same id exists, without stopping the bulk operation of uploading documents?
Just to confirm - the bulk api will only report that specific doc as failed
and will continue to process all the rest.
Cheers,
Boaz
On Tuesday, February 18, 2014 9:41:01 AM UTC+1, Thomas wrote:
Just for any other people that might find this post useful, finally we
managed to get the expected functionality as described here
Thanks
Thomas
On Saturday, 15 February 2014 16:53:20 UTC+2, Thomas wrote:
Hi,
First of all congrats for the 1.0 release!! Thumbs up for the aggregation
framework
I'm trying to build a system which is kind of querying for analytics. I
have a document called event, and I have events of specific type (e.g.
click open etc.) per page. So per page i might have for example an open
event. The thing is that I might as well take the open event more than
once, but I want to count it only once. So I use the versioning API and
I provide the same document id having as a result the version to increase.
In my queries I use the _timestamp field to determine the last document
that I counted. But my problem is that since ES reindex the document, it
updates _timestamp so it seems as recent document, and in my queries I
count it again.
Is there a way to simply discard the document if the document with the
same id exists, without stopping the bulk operation of uploading documents?
Just to clarify, did you get the behavior you wanted by using the create op_type? In looking at the ES code, I would expect the default op_type (INDEX) to not create duplicate documents if the id, type and version are the same. Is this not true?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.