A scan type would be awesome. I can raise a ticket for that. This
would also remove the need for working around the need to determine
the size of the results (admitidly not a big issue unless under update
load where the counts will drift between the count call and then a
Are there other strategies people are using to verify their index
state is consistent with some external 'truth'?
On Wednesday, 5 January 2011, Shay Banon email@example.com wrote:
The versioning is not stored as a field, but you get it back when you do a "get" or when searching (per hit). Regarding getting back all the hits for a query (can be match_all), you can use search, with scrolling. The problem is that with large result sets, it can get very expensive, even with scrolling. I am thinking of adding a search_type called "scan" that will support something like that, without the requirement to do any type of sorting (score or other fields).
On Tue, Jan 4, 2011 at 4:43 AM, Paul Smith firstname.lastname@example.org wrote:
This is great Shay, an optimistic locking pattern usage. I like it.
This reminded me of an old topic I had raised a while back: http://elasticsearch-users.115913.n3.nabble.com/Index-Verification-td1430219.html . We're beginning to use ES for some production support tools at the moment, but the real goal is to surgically remove the cancer that is our own home grown index infrastructure (going to take a while).
Relating to Versioning stuff, but is that field an automatically stored field? In an ideal case I'd love an API to hit to return all _id and _version values for all items in a given index, I can drop the sort need from the original post, as I can probably do that outside ES. I'm after the fastest stream of these 2 fields from an index to be compared with another stream coming from the originating data source (our db) to validate the info in ES.
I notice that this Versioning should address Issue # 490? (491 I agree should have the _version information provided by the client, allow that to worry about what clock to use, in our case, our DB is the 'primary source of truth' and it generates a trigger-based lastupdated timestamp for the row).
love your work (and the amount of it.. )
On 4 January 2011 13:07, Shay Banon email@example.com wrote:
The versioning feature allows to handle conflicts when doing a get/search and then index/delete. Each document has a version number, and each index and delete (tombstone) increase the version number.
A conflict is detected when providing a version parameter to the index/delete request and there is a mismatch between the version provided and the real time version of the current document. If a version number is not provided, then the operation is forced without doing any version check.
Search and get operations return the version number associated with the hit under
_version. Note, the version number reflects the near real time status of the search. In other words, the version number will point to the doc when the index was last refreshed.
Nice side affects of this feature include returning on delete if the document was actually found or not, as well as fixing a rare out of order replication sync between a primary and its replica (under heavy changes of the same doc within a short period of time).
The bulk operation supports providing a version number as well using
_version on each bulk item.
The versioning support is backward compatible, though only new operations on documents will cause the versioning system to start and kick in for that doc.