How to perform bulk unique inserts into an elasticsearch table using the .Net API?


(bdb) #1

We are currently researching elasticsearch as a replacement for our current
system that uses TVP'shttp://msdn.microsoft.com/en-us/library/bb675163(v=vs.110).aspx on
SQL Server 2008R2.

The system ensures that the records about to be inserted are unique in the
table (based on a composite key).

The two data fields to be checked are UserID and ContentTitle. Instead of
checking if both exist, a hash can be created and stored in a single field.
When a record is to be added, the hashed valued of the incoming records can
be checked against the existing table's hash values.

How would this be accomplished in elasticsearch using the .Net API?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

two possible solutions (among a couple of others)

  1. If you want to create a hash, use that hash as ID, when indexing your
    data. The tuple of id and type are unique in an index. So if you reindex a
    document with that hash, the data gets simple overwritten. Just make sure,
    you have a good hash function in order to not overwrite data. You could
    also configure the mapping of the index, to use the content of the hash
    field to be used as ID. See
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-id-field.html
  2. Instead of creating a hash, it might be sufficient to simply concatenate
    userId and contentTitle into a single ID (123_456 for example), and use
    this ID when indexing. This would save a couple of CPU cycles as you dont
    need to hash, but it might not be unique, depending on the format of these
    ids.

Sorry, cannot tell you anything about the .NET API, but this should give
you a first hint I hope.

--Alex

On Mon, Nov 4, 2013 at 3:56 PM, bdb baden0x1@gmail.com wrote:

We are currently researching elasticsearch as a replacement for our
current system that uses TVP'shttp://msdn.microsoft.com/en-us/library/bb675163(v=vs.110).aspx on
SQL Server 2008R2.

The system ensures that the records about to be inserted are unique in the
table (based on a composite key).

The two data fields to be checked are UserID and ContentTitle. Instead of
checking if both exist, a hash can be created and stored in a single field.
When a record is to be added, the hashed valued of the incoming records can
be checked against the existing table's hash values.

How would this be accomplished in elasticsearch using the .Net API?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3