Generating globally unique mostly-sequential keys for DB rows across multiple processes

Note: This agorithm was co-authored by Phillipa Berresford.

I wanted my domain classes to be able to reference each other by Id without having to add association properties - as I felt having all of this navigation in the model made it messy. I'd much rather have association properties within my aggregates (PurchaseOrder to PurchaseOrderLines) but not elsewhere.

This was okay when I needed to add a reference to an object that already existed in the DB, but when I was adding a reference to a new object I needed to know the object's Id before Entity Framework Core saved it to the DB.

As my tables were using DB assigned incrementing INT columns this was not possible, so it become obvious I was going to have to switch to GUID Ids. The problem with using GUIDs as clustered primary keys is that GUIDs are designed to be random. Whereas an auto-incrementing INT key will always result in new rows being appended to the end of a table, GUID keys would result in new rows being inserted at random locations in the table (which is slower).

As we have a CreatedUtc column on each of our tables we could have made those our clustered index to ensure new rows are appended at least near the end of the table (if not at the very end), however, as these values will be used in foreign key indexes it means we have the same problem for inserting GUIDs into an index instead of always appending.

What I needed was some kind of sequential GUID. Microsoft has provided the SequentialGuidValueGenerator class for this kind of thing.

The Microsoft class creates a GUID, keeps the 8 most random bytes, and then assigns a sequential number to the remaining 8 bytes (these are the bytes that SQL Server uses for sorting GUIDs). Using this approach we are able to ensure we always append to the end of the table, and the 8 random bytes ensure we don't get key clashes.

The SequentialGuidValueGenerator class starts its first sequential number on UtcNow, and then simply increments the value each time we need a new key. This is fine in a single process scenario, but when we are running our app on Azure with multiple processes running it means that the sequential value of some of the processes can be significantly lagged behind the others.

For example, if we go through a period of time where we are receiving a single job to perform at a time then we might have a single process used more often than the others. The other Azure instances either sit idle or get shut down. Then, when things get busy and we need multiple processed then either

The running (but idle) processes find themselves in demand, and continue to generate sequential numbers based on the last time they were busy.
Azure starts up new processes to handle the extra load, their sequential numbers are based on the current time whereas the already running processes' sequential numbers are based on some point in the past.

Either way, if all of our processes aren't started at approximately the same time or they don't get allocated jobs on a purely round-robin basis (each process gets exactly the same number of jobs) then we end up having multiple processes inserting into different parts of the table because their sequential numbers are based on different historical times.

So, as I am developing an Azure system, I needed an alternative.

Requirements

I want GUID length unique identifiers at runtime *before* I update the DB so I can add references to other rows that have not yet been inserted.
I need the value to be unique not only within the current process, but across processes.
The last 8 bytes need to be sequential so that new rows on SQL Server tables with clustered primary indexes are always added to (or near to) then end of the table*
The last 8 bytes must be fairly close in range across processes so we don't end up with one processes appending new rows to the end of the table and another inserting them in the middle.

*SQL Server sorts GUIDs on the last 6 or 8 bytes.

My proposed format is

8 crypto secure random bytes to identify the current process
then
8 bytes sequential time that must always be unique within the current process

Resulting in a unique Id that is not only sequential, but where across processes will be based on approximately the same time and therefore mostly be appened to the end of tables, and also to the end of indexes.

According to benchmarks, this approach is approximately 12% faster than the code in the SequentialGuidValueGenerator class.


// Gets the current DateTime.UtcNow.Ticks and adds
// an offset so we never get the same value twice
// If the resulting ticks are before UtcNow.Ticks
// then we jump the values forward so we don't return a
// value that would cause inserts to the middle of the table.
// In a single process this would ensure
// the value is always added to the end of the table.
public static class LongGenerator
{
  private static long LastValueUsed = DateTime.UtcNow.Ticks;

  public static long Next()
  {
    long result;
    long ticksNow = DateTime.UtcNow.Ticks;
    do
    {
      result = Interlocked.Increment(ref LastValueUsed);
      if (result >= ticksNow)
        return result;
    } while (Interlocked.CompareExchange(ref LastValueUsed, ticksNow, result) != result);
    return result;
  }
}


// Creates a one-off 8 byte crypto secure random
// number as the process Id when the app starts
// and then appends the time based sequence number
// as the next 8 bytes to give us a sequential
// value this is statistically globally unique
public static class SequentialGuidGenerator
{
  private static byte[] ProcessId = new byte[8];

  static SequentialGuidGenerator()
  {
    using (var rng = new RNGCryptoServiceProvider())
    {
      rng.GetBytes(ProcessId);
    }
  }

  public static Guid Next()
  {
    byte[] result = new byte[16];
    byte[] sequentialTimeStamp =
   BitConverter.GetBytes(LongGenerator.Next());
    if (!BitConverter.IsLittleEndian)
      Array.Reverse(sequentialTimeStamp);

    result[0] = ProcessId[0];
    result[1] = ProcessId[1];
    result[2] = ProcessId[2];
    result[3] = ProcessId[3];
    result[4] = ProcessId[4];
    result[5] = ProcessId[5];
    result[6] = ProcessId[6];
    result[7] = ProcessId[7];
    result[8] = sequentialTimeStamp[1];
    result[9] = sequentialTimeStamp[0];
    result[10] = sequentialTimeStamp[7];
    result[11] = sequentialTimeStamp[6];
    result[12] = sequentialTimeStamp[5];
    result[13] = sequentialTimeStamp[4];
    result[14] = sequentialTimeStamp[3];
    result[15] = sequentialTimeStamp[2];
    return new Guid(result);
  }
}

Search This Blog

A day in the life of...

Generating globally unique mostly-sequential keys for DB rows across multiple processes

Comments

Popular posts from this blog

ECO IV

Convert absolute path to relative path

Printing bitmaps using CPCL