This paper introduces the RDCSS (Restricted double compare and single swap) concurrent instruction using CAS (compare and swap).
This RDCSS instruction is then used to write MWCAS (Multi-word compare and swap).
RDCSS checks to make sure that two locations are set to the correct value provided (i.e. no other thread has mutated them) and then updates the second memory location to a value provided.
We could try something like this, but this isn’t thread-safe (between
the first line of the function and the second, something could’ve
updated *a2
, so this function is not linearizable.
uint64_t RDCSS(uint64_t *a1, uint64_t *a2, uint64_t o1, uint64_t o2, uint64_t n2) {
uint64_t a2_local = *a2;
if(*a1 == o1 && a2_local == o2) *a2 = n2;
return a2_local;
}
We can reduce this algorithm to 2PL (2 Phase Locking) like so:
uint64_t RDCSS(uint64_t *a1, uint64_t *a2, uint64_t o1, uint64_t o2, uint64_t n2) {
(a2);
lockuint64_t a2_local = *a2; // Access protected by lock on a2
(a1);
lockuint64_t a1_local = *a1; // Access protected by lock on a1
(a1);
unlockif (a1_local == o1 && a2_local == o2) *a2 = n2; // Protected by lock on a2
(a2);
unlockreturn a2_local;
}
Unfortunately, this requires locking a2
and
a1
, so it’s not correct either.
The paper sidesteps this in two ways: having a cooperative approach, and using descriptors.
If other threads notice that an address has a descriptor on it, and the thread did not place that descriptor in that memory address, they wait until the other thread is done with its work before continuing on.
This looks like this:
struct RDCSSDescriptor_t {
* a1; // control address
word_tconst word_t o1; // expected control value
* a2; // data address
word_tconst word_t o2; // expected data value
const word_t n2; // new data value
};
(RDCSSDescriptor_t* d) {
word_t RDCSSdo {
= CAS1(d->a2, d->o2, d);
r if (IsDescriptor(r)) Complete(r); // H1
} while (IsDescriptor(r));
if (r == d->o2) Complete(d);
return r;
}
(addr_t *addr) {
word_t RDCSSReaddo {
= __atomic_load(addr);
r if (IsDescriptor(r)) Complete(r);
} while (IsDescriptor(r));
return r;
}
void Complete(RDCSSDescriptor_t* d) {
= __atomic_load(d->a1);
v if (v == d->o1) CAS1(d->a2, d, d->n2);
else CAS1(d->a2, d, d->o2);
}
Now, we can implement multi-word compare and swap, which looks like 2PL (try to install descriptors into both memory addresses using RDCSS. If both operations succeed, we write the values to both, since we know this is race-free (since other threads will know to wait, since our descriptors are still there).
If either or both operations fail, the operation is marked as failed, and retried.
struct CASNDescriptor_t {
const int n; // Number of words we are updating
; // Status of overall operation
Status status[]; // Array of entries with `addr`, `old_val`, `new_val`
Entry entry}
bool CASN(CASNDescriptor_t* cd) {
if (__atomic_load(&(cd->status)) == UNDECIDED) {
// phase1: Install the descriptors
= SUCCEEDED;
status for (i = 0; (i < cd->n) && (status == SUCCEEDED); i++) {
:
retry_entry= cd->entry[i];
entry = RDCSS(new RDCSSDescriptor_t( // X1
val &(cd->status), UNDECIDED, entry.addr,
.old_val, cd));
entryif (IsCASNDescriptor(val)) {
if (val != cd) {
(val);
CASNgoto retry_entry;
} // else we installed descriptor successfully.
} else if (val != entry.old_val) {
= FAILED;
status }
}
(&(cd->status), UNDECIDED, status); // C4: Update status
CAS1}
// phase2: Roll forward/back the descriptors to values.
= (__atomic_load(&(cd->status)) == SUCCEEDED);
succeeded for (i = 0; i < cd->n; i++)
(cd->entry[i].addr, cd,
CAS1? (cd->entry[i].new_val) : (cd->entry[i].old_val));
succeeded return succeeded;
}
(addr_t *addr) {
word_t CASNReaddo {
= RDCSSRead(addr);
r if (IsCASNDescriptor(r)) CASN(r);
} while (IsCASNDescriptor(r));
return r;
}
Shockingly, according to the paper, the algorithm performs roughly as well as the lock version, even though this algorithm is lock-free.