Works best when
- Sets are large (10k to millions of elements).
- Bandwidth is constrained (cuckoo hashing compresses communication).
- Elements are exact-match identifiers (addresses, LEIs, wallet hashes).
Avoid when
- Sets are very small (the DH-based variant is simpler and sufficient).
- Aggregates over the intersection are required (use the circuit-based variant).
- Fuzzy or approximate matching is required.
I2I vs I2U — context differences
Institution to institution
I2IBetween institutions, the protocol runs bilaterally over an authenticated channel with set sizes in the hundreds of thousands or millions. Parties pre-negotiate OPRF parameters and cuckoo-hash sizing, and typically wrap the protocol in legal agreements covering fair play.
Institution to end user
I2UFor end users, the user runs their side on commodity hardware. Minimum-set-size policies prevent the institution from turning the protocol into a yes/no oracle for a single user-submitted identifier. The user should only submit sets they can stand behind committing to in bulk.
Post-quantum exposure
Risk · medium- Vector
- Base OT in standard OT-extension pipelines uses elliptic-curve primitives broken by a CRQC. OPRF evaluation itself is pseudorandom and resists quantum analysis once base OT is secure.
- Mitigation
- Swap base OT for a lattice-based or code-based post-quantum OT construction; the OT-extension layer and cuckoo hashing carry over unchanged. See Post-Quantum Threats.
Components
- OPRF construction (RFC 9497 or batched-OPRF) and an OT extension library (IKNP, Silent OT, or similar).
- Cuckoo hash tables with an agreed number of hash functions and stash slots, used by both parties to pack their inputs.
- Authenticated bilateral channel for OT base setup, OPRF evaluation, and tag exchange.
- Local matcher on each side that compares received OPRF tags against locally-computed tags to surface the intersection.
Protocol
- counterparty Parties A and B agree OPRF parameters, cuckoo-hash functions, and tag length.
- counterparty Each party inserts its elements into a cuckoo hash table, producing fixed-size tables.
- counterparty A holds OPRF key k_A; B sends its table entries through an OT-based OPRF protocol and receives PRF(k_A, y) for each y without A learning y.
- counterparty Roles reverse: B holds OPRF key k_B; A sends its table entries and receives PRF(k_B, x) for each x without B learning x.
- counterparty A locally computes PRF(k_A, x) for its own elements and sends these tags to B. B does the same with PRF(k_B, y) and sends its tags to A.
- counterparty Each party compares the tags it computed via the OPRF step against the tags the other party sent, and maps equal tags back to its own plaintext elements.
Guarantees & threat model
Guarantees:
- Input privacy: non-intersecting elements are computationally hidden under OT hardness and PRF security.
- Bilateral: runs directly between two parties over an authenticated channel.
- Completeness: all shared elements are found, assuming cuckoo insertion succeeds (negligible failure probability with stash slots).
- Soundness: zero false positives under a collision-resistant PRF.
- Scalability: O(n + m) communication with small constants; amortises well as sets grow.
Threat model:
- OT and PRF security; in particular, the base OT must remain secure against the network adversary.
- Honest cuckoo construction by both parties. A malicious party can skew table layout to probe specific identifiers; commit-and-prove variants mitigate.
- Authenticated transport between the parties.
- Set-size hygiene: singleton or near-singleton submissions turn the protocol into a yes/no oracle on the submitted element.
Trade-offs
- More moving parts than the DH-based variant: an OT extension library and cuckoo-hashing machinery are required.
- Larger per-element constant factor than DH-based due to OT base setup, but linear communication amortises for large sets.
- Cuckoo hashing has a small insertion-failure probability. Stash slots or larger tables mitigate at minor cost.
- Equality-only: aggregates over the intersection require the circuit-based variant.
Example
Two custodians hold flagged-address lists under different jurisdictions. Custodian A holds 500k addresses and Custodian B holds 400k. Both run the bilateral OPRF intersection protocol. They discover 8,200 addresses appearing on both lists; the remaining 491,800 and 391,800 non-overlapping entries are not disclosed. Matched addresses feed into a joint investigation workflow.
See also
Open-source implementations
- github.com C++
Microsoft APSI: asymmetric PSI library supporting labeled and unlabeled modes with unbalanced set sizes
- github.com C++
OpenMined PSI: cardinality protocol based on ECDH and Bloom filters, with bindings for several languages
- github.com C++
Encrypto Group PSI: benchmarking suite for multiple PSI protocol variants including OT-based
- github.com C++
libOTe: OT extension library underpinning many OPRF-PSI implementations