Testing 4-validator BFT consensus in 200 lines of code — zero network sockets — Devlog

Phase 3.3 was the meat of the BFT consensus loop. Block proposal, prevote gossip, precommit gossip, 2f+1 quorum, real cross-node voting via HTTP. The kind of thing that, in any other ecosystem, you'd test by writing a docker-compose, spinning up four separate node processes, opening four RPC sockets, and running for thirty minutes hoping nothing flakes. Then you'd have a flaky integration test that fails on CI every third Tuesday and nobody knows why.

I tested it in 200 lines of vitest. No sockets. No processes. No HTTP. No mocks. The test runs in 384 milliseconds and exercises the entire BFT protocol surface.

I want to talk about how this is possible because I think it's the most underrated property of how I'm building this whole chain.

the architecture trick (which is not really a trick)

OK here's the setup. The consensus engine is a pure JavaScript class. It has no I/O. It doesn't know what HTTP is. It doesn't know what TCP is. It's never imported the node:net module. When a vote arrives at the engine — whether it's the producer's own self-signed vote or one that came over HTTP from a peer — the engine receives the same thing: a SignedVote JavaScript object. Same shape, same method, same code path.

This means the network layer is COMPLETELY external to the protocol. The HTTP server in packages/node/src/rpc/server.ts does this:

if (method === 'POST' && path === '/consensus/vote') {
  const body = await readJsonBody(req);
  const vote = SignedVoteSchema.deserialize(parseHex(body.rawVote));
  return this.json(res, 200, this.options.node.ingestConsensusVote(vote));
}

That's it. Receive HTTP request, decode SSZ bytes, hand to ingestConsensusVote. The transport is a 5-line wrapper around the protocol. The protocol does the work.

So if I want to simulate four validators voting on the same block via "the network", I don't need a network. I need a function that, when one engine ingests a vote, also calls ingestVote on the other three engines with the same vote. That's it. That's the entire gossip simulation:

const gossip = (vote) => {
  for (const node of nodes) {
    node.engine.ingestVote(vote);
  }
};

Eight lines including braces. Perfect simulation of an unreliable broadcast network — except 100% reliable, deterministic, and synchronous, which is exactly what you want in a unit test.

the test

Here's the actual multi-engine.test.ts, slightly trimmed for the article. This is real code, not pseudo:

it('all 4 engines reach commit on the same block hash via gossip fan-out', () => {
  const nodes = [makeNode(), makeNode(), makeNode(), makeNode()];
  const validators = nodes.map(n => n.view);

  for (const node of nodes) {
    node.engine.startHeight({ height: 1n, validators });
    node.engine.setPhase('prevote');
    expect(node.engine.threshold()).toBe(3);  // 2f+1 for n=4
  }

  const gossip = (vote) => {
    for (const node of nodes) node.engine.ingestVote(vote);
  };

  // Each node signs its own prevote and gossips it
  for (const node of nodes) {
    gossip(buildAndSignVote({
      kind: 'prevote', chainId: CHAIN_ID, height: 1n, round: 0n,
      blockHash: BLOCK_HASH, keypair: node.keypair,
    }));
  }

  // After fan-out, every engine has 4 prevotes and is in 'precommit' phase
  for (const node of nodes) {
    expect(node.engine.getPhase()).toBe('precommit');
  }

  // Same dance for precommits
  for (const node of nodes) {
    gossip(buildAndSignVote({
      kind: 'precommit', chainId: CHAIN_ID, height: 1n, round: 0n,
      blockHash: BLOCK_HASH, keypair: node.keypair,
    }));
  }

  // Every engine is committed on the same hash
  for (const node of nodes) {
    expect(node.engine.committedHash()).toBe(expectedHashHex);
    expect(node.engine.commitResultFor(BLOCK_HASH).committed).toBe(true);
  }
});

That's roughly 30 lines of meaningful test logic. It exercises:

4 separate Dilithium3 keypairs (real post-quantum signatures, not stubs)
4 independent engine state machines
Sorted-validator-set proposer selection
2f+1 quorum math (3 of 4)
Phase auto-advancement (prevote → precommit on quorum)
Commit detection
Vote signature verification (every ingest re-verifies)
Vote-by-blockHash bucketing
Cross-engine consistency (all four reach the same commit)

That is the entire BFT protocol surface, in one test, deterministic, repeatable, sub-second.

the byzantine test, which is the one that made me grin

Here's the test I'm proudest of, right after this one:

it('survives one byzantine validator (n=4, f=1)', () => {
  const nodes = [makeNode(), makeNode(), makeNode(), makeNode()];
  // ... setup engines, set phase ...

  const gossip = (vote) => {
    for (const node of nodes) node.engine.ingestVote(vote);
  };

  // Only the first 3 nodes participate honestly. Node 4 is silent.
  for (let i = 0; i < 3; i++) {
    gossip(buildAndSignVote({ kind: 'prevote',   /* ... */, keypair: nodes[i]!.keypair }));
  }
  for (let i = 0; i < 3; i++) {
    gossip(buildAndSignVote({ kind: 'precommit', /* ... */, keypair: nodes[i]!.keypair }));
  }

  // ALL FOUR nodes — including the silent one — should reach commit
  // because the silent node still received the gossiped votes from the
  // other three.
  for (const node of nodes) {
    expect(node.engine.commitResultFor(BLOCK_HASH).committed).toBe(true);
  }
});

Read that last assertion again. The byzantine (silent) node is in the for-loop. It still reaches committed = true. Because in a real network, even a non-participating validator passively observes the votes flying around and updates its own engine state. Once it sees 2f+1 precommits from the honest validators, it knows the block is finalized — regardless of whether it personally participated.

This is THE property that makes BFT work, and seeing it pop out of the test in one assertion was the moment I felt like the protocol was real. You don't need every validator to vote. You need enough HONEST validators to vote. The byzantine ones can lie, stay silent, equivocate — and the honest ones still reach consensus.

There's a third negative test for the boundary: "doesn't commit when only 2 of 4 vote". 2 < 3, no quorum, no commit. Asserts committedHash() === null for all four engines. Because tests for "this thing should NOT happen" matter just as much as tests for "this thing SHOULD happen", and most test suites I've seen are missing the negative half.

the vitest output (the porn shot)

Here's the whole consensus suite running. I'm including every line because, look, I know my audience, you crypto nerds love a clean test feed:

$ pnpm --filter @asentum/node test:consensus

 RUN  v1.6.1 /Users/jlmbp/Documents/GitHub/AsentumChain/packages/node

 ✓ test/consensus/round.test.ts            (13 tests) 3ms
   ✓ quorumThreshold (2f+1 math)
     ✓ returns 0 for an empty set
     ✓ returns 1 for a single validator (federation mode)
     ✓ returns 2 for n=2
     ✓ returns 3 for n=3 and n=4 (tolerates f=1 at n=4)
     ✓ returns 5 for n=7 (tolerates f=2)
     ✓ returns 7 for n=10 (tolerates f=3)
     ✓ returns 67 for n=100 (tolerates f=33)
   ✓ proposerForRound
     ✓ returns null for an empty set
     ✓ deterministically rotates proposer across heights
     ✓ rotates by round at the same height
     ✓ is deterministic regardless of input order
   ✓ activeValidators filter
     ✓ drops zero-stake entries and keeps positive-stake ones
   ✓ totalBondedStake
     ✓ sums bigint-parseable stakes and ignores malformed ones

 ✓ test/consensus/sign.test.ts             (6 tests)  173ms
   ✓ voteDigest is deterministic for the same body
   ✓ voteDigest is different for prevote vs precommit
   ✓ buildAndSignVote produces a signed vote that verifies
   ✓ rejects a vote with a tampered blockHash
   ✓ rejects a vote whose body.validator doesn't match the public key
   ✓ SignedVote round-trips through SSZ encode/decode

 ✓ test/consensus/engine.test.ts           (10 tests) 238ms
   ✓ self-certifies its own prevote and precommit at threshold=1
   ✓ requires 3 prevotes and 3 precommits before quorum (n=4)
   ✓ doesn't commit if votes are split across two block hashes
   ✓ rejects votes at a different height
   ✓ rejects votes at a different round
   ✓ rejects votes with a different chainId
   ✓ rejects votes from a validator not in the active set
   ✓ deduplicates repeated votes from the same validator
   ✓ snapshot reports current height/round/phase/counts
   ✓ advanceRound bumps round and clears vote buckets

 ✓ test/consensus/wait-for-quorum.test.ts  (5 tests)  342ms
   ✓ resolves immediately if quorum is already met
   ✓ blocks until votes arrive then resolves on threshold-crossing vote
   ✓ rejects with a timeout error if quorum is never reached
   ✓ only resolves on votes for the requested blockHash
   ✓ startHeight clears stale waiters from a prior height

 ✓ test/consensus/multi-engine.test.ts     (3 tests)  384ms
   ✓ all 4 engines reach commit on the same block hash via gossip fan-out
   ✓ survives one byzantine validator (n=4, f=1)
   ✓ does NOT commit when only 2 of 4 vote (below 2f+1)

 ✓ test/consensus/validator-key.test.ts    (5 tests)  85ms
   ✓ returns null when missing and generateIfMissing=false
   ✓ generates and persists a fresh key when missing
   ✓ reloading the same path produces the same keypair
   ✓ throws if the file exists with the wrong size
   ✓ wraps a keypair and derives the address

 ✓ test/consensus/node-integration.test.ts (4 tests)  172ms
   ✓ consensus engine is primed at height 1 after genesis
   ✓ produces block 1 through the voting cycle and re-homes engine
   ✓ emits the node-produced votes via the vote listener
   ✓ produces a sequence of blocks without losing consensus coherence

 Test Files  7 passed (7)
      Tests  46 passed (46)
   Start at  23:39:01
   Duration  617ms

46 tests. 617 milliseconds. Every BFT property I care about, asserted, in under a second. This is what I want every consensus implementation to look like.

why this is possible (the actual lesson)

The trick — and again, I cannot stress enough that this is just good systems design, not me being clever — is that the consensus engine has NO BUSINESS knowing about networking. Networking is a transport detail. The protocol is: "validators sign votes, the engine aggregates them, when 2f+1 are in agreement we commit." Whether those votes arrived via HTTP, libp2p, gRPC, raven, or a for loop in a vitest file is COMPLETELY IRRELEVANT to the protocol.

So I get to test the protocol. The transport gets its own (much smaller) tests — basically "the HTTP endpoint correctly deserializes SSZ bytes and calls ingestConsensusVote." That's a 5-line test against a mock node. Done.

The consequence of this layering is profound: the same engine code that runs in the test runs in production. There's no test harness, no mock layer, no fake network, no simulated time. The tests use the same ConsensusEngine class the producer uses to commit real blocks. If the test passes, you have very strong evidence that the production path works too. The only thing that's different is what calls ingestVote — a for loop in the test, an HTTP server in production.

the validator key sidebar

One quick aside because I think it's neat. Each test node gets its own real validator key:

function makeNode() {
  const keypair = generateKeypair();  // real Dilithium3, 1952-byte pubkey
  const view = {
    address: bytesToHexLower(addressBytesFromPublicKey(keypair.publicKey)),
    stake: '1000',
  };
  return { keypair, view, engine: new ConsensusEngine(CHAIN_ID) };
}

Real post-quantum signatures, derived addresses, real ml_dsa65 operations. No mocking. The 4-validator commit test exercises the full Dilithium3 sign/verify path eight times per round (4 prevotes + 4 precommits, each verified by every engine on ingestion = 32 verifications). And it still finishes in 384ms, including vitest startup overhead.

That's the @noble/post-quantum library doing its job, and that's a big part of why I picked Dilithium3 over the other lattice schemes. Small enough to fit in a vote message without nightmares, fast enough that I don't think about performance in tests, and the JavaScript implementation is genuinely good.

Persisting validator keys to disk is also nice and tight: we store a 32-byte seed, not the 4032-byte expanded secret, because the keygen function is deterministic from the seed and you lose nothing by regenerating on load. Five lines of loadOrCreateValidatorKey and you're done. The test for that is also dumb-fast:

 ✓ test/consensus/validator-key.test.ts (5 tests)  85ms
   ✓ generates and persists a fresh key when missing
   ✓ reloading the same path produces the same keypair (deterministic from seed)
   ✓ throws if the file exists with the wrong size

what's next

Phase 3.4 is slashing. Detecting double-signing — a validator who signs two different blockHash values at the same (height, round) — and tombstoning them. Marking them as ejected, possibly burning their bond. The detection turns out to be embarrassingly easy because the engine already tracks Map<blockHash, Set<validator>> for both prevotes and precommits. Finding a double-signer is just: walk the prevote map at this height/round, collect any validator address that appears in two different blockHash buckets. If the set is non-empty, you have a slashable offense.

I'm going to write the slashing rules as a contract, of course. Because (a) the staking module is already a contract so it's the natural place to add slash(validatorAddr, evidence), and (b) the whole point of the "small extensible core, everything is a contract" thesis is to keep the protocol thin. The slashing logic is policy. Policy lives in contracts.

Then Phase 3.5: spin up four ACTUAL nodes — real processes, real HTTP, real Dilithium3 — and watch them survive me killing one of them mid-round. That's the moment we know it works for real, not just in 384ms of vitest. I'm looking forward to it.

— milkie

Testing 4-validator BFT consensus in 200 lines of code — zero network sockets