APTOGON — Human Verification

AI agents using browser automation — Playwright, Puppeteer, Selenium, GPT-4V with computer-use — have gotten remarkably good at impersonating humans. They solve text CAPTCHAs, navigate complex UIs, and generate plausible form submissions. A naive "is this a bot?" check based on JavaScript environment variables or mouse movement alone will fail against a well-configured headless browser.

Gesture verification works on a different premise: don't try to detect a bot environment. Instead, verify that a human hand performed a specific physical movement at a specific time, and make that verification cryptographically unforgeable.

Layer 1: Canvas Fingerprinting

When a browser renders a WebGL canvas, the output varies subtly based on the GPU driver, operating system font renderer, and hardware. These variations are deterministic — the same hardware produces the same fingerprint — but differ measurably across hardware configurations.

Headless Chrome running in a cloud VM produces a different canvas fingerprint than Chrome on a physical MacBook or Android device. Even when bots spoof the User-Agent and JS environment, they can't change how the GPU driver rasterizes WebGL geometry. This gives us a hardware-correlated signal before the gesture even begins.

Layer 2: Neuromuscular Gesture Biometrics

A human performing a gesture leaves a characteristic signature in the raw touch/pointer data: velocity curves, acceleration peaks, micro-tremors, pressure variation (on touch devices), and the specific timing of segment transitions. This signature reflects the neuromuscular control of a human hand — not cognitive ability, but physical biomechanics.

AI agents fail here in two ways:

Synthetic mouse movement libraries (like ghost-cursor) generate Bézier-curve-based trajectories that look smooth and human-like to the eye — but their velocity distributions don't match real human jitter at a statistical level
Touch pressure and timing on mobile devices requires actual hardware contact; cloud-based agents operating through browser APIs cannot generate real pressure events — they either omit them entirely (detectable) or generate implausible uniform pressure (also detectable)

The gesture backend runs the raw pointer event stream through a classifier trained on millions of real human gestures. Synthetic inputs cluster differently in feature space — consistently enough that they can be flagged in real-time.

Layer 3: Device-Bound DID

The DID (Decentralized Identifier) is derived from hardware characteristics and stored in a platform keystore (TPM on Windows, Secure Enclave on iOS/macOS, StrongBox on Android). The private key never leaves the secure element.

This matters because: even if a bot successfully mimics the gesture biometrics, the verification request must be signed with the DID's private key. The private key is non-exportable hardware-bound material. A cloud-based bot doesn't have a hardware Secure Enclave — it has a software key store that, when inspected, lacks the hardware attestation certificates.

For non-mobile environments (desktop browsers without TEE access), the DID is derived from a stable combination of hardware identifiers and stored encrypted in IndexedDB. It's not as strong as hardware attestation, but it is stable across sessions and correlated with specific hardware — making it costly to rotate.

Layer 4: Challenge Freshness

Each verification request contains a server-generated nonce and a timestamp valid for 60 seconds. The gesture data is signed over the nonce — preventing replay attacks where a bot records a valid human gesture and reuses it.

json

{
  "challenge": {
    "nonce": "7f4a2c9e...",
    "expires_at": 1732801234,
    "gesture_path": [3, 1, 4, 1, 5, 9]
  }
}

The gesture path specifies which segments to trace — a different pattern each session. A bot cannot pre-compute or cache a valid gesture because the required path changes every request.

Layer 5: Bond Graph Clustering

A DID that passes all the above checks in isolation might still be suspicious if it exists in a vacuum. APTOGON's bond graph tracks social connections between verified humans — explicit vouching relationships, shared community membership, interaction history.

A newly created DID with no social connections, no history, and a suspicious canvas fingerprint is assigned a lower trust score even if its gesture verification passes. This is analogous to how email spam filters use sender reputation alongside content analysis.

Legitimate new users build trust through normal activity. Bot farms can't replicate years of organic social interaction at scale.

Why This Combination Holds

Each layer individually is bypassable with enough effort. The combination is not, for a simple economic reason: the cost of defeating all five layers simultaneously exceeds the value of a single fake verification.

Layer	What bots must fake	Cost to fake
Canvas fingerprint	Real GPU hardware	High (dedicated physical hardware per DID)
Gesture biometrics	Human neuromuscular pattern	High (human in the loop, or ML model that still clusters as synthetic)
Device-bound DID	Hardware Secure Enclave with real attestation cert	Extremely high (requires physical hardware)
Challenge freshness	Real-time human response within 60 sec	High (removes async/batch operation)
Bond graph	Years of organic social history	Near-impossible at scale

The goal isn't zero false negatives — it's making fraud economically unviable. When the cost of a fake verification exceeds its value, the attack stops.

For platforms that need it, the on-chain HumanCredential provides a final anchor: the Aptos blockchain provides an immutable, auditable record that a specific DID passed verification at a specific time. Even if an attacker eventually gets past all five layers, they've expended so many resources that they've proven their operation isn't scalable.

How Gesture Verification Detects AI Agents