Research
Latest research
11Latest research11
Measuring what models can't fake
Measuring what models can't fake
Most evaluations grade from text alone, flattening the reasoning and edge cases that decide correctness. We describe an expert-graded benchmark built to resist gaming, including undisclosed-AI submissions.
Dr. Helena Vogt
May 27, 2026
The case for verifiable human judgment
The case for verifiable human judgment
Frontier labs increasingly train on the judgment of human experts, yet that judgment is barely verifiable today. We argue that signed provenance and live agreement turn trust into something a reviewer can check.
Dr. Helena Vogt
April 28, 2026
Why scarce experts churn, and what it costs
Why scarce experts churn, and what it costs
Incumbents optimise for volume and speed, treating scarce specialists as interchangeable gig workers. We look at the churn this creates and how it quietly degrades data quality over time.
Pathwize Research
March 16, 2026
Detecting AI disguised as human feedback
Detecting AI disguised as human feedback
When contractors secretly use LLMs, the data meant to capture human judgment is poisoned. We share the timing, telemetry and content signals we use to catch undisclosed-AI submissions in the loop.
Jonas Albrecht
Lena Brandt
February 9, 2026
How human expertise shapes modern AI
How human expertise shapes modern AI
Exploring why human judgment, skill and domain expertise remain essential to training modern AI, and how credentialed people shape the frontier where synthetic data can't reach.
Jonas Albrecht
January 5, 2026






