Glitchlings development setup#
This guide walks through preparing a local development environment, running the automated checks, and working with the Rust acceleration layer that now powers the core runtime. It is the source of truth for contributor workflow details.
Prerequisites#
- Python 3.10+
pipand a virtual environment tool of your choice (the examples below usepython -m venv)- A Rust toolchain (
rustupor system packages) andmaturinfor compiling the PyO3 extensions
Install the project#
- Clone the repository and create an isolated environment:
git clone https://github.com/osoleve/glitchlings.git
cd glitchlings
python -m venv .venv
source .venv/bin/activate
- Install the package in editable mode with the development dependencies:
Add the prime extra (pip install -e .[dev,prime]) when you need the Prime Intellect integration and its verifiers dependency.
- Install the git hooks so the shared formatting, linting, and type checks run automatically:
Run the test suite#
Execute the automated tests from the repository root:
The suite covers determinism guarantees, dataset integrations, and the compiled Rust implementation that now backs orchestration.
Automated checks#
Run the shared quality gates before opening a pull request:
Additional tips#
- Rebuild the Rust extension after editing files under
rust/zoo/:
- Regenerate the CLI reference page, Monster Manual (both repo root and docs site copies), and glitchling gallery together with:
Pattern masking#
All glitchlings support two base-class parameters for controlling which regions of text are corrupted:
exclude_patterns#
A list of regex patterns marking text that must not be modified. Matched regions are treated as immutable and passed through unchanged.
from glitchlings import Typogre
# Preserve HTML tags while corrupting surrounding text
typo = Typogre(rate=0.1, exclude_patterns=[r"<[^>]+>"])
typo("<h1>Welcome</h1> to the show!")
# -> "<h1>Welcoem</h1> to teh shwo!"
# Protect code blocks in Markdown
from glitchlings import Gaggle, Mim1c, Rushmore
gaggle = Gaggle(
[Mim1c(rate=0.02), Rushmore(rate=0.01)],
seed=404,
exclude_patterns=[r"```[\s\S]*?```", r"`[^`]+`"],
)
include_only_patterns#
A list of regex patterns restricting corruption to only matched regions. Text outside these matches is treated as immutable.
from glitchlings import Typogre
# Only corrupt text inside backticks
typo = Typogre(rate=0.5, include_only_patterns=[r"`[^`]+`"])
typo("Run `echo hello` to test")
# -> "Run `ecoh helo` to test"
Gaggle-level patterns#
When patterns are set on a Gaggle, they apply to all member glitchlings and merge with any patterns set on individual glitchlings:
from glitchlings import Gaggle, Typogre, Mim1c
# Protect system tags for the entire roster
gaggle = Gaggle(
[Typogre(rate=0.02), Mim1c(rate=0.01)],
seed=404,
exclude_patterns=[r"<system>.*?</system>"],
)
CLI usage#
Pass patterns directly in the glitchling specification:
Functional Purity Architecture#
The codebase follows a layered architecture that separates pure (deterministic, side-effect-free) code from impure (stateful, side-effectful) code, and requires all defensive coding to occur at module boundaries instead of all throughout. This pattern improves maintainability, testability, and clarity, especially when working with AI coding agents that tend to add defensive checks everywhere.
What is Pure Code?#
Pure functions:
- Return the same output given the same inputs
- Have no side effects (no IO, logging, or external state mutation)
- Do not manipulate RNG objects directlyβthey accept pre-computed random values
What is Impure Code?#
Impure code includes:
- File IO (configuration loading, cache reading/writing)
- Rust FFI calls via
get_rust_operation() - RNG state management (
random.Randominstantiation, seeding) - Optional dependency imports (
compat.pyloaders) - Global state access (
get_config(), cached singletons)
Module Organization#
The zoo subpackage organizes code by purity:
| Module | Type | Purpose |
|---|---|---|
zoo/validation.py |
Pure | Boundary validation, rate clamping, parameter normalization |
zoo/transforms.py |
Pure | Text tokenization, keyboard processing, string diffs, word splitting |
zoo/rng.py |
Pure | Seed resolution, hierarchical derivation |
compat/types.py |
Pure | Type definitions for optional dependency loading |
conf/types.py |
Pure | Configuration dataclasses (RuntimeConfig, AttackConfig) |
constants.py |
Pure | Centralized default values and constants |
attack/compose.py |
Pure | Result assembly helpers |
attack/encode.py |
Pure | Tokenization helpers |
attack/metrics_dispatch.py |
Pure | Metric dispatch logic |
internal/rust.py |
Impure | Low-level Rust FFI loader and primitives |
internal/rust_ffi.py |
Impure | Centralized Rust operation wrappers (preferred) |
compat/loaders.py |
Impure | Optional dependency lazy loading machinery |
conf/loaders.py |
Impure | Configuration file loading, caching, Gaggle construction |
Boundary Layer Pattern#
Validation and defensive code belong at module boundaries where untrusted input enters:
- CLI argument parsing (
main.py) - Public API entry points (
Glitchling.__init__,Attack.__init__) - Configuration loaders (
conf/module)
Core transformation functions inside these boundaries should:
- Trust that inputs are already validated
- NOT check for
Noneon required parameters - NOT re-validate types that the boundary already checked
- NOT add defensive
try/exceptaround trusted calls
Example: Correct Pattern#
# In validation.py (boundary layer)
def validate_rate(rate: float | None) -> float:
if rate is None:
raise ValueError("rate cannot be None")
if not isinstance(rate, (int, float)):
raise TypeError("rate must be numeric")
if math.isnan(rate):
return 0.0
return max(0.0, min(1.0, float(rate)))
# In typogre.py (uses boundary layer, trusts result)
def fatfinger(text: str, rate: float, ...) -> str:
# rate is already validated - just use it
return keyboard_typo_rust(
text,
rate,
layout,
seed,
shift_slip_rate=slip_rate,
shift_slip_exit_rate=slip_exit_rate,
shift_map=shift_map,
)
Example: Anti-Pattern#
# DON'T: Re-validate everywhere
def some_pure_transform(text: str, rate: float) -> str:
# Bad: re-validating what boundary should have checked
if rate is None:
raise ValueError("rate cannot be None")
if not isinstance(rate, (int, float)):
raise TypeError("rate must be numeric")
if math.isnan(rate):
rate = 0.0
# ... actual logic
Import Conventions#
Pure modules must follow strict import rules:
- Pure modules can only import from:
- Python standard library
-
Other pure modules
-
Pure modules must NOT import:
glitchlings.internal.rustglitchlings.compat-
Any module that triggers side effects at import time
-
Use TYPE_CHECKING guards for type-only imports:
Enforcement#
The architecture is enforced by automated tests in tests/test_purity_architecture.py:
These tests verify:
- Pure modules don't import impure modules
- Pure modules only use stdlib imports
- All pure modules have docstrings documenting their purity guarantees
Why This Matters for AI Agents#
AI coding agents tend to add defensive checks everywhere. This architecture makes it explicit:
- If you're in a
pure/ortransforms/module: trust your inputs - If you're at a boundary: validate thoroughly once
- If you're unsure: check which layer the file belongs to
This reduces noise in the codebase and makes the agent-written code more consistent with human-written code.