Skip to content

Glitchlings development setup#

This guide walks through preparing a local development environment, running the automated checks, and working with the Rust acceleration layer that now powers the core runtime. It is the source of truth for contributor workflow details.

Prerequisites#

  • Python 3.10+
  • pip and a virtual environment tool of your choice (the examples below use python -m venv)
  • A Rust toolchain (rustup or system packages) and maturin for compiling the PyO3 extensions

Install the project#

  1. Clone the repository and create an isolated environment:
git clone https://github.com/osoleve/glitchlings.git
cd glitchlings
python -m venv .venv
source .venv/bin/activate
  1. Install the package in editable mode with the development dependencies:
pip install -e .[dev]

Add the prime extra (pip install -e .[dev,prime]) when you need the Prime Intellect integration and its verifiers dependency.

  1. Install the git hooks so the shared formatting, linting, and type checks run automatically:
pre-commit install

Run the test suite#

Execute the automated tests from the repository root:

pytest

The suite covers determinism guarantees, dataset integrations, and the compiled Rust implementation that now backs orchestration.

Automated checks#

Run the shared quality gates before opening a pull request:

ruff check .
python -m mypy --config-file pyproject.toml src
uv build
pytest

Additional tips#

  • Rebuild the Rust extension after editing files under rust/zoo/:
uv build -Uq
  • Regenerate the CLI reference page, Monster Manual (both repo root and docs site copies), and glitchling gallery together with:
python -m glitchlings.dev.docs
# or, once installed: glitchlings-refresh-docs

Pattern masking#

All glitchlings support two base-class parameters for controlling which regions of text are corrupted:

exclude_patterns#

A list of regex patterns marking text that must not be modified. Matched regions are treated as immutable and passed through unchanged.

from glitchlings import Typogre

# Preserve HTML tags while corrupting surrounding text
typo = Typogre(rate=0.1, exclude_patterns=[r"<[^>]+>"])
typo("<h1>Welcome</h1> to the show!")
# -> "<h1>Welcoem</h1> to teh shwo!"

# Protect code blocks in Markdown
from glitchlings import Gaggle, Mim1c, Rushmore

gaggle = Gaggle(
    [Mim1c(rate=0.02), Rushmore(rate=0.01)],
    seed=404,
    exclude_patterns=[r"```[\s\S]*?```", r"`[^`]+`"],
)

include_only_patterns#

A list of regex patterns restricting corruption to only matched regions. Text outside these matches is treated as immutable.

from glitchlings import Typogre

# Only corrupt text inside backticks
typo = Typogre(rate=0.5, include_only_patterns=[r"`[^`]+`"])
typo("Run `echo hello` to test")
# -> "Run `ecoh helo` to test"

Gaggle-level patterns#

When patterns are set on a Gaggle, they apply to all member glitchlings and merge with any patterns set on individual glitchlings:

from glitchlings import Gaggle, Typogre, Mim1c

# Protect system tags for the entire roster
gaggle = Gaggle(
    [Typogre(rate=0.02), Mim1c(rate=0.01)],
    seed=404,
    exclude_patterns=[r"<system>.*?</system>"],
)

CLI usage#

Pass patterns directly in the glitchling specification:

glitchlings -g "Typogre(rate=0.1, exclude_patterns=['<[^>]+>'])" "<b>Bold</b> text"

Functional Purity Architecture#

The codebase follows a layered architecture that separates pure (deterministic, side-effect-free) code from impure (stateful, side-effectful) code, and requires all defensive coding to occur at module boundaries instead of all throughout. This pattern improves maintainability, testability, and clarity, especially when working with AI coding agents that tend to add defensive checks everywhere.

What is Pure Code?#

Pure functions:

  • Return the same output given the same inputs
  • Have no side effects (no IO, logging, or external state mutation)
  • Do not manipulate RNG objects directlyβ€”they accept pre-computed random values

What is Impure Code?#

Impure code includes:

  • File IO (configuration loading, cache reading/writing)
  • Rust FFI calls via get_rust_operation()
  • RNG state management (random.Random instantiation, seeding)
  • Optional dependency imports (compat.py loaders)
  • Global state access (get_config(), cached singletons)

Module Organization#

The zoo subpackage organizes code by purity:

Module Type Purpose
zoo/validation.py Pure Boundary validation, rate clamping, parameter normalization
zoo/transforms.py Pure Text tokenization, keyboard processing, string diffs, word splitting
zoo/rng.py Pure Seed resolution, hierarchical derivation
compat/types.py Pure Type definitions for optional dependency loading
conf/types.py Pure Configuration dataclasses (RuntimeConfig, AttackConfig)
constants.py Pure Centralized default values and constants
attack/compose.py Pure Result assembly helpers
attack/encode.py Pure Tokenization helpers
attack/metrics_dispatch.py Pure Metric dispatch logic
internal/rust.py Impure Low-level Rust FFI loader and primitives
internal/rust_ffi.py Impure Centralized Rust operation wrappers (preferred)
compat/loaders.py Impure Optional dependency lazy loading machinery
conf/loaders.py Impure Configuration file loading, caching, Gaggle construction

Boundary Layer Pattern#

Validation and defensive code belong at module boundaries where untrusted input enters:

  • CLI argument parsing (main.py)
  • Public API entry points (Glitchling.__init__, Attack.__init__)
  • Configuration loaders (conf/ module)

Core transformation functions inside these boundaries should:

  • Trust that inputs are already validated
  • NOT check for None on required parameters
  • NOT re-validate types that the boundary already checked
  • NOT add defensive try/except around trusted calls

Example: Correct Pattern#

# In validation.py (boundary layer)
def validate_rate(rate: float | None) -> float:
    if rate is None:
        raise ValueError("rate cannot be None")
    if not isinstance(rate, (int, float)):
        raise TypeError("rate must be numeric")
    if math.isnan(rate):
        return 0.0
    return max(0.0, min(1.0, float(rate)))

# In typogre.py (uses boundary layer, trusts result)
def fatfinger(text: str, rate: float, ...) -> str:
    # rate is already validated - just use it
    return keyboard_typo_rust(
        text,
        rate,
        layout,
        seed,
        shift_slip_rate=slip_rate,
        shift_slip_exit_rate=slip_exit_rate,
        shift_map=shift_map,
    )

Example: Anti-Pattern#

# DON'T: Re-validate everywhere
def some_pure_transform(text: str, rate: float) -> str:
    # Bad: re-validating what boundary should have checked
    if rate is None:
        raise ValueError("rate cannot be None")
    if not isinstance(rate, (int, float)):
        raise TypeError("rate must be numeric")
    if math.isnan(rate):
        rate = 0.0
    # ... actual logic

Import Conventions#

Pure modules must follow strict import rules:

  1. Pure modules can only import from:
  2. Python standard library
  3. Other pure modules

  4. Pure modules must NOT import:

  5. glitchlings.internal.rust
  6. glitchlings.compat
  7. Any module that triggers side effects at import time

  8. Use TYPE_CHECKING guards for type-only imports:

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from glitchlings.zoo.core import Glitchling

Enforcement#

The architecture is enforced by automated tests in tests/test_purity_architecture.py:

pytest tests/test_purity_architecture.py -v

These tests verify:

  • Pure modules don't import impure modules
  • Pure modules only use stdlib imports
  • All pure modules have docstrings documenting their purity guarantees

Why This Matters for AI Agents#

AI coding agents tend to add defensive checks everywhere. This architecture makes it explicit:

  • If you're in a pure/ or transforms/ module: trust your inputs
  • If you're at a boundary: validate thoroughly once
  • If you're unsure: check which layer the file belongs to

This reduces noise in the codebase and makes the agent-written code more consistent with human-written code.