glitchlings

Glitchlings development setup

This guide walks through preparing a local development environment, running the automated checks, and exercising the optional Rust acceleration layer.

Prerequisites

Install the project

  1. Clone the repository and create an isolated environment:

    git clone https://github.com/osoleve/glitchlings.git
    cd glitchlings
    python -m venv .venv
    source .venv/bin/activate
    
  2. Install the package in editable mode with the development dependencies:

    pip install -e .[dev]
    

    Add the prime extra (pip install -e .[dev,prime]) when you need the Prime Intellect integration and its verifiers dependency. Enable the embedding-backed lexicon helpers with the vectors extra (pip install -e .[dev,vectors]) to pull in numpy, spaCy, and gensim.

  3. Install the git hooks so the shared formatting, linting, and type checks run automatically:

    pre-commit install
    
  4. The package ships a compact vector-cache for Jargoyle so you can exercise synonym swaps without heavyweight models. Regenerate or extend that cache with the bundled CLI when you have larger embeddings available:

    glitchlings build-lexicon \
        --source spacy:en_core_web_md \
        --output data/vector_lexicon.json \
        --limit 50000 \
        --overwrite
    

    The command accepts gensim-compatible KeyedVectors or Word2Vec formats via --source /path/to/vectors.kv. Pass --tokens words.txt to restrict caching to a curated vocabulary, tweak --min-similarity/--max-neighbors to trade breadth for precision, and bake in deterministic seeds with --seed. HuggingFace SentenceTransformer checkpoints work too: install the st extra and call --source sentence-transformers:sentence-transformers/all-mpnet-base-v2 --tokens words.txt to mirror the bundled cache.

    Need to sanity-check new lexical sources? Import glitchlings.lexicon.metrics and call compare_lexicons(...) to benchmark synonym diversity, ≥3-substitute coverage, and mean cosine similarity against previously captured baselines.

    Prefer the legacy WordNet behaviour? Install nltk, download its WordNet corpus (python -m nltk.downloader wordnet), and update config.toml so the lexicon.priority includes "wordnet" ahead of the vector cache.

Run the test suite

Execute the automated tests from the repository root:

pytest

The suite covers determinism guarantees, dataset integrations, and parity between Python and Rust implementations. Vector-backed lexicons ship with the repository so the Jargoyle tests run without external downloads, while optional WordNet checks are gated behind the legacy backend being available.

Key regression guardrails live in:

Automated checks

Run the shared quality gates before opening a pull request:

ruff check .
black --check .
isort --check-only .
python -m mypy --config-file pyproject.toml
pytest --maxfail=1 --disable-warnings -q

Rust acceleration

Glitchlings ships PyO3 extensions that accelerate Typogre, Mim1c, Reduple, Adjax, Rushmore, Redactyl, and Scannequin. Compile them with maturin; the Python interfaces pick them up automatically when available:

# Compile the shared Rust crate (rerun after Rust or Python updates)
maturin develop -m rust/zoo/Cargo.toml

# Optional: disable the fast path before importing glitchlings
export GLITCHLINGS_RUST_PIPELINE=0

Gaggle prefers the compiled fast path whenever the extension is importable. Set the environment variable to 0/false (or any other falsey value) to force the pure-Python orchestrator when debugging or profiling. The test suite automatically covers both code paths - re-run pytest once normally and once with the flag set to 0 to verify changes across implementations.

Additional tips