Legacy / quarantine notice

This baseline note is preserved as a historical design artifact. It contains superseded assumptions, including KaHIP 3.14, greedy inside the benchmark yardstick, and NFE-first budgeting language. Do not use it as the active benchmark specification.

FORJA — Evolutive Benchmarking Baseline v1.0 (revisable)¶

This baseline is the initial, revisable yardstick for FORJA. It codifies fairness, budgets, metrics, and provenance. It can be updated after the pilot; record changes in the Revision History.

Date: 2025-09-24

1) Purpose and problem¶

Goal: compare graph partitioning algorithms under a strict, reproducible protocol.
Problem: k-way partitioning on unweighted graph G=(V,E), minimize edge-cut under cardinality balance with tolerance β.
Objective: (cut(P) = |{(u,v)\in E : part(u) \neq part(v)}|)
Balance constraint: (\forall i,\ \big| |V_i| - |V|/k \big| / (|V|/k) \le \beta)

Primary metrics - cut(P) (minimize), effective imbalance δ(P) (%) Complementary - normalized cut cut/|E|, boundary size |∂P|, per-part volume, wall time (ms)

2) Fairness rule (anytime + budgets)¶

k: {4, 8, 16} (k=32 is a future extension)
β: {1%, 3%, 5%}
Primary budget (NFE): NFE_max = 50 · |E| per run
Time cap: t_max = 0.5 · |E| ms (stop on whichever comes first)
Anytime checkpoints (by NFE): {1e2, 3e2, 1e3, 3e3, 1e4, 3e4, 1e5, 3e5, …} up to NFE_max
Seeds: pilot = 1 (42); main campaign = 10 replicas per (algorithm, instance). Use a master-seed and deterministic derivation.

Note: Values are initial defaults; they may be tuned after the pilot, but changes must be documented and applied uniformly.

3) Algorithm portfolio (versions & defaults)¶

METIS 5.1.0 (gpmetis) — required
-ufactor = round(1000·β), -ptype=kway
KaHIP 3.14 (kaffpa) — optional in CI; required locally for final experiments
-k, -imbalance = 100·β, --preconfiguration=fast
Greedy + 1-swap local refinement (respects balance). Connectivity is not enforced globally to stay comparable to METIS/KaHIP.
Metaheuristics (future, e.g., SA/GA/MEALPY) — cold start (random k-balanced init). Warm-start is reserved for ablation studies.

4) Dataset and naming¶

Synthetic v1.0: current canonical set in repo (8–12 instances spanning n, p, CV)
Planned extension: factorial grid n ∈ {2000,3000,3500} × p ∈ {0.35,0.40,0.50} × CV ∈ {0.15,0.30,0.45} with 2 seeds/point
Real (RoadNet-CA): 3–5 induced subgraphs via BFS with target sizes (~10k, ~20k), defined after pilot
Canonical naming: n{N}_p{P}_cv{CV}_seed{S}.json.gz (0-based ids, unit weights)

5) Provenance and reproducibility¶

Threads: OMP=1, OPENBLAS=1, MKL=1
Per-run capture:
git_sha, hostname, started_at, finished_at, k, β,
budget (NFE_max, t_max), seed, cmdline, exit_code
checkpoints: list of (NFE, cut, time_ms)
Label normalization: map labels to contiguous 0..k-1; if remapped, store the mapping in the manifest
Versions logged: Python, FORJA pkg, METIS, KaHIP, OS, CPU
Results contract (solver_run.v1): fields for metrics, budgets, checkpoints, and artifact paths (consistent with analysis scripts).

6) Analysis and statistics¶

Performance profiles (Dolan–Moré) (ratio to best per instance)
Anytime curves cut × NFE
TTT (time-to-target) with gaps to best-known: {0%, 1%, 2%, 5%}; explicit censoring
Inference: Friedman + Nemenyi (multi-algo), Wilcoxon (paired). Multiple testing correction: Holm (BH if justified)
Stratification: terciles for density, modularity Q, |V|, CV

Revision history¶

v1.0 (2025-09-24) — initial evolutive baseline