Legacy / quarantine notice
This baseline note is preserved as a historical design artifact. It contains superseded assumptions, including
KaHIP 3.14, greedy inside the benchmark yardstick, and NFE-first budgeting language. Do not use it as the active benchmark specification.
FORJA — Evolutive Benchmarking Baseline v1.0 (revisable)¶
This baseline is the initial, revisable yardstick for FORJA. It codifies fairness, budgets, metrics, and provenance. It can be updated after the pilot; record changes in the Revision History.
Date: 2025-09-24
1) Purpose and problem¶
- Goal: compare graph partitioning algorithms under a strict, reproducible protocol.
- Problem: k-way partitioning on unweighted graph G=(V,E), minimize edge-cut under cardinality balance with tolerance β.
- Objective: (cut(P) = |{(u,v)\in E : part(u) \neq part(v)}|)
- Balance constraint: (\forall i,\ \big| |V_i| - |V|/k \big| / (|V|/k) \le \beta)
Primary metrics - cut(P) (minimize), effective imbalance δ(P) (%) Complementary - normalized cut cut/|E|, boundary size |∂P|, per-part volume, wall time (ms)
2) Fairness rule (anytime + budgets)¶
- k: {4, 8, 16} (k=32 is a future extension)
- β: {1%, 3%, 5%}
- Primary budget (NFE):
NFE_max = 50 · |E|per run - Time cap:
t_max = 0.5 · |E| ms(stop on whichever comes first) - Anytime checkpoints (by NFE):
{1e2, 3e2, 1e3, 3e3, 1e4, 3e4, 1e5, 3e5, …}up toNFE_max - Seeds: pilot = 1 (42); main campaign = 10 replicas per (algorithm, instance). Use a master-seed and deterministic derivation.
Note: Values are initial defaults; they may be tuned after the pilot, but changes must be documented and applied uniformly.
3) Algorithm portfolio (versions & defaults)¶
- METIS 5.1.0 (
gpmetis) — required -ufactor = round(1000·β),-ptype=kway- KaHIP 3.14 (
kaffpa) — optional in CI; required locally for final experiments -k,-imbalance = 100·β,--preconfiguration=fast- Greedy + 1-swap local refinement (respects balance). Connectivity is not enforced globally to stay comparable to METIS/KaHIP.
- Metaheuristics (future, e.g., SA/GA/MEALPY) — cold start (random k-balanced init). Warm-start is reserved for ablation studies.
4) Dataset and naming¶
- Synthetic v1.0: current canonical set in repo (8–12 instances spanning n, p, CV)
- Planned extension: factorial grid
n ∈ {2000,3000,3500} × p ∈ {0.35,0.40,0.50} × CV ∈ {0.15,0.30,0.45}with 2 seeds/point - Real (RoadNet-CA): 3–5 induced subgraphs via BFS with target sizes (~10k, ~20k), defined after pilot
- Canonical naming:
n{N}_p{P}_cv{CV}_seed{S}.json.gz(0-based ids, unit weights)
5) Provenance and reproducibility¶
- Threads:
OMP=1,OPENBLAS=1,MKL=1 - Per-run capture:
git_sha,hostname,started_at,finished_at,k,β,budget(NFE_max,t_max),seed,cmdline,exit_codecheckpoints: list of(NFE, cut, time_ms)- Label normalization: map labels to contiguous
0..k-1; if remapped, store the mapping in the manifest - Versions logged: Python, FORJA pkg, METIS, KaHIP, OS, CPU
- Results contract (
solver_run.v1): fields for metrics, budgets, checkpoints, and artifact paths (consistent with analysis scripts).
6) Analysis and statistics¶
- Performance profiles (Dolan–Moré) (ratio to best per instance)
- Anytime curves
cut × NFE - TTT (time-to-target) with gaps to best-known: {0%, 1%, 2%, 5%}; explicit censoring
- Inference: Friedman + Nemenyi (multi-algo), Wilcoxon (paired). Multiple testing correction: Holm (BH if justified)
- Stratification: terciles for density, modularity Q, |V|, CV
Revision history¶
- v1.0 (2025-09-24) — initial evolutive baseline