Expand description
GridFM interchange: a parsed case as the gridfm-datakit Parquet schema.
gridfm-datakit writes per-scenario Parquet
tables that gridfm-graphkit’s
HeteroGridDatasetDisk trains a GNN on. This module emits the same four
tables — bus_data, gen_data, branch_data, y_bus_data — from one
parsed Network, so graphkit can train on powerio output directly and the
scenario-batch path (issue #14) has its on-disk format.
The reverse — read_gridfm_dataset / read_gridfm_scenarios /
gridfm_base_case, with the pure read_gridfm_network over in-memory
batches — rebuilds a Network from such a dataset (lossy but
power-flow-complete; see GridfmRead), the ML→classical return leg
(issue #60). One reader plus the existing writers means gridfm → any classical
format. y_bus_data is ignored on read; branches carry raw r/x/b.
§Snapshots and scenarios
powerio has no power flow solver. One parsed case is one snapshot
(scenario = 0): voltages and generator dispatch are the case’s stored
values, and branch flows pf/qf/pt/qt are computed from those voltages and
the branch admittances (branch_flows). For a solved MATPOWER case the
stored voltages are the converged operating point, so the flows match what a
solver would report to float tolerance; for an unsolved/flat start case they
are the flows at the stored voltages, not a re-solved dispatch.
A scenario batch (write_gridfm_batch / gridfm_record_batches_batch)
row-stacks many snapshots into the four tables, keyed by the scenario
column. The snapshots share a base element set — the same bus/branch/gen
counts and bus-id ordering, so the dense bus index means the same bus across
scenarios — enforced by the shape check (Error::ScenarioShapeMismatch).
Within that, load, dispatch, voltages, branch status, bus type, and costs may
all differ per snapshot. This matches datakit, whose topology variants (N-K,
random component drop) toggle BR_STATUS/GEN_STATUS on a fixed element set,
and graphkit’s HeteroGridDatasetDisk, which groups by scenario and
rebuilds the graph independently for each one. powerio doesn’t generate the
perturbations; a caller (e.g. a scenario generator) supplies the snapshots.
§Units
Pd, Qd, Pg, Qg, p_mw, q_mvarare MW/MVAr, passed through from the case (loads and generator setpoints are already MW/MVAr). The branch flowspf, qf, pt, qtare MW/MVAr too, computed in per-unit and scaled bybase_mva.Vmper-unit,Vadegrees;r, x, band theY**admittances per-unit.GS, BSare the MATPOWER shunt values (MW/MVAr at V = 1) divided bybase_mva, matching datakit’s normalization.- Costs are the raw MATPOWER coefficients:
cp2 = c2,cp1 = c1,cp0 = c0. A cost row gridfm can’t represent (piecewise, missing, malformed, or cubic and higher) emits zeros — graphkit ignores the cost columns — and is counted in the manifest. The_eursuffixes are datakit’s column names, not a unit powerio converts to. bus,from_bus,to_busare dense[0, n)indices;idxis the 0-based generator/branch row. An out-of-service branch keeps its physicalY**admittances but carries zero flows (itsbr_statusis 0).
Structs§
- Gridfm
Options - Options for the gridfm export — the batch-wide knobs. The scenario id is a
per-snapshot property (set via
GridfmSnapshot::new/numbered_snapshots, or the explicit argument to the single-casewrite_gridfm_dataset/gridfm_record_batches), not an option here. - Gridfm
Outputs - What
write_gridfm_datasetwrote, plus the counts of columns it had to zero (see the manifest) so a caller can surface them. - Gridfm
Read - One scenario read out of a gridfm dataset: the reconstructed
Networkplus the fidelity warnings the lossy read couldn’t avoid (mirroringConversion::warnings). - Gridfm
Snapshot - One snapshot in a gridfm scenario batch: a parsed
Networkand the scenario id stamped into its rows. - Gridfm
Tables - The gridfm-datakit tables as Arrow record batches. The Parquet writer builds from these; a deferred gridfm-schema Arrow C Data Interface export (issue #38) would reuse them. (The raw network Arrow export that ships in powerio-capi is a different, lighter schema.)
Functions§
- gridfm_
base_ case - The unperturbed base case:
read_gridfm_datasetatscenario = 0(datakit’s convention). There is no single “shared base” beyond a chosen scenario — bus types, branch status, and reference bus all vary per scenario — so the base case is just scenario 0. - gridfm_
record_ batches - Build the four gridfm tables for one network, stamping
scenariointo the id columns. Pure (no I/O). A thin wrapper overgridfm_record_batches_batchfor one snapshot. - gridfm_
record_ batches_ batch - Build the four gridfm tables for a batch of scenarios, row-stacked and keyed
by the
scenariocolumn. Pure (no I/O). Each snapshot carries its own scenario id; theinclude_y_bus/taps/shifts flags apply to every snapshot. - gridfm_
scenario_ ids - The distinct scenario ids in a gridfm dataset, ascending — the keys
read_gridfm_scenariosrebuilds aNetworkfor. Reads onlybus_data’s scenario column, so it enumerates a dataset’s scenarios without rebuilding every network; the C ABI’spio_gridfm_scenario_idsis a thin wrapper over it. - numbered_
snapshots - Number a list of networks into snapshots, stamping the k-th
base + k— the one place the “k-th input is scenariobase + k” rule lives, so the CLI and the Python binding can’t drift. Checked: returnsError::ScenarioIdOverflowrather than wrapping or panicking if a scenario id exceedsi64. - read_
gridfm_ dataset - Read one
scenariofrom a gridfm dataset on disk and rebuild aNetwork. The inverse ofwrite_gridfm_dataset. - read_
gridfm_ network - Build one
Networkfrom in-memory gridfm tables, selectingscenario’s rows. The pure inverse ofgridfm_record_batches:base_mvaandnamecome from the caller (the disk path reads them fromgridfm_meta.json). - read_
gridfm_ scenarios - Read every scenario from a gridfm dataset, one
Networkperscenarioid (sorted ascending) over the shared topology — the read side of the scenario batch (#57). Each scenario is rebuilt independently, so two scenarios may differ in branch status, bus types, and reference bus. - write_
gridfm_ batch - Write a batch of scenarios as one gridfm-datakit dataset under
out_dir/<network_name>/raw/, row-stacking every snapshot’s tables and keying them by thescenariocolumn. The dataset name and the base element counts come from the first snapshot (shared across the batch by the shape check); the dropped/degenerate counts are summed over every snapshot, whilereference_bus/n_branches_in_servicerecord the first snapshot only (they can differ per scenario, so the manifest documents them as scenario 0’s). - write_
gridfm_ dataset - Write the gridfm-datakit Parquet dataset for one case under
out_dir/<network_name>/raw/, matching datakit’s directory layout. Stampsscenariointo the id columns. Writesbus_data.parquet,gen_data.parquet,branch_data.parquet, optionallyy_bus_data.parquet, and agridfm_meta.jsonmanifest.