Module gridfm

Expand description

GridFM interchange: a parsed case as the gridfm-datakit Parquet schema.

gridfm-datakit writes per-scenario Parquet tables that gridfm-graphkit’s HeteroGridDatasetDisk trains a GNN on. This module emits the same four tables — bus_data, gen_data, branch_data, y_bus_data — from one parsed Network, so graphkit can train on powerio output directly and the scenario-batch path (issue #14) has its on-disk format.

The reverse — read_gridfm_dataset / read_gridfm_scenarios / gridfm_base_case, with the pure read_gridfm_network over in-memory batches — rebuilds a Network from such a dataset (lossy but power-flow-complete; see GridfmRead), the ML→classical return leg (issue #60). One reader plus the existing writers means gridfm → any classical format. y_bus_data is ignored on read; branches carry raw r/x/b.

§Snapshots and scenarios

powerio has no power flow solver. One parsed case is one snapshot (scenario = 0): voltages and generator dispatch are the case’s stored values, and branch flows pf/qf/pt/qt are computed from those voltages and the branch admittances (branch_flows). For a solved MATPOWER case the stored voltages are the converged operating point, so the flows match what a solver would report to float tolerance; for an unsolved/flat start case they are the flows at the stored voltages, not a re-solved dispatch.

A scenario batch (write_gridfm_batch / gridfm_record_batches_batch) row-stacks many snapshots into the four tables, keyed by the scenario column. The snapshots share a base element set — the same bus/branch/gen counts and bus-id ordering, so the dense bus index means the same bus across scenarios — enforced by the shape check (Error::ScenarioShapeMismatch). Within that, load, dispatch, voltages, branch status, bus type, and costs may all differ per snapshot. This matches datakit, whose topology variants (N-K, random component drop) toggle BR_STATUS/GEN_STATUS on a fixed element set, and graphkit’s HeteroGridDatasetDisk, which groups by scenario and rebuilds the graph independently for each one. powerio doesn’t generate the perturbations; a caller (e.g. a scenario generator) supplies the snapshots.

§Units

Pd, Qd, Pg, Qg, p_mw, q_mvar are MW/MVAr, passed through from the case (loads and generator setpoints are already MW/MVAr). The branch flows pf, qf, pt, qt are MW/MVAr too, computed in per-unit and scaled by base_mva.
Vm per-unit, Va degrees; r, x, b and the Y** admittances per-unit.
GS, BS are the MATPOWER shunt values (MW/MVAr at V = 1) divided by base_mva, matching datakit’s normalization.
Costs are the raw MATPOWER coefficients: cp2 = c2, cp1 = c1, cp0 = c0. A cost row gridfm can’t represent (piecewise, missing, malformed, or cubic and higher) emits zeros — graphkit ignores the cost columns — and is counted in the manifest. The _eur suffixes are datakit’s column names, not a unit powerio converts to.
bus, from_bus, to_bus are dense [0, n) indices; idx is the 0-based generator/branch row. An out-of-service branch keeps its physical Y** admittances but carries zero flows (its br_status is 0).

Structs§

GridfmOptions: Options for the gridfm export — the batch-wide knobs. The scenario id is a per-snapshot property (set via GridfmSnapshot::new / numbered_snapshots, or the explicit argument to the single-case write_gridfm_dataset / gridfm_record_batches), not an option here.
GridfmOutputs: What write_gridfm_dataset wrote, plus the counts of columns it had to zero (see the manifest) so a caller can surface them.
GridfmRead: One scenario read out of a gridfm dataset: the reconstructed Network plus the fidelity warnings the lossy read couldn’t avoid (mirroring Conversion::warnings).
GridfmSnapshot: One snapshot in a gridfm scenario batch: a parsed Network and the scenario id stamped into its rows.
GridfmTables: The gridfm-datakit tables as Arrow record batches. The Parquet writer builds from these; a deferred gridfm-schema Arrow C Data Interface export (issue #38) would reuse them. (The raw network Arrow export that ships in powerio-capi is a different, lighter schema.)

Functions§

gridfm_base_case: The unperturbed base case: read_gridfm_dataset at scenario = 0 (datakit’s convention). There is no single “shared base” beyond a chosen scenario — bus types, branch status, and reference bus all vary per scenario — so the base case is just scenario 0.
gridfm_record_batches: Build the four gridfm tables for one network, stamping scenario into the id columns. Pure (no I/O). A thin wrapper over gridfm_record_batches_batch for one snapshot.
gridfm_record_batches_batch: Build the four gridfm tables for a batch of scenarios, row-stacked and keyed by the scenario column. Pure (no I/O). Each snapshot carries its own scenario id; the include_y_bus/taps/shifts flags apply to every snapshot.
gridfm_scenario_ids: The distinct scenario ids in a gridfm dataset, ascending — the keys read_gridfm_scenarios rebuilds a Network for. Reads only bus_data’s scenario column, so it enumerates a dataset’s scenarios without rebuilding every network; the C ABI’s pio_gridfm_scenario_ids is a thin wrapper over it.
numbered_snapshots: Number a list of networks into snapshots, stamping the k-th base + k — the one place the “k-th input is scenario base + k” rule lives, so the CLI and the Python binding can’t drift. Checked: returns Error::ScenarioIdOverflow rather than wrapping or panicking if a scenario id exceeds i64.
read_gridfm_dataset: Read one scenario from a gridfm dataset on disk and rebuild a Network. The inverse of write_gridfm_dataset.
read_gridfm_network: Build one Network from in-memory gridfm tables, selecting scenario’s rows. The pure inverse of gridfm_record_batches: base_mva and name come from the caller (the disk path reads them from gridfm_meta.json).
read_gridfm_scenarios: Read every scenario from a gridfm dataset, one Network per scenario id (sorted ascending) over the shared topology — the read side of the scenario batch (#57). Each scenario is rebuilt independently, so two scenarios may differ in branch status, bus types, and reference bus.
write_gridfm_batch: Write a batch of scenarios as one gridfm-datakit dataset under out_dir/<network_name>/raw/, row-stacking every snapshot’s tables and keying them by the scenario column. The dataset name and the base element counts come from the first snapshot (shared across the batch by the shape check); the dropped/degenerate counts are summed over every snapshot, while reference_bus / n_branches_in_service record the first snapshot only (they can differ per scenario, so the manifest documents them as scenario 0’s).
write_gridfm_dataset: Write the gridfm-datakit Parquet dataset for one case under out_dir/<network_name>/raw/, matching datakit’s directory layout. Stamps scenario into the id columns. Writes bus_data.parquet, gen_data.parquet, branch_data.parquet, optionally y_bus_data.parquet, and a gridfm_meta.json manifest.

Module gridfm

Module gridfm Copy item path

§Snapshots and scenarios

§Units

Structs§

Functions§

Module gridfm