Export Schemas
Everything the export produces, in one place. 60 canonical columns across 8 sections, 13 warning codes, and the per-meter / invoice-level scope rules you need to read multi-meter bills correctly. Two download formats — a flat CSV for spreadsheet workflows and an advanced two-sheet Excel workbook that adds per-line charge detail — use identical column names and types, so anything that's true of one is true of the other.
Overview
The export turns each uploaded utility bill PDF into structured data. Every column has a stable canonical name regardless of what the utility happens to call it on the bill — see State-specific identifiers for the alias map. The Field reference below is the single source of truth for both CSV and Excel: identical column names, identical types, identical nullability.
CSV vs Excel — which one to choose
Both downloads come off the same extraction. They differ only in cardinality: the CSV is fixed-width and flat; the Excel workbook adds a second sheet for the variable-cardinality per-line detail (TOU bands, tier rows, supplier splits, individual taxes) that doesn't fit in a flat table.
| Base CSV | Advanced Excel (.xlsx) | |
|---|---|---|
| Layout | One flat table. | Two sheets — Bills and Line items — joined by FK columns. |
| Granularity | One row per metered service. | Sheet 1 matches the CSV (one row per metered service); Sheet 2 adds one row per charge line. |
| Columns | 60 canonical fields. | Same 60 fields on Sheet 1 (plus two FK columns), 18 columns on Sheet 2. |
| Best for | Bulk imports into Xero / QuickBooks / accounting tools, totals-only reporting, simple spreadsheets. | Reconciling charge breakdowns, TOU / tier analysis, splitting by supplier on CCA bills, auditing why a meter cost what it cost. |
| Opens in | Excel, Google Sheets, Numbers, any text editor. | Excel, Google Sheets, Numbers, LibreOffice (binary .xlsx). |
Sheet 1 of the workbook is a strict superset of the CSV — same column order with two FK columns prepended — so you can ignore Sheet 2 and treat the workbook as a richer CSV if you don't need per-line detail. Jump straight to the Excel workbook section for the sheet-by-sheet schema, or read on for the CSV.
- One row per metered service. A service produces a row only when the bill prints a volume reading for it (a physical meter, an NMI, a recorded usage). Flat-fee services that appear as line items without a meter reading — sewerage on a flat residential rate, parks levies, late fees — do not get their own row; they roll into
bill_new_chargesonly. A single-meter bill produces one row; a two-meter bill produces two rows that share invoice-level columns. - Dates are ISO 8601 (
YYYY-MM-DD). - Strings are emitted verbatim — utility names, tariff names, addresses are not normalized or abbreviated.
- Numbers are raw values: no commas, no currency symbols, no embedded units. Units live in their own column (e.g.
meter_total_unit,currency). - Units include both US-style (
kwh,therms,ccf,gallons,kw) and metric (mj,m3,kl) for Australian and European bills. - Blank means “not stated on the bill”, not zero. See Sign conventions & nulls.
Row scope & sums
Each column has one of three scopes. The scope is shown as a chip in the field-reference tables below.
Varies across the meter rows of a single bill. meter_total_cost, service_account_number, tariff, etc.
Same invoice-level value on every meter row of the same bill. bill_total, customer_name, addresses, dates, currency.
Information about the extraction itself. extraction_status, extraction_warnings, source_filename.
Calculations
The numeric fields satisfy two summing identities you can use to validate your own roll-ups.
1. Invoice-level rollup (all row-repeated):
bill_total = previous_balance
− payments_received
+ bill_new_charges
+ late_fee
The same numbers appear on every meter row of a single bill.
When the bill shows no prior balance and no in-period payment,
bill_total == bill_new_charges.
2. Per-meter to invoice rollup:
bill_new_charges ≈ sum of meter_total_cost
across every row of the same bill
Approximate because invoice-level fees, discounts, rounding, or
flat-fee services without their own meter reading land outside any
single meter row. A multi-utility municipal bill that mixes a
metered electric service with a flat sewerage charge will leave a
gap here by design — that case emits
MULTI_SERVICE, not COST_MISMATCH. We
reserve COST_MISMATCH for a single-service bill
where the meter-row total is more than 10% off the invoice's
stated new charges — the signal of an actual extraction
problem worth spot-checking.
Collapsing to one row per bill.
Group the CSV by bill_customer_account + bill_start_date.
Keep the first value for row-repeated columns; sum the per-meter ones
(meter_total_volume, meter_total_cost).
Field reference
60 columns, in the order they appear in the CSV.
Extraction metadata
Pipeline-level fields about the extraction itself. One value per CSV row.
| Column | Type | Scope | Nullable | Description |
|---|---|---|---|---|
| source_filename |
string
|
no | Original filename uploaded by the user. Used to trace each CSV row back to its source PDF. | |
| source_page_count |
number
|
yes | Number of pages in the source PDF. | |
| extraction_status |
enum
completed
partial
failed
|
no | Status of the extraction run. 'partial' means critical fields were missing or overall confidence fell below threshold; 'failed' means the bill was unreadable or extraction errored out. | |
| extraction_error |
string
|
yes | Error code when extraction_status is 'failed' (e.g. 'unreadable_file', 'extraction_failed'). Null otherwise. | |
| extraction_warnings |
string
|
yes | Semicolon-joined warning codes emitted during extraction (e.g. 'MULTI_METER;COST_SHARED_ACROSS_METERS'). See warning_codes section of this dictionary for the canonical set. | |
| extraction_confidence |
number
|
yes | Mean confidence (0-1) across all populated fields for this row, rounded to 4 decimal places. | |
| extracted_at |
string
|
yes | ISO 8601 datetime when the extraction completed. Falls back to the Extraction model's updated_at if the lambda didn't set it. |
Grid & market
The delivery utility and the ISO/RTO market that operates the wires. Same on every meter row of one bill.
| Column | Type | Scope | Nullable | Description |
|---|---|---|---|---|
| transmission_region |
enum
CAISO
ERCOT
NYISO
ISONE
PJM
MISO
SPP
SERC
WECC_SW
WECC_PC
NEM
WEM
OTHER
|
row-repeated | yes | Electric-only. Wholesale market / ISO-RTO that operates the transmission grid serving this meter. US ISO/RTOs: CAISO, ERCOT, NYISO, ISONE, PJM, MISO, SPP, plus the SERC reliability region and WECC_SW / WECC_PC for non-CAISO western US. Australian markets (AEMO-operated): NEM (eastern + southern states — NSW/QLD/VIC/SA/TAS/ACT) and WEM (Western Australia's SWIS). Use OTHER for jurisdictions outside these (e.g. Pilbara NWIS, Northern Territory, non-AU/US bills) and null for gas and water bills. |
| load_zone |
string
|
row-repeated | yes | Electric-only. ISO/RTO sub-zone where the meter is located. Format varies by market: CAISO uses SLAP identifiers like 'SLAP_SCEW'; ERCOT uses 'LZ_HOUSTON' / 'LZ_NORTH'; NYISO uses single-letter zones like 'J' or 'K'; PJM uses zone names like 'AEP', 'COMED'; NEM (Australia) uses AEMO region codes 'NSW1' / 'QLD1' / 'VIC1' / 'SA1' / 'TAS1' (the digit-1 suffix is part of the canonical code, never strip it); WEM (Western Australia) is a single region so populate 'SWIS' when applicable. Gas and water bills leave this null. |
| utility_name |
string
|
row-repeated | no | Name of the regulated utility / delivery company that owns the wires to this meter, as printed on the bill. Never normalized or abbreviated by the extractor. |
| utility_phone |
string
|
row-repeated | yes | Customer-service phone number for the utility, as printed on the bill. |
| utility_vendor_id |
string
|
row-repeated | yes | Utility's internal vendor/supplier code if printed on the bill. Rare. |
| utility_tax_id |
string
|
row-repeated | yes | Federal tax ID (FEIN) of the utility if printed on the bill. Mostly seen on commercial bills. |
| commodity_supplier |
string
|
row-repeated | yes | Competitive supplier the customer buys their commodity from, when distinct from the delivery utility. Electric markets call this the Load-Serving Entity (LSE), Competitive Retail Electric Service (CRES) provider, or Community Choice Aggregator (CCA). Gas markets call this a marketer or natural-gas supplier. Null on bundled / regulated-monopoly bills where the utility is both supplier and deliverer. |
| service_voltage |
string
|
row-repeated | yes | Electric-only. Service voltage level / category as printed on the bill. May be a tier name ('Primary', 'Secondary', 'Transmission'), a phase descriptor ('Secondary Three Phase'), or a kV range ('>= 600 V', '< 35 kV'). Capture verbatim. Gas/water bills leave this null. |
Identification
The customer plus the specific service point being billed. Varies per meter.
| Column | Type | Scope | Nullable | Description |
|---|---|---|---|---|
| service_account_number |
string
|
per-meter | yes | Canonical site/service identifier used by the utility to refer to the specific service point. Works for electric, gas, water, and other utility commodities. Goes by different names across regions and commodities: CAISO calls it Service Agreement ID / Service Account Number (SAN); ERCOT uses ESIID; ISO-NE uses Electric Account Number; NYISO uses TO Account Number; AES Ohio calls it Choice Service ID. All variants are the same concept under different labels — they identify the service/site at the utility's grid level. |
| utility_meter_serial_number |
string
|
per-meter | yes | Serial number of the physical meter device, as engraved on the meter and printed on the bill. One row is emitted per meter on multi-meter bills, each with its own utility_meter_serial_number. |
| point_of_delivery_id |
string
|
per-meter | yes | Premise-level identifier that stays stable across customer changes (e.g. SCE's 17-digit POD ID; an apartment building's identifier doesn't change when a new tenant moves in). Different from service_account_number, which can change when the customer's account changes. Populated only when the bill prints an explicit Point-of-Delivery / Premise / POD identifier distinct from the service-account number. |
| customer_name |
string
|
row-repeated | yes | Customer / account-holder name as printed on the bill (the entity that pays for service at this account). |
Location
Service address (where the meter is) and billing address (where the bill is sent). Row-repeated.
| Column | Type | Scope | Nullable | Description |
|---|---|---|---|---|
| service_street1 |
string
|
row-repeated | yes | Street number and street name of the service address (where the commodity is delivered). |
| service_street2 |
string
|
row-repeated | yes | Apartment, suite, unit, or floor on the service address (when present). |
| service_city |
string
|
row-repeated | yes | City of the service address. |
| service_state |
string
|
row-repeated | yes | State / province of the service address. 2-letter postal abbreviation for US/Canadian bills; full name otherwise. |
| service_zip |
string
|
row-repeated | yes | Postal code of the service address as printed on the bill (US 5-digit, ZIP+4, or international format). |
| service_country |
string
|
row-repeated | yes | ISO country name for the service address. Inferred from postal-code format / utility if not explicitly stated. |
| billing_street1 |
string
|
row-repeated | yes | Street number and name of the remittance / mailing address (where bills are sent). |
| billing_street2 |
string
|
row-repeated | yes | Apartment, suite, unit, or floor on the remittance address. |
| billing_city |
string
|
row-repeated | yes | City of the remittance address. |
| billing_state |
string
|
row-repeated | yes | State / province of the remittance address. |
| billing_zip |
string
|
row-repeated | yes | Postal code of the remittance address. |
| billing_country |
string
|
row-repeated | yes | ISO country name for the remittance address. |
Billing & service agreement
Customer-level account number plus any per-meter service agreement.
| Column | Type | Scope | Nullable | Description |
|---|---|---|---|---|
| bill_customer_account |
string
|
row-repeated | no | Customer-level account number printed on the bill, used for remittance. Distinct from service_account_number, which identifies the grid-side service point. Some utilities (e.g. SCE residential) use similar formats for both; others (e.g. AES Ohio) use very different numbers. |
| bill_service_agreement_number |
string
|
per-meter | yes | Service agreement / supplier contract identifier on the bill side. Distinct from service_account_number (grid-side) and bill_customer_account (customer-level). For SCE this often equals service_account_number; for deregulated supplier bills (AES Ohio + Dynegy, CCA territories), it's a separate identifier issued by the supplier. |
| bill_service_agreement_start_date |
date
|
per-meter | yes | Effective date of the supplier-side service agreement / contract, when printed on the bill. ISO 8601 YYYY-MM-DD. |
| bill_service_agreement_end_date |
date
|
per-meter | yes | Expiration date of the supplier-side service agreement / contract, when printed on the bill. ISO 8601 YYYY-MM-DD. |
Rate & service class
The tariff that prices the meter, plus the derived service-class classification.
| Column | Type | Scope | Nullable | Description |
|---|---|---|---|---|
| tariff |
string
|
per-meter | yes | Rate / tariff / schedule name as printed on the bill. Never normalized. |
| service_class |
enum
res-electric
comm-electric
ind-electric
agr-electric
res-gas
comm-gas
ind-gas
agr-gas
res-water
comm-water
ind-water
res-sewer
comm-sewer
res-steam
comm-steam
unknown
|
per-meter | no | Customer classification + commodity. Derived from the tariff name and customer profile on the bill. |
| service_type |
enum
Electric
Gas
Water
Sewer
Steam
Telecom
|
per-meter | yes | Single-word service category from the bill: 'Electric', 'Gas', 'Water', 'Sewer', 'Steam', or 'Telecom'. |
| is_tou_rate |
bool
|
per-meter | yes | Electric-only. True if the bill shows any time-of-use rate windows (on-peak / off-peak / mid-peak / shoulder); false otherwise. The detailed per-band breakdown lives in meters[i].line_items[] (kind=consumption, with name carrying the verbatim band label) rather than in flat columns. Gas bills rarely use TOU rates; leave null when not applicable. |
| baseline_allowance |
number
|
per-meter | yes | Electric-only. The kWh threshold that determines tier crossover on a tiered rate (e.g. SCE 'Your winter baseline allowance: 215.0 kWh', PG&E equivalent). Lets a customer reason about 'am I close to crossing into Tier 2?' Null on flat-rate bills (single-rate AU retailers, simple commercial flat tariffs). On a quarterly or cross-season bill that prints two allowances (winter and summer portions), leave null and emit MULTI_SEASON_BILL — the per-segment values surface on the corresponding tier line items via start_date / end_date. |
| baseline_season |
enum
winter
summer
|
per-meter | yes | Electric-only. Which seasonal baseline allowance applies for this bill. Implicit on SCE ('Your winter baseline allowance'); explicit on some utilities. Null on bills with no seasonal baseline. Same MULTI_SEASON_BILL caveat as baseline_allowance — null when the bill spans both seasons. |
| tax_in_total |
number
|
per-meter | yes | Region-neutral. The bill's stated total tax embedded in meter_total_cost — populated when the bill prints a single headline tax figure ('GST in total $59.46', 'Total includes GST component of $10.77', 'Includes Sales Tax $X', UK 'VAT total'). Null when the bill itemises tax as discrete line items instead (US bills like SCE itemise State tax as a line_item with kind=tax — the line item is the source of truth). Never both: if line items are tax-inclusive and the bill also prints a headline figure, capture the headline here; if line items are tax-exclusive (with a separate kind=tax line), leave this null. |
Dates
Statement, billing-period, and due dates. Row-repeated.
| Column | Type | Scope | Nullable | Description |
|---|---|---|---|---|
| bill_statement_date |
date
|
row-repeated | yes | Date the bill was issued by the utility. ISO 8601 YYYY-MM-DD. |
| bill_start_date |
date
|
row-repeated | yes | First day of the billing period covered by this row. ISO 8601 YYYY-MM-DD. |
| bill_end_date |
date
|
row-repeated | yes | Last day of the billing period covered by this row. ISO 8601 YYYY-MM-DD. |
| bill_due_date |
date
|
row-repeated | yes | Payment due date for the bill. ISO 8601 YYYY-MM-DD. |
| bill_frequency |
enum
monthly
bi-monthly
quarterly
annual
|
row-repeated | yes | How often the bill is issued. Taken from the bill if explicitly stated; otherwise inferred from (bill_end_date - bill_start_date): ~30 days = monthly, ~60 = bi-monthly, ~90 = quarterly. |
| days_in_period |
number
|
row-repeated | yes | Number of days the billing period covers. Use the value the bill prints directly when it states one (e.g. 'Service Period: 30 days', 'Number of Days: 31'); otherwise compute it from (bill_end_date - bill_start_date). Row-repeated: every meter row on the same invoice shares this value. Null when neither a stated days value nor both period dates are available. When the stated value disagrees with the computed value by more than one day, prefer the stated value and emit DATE_MISMATCH. |
Usage, demand & cost
Per-meter consumption and cost, plus the invoice-level adjustment fields that satisfy the summing identities.
| Column | Type | Scope | Nullable | Description |
|---|---|---|---|---|
| meter_total_volume |
number
|
per-meter | yes | Usage attributed to this single meter for the billing period. Per-meter scope: each meter on a multi-meter invoice gets its own value. Raw number with no units or commas. The unit is in meter_total_unit. |
| meter_total_unit |
enum
kwh
therms
ccf
gallons
kw
mj
m3
kl
|
per-meter | yes | Unit of meter_total_volume for this single meter. US-style units: kwh (electric), therms (gas energy), ccf (gas/water volume), gallons (water), kw (demand). Metric/AU-style units: mj (gas or electric energy in megajoules), m3 (gas or water volume in cubic metres), kl (water in kilolitres). Lowercase canonical form regardless of how the bill prints it (e.g. 'MJ' on the bill -> 'mj'). |
| read_type |
enum
actual
estimated
customer_read
|
per-meter | yes | How the meter reading at the end of the billing period was obtained: 'actual' (utility read the physical meter or AMI sent a reading), 'estimated' (utility estimated usage because no read was taken — common for skipped months, storm interruptions, or inaccessible meters), or 'customer_read' (customer self-read and reported the value). Lowercase canonical form regardless of how the bill prints it (e.g. 'Estimated Reading' / 'EST' -> 'estimated'). Null when the bill doesn't state how the read was obtained. |
| meter_total_cost |
number
|
per-meter | yes | Cost attributed to this single meter for the billing period. Per-meter scope: each meter on a multi-meter invoice gets its own value. Sum across all meter rows on the same invoice equals bill_new_charges (modulo invoice-level fees, discounts, or rounding). Negative values are valid - credits and solar feed-in show as negative. For the invoice's headline 'amount due' figure see bill_total. Raw number, no currency symbols; currency lives in the currency field. When a multi-meter bill reports only one combined cost with no per-meter breakdown, that single value is repeated on every row and the COST_SHARED_ACROSS_METERS warning is emitted. |
| bill_total |
number
|
row-repeated | yes | Invoice-level headline amount the customer is asked to pay (the 'amount due' / 'pay this' figure printed at the top of the bill). Row-repeated: every meter row on the same invoice shares this value. Relationship: bill_total = previous_balance - payments_received + bill_new_charges + late_fee. When the bill shows no previous balance and no in-period payment, bill_total equals bill_new_charges. Positive when something is owed; negative when the account is in credit. |
| bill_new_charges |
number
|
row-repeated | yes | Total charges for the current billing period, before previous-balance or payment adjustments. The sum of meter_total_cost across every meter row on the same invoice equals this value (modulo invoice-level fees, discounts, rounding, or flat-fee services that have no meter row of their own). On a single-service bill, a gap over 10% emits COST_MISMATCH; on a multi-service bill the gap is expected and emits MULTI_SERVICE instead. Row-repeated. Null when the bill does not separately show a current-charges subtotal distinct from the headline total. |
| previous_balance |
number
|
row-repeated | yes | Outstanding balance carried over from the previous invoice. Positive when the prior bill was unpaid; negative when the customer was in credit. Null on first bills or when the bill does not show a carry-forward line (do not default to 0 - null preserves the distinction between 'not stated' and 'explicitly zero'). Row-repeated. |
| payments_received |
number
|
row-repeated | yes | Sum of payments received from the customer since the previous invoice. Always reported as a positive magnitude; the summing relationship subtracts it from the running balance. Null when the bill does not show a payments line (do not default to 0 - null preserves the distinction between 'not stated' and 'explicitly zero'). Row-repeated. |
| late_fee |
number
|
row-repeated | yes | Penalty charge applied at the invoice level for late payment of the prior balance. Positive when present; null when no late fee applies (the common case - do not default to 0). Distinct from a meter-level disconnection or reconnection fee, which belongs to its meter's meter_total_cost rather than here. Row-repeated. |
| max_kw |
number
|
per-meter | yes | Electric-only. Peak / maximum demand for this meter during the billing period, in the unit declared by demand_unit (typically kW; some commercial bills report kVA). Null when the bill doesn't report demand. Gas analog (max therms/hour or peak Dth) and water analog (max GPM) are not currently captured; would be separate fields if needed. |
| demand_unit |
enum
kw
kva
|
per-meter | yes | Unit of max_kw for this meter: 'kw' (real power demand) or 'kva' (apparent power demand, common on industrial / large-commercial bills where the utility prices power-factor as well as real power). Electric-only. Lowercase canonical form regardless of how the bill prints it (e.g. 'kVA' -> 'kva'). Null when max_kw is null. When max_kw is populated but the bill prints only 'Demand' without further qualification, default to 'kw'. |
| currency |
string
|
row-repeated | no | ISO 4217 currency code for monetary fields. Inferred from utility/address when not explicitly stated; the CURRENCY_ASSUMED warning fires in that case. |
State-specific identifier notes
Different jurisdictions print the same canonical concept under different labels. We fold them into a single column so downstream tooling doesn't need to special-case them.
Folded into service_account_number
| Label on the bill | Region | Why we fold |
|---|---|---|
| ESIID | ERCOT, Texas | ERCOT's 17-digit Electric Service Identifier is the same concept as a SAN under a different naming convention. We fold it into service_account_number. |
| SP ID | CAISO, PG&E commercial | PG&E commercial Service Point ID is the same site identifier as Service Agreement ID under a different label. |
| Choice Service ID | PJM (AES Ohio), Ohio Electric Choice | Per AES Ohio bill glossary (background/NNNNNNNNNNNN.pdf, p.5): 'Unique customer identification number to provide your selected Competitive Retail Electric Service (CRES) provider when enrolling.' Same canonical concept as a SAN/ESIID. |
| Service Agreement ID | CAISO (PG&E) | PG&E's name for the canonical service-point ID. Same concept as SAN. |
| TO Account Number | NYISO | NYISO's name for the canonical service-point ID, printed by ConEd / PSEG-LI on their bills. |
When point_of_delivery_id is populated
A premise-level identifier that stays stable across customer turnover. Populated only when the bill prints it explicitly (most utilities do not).
| Label on the bill | Region | Notes |
|---|---|---|
| POD ID | Italy, CAISO (SCE commercial) | Premise-level identifier independent of customer (stable across tenant turnover). Kept as its own canonical field rather than folded into service_account_number because SCE's Greenbutton hierarchy explicitly distinguishes the two. |
Folded into bill_customer_account
| Label on the bill | Region | Notes |
|---|---|---|
| Summary Account | commercial multi-site | Parent account for customers with multiple sites under one billing relationship. Today the bill_customer_account on each row already identifies the customer; we'd promote this back to its own column only if a real customer needs site-level AND parent-level identifiers on the same row. |
| LDC Account Number | deregulated supplier markets (PJM, ERCOT, NY, OH) | On supplier-direct bills, the LDC may print its own account number alongside the supplier's. Whichever party is the issuer of the PDF we're parsing, we use their printed account as bill_customer_account. If both must coexist on one row, revisit this decision. |
Sign conventions & nulls
- Credits and solar feed-in:
meter_total_costis negative. A meter that exported more energy than it consumed will roll in as a negative number — that's intentional, the summing identity above relies on it. payments_receivedis always a positive magnitude. The identity above subtracts it; do not double-negate.previous_balanceis positive when the prior bill was unpaid, negative when the customer was in credit going in.late_feeis positive when present; blank when there was no late fee on this bill.
Null versus zero.
A blank cell means the bill did not state that line. Don't treat it as zero — that loses
the distinction between “no late fee was charged on this bill” (very common,
blank) and “the bill explicitly shows a $0.00 late fee” (rare, value 0).
This applies to previous_balance, payments_received,
and late_fee in particular.
Warning codes
Codes appear semicolon-joined in the extraction_warnings column.
Most are informational, not failures.
| Code | Meaning | What to do |
|---|---|---|
| LOW_QUALITY_SCAN | Source PDF was a poor-quality scan; extraction confidence may be reduced. | Re-upload a higher-resolution scan or the original PDF from the utility portal. OCR confidence is reduced on poor scans. |
| MULTI_METER | Bill lists more than one meter (each becomes its own CSV row). | Expected — each meter becomes its own row. Group by bill_customer_account + bill_start_date to collapse to one row per bill. |
| MULTI_SERVICE | Bill mixes services (e.g. electric + gas, or electric + flat-fee sewerage) on one statement. | The bill mixes services. Each metered service gets its own row; flat-fee services without a meter reading land in bill_new_charges only, so sum(meter_total_cost) may legitimately fall short of bill_new_charges. Filter by service_type if you want a single commodity. |
| COST_SHARED_ACROSS_METERS | Bill reported a single combined cost; the same value is repeated on every meter row. | The bill printed only one combined cost. Each row carries that combined value; do not sum meter_total_cost — use bill_new_charges for the invoice total. |
| DATE_MISMATCH | Period dates didn't agree across sections of the bill. | Two sections of the bill showed different period dates. Spot-check bill_start_date / bill_end_date before relying on them. |
| COST_MISMATCH | Line-item costs didn't sum to the reported total on a single-service bill. | On a single-service bill, per-meter costs do not sum to the invoice's stated new-charges total within 10%. Cross-check before relying on either figure. A gap on a multi-service bill is reported as MULTI_SERVICE instead, since flat-fee services legitimately fall outside meter rows. |
| UTILITY_UNKNOWN | Couldn't confidently identify the utility name. | Couldn't confidently identify the utility name. utility_name may be blank or wrong; correct it manually if you depend on it. |
| CURRENCY_ASSUMED | Currency wasn't stated; inferred from address/utility. | Currency code was inferred from the utility or service address because the bill didn't print it. Verify if you mix currencies. |
| PARTIAL_EXTRACTION | One or more critical fields were missing or low-confidence. | One or more critical fields were missing or low-confidence. Spot-check the row before trusting the totals. |
| LINE_ITEM_TOTAL_MISMATCH | Sum of line_items[].cost diverges from meter_total_cost beyond the floored tolerance max(0.01 * abs(meter_total_cost), $0.50). | Spot-check the bill's printed per-meter total against the line items in the XLSX 'Line items' sheet — likely a missing line, a sign error, or a decimal mistake. The absolute $0.50 floor exists because pure-percentage tolerances misfire on net-credit / near-zero bills (high-solar months can net to negative dollars from offsetting tens-of-dollar components). |
| LINE_ITEM_NAME_UNFAMILIAR | One or more line items couldn't be classified into a kind enum and landed at kind=other. | The line item's verbatim name is preserved in the XLSX 'Line items' sheet — the analyst can still see what it is, classification just didn't recognise the pattern. A rising kind_other_rate is an ops signal that new patterns should be added to common/kind_patterns.yaml. |
| TOU_BANDS_NOT_PRINTED | Bill is on a Time-of-Use rate (is_tou_rate=true) but doesn't print per-band figures — line_items[] only contains the aggregated General Usage line. | The bill's customer is on TOU but their statement doesn't itemise peak/off-peak/shoulder. Only the tariff name and aggregated usage are available; the per-band split isn't recoverable from the PDF alone. Origin's Go Variable bills are the canonical example. |
| MULTI_SEASON_BILL | A quarterly or cross-season bill prints two baseline allowances (one per season-portion of the period). base.baseline_allowance and base.baseline_season are left null. | The bill spans a seasonal baseline boundary (e.g. an AU summer→winter transition in May/Sep, a US winter→summer transition in May/Oct). The per-segment baselines surface on the relevant tier rows in the XLSX 'Line items' sheet via start_date / end_date — use those for analysis instead of the null base headline. |
Sample CSV
Three rows of synthetic data that exercise every concept above: a
single-meter US residential bill and a two-meter Australian bill
where solar feed-in shows as a negative meter_total_cost.
Download utilitybillreader-sample-export.csv
Three rows; one row per meter. Long lines wrap horizontally — download to see all 60 columns.
Excel workbook (advanced)
The .xlsx download is the same extraction laid out as a
two-sheet workbook so per-charge detail — TOU bands, tier rows,
supplier splits on CCA bills, individual taxes — doesn't get
flattened away. Available from the dashboard's Excel
button (next to CSV) and from the same per-row
download menu the CSV uses.
Two sheets, one foreign key
- Sheet 1 —
Bills. Same 60-column CSV schema, with two FK columns (extraction_id,meter_index) prepended. One row per metered service, same scope rules as the CSV. - Sheet 2 —
Line items. One row per charge line across every meter on every bill in the workbook, with three FK columns (extraction_id,meter_index,line_index) that link each line back to its meter row on Sheet 1.
Sheet 2 is empty (header row only) on bills where no per-line charge
detail was itemized — bill totals still appear on Sheet 1's
meter_total_cost. The workbook layout is identical either
way, so your tooling doesn't change whether or not a bill carries
line detail.
Sheet 1: Bills
Header row layout, left-to-right: the two FK columns, then every column from the Field reference in the same order as the CSV.
| Column | Type | Description |
|---|---|---|
| extraction_id | uuid | Stable identifier for the uploaded bill. Same across every row of a multi-meter bill, and the join key for Sheet 2. |
| meter_index | integer | Zero-based ordinal of the meter on the bill. Joins with Sheet 2's meter_index to attach each line item to its meter row. |
| … then the 60 canonical fields, in CSV order — see the Field reference for each one's type, scope, and description. | ||
Sheet 2: Line items
18 columns. One row per charge line,
in the order the line appeared on the bill. Strings come back
verbatim (no normalization on name, source_section,
or supplier); kind and section
are the only normalized fields.
| Column | Type | Description |
|---|---|---|
| extraction_id |
uuid
|
Foreign key joining back to Sheet 1's extraction_id column. Same value on every line item of the same bill. |
| meter_index |
integer
|
Zero-based index of the meter this line came from. Joins with Sheet 1's meter_index to pick out the row this charge contributes to. |
| line_index |
integer
|
Zero-based position of this line within its meter's line_items[] — preserves the bill's printed order. |
| kind |
enum
consumption
export
supply
credit
tier
demand
flat_fee
tax
other
|
Normalized charge type so you can pivot without parsing names. |
| section |
enum
delivery
generation
tax
other
|
Normalized section the line was filed under. |
| source_section |
string
|
Verbatim section header from the bill (e.g. "Delivery Charges", "Generation Charges", "Taxes & Fees"). Useful when section's four-bucket enum is too coarse. |
| supplier |
string
|
Verbatim supplier name when the bill splits charges by supplier (CCA / community choice aggregator bills do this). Null on single-supplier bills. |
| name |
string
|
Verbatim charge name as printed on the bill — never normalized. The raw text downstream tooling pivots and reconciles on. |
| volume |
number
|
Quantity charged for. Null for flat fees and most taxes. |
| unit |
string
|
Unit volume is denominated in (kwh, therms, ccf, mj, m3, kl, day, etc.). Null when no volume. |
| rate |
number
|
Per-unit price applied. Negative for credits and feed-in. Null when the bill prints only a total. |
| rate_unit |
string
|
How the rate is denominated — e.g. "$/kwh", "$/day", "$/mj". Null when rate is null. |
| cost |
number
|
Dollar amount the line contributes. Negative for credits, rebates, and net feed-in. Sums up to meter_total_cost within the floored tolerance — see COST_MISMATCH. |
| start_date |
date
|
ISO 8601 service-window start, when the bill prints per-line dates (rare). Usually null. |
| end_date |
date
|
ISO 8601 service-window end. Usually null — defer to meter's bill_start_date / bill_end_date. |
| time_window |
string
|
TOU window the line covers, verbatim (e.g. "4pm–9pm Mon–Fri"). Populated only on time-of-use rates that print per-band figures. |
| plan_label |
string
|
Verbatim plan or product name when the line was filed under one (e.g. "ZEROHERO"). Mostly null on US bills. |
| confidence |
number
|
Per-line extractor confidence, 0–1. |
Cost signs follow the CSV convention — credits, rebates, and
feed-in lines come back as negative cost. See
Sign conventions & nulls.
Joining Sheet 2 back to Sheet 1
Both sheets share the (extraction_id, meter_index)
pair. Group Sheet 2 by that pair to roll line items back up to a
meter; group by extraction_id alone to roll back up to
a bill. The pivot below sums Sheet 2's cost by meter
and reconciles it against Sheet 1's meter_total_cost:
// Excel formula on Sheet 1, alongside meter_total_cost:
SUMIFS('Line items'!M:M,
'Line items'!A:A, A2, // extraction_id
'Line items'!B:B, B2) // meter_index
Sheet 2's cost column is column M; the FK
columns are A and B. The sum should equal
Sheet 1's meter_total_cost within the floored tolerance
described under COST_MISMATCH in
Warning codes.
Other common pivots:
- Spend by charge type: pivot
costbykindto split consumption vs supply vs taxes vs credits across a whole portfolio of bills. - Spend by supplier: pivot by
supplieron CCA / community-aggregation bills (PG&E + MCE, SDG&E + Clean Energy Alliance, etc.) to see the delivery / generation split per supplier. - TOU breakdown: filter
kind = consumptionand pivotvolume+costbytime_windowto recover peak / off-peak / shoulder usage that the flat CSV folds into a singlemeter_total_volume.
Sample template
Same three bills as the Sample CSV above — the single-meter US residential bill and the two-meter Australian bill with solar feed-in — laid out in the workbook format with a populated Sheet 2 so you can see exactly how line items shake out on each.
Download utilitybillreader-sample-export.xlsx
Sheet 1: 3 rows + 2 FK columns + the 60 CSV columns. Sheet 2: 6 line items spread across the three meters.
Email Inbox Forwarding
Forward utility bills as PDF attachments to a unique address on
@ bills.utilitybillreader.com and they'll appear on your
dashboard the same way uploaded files do. Also recieve a reply email
summarising the parsed result and a CSV attached. No browser
upload or access steps required.
Workflow
- Sign up, then open the Email Tokens entry in your profile menu to copy your forwarding address. The default token is created automatically.
- Forward (or set up auto-forward in your inbox) any utility bill emails with PDF attachments to that address. Multiple PDFs in one email are processed as a batch.
-
Each PDF lands on the dashboard
as a row with the email-source badge and a "via email from
<sender>" subline — first as
queued, thencompletedwithin ~30s. - Once every PDF in the batch finishes, you receive a single reply email with a per-file summary and a CSV containing the parsed line items, threaded with the original forward so it lands in the same conversation in your inbox.
Forwarding addresses follow the format
ubr_eml_<20-char>@bills.utilitybillreader.com. The
ubr_eml_ prefix lets you spot the addresses at a glance
in your inbox and contact lists.
Multiple addresses per team are supported — create labelled tokens (e.g. AGL gas bills, Retail shops) to route different upstream inboxes without losing context.
Allowed senders — senders must be a team member
We only accept forwards from email addresses that are already on the team that owns the token. Mail from anyone else is silently dropped, and the token's owner receives a one-line notification: "Someone at <sender> tried to forward to your ingest address and we rejected it."
To let a colleague forward to your team's address, add them to your team using their registered email address. Any team member's address then becomes a valid sender for every token on the team.
SPF and DKIM must also pass on the sender's domain — we drop forgeries silently. Forwarding from your own Gmail/Outlook generally passes both; forwarding through an old POP3 server that re-stamps headers may not. If a legitimate forward keeps failing, drop us a note with the sender domain.
Limitations
- PDF attachments only. Any other attachment type is ignored. Inline images, bills embedded as the email body itself, or links to a utility's web portal are not parsed today.
- 40 MB total per inbound message (an AWS SES cap we can't override). Most utility bills are well under 1 MB. Oversized forwards bounce back from SES — the failure isn't visible in your dashboard, so the forwarder will see a bounce notice from their own mail server.
- Quota applies the same as web uploads. Pages consumed by email-forwarded bills draw from your team's monthly page allowance. A forward that would exceed the remaining quota is dropped and we log it but don't currently auto-reply — check the dashboard quota meter if a forward seems to have vanished.
- Forwarder identity is the outer From header. If you forward a thread, we attribute it to the immediate forwarder (you), not the utility that sent the original. The utility name shows up in the parsed data, not in the source attribution.
- One reply per inbound, not per attachment. Forward five PDFs in one email and you get one reply once all five have finished extracting (success or failure). Each PDF still becomes visible on the dashboard the moment its individual extraction completes — you don't need to wait for the reply to see results.
Reply email
Sent from noreply@utilitybillreader.com to the original
forwarder, threaded with the inbound message (via In-Reply-To
and References headers) so most mail clients attach it to
the same conversation. The body is a per-file table with status
(parsed / could not parse / duplicate — review), the utility we
detected, and a button back to the dashboard.
A CSV named parsed-bills-YYYY-MM-DD.csv is attached when
at least one PDF parsed successfully. Its column layout matches the
Field reference below — same
canonical names as the dashboard's CSV download, the
Excel workbook's Bills sheet, and the API's JSON
response. (The richer Excel workbook isn't auto-attached to email
replies; grab it from the dashboard when you need per-line detail.)
Replies are suppressed when the forwarder's address looks like an
automation account (postmaster@, noreply@,
mailer-daemon@, etc.) to avoid reply loops.
Rotate or revoke a token
Forwarding addresses are stable but rotatable. If one ends up in the wrong hands — or you'd just rather not keep using a particular upstream inbox — use the Email Ingest Tokens page to:
- Rotate — issues a new active address with the same label and immediately disables the old one. Forwards to the old address are dropped from that moment on.
- Revoke — disables an address without replacing it. Mail to the revoked address is dropped before it reaches your account.
Partner API
A small REST surface for partners: upload a bill, fetch the
parsed JSON, list every bill on your account, check your
remaining quota. JSON responses use the same canonical column
names as the Field reference below
— each result carries a rows array (identical
to the CSV download and Sheet 1 of the
Excel workbook) and a line_items
array (identical to Sheet 2), so the API mirrors the full
workbook, not just the flat CSV. The API returns JSON only;
use the dashboard's CSV / Excel buttons (or these response
arrays) to drive your own export.
Authentication
Mint a token from your account's API Tokens page (Profile menu → API Tokens). Send it on every request as a Bearer credential:
Authorization: Bearer <your-token>
Tokens are scoped to a single user account; treat them like passwords. Revoke a token anytime from the same page.
Quota
API uploads draw from the same monthly page quota as the web
app — one PDF page counts as one page used. Use
GET /api/v1/user to check
your remaining pages, or visit the
pricing page
to compare plans. Quota is decremented when an extraction
finishes, not at upload time, but a queued bill counts
against your remaining balance to prevent racing.
Bills are visible through the API for as long as your plan's
retention window lasts — 90 days on the paid tiers and
unlimited on Enterprise. Past that window the source PDF is
purged and the bill drops out of both the dashboard's
history table and every API response (404 on
/bill/{id}, omitted from /bill).
Endpoints
All endpoints live under /api/v1/, require the
Authorization header, and speak JSON. The
collection vs. single-item distinction is the presence of an
{id} path segment — there is no
/download suffix and no singular-vs-plural
convention to keep straight.
POST /api/v1/bill
Upload one PDF. Returns 202 Accepted with the
new extraction id. Use the id with
GET /api/v1/bill/{id} to track status and fetch
the parsed result.
Request — multipart/form-data with a single file field:
curl -X POST https://utilitybillreader.com/api/v1/bill \ -H "Authorization: Bearer $TOKEN" \ -F file=@/path/to/bill.pdf
Response 202:
{
"id": "0193e3a0-7c3b-7000-8f1a-1234567890ab",
"status": "queued",
"source_filename": "bill.pdf",
"source_page_count": 3,
"created_at": "2026-05-19T12:34:56+00:00"
}
GET /api/v1/bill/{id}
One endpoint covers status polling and result retrieval.
The response shape is identical regardless of state:
rows and line_items are both
null while status is
queued or processing, and become
arrays once the extraction reaches a terminal state —
rows with one object per meter,
line_items with one object per charge line
across every meter. Always 200 OK — check
status (or just rows !== null), not
the HTTP code.
curl https://utilitybillreader.com/api/v1/bill/$ID \ -H "Authorization: Bearer $TOKEN"
Response (terminal):
{
"id": "0193e3a0-7c3b-7000-8f1a-1234567890ab",
"status": "completed",
"source_filename": "bill.pdf",
"source_page_count": 3,
"extraction_confidence": 0.97,
"warnings": [],
"error_code": null,
"error_message": null,
"created_at": "2026-05-19T12:34:56+00:00",
"extracted_at": "2026-05-19T12:35:12+00:00",
"rows": [
{
"utility_name": "Southern California Edison",
"service_account_number": "2-12-345-6789",
"tariff": "DOMESTIC",
"meter_total_volume": 412,
"meter_total_unit": "kwh",
"meter_total_cost": 78.42,
"bill_total": 78.42,
"currency": "USD",
"customer_name": "LASTNAME, FIRSTNAME",
"...": "..."
}
],
"line_items": [
{
"extraction_id": "0193e3a0-7c3b-7000-8f1a-1234567890ab",
"meter_index": 0,
"line_index": 0,
"kind": "consumption",
"section": "delivery",
"supplier": null,
"name": "Peak",
"volume": 180,
"unit": "kwh",
"rate": 0.31,
"rate_unit": "$/kwh",
"cost": 55.8,
"time_window": "peak",
"...": "..."
}
]
}
Every key inside a rows[] object is a CSV
column name — see Field
reference for the full list, types, scopes, and
nullability rules. Every key inside a
line_items[] object is a column from the
Excel workbook's Line items
sheet; the meter_index /
line_index FKs join each line back to its
rows[] entry (row N is
meter_index = N). line_items is an
empty array on bills whose charge lines haven't been broken
out yet.
GET /api/v1/bill
Paginated list of every bill on your team, newest first.
Each entry is the same envelope as the single-id endpoint
(rows + line_items); in-flight
bills appear in the same list with rows: null
and line_items: null.
Query parameters:
limit— 1 to 100, default 50.since— ISO 8601 timestamp; only return bills created on or after this moment.cursor— opaque pagination token returned asnext_cursoron the previous page. Echo it back to fetch the next page.
curl "https://utilitybillreader.com/api/v1/bill?limit=50&since=2026-05-01T00:00:00Z" \ -H "Authorization: Bearer $TOKEN"
Response:
{
"data": [
{ "id": "...", "status": "completed", "rows": [ ... ], "line_items": [ ... ] },
{ "id": "...", "status": "processing", "rows": null, "line_items": null }
],
"next_cursor": "MjAyNi0wNS0xOVQxMjozNDo1NisrMDA6MDAi..."
}
When next_cursor is null, you've
reached the end of the list.
GET /api/v1/user
Current user, team, and remaining page quota. Poll this before a bulk upload to confirm you have enough pages.
curl https://utilitybillreader.com/api/v1/user \ -H "Authorization: Bearer $TOKEN"
Response:
{
"id": "0193e399-0000-7000-8f1a-1234567890ab",
"name": "Jane Smith",
"email": "jane@example.com",
"team": {
"id": "0193e398-0000-7000-8f1a-1234567890ab",
"name": "Jane Smith's Team",
"plan": "professional"
},
"remaining_pages": 1487,
"pages_limit": 1500,
"pages_used_this_period": 13,
"period_resets_at": "2026-06-01T00:00:00+00:00"
}
Response shape
Objects inside rows[] are keyed by the same
canonical column names the CSV uses. The
Field reference below is the
single source of truth: it lists every key, its type,
its scope (per-meter vs. row-repeated vs. metadata), and
whether it can be null.
Objects inside line_items[] are keyed by the
same columns as the Excel workbook's
Line items sheet. Each line
carries extraction_id, meter_index,
and line_index FKs: join on
meter_index to attach a charge line to its
rows[] entry, or sum cost per
meter_index to reconcile against that row's
meter_total_cost. line_items is an
empty array on bills with no itemized charge lines, so treat
it as optional detail on top of the always-present
rows[].
Conventions carry over from the CSV: dates are ISO 8601,
numbers carry no commas or currency symbols (units live in
their own column), blanks mean not stated on the bill
rather than zero, and credits/solar feed-in show as a
negative meter_total_cost. See
Sign conventions & nulls for the
full set.
Error codes
Errors are returned as
{ "error": "<message>" } with the HTTP
status set accordingly.
| Status | When |
|---|---|
| 400 | Validation failure — missing file, wrong mimetype, malformed since or cursor. |
| 401 | Missing or invalid Authorization token. |
| 402 | Upload would exceed your team's remaining pages. |
| 403 | Bill belongs to a different team. |
| 404 | No bill exists with that id, or the bill is past your plan's retention window. |
| 413 | File larger than the per-upload size cap. |
No 425 Too Early — an in-flight bill
returns 200 with rows: null and
line_items: null, not an error.