Utility Bill Reader logo UTILITY BILL READER

Export Schemas

Everything the export produces, in one place. 60 canonical columns across 8 sections, 13 warning codes, and the per-meter / invoice-level scope rules you need to read multi-meter bills correctly. Two download formats — a flat CSV for spreadsheet workflows and an advanced two-sheet Excel workbook that adds per-line charge detail — use identical column names and types, so anything that's true of one is true of the other.

Overview

The export turns each uploaded utility bill PDF into structured data. Every column has a stable canonical name regardless of what the utility happens to call it on the bill — see State-specific identifiers for the alias map. The Field reference below is the single source of truth for both CSV and Excel: identical column names, identical types, identical nullability.

CSV vs Excel — which one to choose

Both downloads come off the same extraction. They differ only in cardinality: the CSV is fixed-width and flat; the Excel workbook adds a second sheet for the variable-cardinality per-line detail (TOU bands, tier rows, supplier splits, individual taxes) that doesn't fit in a flat table.

Base CSV Advanced Excel (.xlsx)
Layout One flat table. Two sheets — Bills and Line items — joined by FK columns.
Granularity One row per metered service. Sheet 1 matches the CSV (one row per metered service); Sheet 2 adds one row per charge line.
Columns 60 canonical fields. Same 60 fields on Sheet 1 (plus two FK columns), 18 columns on Sheet 2.
Best for Bulk imports into Xero / QuickBooks / accounting tools, totals-only reporting, simple spreadsheets. Reconciling charge breakdowns, TOU / tier analysis, splitting by supplier on CCA bills, auditing why a meter cost what it cost.
Opens in Excel, Google Sheets, Numbers, any text editor. Excel, Google Sheets, Numbers, LibreOffice (binary .xlsx).

Sheet 1 of the workbook is a strict superset of the CSV — same column order with two FK columns prepended — so you can ignore Sheet 2 and treat the workbook as a richer CSV if you don't need per-line detail. Jump straight to the Excel workbook section for the sheet-by-sheet schema, or read on for the CSV.

  • One row per metered service. A service produces a row only when the bill prints a volume reading for it (a physical meter, an NMI, a recorded usage). Flat-fee services that appear as line items without a meter reading — sewerage on a flat residential rate, parks levies, late fees — do not get their own row; they roll into bill_new_charges only. A single-meter bill produces one row; a two-meter bill produces two rows that share invoice-level columns.
  • Dates are ISO 8601 (YYYY-MM-DD).
  • Strings are emitted verbatim — utility names, tariff names, addresses are not normalized or abbreviated.
  • Numbers are raw values: no commas, no currency symbols, no embedded units. Units live in their own column (e.g. meter_total_unit, currency).
  • Units include both US-style (kwh, therms, ccf, gallons, kw) and metric (mj, m3, kl) for Australian and European bills.
  • Blank means “not stated on the bill”, not zero. See Sign conventions & nulls.

Row scope & sums

Each column has one of three scopes. The scope is shown as a chip in the field-reference tables below.

per-meter

Varies across the meter rows of a single bill. meter_total_cost, service_account_number, tariff, etc.

row-repeated

Same invoice-level value on every meter row of the same bill. bill_total, customer_name, addresses, dates, currency.

Information about the extraction itself. extraction_status, extraction_warnings, source_filename.

Calculations

The numeric fields satisfy two summing identities you can use to validate your own roll-ups.

1. Invoice-level rollup (all row-repeated):

bill_total  =  previous_balance
            −  payments_received
            +  bill_new_charges
            +  late_fee

The same numbers appear on every meter row of a single bill. When the bill shows no prior balance and no in-period payment, bill_total == bill_new_charges.

2. Per-meter to invoice rollup:

bill_new_charges  ≈  sum of meter_total_cost
                  across every row of the same bill

Approximate because invoice-level fees, discounts, rounding, or flat-fee services without their own meter reading land outside any single meter row. A multi-utility municipal bill that mixes a metered electric service with a flat sewerage charge will leave a gap here by design — that case emits MULTI_SERVICE, not COST_MISMATCH. We reserve COST_MISMATCH for a single-service bill where the meter-row total is more than 10% off the invoice's stated new charges — the signal of an actual extraction problem worth spot-checking.

Collapsing to one row per bill. Group the CSV by bill_customer_account + bill_start_date. Keep the first value for row-repeated columns; sum the per-meter ones (meter_total_volume, meter_total_cost).

Field reference

60 columns, in the order they appear in the CSV.

Extraction metadata

Pipeline-level fields about the extraction itself. One value per CSV row.

Column Type Scope Nullable Description
source_filename
string
no Original filename uploaded by the user. Used to trace each CSV row back to its source PDF.
source_page_count
number
yes Number of pages in the source PDF.
extraction_status
enum
completed partial failed
no Status of the extraction run. 'partial' means critical fields were missing or overall confidence fell below threshold; 'failed' means the bill was unreadable or extraction errored out.
extraction_error
string
yes Error code when extraction_status is 'failed' (e.g. 'unreadable_file', 'extraction_failed'). Null otherwise.
extraction_warnings
string
yes Semicolon-joined warning codes emitted during extraction (e.g. 'MULTI_METER;COST_SHARED_ACROSS_METERS'). See warning_codes section of this dictionary for the canonical set.
extraction_confidence
number
yes Mean confidence (0-1) across all populated fields for this row, rounded to 4 decimal places.
extracted_at
string
yes ISO 8601 datetime when the extraction completed. Falls back to the Extraction model's updated_at if the lambda didn't set it.

Grid & market

The delivery utility and the ISO/RTO market that operates the wires. Same on every meter row of one bill.

Column Type Scope Nullable Description
transmission_region
enum
CAISO ERCOT NYISO ISONE PJM MISO SPP SERC WECC_SW WECC_PC NEM WEM OTHER
row-repeated yes Electric-only. Wholesale market / ISO-RTO that operates the transmission grid serving this meter. US ISO/RTOs: CAISO, ERCOT, NYISO, ISONE, PJM, MISO, SPP, plus the SERC reliability region and WECC_SW / WECC_PC for non-CAISO western US. Australian markets (AEMO-operated): NEM (eastern + southern states — NSW/QLD/VIC/SA/TAS/ACT) and WEM (Western Australia's SWIS). Use OTHER for jurisdictions outside these (e.g. Pilbara NWIS, Northern Territory, non-AU/US bills) and null for gas and water bills.
load_zone
string
row-repeated yes Electric-only. ISO/RTO sub-zone where the meter is located. Format varies by market: CAISO uses SLAP identifiers like 'SLAP_SCEW'; ERCOT uses 'LZ_HOUSTON' / 'LZ_NORTH'; NYISO uses single-letter zones like 'J' or 'K'; PJM uses zone names like 'AEP', 'COMED'; NEM (Australia) uses AEMO region codes 'NSW1' / 'QLD1' / 'VIC1' / 'SA1' / 'TAS1' (the digit-1 suffix is part of the canonical code, never strip it); WEM (Western Australia) is a single region so populate 'SWIS' when applicable. Gas and water bills leave this null.
utility_name
string
row-repeated no Name of the regulated utility / delivery company that owns the wires to this meter, as printed on the bill. Never normalized or abbreviated by the extractor.
utility_phone
string
row-repeated yes Customer-service phone number for the utility, as printed on the bill.
utility_vendor_id
string
row-repeated yes Utility's internal vendor/supplier code if printed on the bill. Rare.
utility_tax_id
string
row-repeated yes Federal tax ID (FEIN) of the utility if printed on the bill. Mostly seen on commercial bills.
commodity_supplier
string
row-repeated yes Competitive supplier the customer buys their commodity from, when distinct from the delivery utility. Electric markets call this the Load-Serving Entity (LSE), Competitive Retail Electric Service (CRES) provider, or Community Choice Aggregator (CCA). Gas markets call this a marketer or natural-gas supplier. Null on bundled / regulated-monopoly bills where the utility is both supplier and deliverer.
service_voltage
string
row-repeated yes Electric-only. Service voltage level / category as printed on the bill. May be a tier name ('Primary', 'Secondary', 'Transmission'), a phase descriptor ('Secondary Three Phase'), or a kV range ('>= 600 V', '< 35 kV'). Capture verbatim. Gas/water bills leave this null.

Identification

The customer plus the specific service point being billed. Varies per meter.

Column Type Scope Nullable Description
service_account_number
string
per-meter yes Canonical site/service identifier used by the utility to refer to the specific service point. Works for electric, gas, water, and other utility commodities. Goes by different names across regions and commodities: CAISO calls it Service Agreement ID / Service Account Number (SAN); ERCOT uses ESIID; ISO-NE uses Electric Account Number; NYISO uses TO Account Number; AES Ohio calls it Choice Service ID. All variants are the same concept under different labels — they identify the service/site at the utility's grid level.
utility_meter_serial_number
string
per-meter yes Serial number of the physical meter device, as engraved on the meter and printed on the bill. One row is emitted per meter on multi-meter bills, each with its own utility_meter_serial_number.
point_of_delivery_id
string
per-meter yes Premise-level identifier that stays stable across customer changes (e.g. SCE's 17-digit POD ID; an apartment building's identifier doesn't change when a new tenant moves in). Different from service_account_number, which can change when the customer's account changes. Populated only when the bill prints an explicit Point-of-Delivery / Premise / POD identifier distinct from the service-account number.
customer_name
string
row-repeated yes Customer / account-holder name as printed on the bill (the entity that pays for service at this account).

Location

Service address (where the meter is) and billing address (where the bill is sent). Row-repeated.

Column Type Scope Nullable Description
service_street1
string
row-repeated yes Street number and street name of the service address (where the commodity is delivered).
service_street2
string
row-repeated yes Apartment, suite, unit, or floor on the service address (when present).
service_city
string
row-repeated yes City of the service address.
service_state
string
row-repeated yes State / province of the service address. 2-letter postal abbreviation for US/Canadian bills; full name otherwise.
service_zip
string
row-repeated yes Postal code of the service address as printed on the bill (US 5-digit, ZIP+4, or international format).
service_country
string
row-repeated yes ISO country name for the service address. Inferred from postal-code format / utility if not explicitly stated.
billing_street1
string
row-repeated yes Street number and name of the remittance / mailing address (where bills are sent).
billing_street2
string
row-repeated yes Apartment, suite, unit, or floor on the remittance address.
billing_city
string
row-repeated yes City of the remittance address.
billing_state
string
row-repeated yes State / province of the remittance address.
billing_zip
string
row-repeated yes Postal code of the remittance address.
billing_country
string
row-repeated yes ISO country name for the remittance address.

Billing & service agreement

Customer-level account number plus any per-meter service agreement.

Column Type Scope Nullable Description
bill_customer_account
string
row-repeated no Customer-level account number printed on the bill, used for remittance. Distinct from service_account_number, which identifies the grid-side service point. Some utilities (e.g. SCE residential) use similar formats for both; others (e.g. AES Ohio) use very different numbers.
bill_service_agreement_number
string
per-meter yes Service agreement / supplier contract identifier on the bill side. Distinct from service_account_number (grid-side) and bill_customer_account (customer-level). For SCE this often equals service_account_number; for deregulated supplier bills (AES Ohio + Dynegy, CCA territories), it's a separate identifier issued by the supplier.
bill_service_agreement_start_date
date
per-meter yes Effective date of the supplier-side service agreement / contract, when printed on the bill. ISO 8601 YYYY-MM-DD.
bill_service_agreement_end_date
date
per-meter yes Expiration date of the supplier-side service agreement / contract, when printed on the bill. ISO 8601 YYYY-MM-DD.

Rate & service class

The tariff that prices the meter, plus the derived service-class classification.

Column Type Scope Nullable Description
tariff
string
per-meter yes Rate / tariff / schedule name as printed on the bill. Never normalized.
service_class
enum
res-electric comm-electric ind-electric agr-electric res-gas comm-gas ind-gas agr-gas res-water comm-water ind-water res-sewer comm-sewer res-steam comm-steam unknown
per-meter no Customer classification + commodity. Derived from the tariff name and customer profile on the bill.
service_type
enum
Electric Gas Water Sewer Steam Telecom
per-meter yes Single-word service category from the bill: 'Electric', 'Gas', 'Water', 'Sewer', 'Steam', or 'Telecom'.
is_tou_rate
bool
per-meter yes Electric-only. True if the bill shows any time-of-use rate windows (on-peak / off-peak / mid-peak / shoulder); false otherwise. The detailed per-band breakdown lives in meters[i].line_items[] (kind=consumption, with name carrying the verbatim band label) rather than in flat columns. Gas bills rarely use TOU rates; leave null when not applicable.
baseline_allowance
number
per-meter yes Electric-only. The kWh threshold that determines tier crossover on a tiered rate (e.g. SCE 'Your winter baseline allowance: 215.0 kWh', PG&E equivalent). Lets a customer reason about 'am I close to crossing into Tier 2?' Null on flat-rate bills (single-rate AU retailers, simple commercial flat tariffs). On a quarterly or cross-season bill that prints two allowances (winter and summer portions), leave null and emit MULTI_SEASON_BILL — the per-segment values surface on the corresponding tier line items via start_date / end_date.
baseline_season
enum
winter summer
per-meter yes Electric-only. Which seasonal baseline allowance applies for this bill. Implicit on SCE ('Your winter baseline allowance'); explicit on some utilities. Null on bills with no seasonal baseline. Same MULTI_SEASON_BILL caveat as baseline_allowance — null when the bill spans both seasons.
tax_in_total
number
per-meter yes Region-neutral. The bill's stated total tax embedded in meter_total_cost — populated when the bill prints a single headline tax figure ('GST in total $59.46', 'Total includes GST component of $10.77', 'Includes Sales Tax $X', UK 'VAT total'). Null when the bill itemises tax as discrete line items instead (US bills like SCE itemise State tax as a line_item with kind=tax — the line item is the source of truth). Never both: if line items are tax-inclusive and the bill also prints a headline figure, capture the headline here; if line items are tax-exclusive (with a separate kind=tax line), leave this null.

Dates

Statement, billing-period, and due dates. Row-repeated.

Column Type Scope Nullable Description
bill_statement_date
date
row-repeated yes Date the bill was issued by the utility. ISO 8601 YYYY-MM-DD.
bill_start_date
date
row-repeated yes First day of the billing period covered by this row. ISO 8601 YYYY-MM-DD.
bill_end_date
date
row-repeated yes Last day of the billing period covered by this row. ISO 8601 YYYY-MM-DD.
bill_due_date
date
row-repeated yes Payment due date for the bill. ISO 8601 YYYY-MM-DD.
bill_frequency
enum
monthly bi-monthly quarterly annual
row-repeated yes How often the bill is issued. Taken from the bill if explicitly stated; otherwise inferred from (bill_end_date - bill_start_date): ~30 days = monthly, ~60 = bi-monthly, ~90 = quarterly.
days_in_period
number
row-repeated yes Number of days the billing period covers. Use the value the bill prints directly when it states one (e.g. 'Service Period: 30 days', 'Number of Days: 31'); otherwise compute it from (bill_end_date - bill_start_date). Row-repeated: every meter row on the same invoice shares this value. Null when neither a stated days value nor both period dates are available. When the stated value disagrees with the computed value by more than one day, prefer the stated value and emit DATE_MISMATCH.

Usage, demand & cost

Per-meter consumption and cost, plus the invoice-level adjustment fields that satisfy the summing identities.

Column Type Scope Nullable Description
meter_total_volume
number
per-meter yes Usage attributed to this single meter for the billing period. Per-meter scope: each meter on a multi-meter invoice gets its own value. Raw number with no units or commas. The unit is in meter_total_unit.
meter_total_unit
enum
kwh therms ccf gallons kw mj m3 kl
per-meter yes Unit of meter_total_volume for this single meter. US-style units: kwh (electric), therms (gas energy), ccf (gas/water volume), gallons (water), kw (demand). Metric/AU-style units: mj (gas or electric energy in megajoules), m3 (gas or water volume in cubic metres), kl (water in kilolitres). Lowercase canonical form regardless of how the bill prints it (e.g. 'MJ' on the bill -> 'mj').
read_type
enum
actual estimated customer_read
per-meter yes How the meter reading at the end of the billing period was obtained: 'actual' (utility read the physical meter or AMI sent a reading), 'estimated' (utility estimated usage because no read was taken — common for skipped months, storm interruptions, or inaccessible meters), or 'customer_read' (customer self-read and reported the value). Lowercase canonical form regardless of how the bill prints it (e.g. 'Estimated Reading' / 'EST' -> 'estimated'). Null when the bill doesn't state how the read was obtained.
meter_total_cost
number
per-meter yes Cost attributed to this single meter for the billing period. Per-meter scope: each meter on a multi-meter invoice gets its own value. Sum across all meter rows on the same invoice equals bill_new_charges (modulo invoice-level fees, discounts, or rounding). Negative values are valid - credits and solar feed-in show as negative. For the invoice's headline 'amount due' figure see bill_total. Raw number, no currency symbols; currency lives in the currency field. When a multi-meter bill reports only one combined cost with no per-meter breakdown, that single value is repeated on every row and the COST_SHARED_ACROSS_METERS warning is emitted.
bill_total
number
row-repeated yes Invoice-level headline amount the customer is asked to pay (the 'amount due' / 'pay this' figure printed at the top of the bill). Row-repeated: every meter row on the same invoice shares this value. Relationship: bill_total = previous_balance - payments_received + bill_new_charges + late_fee. When the bill shows no previous balance and no in-period payment, bill_total equals bill_new_charges. Positive when something is owed; negative when the account is in credit.
bill_new_charges
number
row-repeated yes Total charges for the current billing period, before previous-balance or payment adjustments. The sum of meter_total_cost across every meter row on the same invoice equals this value (modulo invoice-level fees, discounts, rounding, or flat-fee services that have no meter row of their own). On a single-service bill, a gap over 10% emits COST_MISMATCH; on a multi-service bill the gap is expected and emits MULTI_SERVICE instead. Row-repeated. Null when the bill does not separately show a current-charges subtotal distinct from the headline total.
previous_balance
number
row-repeated yes Outstanding balance carried over from the previous invoice. Positive when the prior bill was unpaid; negative when the customer was in credit. Null on first bills or when the bill does not show a carry-forward line (do not default to 0 - null preserves the distinction between 'not stated' and 'explicitly zero'). Row-repeated.
payments_received
number
row-repeated yes Sum of payments received from the customer since the previous invoice. Always reported as a positive magnitude; the summing relationship subtracts it from the running balance. Null when the bill does not show a payments line (do not default to 0 - null preserves the distinction between 'not stated' and 'explicitly zero'). Row-repeated.
late_fee
number
row-repeated yes Penalty charge applied at the invoice level for late payment of the prior balance. Positive when present; null when no late fee applies (the common case - do not default to 0). Distinct from a meter-level disconnection or reconnection fee, which belongs to its meter's meter_total_cost rather than here. Row-repeated.
max_kw
number
per-meter yes Electric-only. Peak / maximum demand for this meter during the billing period, in the unit declared by demand_unit (typically kW; some commercial bills report kVA). Null when the bill doesn't report demand. Gas analog (max therms/hour or peak Dth) and water analog (max GPM) are not currently captured; would be separate fields if needed.
demand_unit
enum
kw kva
per-meter yes Unit of max_kw for this meter: 'kw' (real power demand) or 'kva' (apparent power demand, common on industrial / large-commercial bills where the utility prices power-factor as well as real power). Electric-only. Lowercase canonical form regardless of how the bill prints it (e.g. 'kVA' -> 'kva'). Null when max_kw is null. When max_kw is populated but the bill prints only 'Demand' without further qualification, default to 'kw'.
currency
string
row-repeated no ISO 4217 currency code for monetary fields. Inferred from utility/address when not explicitly stated; the CURRENCY_ASSUMED warning fires in that case.

State-specific identifier notes

Different jurisdictions print the same canonical concept under different labels. We fold them into a single column so downstream tooling doesn't need to special-case them.

Folded into service_account_number

Label on the bill Region Why we fold
ESIID ERCOT, Texas ERCOT's 17-digit Electric Service Identifier is the same concept as a SAN under a different naming convention. We fold it into service_account_number.
SP ID CAISO, PG&E commercial PG&E commercial Service Point ID is the same site identifier as Service Agreement ID under a different label.
Choice Service ID PJM (AES Ohio), Ohio Electric Choice Per AES Ohio bill glossary (background/NNNNNNNNNNNN.pdf, p.5): 'Unique customer identification number to provide your selected Competitive Retail Electric Service (CRES) provider when enrolling.' Same canonical concept as a SAN/ESIID.
Service Agreement ID CAISO (PG&E) PG&E's name for the canonical service-point ID. Same concept as SAN.
TO Account Number NYISO NYISO's name for the canonical service-point ID, printed by ConEd / PSEG-LI on their bills.

When point_of_delivery_id is populated

A premise-level identifier that stays stable across customer turnover. Populated only when the bill prints it explicitly (most utilities do not).

Label on the bill Region Notes
POD ID Italy, CAISO (SCE commercial) Premise-level identifier independent of customer (stable across tenant turnover). Kept as its own canonical field rather than folded into service_account_number because SCE's Greenbutton hierarchy explicitly distinguishes the two.

Folded into bill_customer_account

Label on the bill Region Notes
Summary Account commercial multi-site Parent account for customers with multiple sites under one billing relationship. Today the bill_customer_account on each row already identifies the customer; we'd promote this back to its own column only if a real customer needs site-level AND parent-level identifiers on the same row.
LDC Account Number deregulated supplier markets (PJM, ERCOT, NY, OH) On supplier-direct bills, the LDC may print its own account number alongside the supplier's. Whichever party is the issuer of the PDF we're parsing, we use their printed account as bill_customer_account. If both must coexist on one row, revisit this decision.

Sign conventions & nulls

  • Credits and solar feed-in: meter_total_cost is negative. A meter that exported more energy than it consumed will roll in as a negative number — that's intentional, the summing identity above relies on it.
  • payments_received is always a positive magnitude. The identity above subtracts it; do not double-negate.
  • previous_balance is positive when the prior bill was unpaid, negative when the customer was in credit going in.
  • late_fee is positive when present; blank when there was no late fee on this bill.

Null versus zero. A blank cell means the bill did not state that line. Don't treat it as zero — that loses the distinction between “no late fee was charged on this bill” (very common, blank) and “the bill explicitly shows a $0.00 late fee” (rare, value 0). This applies to previous_balance, payments_received, and late_fee in particular.

Warning codes

Codes appear semicolon-joined in the extraction_warnings column. Most are informational, not failures.

Code Meaning What to do
LOW_QUALITY_SCAN Source PDF was a poor-quality scan; extraction confidence may be reduced. Re-upload a higher-resolution scan or the original PDF from the utility portal. OCR confidence is reduced on poor scans.
MULTI_METER Bill lists more than one meter (each becomes its own CSV row). Expected — each meter becomes its own row. Group by bill_customer_account + bill_start_date to collapse to one row per bill.
MULTI_SERVICE Bill mixes services (e.g. electric + gas, or electric + flat-fee sewerage) on one statement. The bill mixes services. Each metered service gets its own row; flat-fee services without a meter reading land in bill_new_charges only, so sum(meter_total_cost) may legitimately fall short of bill_new_charges. Filter by service_type if you want a single commodity.
COST_SHARED_ACROSS_METERS Bill reported a single combined cost; the same value is repeated on every meter row. The bill printed only one combined cost. Each row carries that combined value; do not sum meter_total_cost — use bill_new_charges for the invoice total.
DATE_MISMATCH Period dates didn't agree across sections of the bill. Two sections of the bill showed different period dates. Spot-check bill_start_date / bill_end_date before relying on them.
COST_MISMATCH Line-item costs didn't sum to the reported total on a single-service bill. On a single-service bill, per-meter costs do not sum to the invoice's stated new-charges total within 10%. Cross-check before relying on either figure. A gap on a multi-service bill is reported as MULTI_SERVICE instead, since flat-fee services legitimately fall outside meter rows.
UTILITY_UNKNOWN Couldn't confidently identify the utility name. Couldn't confidently identify the utility name. utility_name may be blank or wrong; correct it manually if you depend on it.
CURRENCY_ASSUMED Currency wasn't stated; inferred from address/utility. Currency code was inferred from the utility or service address because the bill didn't print it. Verify if you mix currencies.
PARTIAL_EXTRACTION One or more critical fields were missing or low-confidence. One or more critical fields were missing or low-confidence. Spot-check the row before trusting the totals.
LINE_ITEM_TOTAL_MISMATCH Sum of line_items[].cost diverges from meter_total_cost beyond the floored tolerance max(0.01 * abs(meter_total_cost), $0.50). Spot-check the bill's printed per-meter total against the line items in the XLSX 'Line items' sheet — likely a missing line, a sign error, or a decimal mistake. The absolute $0.50 floor exists because pure-percentage tolerances misfire on net-credit / near-zero bills (high-solar months can net to negative dollars from offsetting tens-of-dollar components).
LINE_ITEM_NAME_UNFAMILIAR One or more line items couldn't be classified into a kind enum and landed at kind=other. The line item's verbatim name is preserved in the XLSX 'Line items' sheet — the analyst can still see what it is, classification just didn't recognise the pattern. A rising kind_other_rate is an ops signal that new patterns should be added to common/kind_patterns.yaml.
TOU_BANDS_NOT_PRINTED Bill is on a Time-of-Use rate (is_tou_rate=true) but doesn't print per-band figures — line_items[] only contains the aggregated General Usage line. The bill's customer is on TOU but their statement doesn't itemise peak/off-peak/shoulder. Only the tariff name and aggregated usage are available; the per-band split isn't recoverable from the PDF alone. Origin's Go Variable bills are the canonical example.
MULTI_SEASON_BILL A quarterly or cross-season bill prints two baseline allowances (one per season-portion of the period). base.baseline_allowance and base.baseline_season are left null. The bill spans a seasonal baseline boundary (e.g. an AU summer→winter transition in May/Sep, a US winter→summer transition in May/Oct). The per-segment baselines surface on the relevant tier rows in the XLSX 'Line items' sheet via start_date / end_date — use those for analysis instead of the null base headline.

Sample CSV

Three rows of synthetic data that exercise every concept above: a single-meter US residential bill and a two-meter Australian bill where solar feed-in shows as a negative meter_total_cost.

Download utilitybillreader-sample-export.csv

Three rows; one row per meter. Long lines wrap horizontally — download to see all 60 columns.

Excel workbook (advanced)

The .xlsx download is the same extraction laid out as a two-sheet workbook so per-charge detail — TOU bands, tier rows, supplier splits on CCA bills, individual taxes — doesn't get flattened away. Available from the dashboard's Excel button (next to CSV) and from the same per-row download menu the CSV uses.

Two sheets, one foreign key

  • Sheet 1 — Bills. Same 60-column CSV schema, with two FK columns (extraction_id, meter_index) prepended. One row per metered service, same scope rules as the CSV.
  • Sheet 2 — Line items. One row per charge line across every meter on every bill in the workbook, with three FK columns (extraction_id, meter_index, line_index) that link each line back to its meter row on Sheet 1.

Sheet 2 is empty (header row only) on bills where no per-line charge detail was itemized — bill totals still appear on Sheet 1's meter_total_cost. The workbook layout is identical either way, so your tooling doesn't change whether or not a bill carries line detail.

Sheet 1: Bills

Header row layout, left-to-right: the two FK columns, then every column from the Field reference in the same order as the CSV.

Column Type Description
extraction_id uuid Stable identifier for the uploaded bill. Same across every row of a multi-meter bill, and the join key for Sheet 2.
meter_index integer Zero-based ordinal of the meter on the bill. Joins with Sheet 2's meter_index to attach each line item to its meter row.
… then the 60 canonical fields, in CSV order — see the Field reference for each one's type, scope, and description.

Sheet 2: Line items

18 columns. One row per charge line, in the order the line appeared on the bill. Strings come back verbatim (no normalization on name, source_section, or supplier); kind and section are the only normalized fields.

Column Type Description
extraction_id
uuid
Foreign key joining back to Sheet 1's extraction_id column. Same value on every line item of the same bill.
meter_index
integer
Zero-based index of the meter this line came from. Joins with Sheet 1's meter_index to pick out the row this charge contributes to.
line_index
integer
Zero-based position of this line within its meter's line_items[] — preserves the bill's printed order.
kind
enum
consumption export supply credit tier demand flat_fee tax other
Normalized charge type so you can pivot without parsing names.
section
enum
delivery generation tax other
Normalized section the line was filed under.
source_section
string
Verbatim section header from the bill (e.g. "Delivery Charges", "Generation Charges", "Taxes & Fees"). Useful when section's four-bucket enum is too coarse.
supplier
string
Verbatim supplier name when the bill splits charges by supplier (CCA / community choice aggregator bills do this). Null on single-supplier bills.
name
string
Verbatim charge name as printed on the bill — never normalized. The raw text downstream tooling pivots and reconciles on.
volume
number
Quantity charged for. Null for flat fees and most taxes.
unit
string
Unit volume is denominated in (kwh, therms, ccf, mj, m3, kl, day, etc.). Null when no volume.
rate
number
Per-unit price applied. Negative for credits and feed-in. Null when the bill prints only a total.
rate_unit
string
How the rate is denominated — e.g. "$/kwh", "$/day", "$/mj". Null when rate is null.
cost
number
Dollar amount the line contributes. Negative for credits, rebates, and net feed-in. Sums up to meter_total_cost within the floored tolerance — see COST_MISMATCH.
start_date
date
ISO 8601 service-window start, when the bill prints per-line dates (rare). Usually null.
end_date
date
ISO 8601 service-window end. Usually null — defer to meter's bill_start_date / bill_end_date.
time_window
string
TOU window the line covers, verbatim (e.g. "4pm–9pm Mon–Fri"). Populated only on time-of-use rates that print per-band figures.
plan_label
string
Verbatim plan or product name when the line was filed under one (e.g. "ZEROHERO"). Mostly null on US bills.
confidence
number
Per-line extractor confidence, 0–1.

Cost signs follow the CSV convention — credits, rebates, and feed-in lines come back as negative cost. See Sign conventions & nulls.

Joining Sheet 2 back to Sheet 1

Both sheets share the (extraction_id, meter_index) pair. Group Sheet 2 by that pair to roll line items back up to a meter; group by extraction_id alone to roll back up to a bill. The pivot below sums Sheet 2's cost by meter and reconciles it against Sheet 1's meter_total_cost:

// Excel formula on Sheet 1, alongside meter_total_cost:
SUMIFS('Line items'!M:M,
       'Line items'!A:A, A2,    // extraction_id
       'Line items'!B:B, B2)    // meter_index

Sheet 2's cost column is column M; the FK columns are A and B. The sum should equal Sheet 1's meter_total_cost within the floored tolerance described under COST_MISMATCH in Warning codes.

Other common pivots:

  • Spend by charge type: pivot cost by kind to split consumption vs supply vs taxes vs credits across a whole portfolio of bills.
  • Spend by supplier: pivot by supplier on CCA / community-aggregation bills (PG&E + MCE, SDG&E + Clean Energy Alliance, etc.) to see the delivery / generation split per supplier.
  • TOU breakdown: filter kind = consumption and pivot volume + cost by time_window to recover peak / off-peak / shoulder usage that the flat CSV folds into a single meter_total_volume.

Sample template

Same three bills as the Sample CSV above — the single-meter US residential bill and the two-meter Australian bill with solar feed-in — laid out in the workbook format with a populated Sheet 2 so you can see exactly how line items shake out on each.

Download utilitybillreader-sample-export.xlsx

Sheet 1: 3 rows + 2 FK columns + the 60 CSV columns. Sheet 2: 6 line items spread across the three meters.

Email Inbox Forwarding

Forward utility bills as PDF attachments to a unique address on @ bills.utilitybillreader.com and they'll appear on your dashboard the same way uploaded files do. Also recieve a reply email summarising the parsed result and a CSV attached. No browser upload or access steps required.

Workflow

  1. Sign up, then open the Email Tokens entry in your profile menu to copy your forwarding address. The default token is created automatically.
  2. Forward (or set up auto-forward in your inbox) any utility bill emails with PDF attachments to that address. Multiple PDFs in one email are processed as a batch.
  3. Each PDF lands on the dashboard as a row with the email-source badge and a "via email from <sender>" subline — first as queued, then completed within ~30s.
  4. Once every PDF in the batch finishes, you receive a single reply email with a per-file summary and a CSV containing the parsed line items, threaded with the original forward so it lands in the same conversation in your inbox.

Forwarding addresses follow the format ubr_eml_<20-char>@bills.utilitybillreader.com. The ubr_eml_ prefix lets you spot the addresses at a glance in your inbox and contact lists.

Multiple addresses per team are supported — create labelled tokens (e.g. AGL gas bills, Retail shops) to route different upstream inboxes without losing context.

Allowed senders — senders must be a team member

We only accept forwards from email addresses that are already on the team that owns the token. Mail from anyone else is silently dropped, and the token's owner receives a one-line notification: "Someone at <sender> tried to forward to your ingest address and we rejected it."

To let a colleague forward to your team's address, add them to your team using their registered email address. Any team member's address then becomes a valid sender for every token on the team.

SPF and DKIM must also pass on the sender's domain — we drop forgeries silently. Forwarding from your own Gmail/Outlook generally passes both; forwarding through an old POP3 server that re-stamps headers may not. If a legitimate forward keeps failing, drop us a note with the sender domain.

Limitations

  • PDF attachments only. Any other attachment type is ignored. Inline images, bills embedded as the email body itself, or links to a utility's web portal are not parsed today.
  • 40 MB total per inbound message (an AWS SES cap we can't override). Most utility bills are well under 1 MB. Oversized forwards bounce back from SES — the failure isn't visible in your dashboard, so the forwarder will see a bounce notice from their own mail server.
  • Quota applies the same as web uploads. Pages consumed by email-forwarded bills draw from your team's monthly page allowance. A forward that would exceed the remaining quota is dropped and we log it but don't currently auto-reply — check the dashboard quota meter if a forward seems to have vanished.
  • Forwarder identity is the outer From header. If you forward a thread, we attribute it to the immediate forwarder (you), not the utility that sent the original. The utility name shows up in the parsed data, not in the source attribution.
  • One reply per inbound, not per attachment. Forward five PDFs in one email and you get one reply once all five have finished extracting (success or failure). Each PDF still becomes visible on the dashboard the moment its individual extraction completes — you don't need to wait for the reply to see results.

Reply email

Sent from noreply@utilitybillreader.com to the original forwarder, threaded with the inbound message (via In-Reply-To and References headers) so most mail clients attach it to the same conversation. The body is a per-file table with status (parsed / could not parse / duplicate — review), the utility we detected, and a button back to the dashboard.

A CSV named parsed-bills-YYYY-MM-DD.csv is attached when at least one PDF parsed successfully. Its column layout matches the Field reference below — same canonical names as the dashboard's CSV download, the Excel workbook's Bills sheet, and the API's JSON response. (The richer Excel workbook isn't auto-attached to email replies; grab it from the dashboard when you need per-line detail.)

Replies are suppressed when the forwarder's address looks like an automation account (postmaster@, noreply@, mailer-daemon@, etc.) to avoid reply loops.

Rotate or revoke a token

Forwarding addresses are stable but rotatable. If one ends up in the wrong hands — or you'd just rather not keep using a particular upstream inbox — use the Email Ingest Tokens page to:

  • Rotate — issues a new active address with the same label and immediately disables the old one. Forwards to the old address are dropped from that moment on.
  • Revoke — disables an address without replacing it. Mail to the revoked address is dropped before it reaches your account.

Partner API

A small REST surface for partners: upload a bill, fetch the parsed JSON, list every bill on your account, check your remaining quota. JSON responses use the same canonical column names as the Field reference below — each result carries a rows array (identical to the CSV download and Sheet 1 of the Excel workbook) and a line_items array (identical to Sheet 2), so the API mirrors the full workbook, not just the flat CSV. The API returns JSON only; use the dashboard's CSV / Excel buttons (or these response arrays) to drive your own export.

Authentication

Mint a token from your account's API Tokens page (Profile menu → API Tokens). Send it on every request as a Bearer credential:

Authorization: Bearer <your-token>

Tokens are scoped to a single user account; treat them like passwords. Revoke a token anytime from the same page.

Quota

API uploads draw from the same monthly page quota as the web app — one PDF page counts as one page used. Use GET /api/v1/user to check your remaining pages, or visit the pricing page to compare plans. Quota is decremented when an extraction finishes, not at upload time, but a queued bill counts against your remaining balance to prevent racing.

Bills are visible through the API for as long as your plan's retention window lasts — 90 days on the paid tiers and unlimited on Enterprise. Past that window the source PDF is purged and the bill drops out of both the dashboard's history table and every API response (404 on /bill/{id}, omitted from /bill).

Endpoints

All endpoints live under /api/v1/, require the Authorization header, and speak JSON. The collection vs. single-item distinction is the presence of an {id} path segment — there is no /download suffix and no singular-vs-plural convention to keep straight.

POST /api/v1/bill

Upload one PDF. Returns 202 Accepted with the new extraction id. Use the id with GET /api/v1/bill/{id} to track status and fetch the parsed result.

Requestmultipart/form-data with a single file field:

curl -X POST https://utilitybillreader.com/api/v1/bill \
  -H "Authorization: Bearer $TOKEN" \
  -F file=@/path/to/bill.pdf

Response 202:

{
  "id": "0193e3a0-7c3b-7000-8f1a-1234567890ab",
  "status": "queued",
  "source_filename": "bill.pdf",
  "source_page_count": 3,
  "created_at": "2026-05-19T12:34:56+00:00"
}

GET /api/v1/bill/{id}

One endpoint covers status polling and result retrieval. The response shape is identical regardless of state: rows and line_items are both null while status is queued or processing, and become arrays once the extraction reaches a terminal state — rows with one object per meter, line_items with one object per charge line across every meter. Always 200 OK — check status (or just rows !== null), not the HTTP code.

curl https://utilitybillreader.com/api/v1/bill/$ID \
  -H "Authorization: Bearer $TOKEN"

Response (terminal):

{
  "id": "0193e3a0-7c3b-7000-8f1a-1234567890ab",
  "status": "completed",
  "source_filename": "bill.pdf",
  "source_page_count": 3,
  "extraction_confidence": 0.97,
  "warnings": [],
  "error_code": null,
  "error_message": null,
  "created_at": "2026-05-19T12:34:56+00:00",
  "extracted_at": "2026-05-19T12:35:12+00:00",
  "rows": [
    {
      "utility_name": "Southern California Edison",
      "service_account_number": "2-12-345-6789",
      "tariff": "DOMESTIC",
      "meter_total_volume": 412,
      "meter_total_unit": "kwh",
      "meter_total_cost": 78.42,
      "bill_total": 78.42,
      "currency": "USD",
      "customer_name": "LASTNAME, FIRSTNAME",
      "...": "..."
    }
  ],
  "line_items": [
    {
      "extraction_id": "0193e3a0-7c3b-7000-8f1a-1234567890ab",
      "meter_index": 0,
      "line_index": 0,
      "kind": "consumption",
      "section": "delivery",
      "supplier": null,
      "name": "Peak",
      "volume": 180,
      "unit": "kwh",
      "rate": 0.31,
      "rate_unit": "$/kwh",
      "cost": 55.8,
      "time_window": "peak",
      "...": "..."
    }
  ]
}

Every key inside a rows[] object is a CSV column name — see Field reference for the full list, types, scopes, and nullability rules. Every key inside a line_items[] object is a column from the Excel workbook's Line items sheet; the meter_index / line_index FKs join each line back to its rows[] entry (row N is meter_index = N). line_items is an empty array on bills whose charge lines haven't been broken out yet.

GET /api/v1/bill

Paginated list of every bill on your team, newest first. Each entry is the same envelope as the single-id endpoint (rows + line_items); in-flight bills appear in the same list with rows: null and line_items: null.

Query parameters:

  • limit — 1 to 100, default 50.
  • since — ISO 8601 timestamp; only return bills created on or after this moment.
  • cursor — opaque pagination token returned as next_cursor on the previous page. Echo it back to fetch the next page.
curl "https://utilitybillreader.com/api/v1/bill?limit=50&since=2026-05-01T00:00:00Z" \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "data": [
    { "id": "...", "status": "completed",  "rows": [ ... ], "line_items": [ ... ] },
    { "id": "...", "status": "processing", "rows": null,     "line_items": null }
  ],
  "next_cursor": "MjAyNi0wNS0xOVQxMjozNDo1NisrMDA6MDAi..."
}

When next_cursor is null, you've reached the end of the list.

GET /api/v1/user

Current user, team, and remaining page quota. Poll this before a bulk upload to confirm you have enough pages.

curl https://utilitybillreader.com/api/v1/user \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "id": "0193e399-0000-7000-8f1a-1234567890ab",
  "name": "Jane Smith",
  "email": "jane@example.com",
  "team": {
    "id": "0193e398-0000-7000-8f1a-1234567890ab",
    "name": "Jane Smith's Team",
    "plan": "professional"
  },
  "remaining_pages": 1487,
  "pages_limit": 1500,
  "pages_used_this_period": 13,
  "period_resets_at": "2026-06-01T00:00:00+00:00"
}

Response shape

Objects inside rows[] are keyed by the same canonical column names the CSV uses. The Field reference below is the single source of truth: it lists every key, its type, its scope (per-meter vs. row-repeated vs. metadata), and whether it can be null.

Objects inside line_items[] are keyed by the same columns as the Excel workbook's Line items sheet. Each line carries extraction_id, meter_index, and line_index FKs: join on meter_index to attach a charge line to its rows[] entry, or sum cost per meter_index to reconcile against that row's meter_total_cost. line_items is an empty array on bills with no itemized charge lines, so treat it as optional detail on top of the always-present rows[].

Conventions carry over from the CSV: dates are ISO 8601, numbers carry no commas or currency symbols (units live in their own column), blanks mean not stated on the bill rather than zero, and credits/solar feed-in show as a negative meter_total_cost. See Sign conventions & nulls for the full set.

Error codes

Errors are returned as { "error": "<message>" } with the HTTP status set accordingly.

Status When
400Validation failure — missing file, wrong mimetype, malformed since or cursor.
401Missing or invalid Authorization token.
402Upload would exceed your team's remaining pages.
403Bill belongs to a different team.
404No bill exists with that id, or the bill is past your plan's retention window.
413File larger than the per-upload size cap.

No 425 Too Early — an in-flight bill returns 200 with rows: null and line_items: null, not an error.