MRA Dashboard — Methodology

This dashboard compares how four U.S. House members spend the ~$2 million per year office budget every member receives (officially the Members' Representational Allowance, or MRA). Every number on the dashboard is derived from the public House Statement of Disbursements (SOD) — quarterly CSVs the House Chief Administrative Officer publishes within 60 days of each quarter's end.

This document is the audit trail. It exists so that a reporter, opposing campaign, or skeptical voter can re-derive any headline figure on the dashboard from the raw filings, without trusting our pipeline.

1. Data source

Publisher: Office of the Chief Administrative Officer, U.S. House of Representatives
Archive page: https://www.house.gov/the-house-explained/open-government/statement-of-disbursements/archive
Format: Quarterly "Detail" CSVs (one per fiscal quarter)
Coverage on the dashboard: 2016 Q1 through the most recent published quarter
Mirrored copies: every CSV the dashboard uses is also mirrored at https://mra-sod-archive.marenalinforcongress.com/ for stable access (a manifest with direct download URLs is at /data/mra/manifest.json)

The House publishes new quarters approximately 60 days after each quarter ends. Q1 2026, for example, would not appear until ~May 2026.

What's also published elsewhere — and what this dashboard adds

Individual member spending and staff salary figures are not exclusive to this dashboard. LegiStorm.com publishes a free, searchable database of congressional staff salaries drawn from the same House Statement of Disbursements records, browsable by member, state, committee, and admin office. Their analytical features (Salary Rate Analysis, Historical Trends, lobbying / revolving-door data) are paywalled in their Pro tier.

The contribution of this dashboard, beyond what's freely available at LegiStorm:

Comparative analytical layer made public. Percentile rank for each member against the full national distribution of ~1,023 House offices observed in our 2016–2025 data window — the kind of analysis LegiStorm reserves for paying customers.
Vendor classification taxonomy. Each transaction's vendor name is bucketed into postage / mass-mailer / advertising / digital-constituent / design / translation / office-other. We've found no evidence LegiStorm publishes vendor classifications in this form.
Election-cycle visualization. Year-by-year outbound mail and ad spend across all four featured members, with primary-year cycles highlighted.
Documented methodology + mirrored source data. This document spells out the schema, classifier rules, and worked examples for every headline figure. The raw quarterly CSVs are mirrored at https://mra-sod-archive.marenalinforcongress.com/ so any reader can re-derive any number by hand. (The processing code itself currently lives in the campaign's private repo; if there's interest, a separately-licensed public ETL repo could be spun up.)

LegiStorm's underlying data and ours come from the same House filings, so the individual numbers should match. If you find a discrepancy on a per-member figure, please report it — that's the kind of error our methodology is designed to catch.

2. Schema (and a quirk you need to know)

The CSV schema changed in 2023. This matters for any audit.

2023 Q1 onward (current schema)

Columns we use, in order:

Column	What it contains	Example
`ORGANIZATION`	The office name with year prefix	`2025 HON. BRAD SHERMAN`
`SORT SUBTOTAL DESCRIPTION`	The category (8–13 distinct values)	`PERSONNEL COMPENSATION`
`SORT SEQUENCE`	Row type — `DETAIL`, `SUBTOTAL`, or `GRAND TOTAL FOR ORGANIZATION`	`DETAIL`
`TRANSACTION DATE`	When the payment posted	`3-Jan-25`
`VENDOR NAME`	Who got paid	`BULLSEYE MARKETING`
`DESCRIPTION`	Free-text purpose / job title	`DISTRICT DIRECTOR`
`BUDGET OBJECT CODE`	4-digit federal Budget Object Code	`1101` (personnel salary), `2352` (franked-mail postage), etc.
`AMOUNT`	Dollar amount (negative for refunds)	`41066.67`

2016 Q1 through 2022 Q4 (legacy schema)

The legacy CSVs lack the BUDGET OBJECT CODE column entirely. They have SORT SUBTOTAL DESCRIPTION (the broad category) but no fine-grained classification.

This means: for 2016–2022 we can only see, e.g., the total PRINTING AND REPRODUCTION for the office, not the split between mass-mailers and office printing. Our vendor classifier (described below) bridges the gap by classifying each transaction by vendor name — that works across all years.

Critical filter: only sum DETAIL rows

The CSVs interleave DETAIL rows (real transactions) with SUBTOTAL rows (per-category running totals) and a final GRAND TOTAL FOR ORGANIZATION row. Naively summing the AMOUNT column doubles every total because the subtotals re-sum the detail.

Always filter to SORT SEQUENCE == 'DETAIL' before any aggregation. Our ETL does this on every quarter.

Categorical strings have irregular spacing

The SORT SUBTOTAL DESCRIPTION field has a few entries with non-standard whitespace, e.g. RENT COMMUNICATION UTILITIES (two spaces, no commas). Our normalizer collapses any run of whitespace to a single space and uppercases before matching.

3. The four members

Every aggregation filters by ORGANIZATION matching one of these patterns (case-insensitive regex):

Member	Pattern	District (current)	First quarter on file
Brad Sherman	`HON\.\sBRAD\sSHERMAN`	CA-32 (was CA-30 pre-2024 redistricting)	2016 Q1
Alexandria Ocasio-Cortez	`HON\.\sALEXANDRIA\sOCASIO[\s\-]*CORTEZ`	NY-14	2019 Q1
Ilhan Omar	`HON\.\sILHAN\sOMAR`	MN-5	2019 Q1
Ayanna Pressley	`HON\.\sAYANNA\sPRESSLEY`	MA-7	2019 Q1

The ORGANIZATION field includes a year prefix (e.g., 2025 HON. BRAD SHERMAN); the regex ignores that. Sherman has been in office since 1997 across CA-24 → CA-27 → CA-30 → CA-32, all under the same name pattern.

4. Headline categories (from `SORT SUBTOTAL DESCRIPTION`)

These are the broad office-budget categories every quarter. The values shown are exact uppercase strings as they appear in the CSV after whitespace normalization.

Display	Raw string
Personnel Compensation	`PERSONNEL COMPENSATION`
Personnel Benefits	`PERSONNEL BENEFITS`
Travel	`TRAVEL`
Rent, Communications, Utilities	`RENT COMMUNICATION UTILITIES`
Other Services	`OTHER SERVICES`
Printing & Reproduction	`PRINTING AND REPRODUCTION`
Supplies & Materials	`SUPPLIES AND MATERIALS`
Equipment	`EQUIPMENT`
Franked Mail	`FRANKED MAIL`
Transportation of Things	`TRANSPORTATION OF THINGS`
Benefits to Former Personnel	`BENEFITS TO FORMER PERSONNEL`
Insurance Claims & Indemnities	`INSURANCE CLAIMS & INDEMNITIES`
Other	`OTHER`

5. Vendor classifier (the key audit-relevant piece)

The headline figure on the Mailings & Ads page — "outbound mail and ads spent" — is not a simple sum of one category. It's a vendor-by-vendor classification because:

The legacy schema (2016–2022) doesn't have a Budget Object Code column to distinguish, e.g., mass-mailer printing from office printing.
Within OTHER SERVICES, some vendors are digital ad firms (advertising), some are constituent CRMs (digital infrastructure), some are translators (case work), some are office cleaning.

So we classify each transaction's VENDOR NAME (uppercased) into one of eight buckets via regex matching. Summary:

Bucket	Examples	Counts toward headline?
`postage`	`UNITED STATES POSTAL SERVICE`	Yes (headline)
`mass_mailer`	`BULLSEYE MARKETING`, `UNION GRAPHICS`, `PUBLIC PRINTER`, `HONSA BINDER`, `MINUTEMAN PRESS`, …	Yes (headline)
`advertising`	`SABIO`, `CONEXION POLITICAL`, `VALLEY NEWS GROUP`, `IMPREMEDIA`, `RELATIVE INTEL`, `THE NEW MEDIA FIRM`, `FACEBK`, `GOOGLE ADS`, …	Yes (headline)
`digital_constituent`	`HOUSECALL`, `LEIDOS DIGITAL`, `ICONSTITUENT`, `FIRESIDE`, `INDIGOVERN`, `MAILCHIMP`, `GSUITE`, …	No (adjacent infrastructure)
`design_creative`	`PFEIFFER DESIGN`, `THREE METAL MONKEYS`, `45PRESS`, `ADOBE`, `CANVA`, …	No
`translation`	`ACCURATE WORD`, `INTERPRETING`, `LANGUAGE`, `TRANSLATIONS`, `TRINT`, …	No
`office_other`	`INNOVATION OFFICE`, `BENJAMIN OFFICE SUPPLY`, `AMERICLEAN`, `HACKETT SECURITY`, `STAPLES`, …	No
`unclassified`	Any vendor not matching the rules above	Surfaced as error bound on dashboard cards

Headline outbound spend = sum of postage + mass_mailer + advertising buckets, restricted to detail rows in the comms-adjacent SUBTOTAL categories (FRANKED MAIL, PRINTING AND REPRODUCTION, OTHER SERVICES).

The "± unclassified" disclosure on each member card is the upper bound: if we had wrongly classified every "unclassified" vendor as actually being an ad firm, the headline would rise by that amount. In practice, the largest unclassified entries are individual contractors (e.g., TATARIAN ALISA S. at ~$114k for Pressley) which are almost certainly not ad firms.

The full regex list is the source of truth; treat the table above as a summary.

Long-tail vendor classification (LLM-assisted)

For vendors not caught by the hand-written regex rules, a separate step asks a language model to classify the vendor name into one of the eight buckets above plus a confidence score. Only assignments at or above 0.7 confidence are trusted; lower-confidence ones are dropped back into unclassified so they show up in error bars rather than silently misclassified.

The two layers compose:

Hand-written regex rule wins if any matches (highest precision)
LLM-derived classification wins if regex misses AND confidence ≥ 0.7
Otherwise: unclassified, surfaced as the upper bound on uncertainty in per-member error bars

Regex rules are intentionally favored because they're auditable line-by-line in this document. Each LLM-assigned classification includes a short reasoning string for spot-check review.

6. Staff classification

For the Staff & Salaries page, we filter to SORT SUBTOTAL DESCRIPTION == 'PERSONNEL COMPENSATION' then bucket each staffer by job title (DESCRIPTION column).

Functional categories use keyword matching against the lowercased title:

Functional category	Keywords
`CONSTITUENT_SERVICES`	caseworker, casework, constituent, district director, district rep, field rep, field representative, community liaison, immigration, veterans, outreach, district office
`LEGISLATIVE`	legislative, policy advisor, policy deputy, sr policy, senior policy, policy, counsel, legislative aide, legislative correspondent
`COMMUNICATIONS`	communications, press, digital director, social media, comm director, speechwriter
`OPERATIONS`	chief of staff, deputy chief, office manager, staff assistant, scheduler, executive assistant, systems, intern coord, operations
`LEADERSHIP`	senior advisor, sr advisor, committee, chair
`OTHER`	(fall-through — untyped roles)

Note on interns and short-tenure roles: the role classifier keeps interns and shared/temporary/part-time staff in their assigned functional category. A row titled DISTRICT OFFICE PAID INTERN is classified as CONSTITUENT_SERVICES because they are doing constituent-services work, even if it's part-time. The per-function staff counts reflect everyone doing that kind of work, including support roles.

The salary distribution and floor calculations apply their own gates (full-year + exclude shared/temp/intern/PT — see §6.1) so per-function median pay is computed only over career staff, not skewed by intern stipends.

"Full-time, full-year" salary floor calculation

The salary-floor comparison on the Staff page restricts to staff who:

Were paid in all 4 quarters of the year (quartersPaid == 4)
Are not in any of these excluded title patterns: SHARED EMPLOYEE, TEMPORARY EMPLOYEE, PAID INTERN, PART-TIME EMPLOYEE

Excluding these prevents the floor comparison from being skewed by part-time or fractional roles. AOC's $52,500 part-time employee — at her stated salary floor — is documented elsewhere in the dashboard but excluded from the apples-to-apples floor.

Annual salary for each staffer = sum of quarterly compensation amounts in the year.

7. House-wide cross-cutting view

The House-Wide View page (/research/mra-dashboard/cross-cutting) renders one scatter dot per House office observed in at least 4 quarters of 2016–2025. Two axes encode the trade-off between broadcast outreach and direct constituent service:

X-axis: outbound mail + ads (postage + mass_mailer + advertising from §5 — same definition as the headline "outbound spend" elsewhere in the dashboard). The user toggles between three time windows:
- Cumulative 2016–2025 (default)
- Per-year average (cumulative ÷ years observed — normalizes for tenure)
- Peak year (each member's single biggest mail-spending year)
Y-axis: median wage of full-time career constituent-services staff at the most recent full year on file, using the same exclusion rules as §6.1 (excludes interns, temps, shared/PT).
Color: party from the public @unitedstates/congress-legislators dataset (CC0). Red = R, blue = D, gray = I/L.
Right-only X error bars: same proportional-allocation logic as elsewhere — assumes unclassified spend could turn out to be mail in the same proportion as the classified portion. The Y-axis has no equivalent because staff salaries are keyed by job title and don't depend on vendor classification.

7.1 Edge-case office filter

The national index excludes 145 of ~980 distinct office entries that aren't real serving members:

105 with only 1 quarter observed (vacancies, end-of-cycle transitions)
22 / 13 with 2 / 3 quarters (special-election partial terms)
8 caucus / committee / conference offices

The full list is written to public/data/mra/members/_excluded.json for transparency. Excluded offices stay out of _index.json, distributions.json, and scatter.json so they don't pollute percentile ranks.

7.2 Name canonicalization

Members appearing in the SOD under multiple spellings are collapsed before aggregation. Without this, ~30 members would occupy two slugs each, halving their headline totals and double-counting them in percentile rankings (e.g., GERALD E. CONNOLLY and GERALD E. "GERRY" CONNOLLY were two separate offices before).

The canonicalizer applies these rules in order:

Strip diacritics (ANDRÉ → ANDRE)
Strip nicknames in quotes (GERALD E. "GERRY" CONNOLLY → GERALD E. CONNOLLY)
Strip parenthetical nicknames (CHARLES (CHUCK) EDWARDS → CHARLES EDWARDS)
Split glued initials (G.K. BUTTERFIELD → G. K. BUTTERFIELD)
Strip stray apostrophes / quotes / Hawaiian okina (KAIALIʻI → KAIALII)
Normalize hyphens to spaces (WASSERMAN-SCHULTZ → WASSERMAN SCHULTZ)
Strip single-letter middle initials (JUSTIN A. AMASH → JUSTIN AMASH)
Normalize trailing periods on suffixes (JR. → JR)
Apply explicit reconciliation map for ~7 cases the algorithm misses — multi-letter middle names (RALPH LEE ABRAHAM → RALPH ABRAHAM), SOD typos (XAVIER BECARRA → XAVIER BECERRA, MARIE GLUESEKAMP PEREZ → MARIE GLUESENKAMP PEREZ), honorifics (AMATA COLEMAN RADEWAGEN → AUMUA AMATA RADEWAGEN)

7.3 Party data join

Party affiliation comes from the community-maintained @unitedstates/congress-legislators repository (CC0).

The fetcher generates multiple canonical-name variants per legislator: with/without suffix, formal name + nickname column + parenthetical-extracted nickname, with/without middle name. The lookup additionally tries common nickname expansions (ROBERT↔BOB, DANIEL↔DAN, ELIJAH↔ELI, etc.) and suffix-stripped variants (JR, MD, III).

Coverage: 829 / 837 retained members (99.0%). The 8 unmatched are all vacancy entries listed in the SOD under Nth DISTRICT OF X rather than a person's name — correctly null since no person held the seat.

7.4 Headline finding callout

Above the scatter, a dynamically-computed callout reports where Sherman ranks across the active filter set. The callout is recomputed on every toggle change and would automatically flip its phrasing if Sherman ever lost his position. As of this writing:

Cohort	N peers	Sherman pays CS less than	Sherman is in the top X% of mailers (per-year)
All offices ≥4 yr (mixed staff_year)	539	508 (94%)	top 8%
Currently in Congress (435 voting reps)	424	421 (99% — only 3 below him on median, 16 below on lowest)	top 8%

Across both cohorts and both wage metrics, Sherman remains in the bottom 1–6% for staff pay and top 8% for mail spending.

7.5 Interactive filter controls

The user can change the comparison set via three pill-toggle rows above the chart, plus a free-text member search:

Mail window — Cumulative 2016–2025 / Per-year average / Peak year. Per-year normalizes for tenure (Sherman's been in office since 1997 vs. Boebert / MTG since 2021); peak captures election-cycle spikes.
Y-axis metric — Median career CS wage / Lowest career CS wage. Median is the typical metric. Lowest is more robust to office-churn distortion (some offices have only 1–2 full-year career CS staff, so their median is fragile, but their lowest is always reliable). See §7.6.
Cohort — All offices (≥4 years observed) / Currently in Congress (the 435 voting House seats with 2025 payroll data, dropping 6 non-voting delegates and 2 vacancy/transition placeholders). The "currently in Congress" cohort gives apples-to-apples 2025 numbers; the default mixes year-records.
Highlight presets — preset member groupings (Firebrand Republicans, Recent Speakers, Top per-year mailers) one-click-apply highlight chips.
Member search — typeahead match against any of the ~840 House offices in the data. Adding a member labels their dot in the chart.

State is encoded in the URL search params (?mode=peakYear&metric=lowest&cohort=voting2025&highlights=marjorie-taylor-greene,matt-gaetz,lauren-boebert), so any specific view is shareable. The IG carousel images we've published are reproducible by visiting the corresponding URL.

7.6 Why "lowest career CS wage" is sometimes more honest than median

For the 4-quarter "career staff" filter (§6.1), an office's median CS staff wage can be misleading when:

The office has only 1–2 career CS staff (mid-year hires get filtered out)
High CS-staff turnover means many partial-year hires are excluded
The office's regional structure spawns short-tenured field-rep roles

Example: Lauren Boebert's office in 2025 had 8 distinct CS people but only 2 worked all 4 quarters. Her reported median ($115,597) is the average of those 2 senior staff. The other 6 (interns + 4 regional field reps with 1–3 quarters tenure) get dropped. The median of all her CS staff is closer to $8,086.

The lowest career CS wage doesn't have this problem. As long as any one full-year CS staffer is paid little, the office's "lowest" reflects that — and it's much harder for an office to game by hiring high-paid replacements. The Sherman-vs-everyone-else finding holds on both metrics, but lowest is the metric we'd use against a sophisticated rebuttal.

7.7 Reproducible static images

The same MRAScatter component is used for both the interactive explorer and the static IG / Reddit carousel images. Each static image lives at its own page URL:

Reddit (peak year): /research/mra-static-peak
Reddit (per-year): /research/mra-static-peryear
IG portrait carousel slides 1–5: /research/mra-static-ig-peak, /research/mra-static-ig-peak-mtg, /research/mra-static-ig-peak-mtg-boebert, /research/mra-static-ig-peak-mtg-boebert-gaetz, /research/mra-static-ig-peak-finale
IG 2025-cohort: /research/mra-static-ig-peryear-2025

These are screenshotted via headless puppeteer for posting. The same data and same chart logic feed both the live explorer and the static images, so any number visible in the carousel can be re-derived from the live dashboard.

8. Worked examples — re-derive these from scratch

Pick any of these and re-derive from the raw CSVs to verify our pipeline. The same quarterly CSVs we use are mirrored at the publicly-readable mra-sod-archive.marenalinforcongress.com bucket (download links also available in the dashboard's "Source data" disclosure), and they're identical bytes to the upstream files at house.gov/the-house-explained/open-government/statement-of-disbursements/archive. Each example below tells you exactly which CSVs and which filters to use; you can verify any line of every dashboard claim with a spreadsheet.

Example A: Brad Sherman 2024 total office spend

Expected: $2,163,600.23

Steps:

Open 2024Q1.csv, 2024Q2.csv, 2024Q3.csv, 2024Q4.csv
In each: filter to rows where ORGANIZATION contains "BRAD SHERMAN" (case-insensitive)
Filter further to SORT SEQUENCE == 'DETAIL'
Sum the AMOUNT column
Add the four quarterly sums

Example B: Brad Sherman 2024 personnel compensation

Expected: $1,438,861.93

Same as A, but additionally filter to SORT SUBTOTAL DESCRIPTION == 'PERSONNEL COMPENSATION'.

Example C: Brad Sherman 2024 outbound (postage + mailers + ads)

Expected: $381,910.07

Open all 4 quarters of 2024
Filter to BRAD SHERMAN and SORT SEQUENCE == 'DETAIL'
Filter SORT SUBTOTAL DESCRIPTION to be in {'FRANKED MAIL', 'PRINTING AND REPRODUCTION', 'OTHER SERVICES'} (after collapsing whitespace)
For each remaining row, classify the VENDOR NAME against the classifier rules in §5
Sum AMOUNT for rows whose vendor classifies as postage, mass_mailer, or advertising

Example D: Brad Sherman 2022 outbound

Expected: $517,338.34

Same as C, but for 2022 quarters. Note: 2022 CSVs lack the BUDGET OBJECT CODE column, but the vendor-name classifier still works because vendor names are present.

Example E: AOC 2024 outbound

Expected: $7,482.61

Same procedure as C, with OCASIO[\s\-]*CORTEZ as the member match.

Example F: Sherman lowest full-time, full-year staff salary in 2025

Expected: $41,939.23 (a "CASEWORKER", VENDOR NAME VINES JUSTIN)

Open 2025Q1.csv through 2025Q4.csv
Filter to BRAD SHERMAN, SORT SEQUENCE == 'DETAIL', SORT SUBTOTAL DESCRIPTION == 'PERSONNEL COMPENSATION'
For each VENDOR NAME that is not blank, sum AMOUNT across the four quarters
Count how many distinct quarters each name appears in; keep only those with quartersPaid == 4
Exclude any DESCRIPTION matching SHARED EMPLOYEE|TEMPORARY EMPLOYEE|PAID INTERN|PART-TIME EMPLOYEE (case-insensitive)
The minimum of the remaining annual sums is the "lowest full-time, full-year salary"

Example G: CONEXION POLITICAL LLC payments to Sherman in 2024

Expected: $101,093.00 (across 5 detail transactions)

Example H: Pressley constituent-services staff count in 2025

Expected: 13 (4 career staff — DISTRICT DIRECTOR, DEPUTY DISTRICT DIRECTOR, CONSTITUENT SERVICES DIRECTOR, CONSTITUENT ADVOCATE — plus 9 paid district-office interns). The dashboard's per-member CS staff count is total, not career-only.