MRA Dashboard — Methodology
This dashboard compares how four U.S. House members spend the ~$2 million per year office budget every member receives (officially the Members' Representational Allowance, or MRA). Every number on the dashboard is derived from the public House Statement of Disbursements (SOD) — quarterly CSVs the House Chief Administrative Officer publishes within 60 days of each quarter's end.
This document is the audit trail. It exists so that a reporter, opposing campaign, or skeptical voter can re-derive any headline figure on the dashboard from the raw filings, without trusting our pipeline.
1. Data source
- Publisher: Office of the Chief Administrative Officer, U.S. House of Representatives
- Archive page: https://www.house.gov/the-house-explained/open-government/statement-of-disbursements/archive
- Format: Quarterly "Detail" CSVs (one per fiscal quarter)
- Coverage on the dashboard: 2016 Q1 through the most recent published quarter
- Mirrored copies: every CSV the dashboard uses is also mirrored at https://mra-sod-archive.marenalinforcongress.com/ for stable access (a manifest with direct download URLs is at
/data/mra/manifest.json)
The House publishes new quarters approximately 60 days after each quarter ends. Q1 2026, for example, would not appear until ~May 2026.
What's also published elsewhere — and what this dashboard adds
Individual member spending and staff salary figures are not exclusive to this dashboard. LegiStorm.com publishes a free, searchable database of congressional staff salaries drawn from the same House Statement of Disbursements records, browsable by member, state, committee, and admin office. Their analytical features (Salary Rate Analysis, Historical Trends, lobbying / revolving-door data) are paywalled in their Pro tier.
The contribution of this dashboard, beyond what's freely available at LegiStorm:
- Comparative analytical layer made public. Percentile rank for each member against the full national distribution of ~1,023 House offices observed in our 2016–2025 data window — the kind of analysis LegiStorm reserves for paying customers.
- Vendor classification taxonomy. Each transaction's vendor name is bucketed into postage / mass-mailer / advertising / digital-constituent / design / translation / office-other. We've found no evidence LegiStorm publishes vendor classifications in this form.
- Election-cycle visualization. Year-by-year outbound mail and ad spend across all four featured members, with primary-year cycles highlighted.
- Documented methodology + mirrored source data. This document spells out the schema, classifier rules, and worked examples for every headline figure. The raw quarterly CSVs are mirrored at https://mra-sod-archive.marenalinforcongress.com/ so any reader can re-derive any number by hand. (The processing code itself currently lives in the campaign's private repo; if there's interest, a separately-licensed public ETL repo could be spun up.)
LegiStorm's underlying data and ours come from the same House filings, so the individual numbers should match. If you find a discrepancy on a per-member figure, please report it — that's the kind of error our methodology is designed to catch.
2. Schema (and a quirk you need to know)
The CSV schema changed in 2023. This matters for any audit.
2023 Q1 onward (current schema)
Columns we use, in order:
| Column | What it contains | Example |
|---|---|---|
ORGANIZATION |
The office name with year prefix | 2025 HON. BRAD SHERMAN |
SORT SUBTOTAL DESCRIPTION |
The category (8–13 distinct values) | PERSONNEL COMPENSATION |
SORT SEQUENCE |
Row type — DETAIL, SUBTOTAL, or GRAND TOTAL FOR ORGANIZATION |
DETAIL |
TRANSACTION DATE |
When the payment posted | 3-Jan-25 |
VENDOR NAME |
Who got paid | BULLSEYE MARKETING |
DESCRIPTION |
Free-text purpose / job title | DISTRICT DIRECTOR |
BUDGET OBJECT CODE |
4-digit federal Budget Object Code | 1101 (personnel salary), 2352 (franked-mail postage), etc. |
AMOUNT |
Dollar amount (negative for refunds) | 41066.67 |
2016 Q1 through 2022 Q4 (legacy schema)
The legacy CSVs lack the BUDGET OBJECT CODE column entirely. They have SORT SUBTOTAL DESCRIPTION (the broad category) but no fine-grained classification.
This means: for 2016–2022 we can only see, e.g., the total PRINTING AND REPRODUCTION for the office, not the split between mass-mailers and office printing. Our vendor classifier (described below) bridges the gap by classifying each transaction by vendor name — that works across all years.
Critical filter: only sum DETAIL rows
The CSVs interleave DETAIL rows (real transactions) with SUBTOTAL rows (per-category running totals) and a final GRAND TOTAL FOR ORGANIZATION row. Naively summing the AMOUNT column doubles every total because the subtotals re-sum the detail.
Always filter to SORT SEQUENCE == 'DETAIL' before any aggregation. Our ETL does this on every quarter.
Categorical strings have irregular spacing
The SORT SUBTOTAL DESCRIPTION field has a few entries with non-standard whitespace, e.g. RENT COMMUNICATION UTILITIES (two spaces, no commas). Our normalizer collapses any run of whitespace to a single space and uppercases before matching.
3. The four members
Every aggregation filters by ORGANIZATION matching one of these patterns (case-insensitive regex):
| Member | Pattern | District (current) | First quarter on file |
|---|---|---|---|
| Brad Sherman | HON\.\s*BRAD\s*SHERMAN |
CA-32 (was CA-30 pre-2024 redistricting) | 2016 Q1 |
| Alexandria Ocasio-Cortez | HON\.\s*ALEXANDRIA\s*OCASIO[\s\-]*CORTEZ |
NY-14 | 2019 Q1 |
| Ilhan Omar | HON\.\s*ILHAN\s*OMAR |
MN-5 | 2019 Q1 |
| Ayanna Pressley | HON\.\s*AYANNA\s*PRESSLEY |
MA-7 | 2019 Q1 |
The ORGANIZATION field includes a year prefix (e.g., 2025 HON. BRAD SHERMAN); the regex ignores that. Sherman has been in office since 1997 across CA-24 → CA-27 → CA-30 → CA-32, all under the same name pattern.
4. Headline categories (from SORT SUBTOTAL DESCRIPTION)
These are the broad office-budget categories every quarter. The values shown are exact uppercase strings as they appear in the CSV after whitespace normalization.
| Display | Raw string |
|---|---|
| Personnel Compensation | PERSONNEL COMPENSATION |
| Personnel Benefits | PERSONNEL BENEFITS |
| Travel | TRAVEL |
| Rent, Communications, Utilities | RENT COMMUNICATION UTILITIES |
| Other Services | OTHER SERVICES |
| Printing & Reproduction | PRINTING AND REPRODUCTION |
| Supplies & Materials | SUPPLIES AND MATERIALS |
| Equipment | EQUIPMENT |
| Franked Mail | FRANKED MAIL |
| Transportation of Things | TRANSPORTATION OF THINGS |
| Benefits to Former Personnel | BENEFITS TO FORMER PERSONNEL |
| Insurance Claims & Indemnities | INSURANCE CLAIMS & INDEMNITIES |
| Other | OTHER |
5. Vendor classifier (the key audit-relevant piece)
The headline figure on the Mailings & Ads page — "outbound mail and ads spent" — is not a simple sum of one category. It's a vendor-by-vendor classification because:
- The legacy schema (2016–2022) doesn't have a Budget Object Code column to distinguish, e.g., mass-mailer printing from office printing.
- Within
OTHER SERVICES, some vendors are digital ad firms (advertising), some are constituent CRMs (digital infrastructure), some are translators (case work), some are office cleaning.
So we classify each transaction's VENDOR NAME (uppercased) into one of eight buckets via regex matching. Summary:
| Bucket | Examples | Counts toward headline? |
|---|---|---|
postage |
UNITED STATES POSTAL SERVICE |
Yes (headline) |
mass_mailer |
BULLSEYE MARKETING, UNION GRAPHICS, PUBLIC PRINTER, HONSA BINDER, MINUTEMAN PRESS, … |
Yes (headline) |
advertising |
SABIO, CONEXION POLITICAL, VALLEY NEWS GROUP, IMPREMEDIA, RELATIVE INTEL, THE NEW MEDIA FIRM, FACEBK, GOOGLE ADS, … |
Yes (headline) |
digital_constituent |
HOUSECALL, LEIDOS DIGITAL, ICONSTITUENT, FIRESIDE, INDIGOVERN, MAILCHIMP, GSUITE, … |
No (adjacent infrastructure) |
design_creative |
PFEIFFER DESIGN, THREE METAL MONKEYS, 45PRESS, ADOBE, CANVA, … |
No |
translation |
ACCURATE WORD, INTERPRETING, LANGUAGE, TRANSLATIONS, TRINT, … |
No |
office_other |
INNOVATION OFFICE, BENJAMIN OFFICE SUPPLY, AMERICLEAN, HACKETT SECURITY, STAPLES, … |
No |
unclassified |
Any vendor not matching the rules above | Surfaced as error bound on dashboard cards |
Headline outbound spend = sum of postage + mass_mailer + advertising buckets, restricted to detail rows in the comms-adjacent SUBTOTAL categories (FRANKED MAIL, PRINTING AND REPRODUCTION, OTHER SERVICES).
The "± unclassified" disclosure on each member card is the upper bound: if we had wrongly classified every "unclassified" vendor as actually being an ad firm, the headline would rise by that amount. In practice, the largest unclassified entries are individual contractors (e.g., TATARIAN ALISA S. at ~$114k for Pressley) which are almost certainly not ad firms.
The full regex list is the source of truth; treat the table above as a summary.
Long-tail vendor classification (LLM-assisted)
For vendors not caught by the hand-written regex rules, a separate step asks a
language model to classify the vendor name into one of the eight buckets above
plus a confidence score. Only assignments at or above 0.7 confidence are
trusted; lower-confidence ones are dropped back into unclassified so they
show up in error bars rather than silently misclassified.
The two layers compose:
- Hand-written regex rule wins if any matches (highest precision)
- LLM-derived classification wins if regex misses AND confidence ≥ 0.7
- Otherwise:
unclassified, surfaced as the upper bound on uncertainty in per-member error bars
Regex rules are intentionally favored because they're auditable line-by-line in this document. Each LLM-assigned classification includes a short reasoning string for spot-check review.
6. Staff classification
For the Staff & Salaries page, we filter to SORT SUBTOTAL DESCRIPTION == 'PERSONNEL COMPENSATION' then bucket each staffer by job title (DESCRIPTION column).
Functional categories use keyword matching against the lowercased title:
| Functional category | Keywords |
|---|---|
CONSTITUENT_SERVICES |
caseworker, casework, constituent, district director, district rep, field rep, field representative, community liaison, immigration, veterans, outreach, district office |
LEGISLATIVE |
legislative, policy advisor, policy deputy, sr policy, senior policy, policy, counsel, legislative aide, legislative correspondent |
COMMUNICATIONS |
communications, press, digital director, social media, comm director, speechwriter |
OPERATIONS |
chief of staff, deputy chief, office manager, staff assistant, scheduler, executive assistant, systems, intern coord, operations |
LEADERSHIP |
senior advisor, sr advisor, committee, chair |
OTHER |
(fall-through — untyped roles) |
Note on interns and short-tenure roles: the role classifier keeps interns and shared/temporary/part-time staff in their assigned functional category. A row titled DISTRICT OFFICE PAID INTERN is classified as CONSTITUENT_SERVICES because they are doing constituent-services work, even if it's part-time. The per-function staff counts reflect everyone doing that kind of work, including support roles.
The salary distribution and floor calculations apply their own gates (full-year + exclude shared/temp/intern/PT — see §6.1) so per-function median pay is computed only over career staff, not skewed by intern stipends.
"Full-time, full-year" salary floor calculation
The salary-floor comparison on the Staff page restricts to staff who:
- Were paid in all 4 quarters of the year (
quartersPaid == 4) - Are not in any of these excluded title patterns:
SHARED EMPLOYEE,TEMPORARY EMPLOYEE,PAID INTERN,PART-TIME EMPLOYEE
Excluding these prevents the floor comparison from being skewed by part-time or fractional roles. AOC's $52,500 part-time employee — at her stated salary floor — is documented elsewhere in the dashboard but excluded from the apples-to-apples floor.
Annual salary for each staffer = sum of quarterly compensation amounts in the year.
7. House-wide cross-cutting view
The House-Wide View page (/research/mra-dashboard/cross-cutting) renders one scatter dot per House office observed in at least 4 quarters of 2016–2025. Two axes encode the trade-off between broadcast outreach and direct constituent service:
- X-axis: outbound mail + ads (
postage+mass_mailer+advertisingfrom §5 — same definition as the headline "outbound spend" elsewhere in the dashboard). The user toggles between three time windows:- Cumulative 2016–2025 (default)
- Per-year average (cumulative ÷ years observed — normalizes for tenure)
- Peak year (each member's single biggest mail-spending year)
- Y-axis: median wage of full-time career constituent-services staff at the most recent full year on file, using the same exclusion rules as §6.1 (excludes interns, temps, shared/PT).
- Color: party from the public
@unitedstates/congress-legislatorsdataset (CC0). Red = R, blue = D, gray = I/L. - Right-only X error bars: same proportional-allocation logic as elsewhere — assumes unclassified spend could turn out to be mail in the same proportion as the classified portion. The Y-axis has no equivalent because staff salaries are keyed by job title and don't depend on vendor classification.
7.1 Edge-case office filter
The national index excludes 145 of ~980 distinct office entries that aren't real serving members:
- 105 with only 1 quarter observed (vacancies, end-of-cycle transitions)
- 22 / 13 with 2 / 3 quarters (special-election partial terms)
- 8 caucus / committee / conference offices
The full list is written to public/data/mra/members/_excluded.json for transparency. Excluded offices stay out of _index.json, distributions.json, and scatter.json so they don't pollute percentile ranks.
7.2 Name canonicalization
Members appearing in the SOD under multiple spellings are collapsed before aggregation. Without this, ~30 members would occupy two slugs each, halving their headline totals and double-counting them in percentile rankings (e.g., GERALD E. CONNOLLY and GERALD E. "GERRY" CONNOLLY were two separate offices before).
The canonicalizer applies these rules in order:
- Strip diacritics (
ANDRÉ→ANDRE) - Strip nicknames in quotes (
GERALD E. "GERRY" CONNOLLY→GERALD E. CONNOLLY) - Strip parenthetical nicknames (
CHARLES (CHUCK) EDWARDS→CHARLES EDWARDS) - Split glued initials (
G.K. BUTTERFIELD→G. K. BUTTERFIELD) - Strip stray apostrophes / quotes / Hawaiian okina (
KAIALIʻI→KAIALII) - Normalize hyphens to spaces (
WASSERMAN-SCHULTZ→WASSERMAN SCHULTZ) - Strip single-letter middle initials (
JUSTIN A. AMASH→JUSTIN AMASH) - Normalize trailing periods on suffixes (
JR.→JR) - Apply explicit reconciliation map for ~7 cases the algorithm misses — multi-letter middle names (
RALPH LEE ABRAHAM→RALPH ABRAHAM), SOD typos (XAVIER BECARRA→XAVIER BECERRA,MARIE GLUESEKAMP PEREZ→MARIE GLUESENKAMP PEREZ), honorifics (AMATA COLEMAN RADEWAGEN→AUMUA AMATA RADEWAGEN)
7.3 Party data join
Party affiliation comes from the community-maintained @unitedstates/congress-legislators repository (CC0).
The fetcher generates multiple canonical-name variants per legislator: with/without suffix, formal name + nickname column + parenthetical-extracted nickname, with/without middle name. The lookup additionally tries common nickname expansions (ROBERT↔BOB, DANIEL↔DAN, ELIJAH↔ELI, etc.) and suffix-stripped variants (JR, MD, III).
Coverage: 829 / 837 retained members (99.0%). The 8 unmatched are all vacancy entries listed in the SOD under Nth DISTRICT OF X rather than a person's name — correctly null since no person held the seat.
7.4 Headline finding callout
Above the scatter, a dynamically-computed callout reports where Sherman ranks across the active filter set. The callout is recomputed on every toggle change and would automatically flip its phrasing if Sherman ever lost his position. As of this writing:
| Cohort | N peers | Sherman pays CS less than | Sherman is in the top X% of mailers (per-year) |
|---|---|---|---|
| All offices ≥4 yr (mixed staff_year) | 539 | 508 (94%) | top 8% |
| Currently in Congress (435 voting reps) | 424 | 421 (99% — only 3 below him on median, 16 below on lowest) | top 8% |
Across both cohorts and both wage metrics, Sherman remains in the bottom 1–6% for staff pay and top 8% for mail spending.
7.5 Interactive filter controls
The user can change the comparison set via three pill-toggle rows above the chart, plus a free-text member search:
- Mail window — Cumulative 2016–2025 / Per-year average / Peak year. Per-year normalizes for tenure (Sherman's been in office since 1997 vs. Boebert / MTG since 2021); peak captures election-cycle spikes.
- Y-axis metric — Median career CS wage / Lowest career CS wage. Median is the typical metric. Lowest is more robust to office-churn distortion (some offices have only 1–2 full-year career CS staff, so their median is fragile, but their lowest is always reliable). See §7.6.
- Cohort — All offices (≥4 years observed) / Currently in Congress (the 435 voting House seats with 2025 payroll data, dropping 6 non-voting delegates and 2 vacancy/transition placeholders). The "currently in Congress" cohort gives apples-to-apples 2025 numbers; the default mixes year-records.
- Highlight presets — preset member groupings (Firebrand Republicans, Recent Speakers, Top per-year mailers) one-click-apply highlight chips.
- Member search — typeahead match against any of the ~840 House offices in the data. Adding a member labels their dot in the chart.
State is encoded in the URL search params (?mode=peakYear&metric=lowest&cohort=voting2025&highlights=marjorie-taylor-greene,matt-gaetz,lauren-boebert), so any specific view is shareable. The IG carousel images we've published are reproducible by visiting the corresponding URL.
7.6 Why "lowest career CS wage" is sometimes more honest than median
For the 4-quarter "career staff" filter (§6.1), an office's median CS staff wage can be misleading when:
- The office has only 1–2 career CS staff (mid-year hires get filtered out)
- High CS-staff turnover means many partial-year hires are excluded
- The office's regional structure spawns short-tenured field-rep roles
Example: Lauren Boebert's office in 2025 had 8 distinct CS people but only 2 worked all 4 quarters. Her reported median ($115,597) is the average of those 2 senior staff. The other 6 (interns + 4 regional field reps with 1–3 quarters tenure) get dropped. The median of all her CS staff is closer to $8,086.
The lowest career CS wage doesn't have this problem. As long as any one full-year CS staffer is paid little, the office's "lowest" reflects that — and it's much harder for an office to game by hiring high-paid replacements. The Sherman-vs-everyone-else finding holds on both metrics, but lowest is the metric we'd use against a sophisticated rebuttal.
7.7 Reproducible static images
The same MRAScatter component is used for both the interactive explorer and the static IG / Reddit carousel images. Each static image lives at its own page URL:
- Reddit (peak year):
/research/mra-static-peak - Reddit (per-year):
/research/mra-static-peryear - IG portrait carousel slides 1–5:
/research/mra-static-ig-peak,/research/mra-static-ig-peak-mtg,/research/mra-static-ig-peak-mtg-boebert,/research/mra-static-ig-peak-mtg-boebert-gaetz,/research/mra-static-ig-peak-finale - IG 2025-cohort:
/research/mra-static-ig-peryear-2025
These are screenshotted via headless puppeteer for posting. The same data and same chart logic feed both the live explorer and the static images, so any number visible in the carousel can be re-derived from the live dashboard.
8. Worked examples — re-derive these from scratch
Pick any of these and re-derive from the raw CSVs to verify our pipeline. The same quarterly CSVs we use are mirrored at the publicly-readable mra-sod-archive.marenalinforcongress.com bucket (download links also available in the dashboard's "Source data" disclosure), and they're identical bytes to the upstream files at house.gov/the-house-explained/open-government/statement-of-disbursements/archive. Each example below tells you exactly which CSVs and which filters to use; you can verify any line of every dashboard claim with a spreadsheet.
Example A: Brad Sherman 2024 total office spend
Expected: $2,163,600.23
Steps:
- Open
2024Q1.csv,2024Q2.csv,2024Q3.csv,2024Q4.csv - In each: filter to rows where
ORGANIZATIONcontains "BRAD SHERMAN" (case-insensitive) - Filter further to
SORT SEQUENCE == 'DETAIL' - Sum the
AMOUNTcolumn - Add the four quarterly sums
Example B: Brad Sherman 2024 personnel compensation
Expected: $1,438,861.93
Same as A, but additionally filter to SORT SUBTOTAL DESCRIPTION == 'PERSONNEL COMPENSATION'.
Example C: Brad Sherman 2024 outbound (postage + mailers + ads)
Expected: $381,910.07
- Open all 4 quarters of 2024
- Filter to BRAD SHERMAN and
SORT SEQUENCE == 'DETAIL' - Filter
SORT SUBTOTAL DESCRIPTIONto be in{'FRANKED MAIL', 'PRINTING AND REPRODUCTION', 'OTHER SERVICES'}(after collapsing whitespace) - For each remaining row, classify the
VENDOR NAMEagainst the classifier rules in §5 - Sum
AMOUNTfor rows whose vendor classifies aspostage,mass_mailer, oradvertising
Example D: Brad Sherman 2022 outbound
Expected: $517,338.34
Same as C, but for 2022 quarters. Note: 2022 CSVs lack the BUDGET OBJECT CODE column, but the vendor-name classifier still works because vendor names are present.
Example E: AOC 2024 outbound
Expected: $7,482.61
Same procedure as C, with OCASIO[\s\-]*CORTEZ as the member match.
Example F: Sherman lowest full-time, full-year staff salary in 2025
Expected: $41,939.23 (a "CASEWORKER", VENDOR NAME VINES JUSTIN)
- Open
2025Q1.csvthrough2025Q4.csv - Filter to BRAD SHERMAN,
SORT SEQUENCE == 'DETAIL',SORT SUBTOTAL DESCRIPTION == 'PERSONNEL COMPENSATION' - For each
VENDOR NAMEthat is not blank, sumAMOUNTacross the four quarters - Count how many distinct quarters each name appears in; keep only those with
quartersPaid == 4 - Exclude any
DESCRIPTIONmatchingSHARED EMPLOYEE|TEMPORARY EMPLOYEE|PAID INTERN|PART-TIME EMPLOYEE(case-insensitive) - The minimum of the remaining annual sums is the "lowest full-time, full-year salary"
Example G: CONEXION POLITICAL LLC payments to Sherman in 2024
Expected: $101,093.00 (across 5 detail transactions)
Example H: Pressley constituent-services staff count in 2025
Expected: 13 (4 career staff — DISTRICT DIRECTOR, DEPUTY DISTRICT DIRECTOR, CONSTITUENT SERVICES DIRECTOR, CONSTITUENT ADVOCATE — plus 9 paid district-office interns). The dashboard's per-member CS staff count is total, not career-only.
Steps:
Open all 4 quarters of 2025
Filter to AYANNA PRESSLEY,
SORT SEQUENCE == 'DETAIL',SORT SUBTOTAL DESCRIPTION == 'PERSONNEL COMPENSATION'Find each distinct
VENDOR NAME(staff member)Apply the role classifier to each row's
DESCRIPTIONusing the keyword table from §6. A title likeDISTRICT OFFICE PAID INTERNmatches thedistrict officekeyword and classifies asCONSTITUENT_SERVICES.Count distinct names whose classification is
CONSTITUENT_SERVICES(no full-year filter; total team count).Open all 4 quarters of 2024
Filter to BRAD SHERMAN and
SORT SEQUENCE == 'DETAIL'Filter to
VENDOR NAMEcontaining "CONEXION POLITICAL"Sum
AMOUNTand count rows
9. Known limitations
- Pre-2023 schema lacks Budget Object Codes. The vendor classifier is the only way to break out
OTHER SERVICESandPRINTING AND REPRODUCTIONinto ad-vs-infrastructure-vs-supplies for those years. We surface the unclassified amount as the upper bound on classifier error. - Personnel compensation includes lump-sum bonuses. Many offices issue end-of-year bonuses through
PERSONNEL COMPENSATION. A one-quarter spike can reflect a bonus rather than a normal salary rate. We don't try to split this out. - Sherman district changes over time. Sherman's district has been CA-24 → CA-27 → CA-30 → CA-32 across his tenure. The dashboard's headline ("CA-32 incumbent") refers to his current district; historical spending was made while representing earlier district numbers.
- Squad members started Jan 2019. AOC, Omar, and Pressley have no SOD records before 2019 Q1.
- Q1 2026 is not yet published. It typically appears ~May 2026 (60 days after quarter end). Re-running the ETL will pick it up automatically.
- Sherman's "ADVERTISEMENTS" line isn't always coded as such. Some 2022 ad placements appear in
PRINTING AND REPRODUCTIONrather thanOTHER SERVICES. The vendor classifier catches these regardless of the parent category.