Bangladesh Banking Monthly Dashboard — Pipeline + Web Page

Context

Build a monthly data pipeline that scrapes Bangladesh banking/financial data from public sources (Bangladesh Bank, BBS, DSE), stores history in SQLite, exports static JSON files, and deploys them as a live dashboard page on sakib8275.com. Architecture: pipeline runs locally each month → JSON files committed to portfolio repo → Cloudflare Workers serves them statically → Astro page fetches and renders client-side. Zero hosting cost, no backend server.

The user already has:


Architecture

Python pipeline (local, monthly)
    ↓  scrape: bb.org.bd, bbs.gov.bd, dsebd.org
    ↓  store: SQLite history (with fallback on scrape failure)
    ↓  export: dashboard_latest.json, dashboard_YYYY_MM.json, history.json
    ↓  copy → portfolio-astro/public/data/
    ↓  git push + wrangler deploy
Cloudflare Workers serves /data/*.json as static assets

sakib8275.com/dashboard (Astro page, client-side fetch + vanilla JS SVG charts)

New Files to Create

Pipeline: /home/kitahara/Desktop/Projects/dashboard-pipeline/

config.py          — env-driven constants (DB_PATH, OUTPUT_DIR, PORTFOLIO_DATA_DIR, URLs)
db.py              — setup_database(), get_last_known(), upsert_raw_scrape(), upsert_monthly_metrics()
transform.py       — normalize_period(), build_dashboard_record(), compute_data_quality()
export.py          — write_json_files(), update_history(), copy_to_portfolio()
main.py            — CLI: python main.py [--period 2026-04] [--dry-run] [--skip-copy]
scrapers/
  __init__.py
  bb_indicators.py — BB Selected Economic Indicators HTML table → forex reserve, exchange rate, policy rate
  bb_pdf_sbs.py    — BB Scheduled Bank Statistics PDF → NPL, CAR, ADR, credit growth
  bb_pdf_psr.py    — BB Payment Systems Report PDF → MFS volume, agent banking, internet banking
  bbs_cpi.py       — BBS CPI page → headline/food/non-food CPI
  dse_market.py    — DSE market summary → DSEX, turnover, market cap
.env.example
requirements.txt
data/banking_dashboard.db   (gitignored)
output/                     (gitignored — JSON files before copy)

Portfolio additions

portfolio-astro/public/data/
  dashboard_latest.json       ← always current month
  dashboard_YYYY_MM.json      ← one per month archived
  history.json                ← all months, for sparklines (max 36)
portfolio-astro/src/pages/dashboard.astro   ← new dashboard page
portfolio-astro/src/components/Navbar.astro ← add Dashboard nav link
portfolio-astro/public/styles.css           ← append dashboard CSS

SQLite Schema (db.py)

WAL mode + PRAGMA synchronous = NORMAL — reuse RTGS pattern.

CREATE TABLE raw_scrapes (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    period_iso  TEXT NOT NULL,   -- '2026-04'
    source_name TEXT NOT NULL,   -- 'bb_indicators' | 'bb_sbs' | 'bb_psr' | 'bbs_cpi' | 'dse'
    scraped_at  TEXT NOT NULL,
    raw_json    TEXT,
    status      TEXT NOT NULL,   -- 'ok' | 'failed' | 'partial'
    UNIQUE(period_iso, source_name)
);

CREATE TABLE monthly_metrics (
    period_iso                      TEXT UNIQUE NOT NULL,
    period_label                    TEXT NOT NULL,
    generated_at                    TEXT NOT NULL,
    -- monetary
    forex_reserve_usd_bn            REAL,
    forex_import_cover_months       REAL,
    policy_rate_pct                 REAL,
    usd_bdt_rate                    REAL,
    cpi_headline_pct                REAL,
    cpi_food_pct                    REAL,
    cpi_nonfood_pct                 REAL,
    -- banking
    private_credit_growth_yoy_pct   REAL,
    npl_ratio_pct                   REAL,
    npl_state_bank_pct              REAL,
    npl_private_bank_pct            REAL,
    adr_pct                         REAL,
    car_pct                         REAL,
    -- digital_finance
    mfs_txn_value_bn_bdt            REAL,
    mfs_accounts_mn                 REAL,
    agent_banking_accounts_mn       REAL,
    agent_banking_disbursement_bn_bdt REAL,
    internet_banking_txn_mn         REAL,
    -- capital_markets
    dsex_close                      REAL,
    dse_turnover_bn_bdt             REAL,
    dse_market_cap_bn_bdt           REAL,
    data_quality_json               TEXT
);

Idempotency: INSERT ... ON CONFLICT(period_iso, source_name) DO UPDATE SET ... — same pattern as RTGS.


Output JSON Schema

{
  "meta": {
    "period": "April 2026",
    "period_iso": "2026-04",
    "generated_at": "ISO8601Z",
    "sources": [{"name": "...", "status": "ok|failed|partial"}],
    "data_quality": {"monetary": "complete|partial|stale", "banking": "...", ...}
  },
  "monetary":        { "forex_reserve_usd_bn": 20.4, "policy_rate_pct": 10.0, ... },
  "banking":         { "npl_ratio_pct": 11.4, "npl_state_bank_pct": 22.1, ... },
  "digital_finance": { "mfs_txn_value_bn_bdt": 1250, "mfs_accounts_mn": 230, ... },
  "capital_markets": { "dsex_close": 5200, "dse_turnover_bn_bdt": 8.4, ... }
}

history.json structure: {"updated_at": "...", "months": [flattened_record, ...]} — max 36 months, ascending by period_iso, upserted idempotently.


Scraper Contracts

Each scraper exports scrape(period_iso: str, session: requests.Session) -> dict. Raises ScraperError on failure.

ModuleSourceMethodKey Fields
bb_indicators.pybb.org.bd/econdata/selectedeconomicindicatorsBeautifulSoup HTML table, find by “forex”/“foreign exchange” textforex_reserve_usd_bn, usd_bdt_rate, policy_rate_pct
bb_pdf_sbs.pybb.org.bd/publication/publictn/0/16 (index → PDF)Stage 1: find latest PDF link; Stage 2: pdfplumber + regex on full textnpl_ratio_pct, npl_state_bank_pct, npl_private_bank_pct, car_pct, adr_pct, private_credit_growth_yoy_pct
bb_pdf_psr.pybb.org.bd/publication/publictn/0/29 (index → PDF)Same two-stage PDF patternmfs_txn_value_bn_bdt, mfs_accounts_mn, agent_banking_accounts_mn, agent_banking_disbursement_bn_bdt, internet_banking_txn_mn
bbs_cpi.pybbs.gov.bd/site/page/7bef52ab-...BeautifulSoup + regex on page text; HTML table fallbackcpi_headline_pct, cpi_food_pct, cpi_nonfood_pct
dse_market.pydsebd.org/market_summary.phpPlain HTTP first; REQUIRES_PLAYWRIGHT flag for JS-rendered fallbackdsex_close, dse_turnover_bn_bdt, dse_market_cap_bn_bdt

Graceful degradation: If a scraper raises, main.py catches it, marks status='failed', calls db.get_last_known() for each field from that source, and passes them as fallbacks to build_dashboard_record(). data_quality section becomes 'stale'.


requirements.txt

requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=5.0.0
pdfplumber>=0.10.0
tqdm>=4.66.0
python-dotenv>=1.0.0
playwright>=1.44.0   # optional, only if DSE needs JS rendering

Dashboard Page (dashboard.astro)

4 sections: Monetary Indicators | Banking Sector Health | Digital Finance | Capital Markets

Each section: KPI grid + sparklines from history.json. Banking section adds NPL bar chart.

CSS: append ~100 lines to public/styles.css using existing design tokens (--bg-surface, --accent, --indigo, --border, var(--radius), Inter + Outfit fonts).


Deployment (monthly flow)

# 1. Run pipeline
cd /home/kitahara/Desktop/Projects/dashboard-pipeline
python main.py --period 2026-05

# 2. Commit + deploy portfolio
cd /home/kitahara/Desktop/Projects/portfolio/portfolio-astro
git add public/data/
git commit -m "chore: dashboard data — May 2026"
git push && npx wrangler deploy

CSP connect-src 'self' already covers same-origin /data/*.json — no _headers change needed. Add to public/_headers: Cache-Control: public, max-age=3600 for /data/*.json.


First-Run Calibration (manual, before automating)

  1. BB Indicators column indices — GET the page, view source, verify which column index holds forex / exchange rate / policy rate. Mark as # VERIFY_COLUMN_INDEX in code.
  2. SBS PDF structure — Download one manually, run pdfplumber interactively on it, inspect page.extract_text() output for NPL/CAR/ADR tables. Tune regex patterns.
  3. PSR PDF structure — Same as above for MFS / agent banking tables.
  4. DSE JS checkrequests.get(DSE_MARKET_URL) and check if DSEX numeric value appears in resp.text. If not, set REQUIRES_PLAYWRIGHT = True in dse_market.py.
  5. BBS CPI structure — Check if values are in HTML table or <script> JS object.
  6. DSE units — Confirm whether turnover/market cap are published in crore BDT (÷10 for billions) or already in billions.

Verification

# Dry run (scrapes but writes nothing)
python main.py --period 2026-04 --dry-run

# Full run without portfolio copy (inspect output/ first)
python main.py --period 2026-04 --skip-copy
cat output/dashboard_latest.json | python -m json.tool

# Copy and test portfolio locally
python main.py --period 2026-04
cd ../portfolio/portfolio-astro && npm run dev
# Visit localhost:4321/dashboard — verify KPIs, sparklines, quality badges
# Rename public/data/dashboard_latest.json → verify error state shows

# Deploy
npx wrangler deploy
# Visit sakib8275.com/dashboard

Key Function Signatures

FunctionModuleSignature
setup_databasedb.py() -> None
get_last_knowndb.py`(conn, source_name, field) -> float
upsert_raw_scrapedb.py(period_iso, source_name, scraped_at, raw_json, status) -> None
scrapescrapers/*.py(period_iso, session) -> dict
normalize_periodtransform.py(period_arg) -> tuple[str, str]
build_dashboard_recordtransform.py(period_iso, period_label, scraped_data, fallbacks, statuses) -> dict
write_json_filesexport.py(period_iso, record) -> tuple[Path, Path]
update_historyexport.py(period_iso, record) -> Path
copy_to_portfolioexport.py(files: list[Path]) -> None
render_sparklinedashboard.astro JS(values, width, height, color) -> SVG string
render_bar_chartdashboard.astro JS(items, width, height, color) -> SVG string