Bangladesh Banking Monthly Dashboard — Pipeline + Web Page
Context
Build a monthly data pipeline that scrapes Bangladesh banking/financial data from public sources (Bangladesh Bank, BBS, DSE), stores history in SQLite, exports static JSON files, and deploys them as a live dashboard page on sakib8275.com. Architecture: pipeline runs locally each month → JSON files committed to portfolio repo → Cloudflare Workers serves them statically → Astro page fetches and renders client-side. Zero hosting cost, no backend server.
The user already has:
- RTGS-Report-Server (
/home/kitahara/Desktop/RTGS-Report-Server/) with proven ETL patterns (WAL SQLite, idempotent inserts, env config, pandas/openpyxl) - Portfolio on Astro 6.1.7 + Cloudflare Workers at
/home/kitahara/Desktop/Projects/portfolio/portfolio-astro/ - Design system:
--bg-primary: #0a0f1e,--accent: #00d4aa(teal),--indigo: #6366f1
Architecture
Python pipeline (local, monthly)
↓ scrape: bb.org.bd, bbs.gov.bd, dsebd.org
↓ store: SQLite history (with fallback on scrape failure)
↓ export: dashboard_latest.json, dashboard_YYYY_MM.json, history.json
↓ copy → portfolio-astro/public/data/
↓ git push + wrangler deploy
Cloudflare Workers serves /data/*.json as static assets
↓
sakib8275.com/dashboard (Astro page, client-side fetch + vanilla JS SVG charts)
New Files to Create
Pipeline: /home/kitahara/Desktop/Projects/dashboard-pipeline/
config.py — env-driven constants (DB_PATH, OUTPUT_DIR, PORTFOLIO_DATA_DIR, URLs)
db.py — setup_database(), get_last_known(), upsert_raw_scrape(), upsert_monthly_metrics()
transform.py — normalize_period(), build_dashboard_record(), compute_data_quality()
export.py — write_json_files(), update_history(), copy_to_portfolio()
main.py — CLI: python main.py [--period 2026-04] [--dry-run] [--skip-copy]
scrapers/
__init__.py
bb_indicators.py — BB Selected Economic Indicators HTML table → forex reserve, exchange rate, policy rate
bb_pdf_sbs.py — BB Scheduled Bank Statistics PDF → NPL, CAR, ADR, credit growth
bb_pdf_psr.py — BB Payment Systems Report PDF → MFS volume, agent banking, internet banking
bbs_cpi.py — BBS CPI page → headline/food/non-food CPI
dse_market.py — DSE market summary → DSEX, turnover, market cap
.env.example
requirements.txt
data/banking_dashboard.db (gitignored)
output/ (gitignored — JSON files before copy)
Portfolio additions
portfolio-astro/public/data/
dashboard_latest.json ← always current month
dashboard_YYYY_MM.json ← one per month archived
history.json ← all months, for sparklines (max 36)
portfolio-astro/src/pages/dashboard.astro ← new dashboard page
portfolio-astro/src/components/Navbar.astro ← add Dashboard nav link
portfolio-astro/public/styles.css ← append dashboard CSS
SQLite Schema (db.py)
WAL mode + PRAGMA synchronous = NORMAL — reuse RTGS pattern.
CREATE TABLE raw_scrapes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
period_iso TEXT NOT NULL, -- '2026-04'
source_name TEXT NOT NULL, -- 'bb_indicators' | 'bb_sbs' | 'bb_psr' | 'bbs_cpi' | 'dse'
scraped_at TEXT NOT NULL,
raw_json TEXT,
status TEXT NOT NULL, -- 'ok' | 'failed' | 'partial'
UNIQUE(period_iso, source_name)
);
CREATE TABLE monthly_metrics (
period_iso TEXT UNIQUE NOT NULL,
period_label TEXT NOT NULL,
generated_at TEXT NOT NULL,
-- monetary
forex_reserve_usd_bn REAL,
forex_import_cover_months REAL,
policy_rate_pct REAL,
usd_bdt_rate REAL,
cpi_headline_pct REAL,
cpi_food_pct REAL,
cpi_nonfood_pct REAL,
-- banking
private_credit_growth_yoy_pct REAL,
npl_ratio_pct REAL,
npl_state_bank_pct REAL,
npl_private_bank_pct REAL,
adr_pct REAL,
car_pct REAL,
-- digital_finance
mfs_txn_value_bn_bdt REAL,
mfs_accounts_mn REAL,
agent_banking_accounts_mn REAL,
agent_banking_disbursement_bn_bdt REAL,
internet_banking_txn_mn REAL,
-- capital_markets
dsex_close REAL,
dse_turnover_bn_bdt REAL,
dse_market_cap_bn_bdt REAL,
data_quality_json TEXT
);
Idempotency: INSERT ... ON CONFLICT(period_iso, source_name) DO UPDATE SET ... — same pattern as RTGS.
Output JSON Schema
{
"meta": {
"period": "April 2026",
"period_iso": "2026-04",
"generated_at": "ISO8601Z",
"sources": [{"name": "...", "status": "ok|failed|partial"}],
"data_quality": {"monetary": "complete|partial|stale", "banking": "...", ...}
},
"monetary": { "forex_reserve_usd_bn": 20.4, "policy_rate_pct": 10.0, ... },
"banking": { "npl_ratio_pct": 11.4, "npl_state_bank_pct": 22.1, ... },
"digital_finance": { "mfs_txn_value_bn_bdt": 1250, "mfs_accounts_mn": 230, ... },
"capital_markets": { "dsex_close": 5200, "dse_turnover_bn_bdt": 8.4, ... }
}
history.json structure: {"updated_at": "...", "months": [flattened_record, ...]} — max 36 months, ascending by period_iso, upserted idempotently.
Scraper Contracts
Each scraper exports scrape(period_iso: str, session: requests.Session) -> dict. Raises ScraperError on failure.
| Module | Source | Method | Key Fields |
|---|---|---|---|
bb_indicators.py | bb.org.bd/econdata/selectedeconomicindicators | BeautifulSoup HTML table, find by “forex”/“foreign exchange” text | forex_reserve_usd_bn, usd_bdt_rate, policy_rate_pct |
bb_pdf_sbs.py | bb.org.bd/publication/publictn/0/16 (index → PDF) | Stage 1: find latest PDF link; Stage 2: pdfplumber + regex on full text | npl_ratio_pct, npl_state_bank_pct, npl_private_bank_pct, car_pct, adr_pct, private_credit_growth_yoy_pct |
bb_pdf_psr.py | bb.org.bd/publication/publictn/0/29 (index → PDF) | Same two-stage PDF pattern | mfs_txn_value_bn_bdt, mfs_accounts_mn, agent_banking_accounts_mn, agent_banking_disbursement_bn_bdt, internet_banking_txn_mn |
bbs_cpi.py | bbs.gov.bd/site/page/7bef52ab-... | BeautifulSoup + regex on page text; HTML table fallback | cpi_headline_pct, cpi_food_pct, cpi_nonfood_pct |
dse_market.py | dsebd.org/market_summary.php | Plain HTTP first; REQUIRES_PLAYWRIGHT flag for JS-rendered fallback | dsex_close, dse_turnover_bn_bdt, dse_market_cap_bn_bdt |
Graceful degradation: If a scraper raises, main.py catches it, marks status='failed', calls db.get_last_known() for each field from that source, and passes them as fallbacks to build_dashboard_record(). data_quality section becomes 'stale'.
requirements.txt
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=5.0.0
pdfplumber>=0.10.0
tqdm>=4.66.0
python-dotenv>=1.0.0
playwright>=1.44.0 # optional, only if DSE needs JS rendering
Dashboard Page (dashboard.astro)
- Astro page with
<script is:inline>— no SSR data, pure client-side fetch - Fetches
Promise.all([/data/dashboard_latest.json, /data/history.json]) - Vanilla JS only — no Chart.js or Recharts
- SVG sparklines:
render_sparkline(values, 180, 52, color)→ polyline + final dot - SVG horizontal bars:
render_bar_chart(items, 520, 120, color)→ for NPL state vs private comparison - KPI cards: label + large value + unit + optional note
- Data quality badges: color-coded
complete(teal) /partial(amber) /stale(red) - Loading skeleton: CSS animated skeleton cards until fetch completes
- Error state: shown if fetch fails (data unavailable message)
4 sections: Monetary Indicators | Banking Sector Health | Digital Finance | Capital Markets
Each section: KPI grid + sparklines from history.json. Banking section adds NPL bar chart.
CSS: append ~100 lines to public/styles.css using existing design tokens (--bg-surface, --accent, --indigo, --border, var(--radius), Inter + Outfit fonts).
Deployment (monthly flow)
# 1. Run pipeline
cd /home/kitahara/Desktop/Projects/dashboard-pipeline
python main.py --period 2026-05
# 2. Commit + deploy portfolio
cd /home/kitahara/Desktop/Projects/portfolio/portfolio-astro
git add public/data/
git commit -m "chore: dashboard data — May 2026"
git push && npx wrangler deploy
CSP connect-src 'self' already covers same-origin /data/*.json — no _headers change needed.
Add to public/_headers: Cache-Control: public, max-age=3600 for /data/*.json.
First-Run Calibration (manual, before automating)
- BB Indicators column indices — GET the page, view source, verify which column index holds forex / exchange rate / policy rate. Mark as
# VERIFY_COLUMN_INDEXin code. - SBS PDF structure — Download one manually, run
pdfplumberinteractively on it, inspectpage.extract_text()output for NPL/CAR/ADR tables. Tune regex patterns. - PSR PDF structure — Same as above for MFS / agent banking tables.
- DSE JS check —
requests.get(DSE_MARKET_URL)and check ifDSEXnumeric value appears inresp.text. If not, setREQUIRES_PLAYWRIGHT = Trueindse_market.py. - BBS CPI structure — Check if values are in HTML table or
<script>JS object. - DSE units — Confirm whether turnover/market cap are published in crore BDT (÷10 for billions) or already in billions.
Verification
# Dry run (scrapes but writes nothing)
python main.py --period 2026-04 --dry-run
# Full run without portfolio copy (inspect output/ first)
python main.py --period 2026-04 --skip-copy
cat output/dashboard_latest.json | python -m json.tool
# Copy and test portfolio locally
python main.py --period 2026-04
cd ../portfolio/portfolio-astro && npm run dev
# Visit localhost:4321/dashboard — verify KPIs, sparklines, quality badges
# Rename public/data/dashboard_latest.json → verify error state shows
# Deploy
npx wrangler deploy
# Visit sakib8275.com/dashboard
Key Function Signatures
| Function | Module | Signature |
|---|---|---|
setup_database | db.py | () -> None |
get_last_known | db.py | `(conn, source_name, field) -> float |
upsert_raw_scrape | db.py | (period_iso, source_name, scraped_at, raw_json, status) -> None |
scrape | scrapers/*.py | (period_iso, session) -> dict |
normalize_period | transform.py | (period_arg) -> tuple[str, str] |
build_dashboard_record | transform.py | (period_iso, period_label, scraped_data, fallbacks, statuses) -> dict |
write_json_files | export.py | (period_iso, record) -> tuple[Path, Path] |
update_history | export.py | (period_iso, record) -> Path |
copy_to_portfolio | export.py | (files: list[Path]) -> None |
render_sparkline | dashboard.astro JS | (values, width, height, color) -> SVG string |
render_bar_chart | dashboard.astro JS | (items, width, height, color) -> SVG string |