Roadmap

Database Administrator (DBA)

The professional who installs, configures, secures, tunes, and maintains database management systems. Ensures data is available, performant, secure, and recoverable while supporting the developers and applications that depend on the database.

OPTIMISTIC 18–24 monthsREALISTIC 2–3 years

Stage 00

Computer & IT Fundamentals

Databases run on operating systems and hardware. Disk I/O, memory, CPU scheduling, and storage architecture directly affect database performance.

Hardware Relevant to Databases

Storage — HDD (spinning, high latency) vs SSD (low latency, higher throughput) vs NVMe (extremely low latency) — storage performance is the primary database performance bottleneck
IOPS and throughput — understanding database I/O patterns (random vs sequential)
RAID — RAID 10 preferred for database workloads (performance + redundancy); RAID 5/6 suitable for backup storage
Memory — ECC RAM, memory channels, NUMA (Non-Uniform Memory Access) — large database buffer pools require substantial RAM
CPU — cores vs clock speed; database workloads are often I/O-bound but complex queries are CPU-bound
Network — bandwidth and latency between database server, application server, and backup storage

Operating Systems

Linux — the dominant OS for production databases (Oracle, PostgreSQL, MySQL, MariaDB)
Full Linux administration depth (see System Administrator path Stage 2)
File system selection — ext4, XFS (preferred for databases), ZFS
I/O scheduler — deadline/mq-deadline preferred for database workloads
Transparent Huge Pages — should be disabled for Oracle and MySQL
Swap configuration — swap tuning for database workloads (swappiness)
ulimit settings — open files, processes — must be increased for database processes
SELinux/AppArmor — configuring database process contexts
Windows Server — required for SQL Server; see System Administrator path Stage 1

Storage Concepts

Block storage — SAN, local disks — database data files live here
File storage — NFS — shared storage for Oracle RAC, backup targets
Object storage — S3, Azure Blob — backup archives, cold data
Storage volumes — LVM on Linux for flexible volume management
I/O patterns — OLTP (many small random reads/writes) vs OLAP (sequential large reads)

Resources

Professor Messer CompTIA A+ (free YouTube)
storage vendor documentation
Linux administration materials

Stage 01

SQL — Deep Mastery

SQL is the primary language every DBA works in daily. Mastery of query writing, optimization, and database objects is the core competency.

SQL Fundamentals

Data types — INTEGER, BIGINT, DECIMAL/NUMERIC, FLOAT, CHAR, VARCHAR, TEXT, BLOB, DATE, TIMESTAMP, BOOLEAN, JSON, ARRAY, UUID
NULL semantics — three-valued logic; NULL comparisons; IS NULL / IS NOT NULL; COALESCE; NULLIF

Data Retrieval — SELECT

Basic SELECT — columns, aliases, DISTINCT, TOP/LIMIT/FETCH FIRST
WHERE clause — comparison operators, BETWEEN, IN, LIKE (%, _), IS NULL, EXISTS
ORDER BY — ASC/DESC, multiple columns, NULLs FIRST/LAST
GROUP BY — aggregate functions: COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT)
HAVING — filtering after GROUP BY (vs WHERE before grouping)
CASE expressions — simple CASE and searched CASE; inline conditional logic

JOINs — Critical Skill

INNER JOIN — only matching rows from both tables
LEFT OUTER JOIN — all rows from left, matching from right (NULL when no match)
RIGHT OUTER JOIN — all rows from right, matching from left
FULL OUTER JOIN — all rows from both, NULL where no match
CROSS JOIN — Cartesian product; every row combined with every row
SELF JOIN — joining a table to itself
Multi-table joins — chaining multiple JOIN clauses
Join performance — index usage in joins, nested loop vs hash vs merge join strategies

Subqueries

Scalar subqueries — returning single value; used in SELECT or WHERE
Row subqueries — returning single row
Table subqueries — returning multiple rows (IN, NOT IN, EXISTS, NOT EXISTS)
Correlated subqueries — referencing outer query; re-executed for each outer row; performance implications
CTEs (Common Table Expressions) — WITH clause; named temporary result sets; recursive CTEs

Window Functions

ROW_NUMBER() — unique sequential number per partition
RANK() and DENSE_RANK() — ranking with gap (RANK) or without (DENSE_RANK)
NTILE(n) — divide into n buckets
LAG() and LEAD() — access previous/next row values
FIRST_VALUE() and LAST_VALUE() — first/last value in window frame
SUM(), AVG(), COUNT() OVER — running totals, rolling averages
PARTITION BY — dividing window function computation by group
ORDER BY in OVER — defining ordering within window
Frame specification — ROWS BETWEEN, RANGE BETWEEN — unbounded preceding, current row, n following

Set Operations

UNION / UNION ALL — combining result sets; UNION removes duplicates, UNION ALL does not
INTERSECT — rows in both result sets
EXCEPT / MINUS — rows in first but not second

Data Modification

INSERT — single row, multi-row, INSERT...SELECT
UPDATE — with WHERE clause; updating with JOIN; importance of WHERE (missing = full table update)
DELETE — with WHERE clause; DELETE vs TRUNCATE vs DROP
TRUNCATE — fast delete of all rows; not logged per-row; resets identity/sequences
UPSERT — INSERT ... ON CONFLICT (PostgreSQL), MERGE (SQL Server, Oracle), INSERT ... ON DUPLICATE KEY UPDATE (MySQL)
MERGE — synchronizing tables; matched/not matched by source/target

DDL (Data Definition Language)

CREATE TABLE — column definitions, data types, constraints
Constraints — PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK, DEFAULT
ALTER TABLE — adding/modifying/dropping columns and constraints; impact on live tables
Indexes — CREATE INDEX, DROP INDEX; unique indexes; partial indexes (PostgreSQL); filtered indexes (SQL Server)
Views — CREATE VIEW; updatable vs read-only views; with CHECK OPTION
Materialized Views — precomputed result sets; refresh strategies (complete, fast/incremental)
Sequences — auto-increment alternatives; NEXTVAL, CURRVAL
Synonyms/Aliases — simplifying access to tables across schemas or databases

Stored Procedures, Functions, and Triggers

Stored procedures — reusable SQL blocks; input/output parameters; transaction control
User-defined functions (UDFs) — scalar functions (return single value), table-valued functions (return table)
Triggers — BEFORE/AFTER INSERT/UPDATE/DELETE; ROW/STATEMENT level; use sparingly (hidden complexity)
PL/SQL (Oracle) — procedural extension: variables, conditionals (IF/ELSIF/ELSE), loops (LOOP/FOR/WHILE), cursors, exception handling
T-SQL (SQL Server) — BEGIN/END blocks, variables, IF/ELSE, WHILE, TRY/CATCH, cursors
PL/pgSQL (PostgreSQL) — similar to PL/SQL; DO blocks for anonymous code

Transactions and Concurrency

ACID properties — Atomicity, Consistency, Isolation, Durability
Transaction control — BEGIN/START TRANSACTION, COMMIT, ROLLBACK, SAVEPOINT
Isolation levels:
READ UNCOMMITTED — dirty reads possible; almost never appropriate
READ COMMITTED — default in most databases; no dirty reads, phantom reads possible
REPEATABLE READ — no dirty or non-repeatable reads; phantom reads possible in some DBs
SERIALIZABLE — strictest; no anomalies; highest blocking
Concurrency problems:
Dirty read — reading uncommitted data
Non-repeatable read — row changed between reads in same transaction
Phantom read — new rows appear between reads in same transaction
Lost update — two transactions overwriting each other
Locking — shared (read), exclusive (write), intent locks; row, page, table lock granularity
Deadlocks — detection and resolution; minimizing via consistent lock order and short transactions
MVCC (Multi-Version Concurrency Control) — PostgreSQL, Oracle; readers don't block writers
Optimistic vs pessimistic locking — application-level vs database-level conflict management

Query Analysis

Execution plans — EXPLAIN (PostgreSQL/MySQL), EXPLAIN PLAN (Oracle), execution plan display in SSMS (SQL Server)
Plan components — table scan, index scan, index seek, hash join, nested loop join, sort, filter
Estimated vs actual row counts — large discrepancies indicate stale statistics
Reading costs — understanding relative cost units
Identifying performance problems from plans — full scans on large tables, sort operations, hash joins on large datasets without statistics

Resources

W3Schools SQL (free)
SQLZoo (free interactive exercises)
PostgreSQL tutorial (free)
leetcode SQL problems (free)
Mode Analytics SQL Tutorial (free)
"Learning SQL" by Alan Beaulieu (book)

Stage 02

Database Administration Fundamentals

Core DBA responsibilities apply across all DBMS platforms. Master these concepts before specializing in a specific platform.

Database Architecture Concepts

Instance vs database — process(es) + memory vs stored data files
Memory structures:
Buffer pool / buffer cache — caching data pages in memory; hit rate monitoring
Log buffer — staging area for write-ahead log entries before disk flush
Work memory / sort area — memory for sort operations, hash joins
Connection memory — per-connection overhead; connection pooling reduces this
Background processes — checkpoint, log writer, archiver, listener, connection handler
Physical files:
Data files — actual table and index data
Redo/transaction log — WAL (Write-Ahead Log); enables crash recovery and replication
Control file / system catalog — database metadata
Archive log — retained WAL for point-in-time recovery

Backup and Recovery

Recovery models / log modes:
Full recovery (SQL Server) / Archive log mode (Oracle/PostgreSQL) — complete recovery possible; transaction logs retained
Simple recovery / No archive — no point-in-time recovery; can only restore to last backup
Backup types:
Full backup — complete database copy
Differential backup (SQL Server) — changes since last full backup
Incremental backup (Oracle/PostgreSQL) — changes since last backup of any type
Transaction log backup — captures log records for point-in-time recovery
Copy-only backup — full backup without affecting backup chain
Physical vs logical backup:
Physical — binary copy of data files (faster for large databases); RMAN (Oracle), pg_basebackup (PostgreSQL)
Logical — SQL statements to recreate objects and data; expdp/impdp (Oracle), pg_dump (PostgreSQL), mysqldump (MySQL)
Recovery scenarios:
Complete recovery — restore to current state; all transactions recovered
Point-in-time recovery (PITR) — restore to specific timestamp or SCN/LSN
Tablespace/schema recovery — restoring subset of database
Lost data file recovery — restoring single data file
Backup testing — restore to test environment, verify data integrity, document RTO
3-2-1 rule for databases — 3 copies, 2 media types, 1 offsite/cloud
RPO/RTO — determining backup frequency and retention from business requirements

High Availability

Replication vs clustering:
Replication — copy of data on standby; lag possible; manual or automated failover
Clustering — shared storage, multiple nodes; automatic failover, minimal lag
Log shipping — simplest HA; apply archived logs to standby; manual failover; minutes of lag
Streaming replication — ship log records as they are generated; near real-time standby
Synchronous vs asynchronous replication:
Synchronous — commit only acknowledged when standby confirms receipt; no data loss; some latency
Asynchronous — commit acknowledged immediately; standby may lag; potential data loss
Read replicas — distributing read workload across multiple nodes
Automatic failover — detecting primary failure and promoting standby automatically

Performance Tuning — Concepts

Wait events / wait statistics — the database telling you what it is waiting for
I/O wait — disk is the bottleneck
CPU wait — processing is the bottleneck
Lock wait — contention on shared resources
Network wait — client is not reading results fast enough
Indexing strategy:
When to index — high selectivity columns, frequently queried columns, JOIN keys, ORDER BY columns
When not to index — low selectivity columns, tables with heavy write workloads, small tables
Index types — B-tree (default, good for equality/range), hash (equality only), GIN (full text, JSON), GiST, BRIN (large sequential tables)
Index maintenance — fragmentation, fill factor, rebuild vs reorganize
Covering indexes — including all columns needed by query to avoid table lookup
Composite indexes — column order matters; leading column rule
Statistics — how the query optimizer makes decisions; staleness causes poor plans; updating stats
Query optimization — rewriting for better plans; avoiding functions on indexed columns in WHERE; avoiding leading wildcards (%text); appropriate join hints

Security

Authentication — OS authentication, password-based, Kerberos, certificate-based
Roles and privileges — database-level, schema-level, object-level; role inheritance
Least privilege — application accounts should only have SELECT/INSERT/UPDATE/DELETE on required tables, not DBA
Service accounts — dedicated accounts for applications; never using DBA accounts for applications
Audit logging — logging authentication, DDL changes, sensitive data access
Transparent Data Encryption (TDE) — encrypting data files at rest
Data masking — masking sensitive data in non-production environments
Column-level encryption — encrypting specific sensitive columns
Row-level security — restricting which rows users can see

Monitoring

Baseline establishment — what is normal CPU, I/O, wait events, query times
Alert thresholds — disk space (80%/90%/95%), buffer cache hit rate, long-running queries, blocking
Tools — vendor native monitoring (Oracle OEM, SSMS, pgAdmin) + third-party (Datadog, SolarWinds DPA, Percona Monitoring)
Log monitoring — error log, slow query log, audit log
Key metrics:
Buffer cache hit rate — percentage of reads satisfied from memory (target > 95%)
Disk I/O — average read/write latency, IOPS
Connection count — near connection limit indicates connection pool issue
Long-running transactions — blocking other sessions
Blocking chains — queries waiting behind locks
Deadlock frequency — indicates application-level contention

Resources

"Database Administration: The Complete Guide to Practices and Procedures" (book)
vendor documentation
db-fiddle.com (free SQL sandbox)
SQLServerCentral (free)
Percona blogs (free)

Stage 03

SQL Server Administration

Microsoft SQL Server is the dominant enterprise RDBMS in Windows-based organizations. Deep SQL Server skills are required in the majority of DBA postings.

SQL Server Architecture

SQL Server services — SQL Server Engine, SQL Server Agent, SQL Server Browser, SQL Server Reporting Services (SSRS), Analysis Services (SSAS), Integration Services (SSIS)
SQL Server Management Studio (SSMS) — primary management GUI
SQL Server Agent — job scheduling, alerts, operators, proxy accounts
T-SQL — Microsoft's SQL extension; variables, control flow, error handling, cursors

Instance Configuration

SQL Server installation — instances (default vs named), edition differences (Express, Developer, Standard, Enterprise)
sp_configure — server-level configuration: max server memory, max degree of parallelism (MAXDOP), cost threshold for parallelism, remote connections
Memory configuration — max server memory (leave ~10% for OS); min server memory
MAXDOP — limiting parallelism; guideline: equal to number of physical cores per NUMA node, max 8
Cost threshold for parallelism — minimum cost before parallel plan considered; default 5 is often too low
Tempdb configuration — one file per CPU core (up to 8); pre-grow to avoid autogrowth

Database Configuration

Recovery models — Full, Simple, Bulk-Logged
Autogrowth settings — configure fixed size growth, not percentage; monitor and pre-grow proactively
Database files — data (.mdf, .ndf) and log (.ldf) files; separating onto different spindles/volumes
Filegroups — organizing data files; placing specific tables/indexes on specific filegroups
Compatibility level — controls query optimizer behavior; set to match SQL Server version

Backup and Recovery — SQL Server

Full backup — BACKUP DATABASE ... TO DISK
Differential backup — BACKUP DATABASE ... WITH DIFFERENTIAL
Transaction log backup — BACKUP LOG ... TO DISK (requires Full recovery model)
Backup compression — WITH COMPRESSION (reduces backup size ~70%)
Backup verification — RESTORE VERIFYONLY
Restore sequence — Full → Differential → Log backups in sequence WITH NORECOVERY, last WITH RECOVERY
Point-in-time restore — RESTORE LOG ... WITH STOPAT = 'datetime'
SQL Server Backup Automation — SQL Server Maintenance Plans, Ola Hallengren's scripts (free, industry standard)

High Availability — SQL Server

Always On Availability Groups (AG):
Primary and up to 8 secondary replicas
Synchronous and asynchronous replicas
Automatic failover with synchronous secondary
Read-scale — read replicas for reporting
Listener — virtual name for connection string (clients don't need to know which node is primary)
Windows Server Failover Cluster (WSFC) — underlying infrastructure for AGs
Always On Failover Cluster Instances (FCI):
Single SQL Server instance with shared storage
WSFC manages failover
Shared storage (SAN or S2D) replicated; no data loss
Database Mirroring — deprecated; replaced by AGs
Log Shipping — simple asynchronous standby; manual failover; separate from AGs
Azure Site Recovery / Azure SQL — cloud DR options

Performance Tuning — SQL Server

Wait statistics — sys.dm_os_wait_stats; top waits indicate bottleneck type
PAGEIOLATCH_SH/EX — buffer pool I/O wait
WRITELOG — log write wait (log disk performance)
LCK_M_* — lock contention
CXPACKET — parallel query coordination (tune MAXDOP)
RESOURCE_SEMAPHORE — memory pressure during query sort/hash operations
Dynamic Management Views (DMVs):
sys.dm_exec_requests — currently executing queries with wait type and duration
sys.dm_exec_query_stats — aggregated query stats including CPU, reads, elapsed time
sys.dm_exec_sql_text — SQL text for query handles
sys.dm_exec_query_plan — execution plan for query handles
sys.dm_db_index_usage_stats — index usage (seeks, scans, lookups, updates)
sys.dm_db_index_physical_stats — index fragmentation
sys.dm_os_memory_cache_counters — memory usage breakdown
Missing index DMVs — sys.dm_db_missing_index_details, sys.dm_db_missing_index_groups
Index maintenance:
ALTER INDEX REBUILD — full rebuild; locks table in Standard edition; Online in Enterprise
ALTER INDEX REORGANIZE — online defragmentation; less aggressive
Ola Hallengren IndexOptimize — automated adaptive rebuild/reorganize
Fill factor — leaving space in leaf pages for INSERT/UPDATE without page splits
Query Store — capturing query plan history; identifying plan regressions; forcing specific plans
Extended Events — lightweight event tracing replacing SQL Profiler; capturing specific events
Execution plans — graphical plan display in SSMS; Plan Explorer (SentryOne); missing indexes, expensive operators

SQL Server Security

Authentication modes — Windows Authentication (preferred), Mixed Mode (SQL + Windows)
Logins vs users — logins at server level, users at database level; orphaned users
Roles — server roles (sysadmin, serveradmin, securityadmin, dbcreator, bulkadmin), database roles (db_owner, db_datareader, db_datawriter, db_ddladmin)
Contained databases — users authenticated at database level, not server
Row-Level Security (RLS) — security policy filtering rows per user
Dynamic Data Masking — masking sensitive column data for non-privileged users
TDE — database file encryption; certificate stored in master database
Always Encrypted — client-side encryption; database server never sees plaintext
SQL Server Audit — tracking security events; server and database audit specifications

Resources

Ola Hallengren scripts (free, ola.hallengren.com)
Brent Ozar's SQLskills resources (free)
Redgate SQL Monitor/Toolbelt
SSMS documentation (free)

Stage 04

PostgreSQL Administration

PostgreSQL is the fastest-growing open-source RDBMS and is rapidly becoming the default choice for new applications and cloud deployments.

PostgreSQL Architecture

Process model — one process per connection (postmaster forks backend processes)
Memory — shared_buffers (buffer pool), work_mem (per-operation sort/hash memory), maintenance_work_mem, wal_buffers
Physical files — data directory ($PGDATA), WAL files (pg_wal), configuration files (postgresql.conf, pg_hba.conf)
pg_hba.conf — client authentication configuration:
Format: TYPE DATABASE USER ADDRESS METHOD
Methods: trust, md5, scram-sha-256, peer, ident, reject
SCRAM-SHA-256 is current best practice (md5 is deprecated)

PostgreSQL Configuration

postgresql.conf — server configuration; requires reload (SELECT pg_reload_conf()) or restart
Key parameters:
shared_buffers — 25% of total RAM as starting point
effective_cache_size — estimate of OS cache available; affects planner cost calculations
work_mem — memory per sort/hash operation; set conservatively (multiple operations per query)
maintenance_work_mem — VACUUM, CREATE INDEX, ANALYZE memory
max_connections — connection limit; use PgBouncer connection pooler in production
wal_level — minimal, replica, logical (replica required for streaming replication)
max_wal_senders — number of WAL sender processes for replication
checkpoint_timeout / max_wal_size — checkpoint frequency and WAL accumulation
effective_io_concurrency — for parallel I/O operations; set to number of disks
random_page_cost — planner cost for random I/O; reduce for SSD (1.1–2.0)
enable_* parameters — controlling specific planner features for debugging

Backup and Recovery — PostgreSQL

pg_dump — logical backup of single database: pg_dump -Fc dbname > dbname.dump
pg_dumpall — logical backup of all databases including global objects
pg_restore — restoring from pg_dump custom format
pg_basebackup — physical backup of entire cluster: pg_basebackup -D /backup -Ft -Xs -P
WAL archiving — archive_mode=on, archive_command; required for PITR
Point-in-time recovery — restore base backup + apply WAL to target time
Continuous Archiving and PITR — pg_basebackup + WAL archiving is the standard PITR approach
Barman (Backup and Recovery Manager) — enterprise backup tool for PostgreSQL

High Availability — PostgreSQL

Streaming replication — primary sends WAL to standby; synchronous or asynchronous
primary_conninfo — standby connection to primary
hot_standby — allowing read queries on standby
synchronous_standby_names — requiring synchronous confirmation
pg_stat_replication — monitoring replication lag
Patroni — HA and auto-failover for PostgreSQL using etcd/ZooKeeper/Consul for consensus
repmgr — replication management and automated failover
Pgpool-II — connection pooling, load balancing, and HA proxy for PostgreSQL
Logical replication — replicating specific tables or publications; supports upgrades

Performance Tuning — PostgreSQL

pg_stat_statements — aggregated query statistics; most important performance view
SELECT * FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20
EXPLAIN ANALYZE — execution plan with actual runtime statistics; always use ANALYZE for performance tuning
Wait events — pg_stat_activity.wait_event_type / wait_event
Lock — lock contention
IO — disk I/O wait
Client — waiting for client to read results
Index types — B-tree (default), GiST (geometric, range), GIN (full text, jsonb, arrays), BRIN (timestamp ranges on large tables), Hash (equality)
VACUUM and autovacuum:
PostgreSQL MVCC leaves dead tuple versions; VACUUM reclaims space
autovacuum — background process running VACUUM and ANALYZE automatically
autovacuum_vacuum_cost_delay — throttling autovacuum I/O impact
Table bloat — significant dead tuples; indicates autovacuum not keeping up
pg_stat_user_tables — n_dead_tup, n_live_tup, last_autovacuum
Connection pooling — PgBouncer (most common); transaction pooling vs session pooling vs statement pooling
ANALYZE — updating table statistics; should run after large data changes
pg_upgrade — major version upgrades; faster than dump/restore

PostgreSQL Security

pg_hba.conf — client authentication rules
Role system — roles are both users and groups; INHERIT attribute
Schema permissions — GRANT USAGE ON SCHEMA, GRANT SELECT ON ALL TABLES IN SCHEMA
Row-Level Security — CREATE POLICY; ALTER TABLE ENABLE ROW LEVEL SECURITY
SSL configuration — ssl=on, ssl_cert_file, ssl_key_file, ssl_ca_file
pgaudit — detailed audit logging extension

Resources

PostgreSQL documentation (postgresql.org, free, with excellent official docs)
Crunchy Data blog (free)
Percona PostgreSQL resources (free)
EDB documentation (free)
pganalyze blog (free)

Stage 05

Oracle Database Administration

Oracle is the dominant RDBMS in large enterprise, financial services, and government. Oracle DBAs command the highest salaries in the field.

Oracle Architecture

SGA (System Global Area) — shared memory: DB Buffer Cache, Shared Pool (Library Cache + Data Dictionary Cache), Redo Log Buffer, Large Pool, Java Pool, Streams Pool
PGA (Program Global Area) — per-session memory: sort area, hash join area, session information
Background processes — DBWR (Database Writer), LGWR (Log Writer), CKPT (Checkpoint), SMON (System Monitor), PMON (Process Monitor), ARCn (Archiver), MMON (Manageability Monitor)
Oracle Multitenant — Container Database (CDB) and Pluggable Databases (PDB); consolidation architecture
Physical files — datafiles, control files, redo log files, archive log files

Oracle Instance Management

STARTUP and SHUTDOWN — STARTUP NOMOUNT (instance, no database), MOUNT (control file read, no datafiles), OPEN (fully open)
Initialization parameter file — PFILE (text, spfile.ora) and SPFILE (binary, server parameter file)
Alter system — dynamic parameter changes without restart
Database links — connecting to remote Oracle databases

Oracle Backup and Recovery (RMAN)

RMAN (Recovery Manager) — Oracle's built-in backup and recovery tool
RMAN backup syntax:
BACKUP DATABASE — full backup
BACKUP DATABASE PLUS ARCHIVELOG — database + archive logs
BACKUP INCREMENTAL LEVEL 0 / LEVEL 1 — base and incremental backups
BACKUP TABLESPACE — single tablespace backup
BACKUP ARCHIVELOG ALL — backup all archive logs
DELETE OBSOLETE — remove backups per retention policy
RMAN catalog — centralized repository for backup metadata (optional but recommended)
Recovery scenarios:
RESTORE DATABASE — restore from backup
RECOVER DATABASE — apply archive logs and redo for complete recovery
RECOVER DATABASE UNTIL TIME — point-in-time recovery
RESTORE TABLESPACE — restoring single tablespace
Flashback technology:
Flashback Query — AS OF TIMESTAMP or AS OF SCN
Flashback Table — restoring table to past state without full recovery
Flashback Database — reverting entire database to past SCN
Undo Retention — undo_retention parameter controls flashback window

Oracle High Availability

Data Guard — standby database solutions:
Physical standby — redo log apply; identical block-for-block copy; MRP process
Logical standby — SQL Apply; transformed SQL applied; can be open for reads
Synchronous vs asynchronous — SYNC vs ASYNC redo transport
Active Data Guard — read-only open physical standby with redo apply continuing
Data Guard Broker — automated switchover/failover management
Oracle RAC (Real Application Clusters):
Multiple instances accessing shared storage simultaneously
Cache Fusion — cross-instance block transfer via private interconnect
Services — workload distribution across nodes
CRS (Cluster Ready Services) — cluster management layer
Oracle GoldenGate — heterogeneous replication; near-real-time change data capture

Oracle Performance Tuning

AWR (Automatic Workload Repository) — periodic snapshots of performance statistics
AWR report — comprehensive performance analysis between two snapshots
ADDM (Automatic Database Diagnostic Monitor) — automated analysis with recommendations
ASH (Active Session History) — per-second active session sampling
Wait events — v$session_wait, v$system_event; top wait events indicate bottlenecks
db file sequential read — single block I/O (index lookup)
db file scattered read — multi-block I/O (full table scan)
log file sync — transaction commit waiting for LGWR
buffer busy waits — contention on buffer cache blocks
row cache lock — dictionary cache contention
library cache lock — shared pool contention
v$ views (dynamic performance views):
v$session — current sessions, wait events, SQL IDs
v$sql — shared SQL cursor statistics
v$sqlplan — execution plans for cached SQL
v$datafile — datafile I/O statistics
v$event_name — wait event descriptions
Automatic SQL Tuning — SQL Tuning Advisor; automated analysis and recommendations
Indexes — B-tree, bitmap (data warehouse), function-based, partitioned
Partitioning — range, list, hash, composite; partition pruning for performance
Hints — /*+ FULL(t) */ /*+ INDEX(t idx_name) */ /*+ USE_NL(t1 t2) */ — overriding optimizer decisions

Oracle Security

Oracle Database Vault — preventing privileged user access to application data
Oracle Label Security — row-level security using sensitivity labels
Oracle Audit Vault — centralized audit data management
Transparent Data Encryption (TDE) — tablespace and column encryption
Unified Auditing — Oracle 12c+ audit framework

Resources

Oracle documentation (docs.oracle.com, free)
Oracle Learning Library (free)
Burleson Consulting Oracle tips (free)
"Oracle Database 12c DBA Handbook" (book)

Stage 06

Cloud Database Administration

Cloud managed databases have changed DBA work: less OS-level administration, more optimization, cost management, and migration expertise.

AWS Database Services

Amazon RDS — managed relational databases: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server
Multi-AZ — synchronous standby in different AZ; automatic failover; ~1–2 min failover
Read replicas — asynchronous replication for read scaling; can promote
Parameter groups — server configuration; some require reboot
Automated backups — point-in-time recovery within retention window (1–35 days)
Snapshots — manual full backups; retained until deleted
Maintenance windows — patching schedule
Enhanced Monitoring — OS-level metrics visible to DBA
Performance Insights — query-level performance analysis
Amazon Aurora — MySQL/PostgreSQL compatible; Cluster storage; 6-way replication
Aurora Serverless v2 — auto-scaling capacity; per-second billing
Aurora Global Database — cross-region replication with <1 second RPO
Aurora Parallel Query — offloads analytics queries to storage layer
Backtrack — rewind the database without restoring backup (Aurora MySQL)
Amazon DynamoDB — serverless NoSQL; key-value and document; single-digit millisecond latency
Amazon ElastiCache — Redis and Memcached; in-memory caching
Amazon Redshift — columnar data warehouse; MPP architecture

Azure Database Services

Azure SQL Database — PaaS SQL Server; serverless and provisioned tiers; hyperscale
Azure SQL Managed Instance — near-complete SQL Server compatibility; VNet-integrated
Azure Database for PostgreSQL — flexible server and single server; built-in HA
Azure Database for MySQL — MySQL-compatible managed service
Azure Cosmos DB — globally distributed multi-model database; multiple APIs

GCP Database Services

Cloud SQL — managed MySQL, PostgreSQL, SQL Server
Cloud Spanner — globally distributed relational database; strong consistency; SQL
BigQuery — serverless data warehouse; columnar; petabyte scale

Cloud DBA Responsibilities

Migration — assessing on-premises databases for cloud migration; AWS DMS, Azure Database Migration Service, Oracle GoldenGate
Sizing — choosing instance types, storage types, IOPS allocation for managed services
Cost optimization — right-sizing instances, reserved instances, serverless for variable workloads
Backup configuration — backup windows, retention periods, cross-region backup
HA configuration — Multi-AZ, read replicas, global clusters
Parameter tuning — working within managed service constraints
Security — IAM authentication, VPC placement, encryption, secrets management
Monitoring — CloudWatch/Azure Monitor/Stackdriver + Performance Insights

Resources

AWS database documentation (free)
AWS re:Invent database talks (YouTube, free)
Azure database documentation (free)
GCP database documentation (free)

Stage 07

Scripting & Automation for DBAs

DBAs who automate routine tasks multiply their effectiveness. Scripting is increasingly expected.

T-SQL / PL/SQL / PL/pgSQL Scripting

Automated backup scripts — SQL Agent jobs, scheduled maintenance
Index maintenance scripts — adaptive rebuild/reorganize based on fragmentation
Statistics maintenance — updating statistics on a schedule
Blocking detection — alerting when sessions are blocked beyond threshold
Space monitoring — alerting when tablespace/database space reaches threshold
Data archiving — moving old data from hot tables to archive tables

PowerShell for SQL Server DBAs

SQLServer module — Invoke-Sqlcmd, Get-SqlInstance, Get-SqlDatabase
DbaTools (free) — comprehensive PowerShell module for SQL Server:
Copy-DbaDatabase, Backup-DbaDatabase, Restore-DbaDatabase
Get-DbaDbFragmentation, Invoke-DbaDbShrink (carefully)
Test-DbaLastBackup — testing backup recoverability
Sync-DbaLoginPermission — synchronizing logins across HA replicas
Copy-DbaLogin — migrating logins with SID matching

Python for DBAs

psycopg2 — PostgreSQL driver for Python
pyodbc / pymssql — SQL Server drivers
cx_Oracle / python-oracledb — Oracle drivers
SQLAlchemy — SQL toolkit and ORM; database-agnostic connection handling
Pandas — loading query results into DataFrames for reporting
Practical scripts:
Database size reporting with trend analysis
Automated backup verification (restore to test environment)
Index fragmentation reporting across all databases
Deadlock detection and logging
Replication lag monitoring and alerting

Bash for Linux DBAs

Database startup/shutdown scripts
Backup automation — cron + pg_dump/RMAN scripts
Log rotation for database logs
Disk space monitoring with alerting

Resources

DbaTools documentation (free)
PowerShell SQL Server documentation (free)
Percona blog Python/DBA content (free)

Stage 08

Hands-On Practice & Portfolio

Lab Setup

PostgreSQL — free download, install locally or in Docker; easiest to start learning
SQL Server Developer Edition — free for non-production use; full Enterprise features
Oracle Database Express Edition (XE) — free; limited resources but good for learning
MySQL / MariaDB — free; widely deployed in web applications
Docker — run multiple database instances simultaneously for comparison

Practice Activities

Load sample databases:
AdventureWorks (SQL Server) — Microsoft sample database
Northwind — classic relational sample database
Sakila (MySQL) — DVD rental sample database
dvdrental (PostgreSQL) — PostgreSQL equivalent of Sakila
HR, OE, SH schemas (Oracle) — installed by default
Write and optimize complex queries — window functions, CTEs, multi-table joins
Practice backup and restore — simulate database corruption and recovery
Performance tuning exercises — identify and fix slow queries with execution plans
HA setup — configure streaming replication in PostgreSQL, AlwaysOn in SQL Server
Automate routine tasks — backup scripts, index maintenance, monitoring alerts

What to Document on LabList

Schema design projects — ER diagrams and SQL for a database you designed
Performance tuning case studies — identifying a slow query, diagnosing with execution plan, applying fix, measuring improvement
Backup/recovery documentation — procedures for PITR recovery in your lab
Automation scripts — GitHub with DBA automation tools
Cert progression — SQL fundamentals → platform-specific cert (DP-300, OCP, PostgreSQL cert)

FAQ

Common questions

How long does it take to become a DBA?

18–24 months optimistic at 20–25 hours/week, 2–3 years realistic. SQL fluency comes quickly; the long tail is operational depth — backup and recovery testing, performance tuning, replication topologies, and the dozens of failure modes you only learn by surviving them. The fastest path is from sysadmin or developer roles with significant database exposure. Pure self-taught DBAs can break in but typically need a junior or analyst role first to develop production instincts.

Which certifications matter for DBAs?

Microsoft DP-300 for SQL Server. Oracle Certified Professional (OCP) for Oracle environments. AWS Database Specialty for cloud-heavy roles. PostgreSQL certifications are emerging as adoption surges. MongoDB Certified DBA for NoSQL environments. The cert that matters most is whichever DBMS your target employer runs — generic database knowledge is less valuable than platform-specific depth.

Do I need a CS degree?

No. Database administration is meritocratic — performance tuning skill, backup/recovery experience, and disaster recovery readiness are demonstrable without a degree. What you do need: comfort with one or more major DBMS platforms at depth, query optimization beyond surface-level indexing, and operational maturity (you've actually restored a database, not just read about it). Cloud database adoption increased total demand for practitioners who can manage both traditional and managed platforms.

What separates a hired DBA from one who doesn't?

Restored a database in production. The career differentiator is proven operational maturity: backup completion logs are easy; restoration evidence is harder. Show that you've done it. Other differentiators: query plan analysis depth, replication and HA topology design, and at least one cloud DBMS migration story. BLS median DBA salary is $104,620 (May 2024). Specialization in PostgreSQL, MongoDB, or Snowflake commands premiums in 2026 because adoption is moving faster than the talent pool.

Database Administrator (DBA)

Common questions

Related roles