What is SQL?
SQL (Structured Query Language) is the standard language for interacting with relational databases. Whether you're using MySQL, PostgreSQL, Oracle, SQL Server, or H2, you'll use SQL to create, read, update, and delete data. As a Java developer, SQL is a fundamental skill you'll use throughout your career.
SQL is NOT a programming language - it's a declarative query language. You describe WHAT you want, not HOW to get it. The database engine figures out the optimal way to execute your query.
Why Java Developers Need SQL
- Direct Database Access: JDBC requires raw SQL queries
- JPA/Hibernate Understanding: ORMs generate SQL - you need to understand it for debugging and optimization
- Performance Tuning: Slow applications often have slow queries
- Data Migration: Scripts for schema changes and data fixes
- Reporting: Complex queries for analytics and business reports
Database Fundamentals
Relational Database Concepts
// Key concepts in relational databases:
// TABLE (Relation) - Stores data in rows and columns
┌─────────────────────────────────────────────────────┐
│ employees │
├────────┬──────────┬────────────┬───────────┬───────┤
│ id │ name │ email │ dept_id │ salary│
├────────┼──────────┼────────────┼───────────┼───────┤
│ 1 │ Alice │ alice@... │ 1 │ 75000 │
│ 2 │ Bob │ bob@... │ 1 │ 80000 │
│ 3 │ Charlie │ charlie@...│ 2 │ 65000 │
└────────┴──────────┴────────────┴───────────┴───────┘
// ROW (Tuple/Record) - Single entry, represents one entity
// COLUMN (Attribute/Field) - Single property of all entries
// PRIMARY KEY - Unique identifier for each row
// FOREIGN KEY - Reference to primary key in another table
// INDEX - Data structure for faster lookups
// SCHEMA - Collection of database objects (tables, views, etc.)
// DATABASE - Container for schemas and data
SQL Categories
| Category | Full Name | Commands | Purpose |
|---|---|---|---|
| DDL | Data Definition Language | CREATE, ALTER, DROP, TRUNCATE | Define/modify structure |
| DML | Data Manipulation Language | SELECT, INSERT, UPDATE, DELETE | Work with data |
| DCL | Data Control Language | GRANT, REVOKE | Permissions |
| TCL | Transaction Control Language | COMMIT, ROLLBACK, SAVEPOINT | Transaction management |
DDL: Creating Database Structure
Creating Tables
-- Create a simple table
CREATE TABLE departments (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
location VARCHAR(200),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create table with foreign key
CREATE TABLE employees (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
salary DECIMAL(10,2) CHECK (salary > 0),
hire_date DATE NOT NULL,
department_id BIGINT,
manager_id BIGINT,
active BOOLEAN DEFAULT TRUE,
-- Foreign key constraints
CONSTRAINT fk_department
FOREIGN KEY (department_id)
REFERENCES departments(id)
ON DELETE SET NULL
ON UPDATE CASCADE,
CONSTRAINT fk_manager
FOREIGN KEY (manager_id)
REFERENCES employees(id)
);
-- Create index for faster lookups
CREATE INDEX idx_emp_email ON employees(email);
CREATE INDEX idx_emp_dept ON employees(department_id);
-- Composite index for common query patterns
CREATE INDEX idx_emp_name ON employees(last_name, first_name);
Common Data Types
-- Numeric Types
INTEGER / INT -- 4 bytes, -2B to +2B
BIGINT -- 8 bytes, very large numbers
SMALLINT -- 2 bytes, -32K to +32K
DECIMAL(10,2) -- Exact numeric, 10 digits, 2 decimal places
FLOAT / DOUBLE -- Approximate numeric (avoid for money!)
-- String Types
CHAR(10) -- Fixed length, padded with spaces
VARCHAR(255) -- Variable length, up to 255 chars
TEXT -- Large text (CLOB in some databases)
-- Date/Time Types
DATE -- Date only: '2024-01-15'
TIME -- Time only: '14:30:00'
TIMESTAMP -- Date + Time: '2024-01-15 14:30:00'
DATETIME -- Similar to TIMESTAMP (MySQL specific)
-- Other Types
BOOLEAN -- TRUE/FALSE (or 1/0 in some databases)
BLOB -- Binary Large Object (files, images)
JSON -- JSON data (PostgreSQL, MySQL 5.7+)
UUID -- Universally Unique Identifier
Modifying Tables
-- Add a column
ALTER TABLE employees
ADD COLUMN phone VARCHAR(20);
-- Modify column type
ALTER TABLE employees
MODIFY COLUMN phone VARCHAR(30);
-- Add constraint
ALTER TABLE employees
ADD CONSTRAINT uk_phone UNIQUE(phone);
-- Drop column
ALTER TABLE employees
DROP COLUMN phone;
-- Rename table
ALTER TABLE employees RENAME TO staff;
-- Drop table (CAREFUL!)
DROP TABLE IF EXISTS temp_data;
-- Truncate (delete all data, keep structure)
TRUNCATE TABLE logs;
DML: CRUD Operations
INSERT - Creating Data
-- Insert single row
INSERT INTO departments (name, location)
VALUES ('Engineering', 'Building A');
-- Insert multiple rows
INSERT INTO departments (name, location) VALUES
('Sales', 'Building B'),
('HR', 'Building A'),
('Marketing', 'Building C');
-- Insert with all columns (not recommended - fragile)
INSERT INTO employees
VALUES (1, 'John', 'Doe', 'john@example.com', 75000, '2024-01-15', 1, NULL, TRUE);
-- Insert from SELECT
INSERT INTO employee_archive (id, name, email)
SELECT id, CONCAT(first_name, ' ', last_name), email
FROM employees
WHERE active = FALSE;
-- Insert with ON DUPLICATE KEY (MySQL)
INSERT INTO employees (id, first_name, last_name, email, salary, hire_date)
VALUES (1, 'John', 'Doe', 'john@example.com', 80000, '2024-01-15')
ON DUPLICATE KEY UPDATE salary = VALUES(salary);
-- Upsert with ON CONFLICT (PostgreSQL)
INSERT INTO employees (id, first_name, last_name, email, salary, hire_date)
VALUES (1, 'John', 'Doe', 'john@example.com', 80000, '2024-01-15')
ON CONFLICT (id) DO UPDATE SET salary = EXCLUDED.salary;
SELECT - Reading Data (The Most Important!)
-- Basic SELECT
SELECT * FROM employees;
-- Select specific columns
SELECT first_name, last_name, email FROM employees;
-- Column aliases
SELECT
first_name AS "First Name",
last_name AS "Last Name",
salary * 12 AS annual_salary
FROM employees;
-- WHERE clause - filtering
SELECT * FROM employees
WHERE department_id = 1
AND salary > 50000
AND active = TRUE;
-- Comparison operators
WHERE salary = 50000 -- Equal
WHERE salary != 50000 -- Not equal (or <>)
WHERE salary > 50000 -- Greater than
WHERE salary >= 50000 -- Greater or equal
WHERE salary < 50000 -- Less than
WHERE salary <= 50000 -- Less or equal
-- BETWEEN (inclusive)
WHERE salary BETWEEN 50000 AND 80000
-- IN - match any in list
WHERE department_id IN (1, 2, 3)
-- LIKE - pattern matching
WHERE last_name LIKE 'S%' -- Starts with 'S'
WHERE last_name LIKE '%son' -- Ends with 'son'
WHERE last_name LIKE '%mi%' -- Contains 'mi'
WHERE last_name LIKE '_ohn' -- 4 chars, ends with 'ohn'
-- IS NULL / IS NOT NULL
WHERE manager_id IS NULL
WHERE manager_id IS NOT NULL
-- Logical operators
WHERE (department_id = 1 OR department_id = 2)
AND salary > 50000
AND NOT (first_name = 'John')
Sorting and Limiting
-- ORDER BY - sorting results
SELECT * FROM employees
ORDER BY last_name ASC; -- Ascending (default)
SELECT * FROM employees
ORDER BY salary DESC; -- Descending
-- Multiple sort columns
SELECT * FROM employees
ORDER BY department_id ASC, salary DESC;
-- LIMIT - restrict number of rows (MySQL/PostgreSQL)
SELECT * FROM employees
ORDER BY salary DESC
LIMIT 10;
-- LIMIT with OFFSET - pagination
SELECT * FROM employees
ORDER BY id
LIMIT 10 OFFSET 20; -- Skip first 20, get next 10
-- TOP (SQL Server)
SELECT TOP 10 * FROM employees
ORDER BY salary DESC;
-- FETCH (Standard SQL / Oracle)
SELECT * FROM employees
ORDER BY salary DESC
FETCH FIRST 10 ROWS ONLY;
UPDATE - Modifying Data
-- Update single row
UPDATE employees
SET salary = 85000
WHERE id = 1;
-- Update multiple columns
UPDATE employees
SET
salary = 85000,
department_id = 2,
manager_id = 5
WHERE id = 1;
-- Update with calculation
UPDATE employees
SET salary = salary * 1.10 -- 10% raise
WHERE department_id = 1;
-- Update all rows (CAREFUL!)
UPDATE employees
SET active = FALSE; -- No WHERE = updates ALL rows!
-- Update with subquery
UPDATE employees
SET salary = (
SELECT AVG(salary) FROM employees WHERE department_id = 1
)
WHERE id = 1;
-- Update with JOIN (MySQL)
UPDATE employees e
JOIN departments d ON e.department_id = d.id
SET e.salary = e.salary * 1.05
WHERE d.name = 'Engineering';
DELETE - Removing Data
-- Delete single row
DELETE FROM employees
WHERE id = 1;
-- Delete multiple rows
DELETE FROM employees
WHERE active = FALSE
AND hire_date < '2020-01-01';
-- Delete all rows (CAREFUL!)
DELETE FROM employees; -- Deletes everything!
-- TRUNCATE is faster for deleting all rows
TRUNCATE TABLE employees; -- Cannot be rolled back in most DBs!
-- Delete with subquery
DELETE FROM employees
WHERE department_id IN (
SELECT id FROM departments WHERE location = 'Building X'
);
-- Soft delete pattern (recommended)
UPDATE employees
SET active = FALSE, deleted_at = NOW()
WHERE id = 1;
UPDATE and DELETE without WHERE affect ALL rows. In production, always:
- Start a transaction before UPDATE/DELETE
- Run a SELECT first with the same WHERE clause
- Verify the affected row count
- COMMIT only if correct, ROLLBACK if wrong
JOINs: Combining Tables
JOINs are one of the most powerful SQL features. They let you combine data from multiple tables in a single query.
Types of JOINs Visualized
-- Sample data for examples:
-- employees table
┌────┬───────┬─────────────┐
│ id │ name │ dept_id │
├────┼───────┼─────────────┤
│ 1 │ Alice │ 1 │
│ 2 │ Bob │ 1 │
│ 3 │ Carol │ 2 │
│ 4 │ Dave │ NULL │ -- No department assigned
└────┴───────┴─────────────┘
-- departments table
┌────┬─────────────┐
│ id │ name │
├────┼─────────────┤
│ 1 │ Engineering │
│ 2 │ Sales │
│ 3 │ Marketing │ -- No employees
└────┴─────────────┘
INNER JOIN
-- INNER JOIN: Only matching rows from both tables
SELECT e.name, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;
-- Result: (Dave excluded - no match, Marketing excluded - no match)
┌───────┬─────────────┐
│ name │ department │
├───────┼─────────────┤
│ Alice │ Engineering │
│ Bob │ Engineering │
│ Carol │ Sales │
└───────┴─────────────┘
-- Visual representation:
employees ∩ departments
┌───────────┐
│ INNER │
│ JOIN │
└───────────┘
LEFT JOIN (LEFT OUTER JOIN)
-- LEFT JOIN: All from left table + matching from right
SELECT e.name, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;
-- Result: (All employees, NULL for Dave's department)
┌───────┬─────────────┐
│ name │ department │
├───────┼─────────────┤
│ Alice │ Engineering │
│ Bob │ Engineering │
│ Carol │ Sales │
│ Dave │ NULL │ -- No matching department
└───────┴─────────────┘
-- Find employees without departments:
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id
WHERE d.id IS NULL;
RIGHT JOIN (RIGHT OUTER JOIN)
-- RIGHT JOIN: All from right table + matching from left
SELECT e.name, d.name AS department
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;
-- Result: (All departments, NULL for Marketing's employees)
┌───────┬─────────────┐
│ name │ department │
├───────┼─────────────┤
│ Alice │ Engineering │
│ Bob │ Engineering │
│ Carol │ Sales │
│ NULL │ Marketing │ -- No employees in this dept
└───────┴─────────────┘
-- Find departments with no employees:
SELECT d.name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id
WHERE e.id IS NULL;
FULL OUTER JOIN
-- FULL OUTER JOIN: All rows from both tables
SELECT e.name, d.name AS department
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.id;
-- Result:
┌───────┬─────────────┐
│ name │ department │
├───────┼─────────────┤
│ Alice │ Engineering │
│ Bob │ Engineering │
│ Carol │ Sales │
│ Dave │ NULL │ -- No department
│ NULL │ Marketing │ -- No employees
└───────┴─────────────┘
-- MySQL doesn't support FULL OUTER JOIN directly, use UNION:
SELECT e.name, d.name AS department
FROM employees e LEFT JOIN departments d ON e.dept_id = d.id
UNION
SELECT e.name, d.name AS department
FROM employees e RIGHT JOIN departments d ON e.dept_id = d.id;
Self JOIN
-- Self JOIN: Join table with itself
-- Example: Find employees and their managers
SELECT
e.first_name AS employee,
m.first_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
-- Result:
┌──────────┬─────────┐
│ employee │ manager │
├──────────┼─────────┤
│ Alice │ NULL │ -- No manager (CEO?)
│ Bob │ Alice │
│ Carol │ Alice │
│ Dave │ Bob │
└──────────┴─────────┘
Multiple JOINs
-- Joining multiple tables
SELECT
e.first_name,
e.last_name,
d.name AS department,
m.first_name AS manager_name,
p.title AS project
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
LEFT JOIN employees m ON e.manager_id = m.id
LEFT JOIN employee_projects ep ON e.id = ep.employee_id
LEFT JOIN projects p ON ep.project_id = p.id
WHERE e.active = TRUE
ORDER BY d.name, e.last_name;
Aggregate Functions and GROUP BY
Common Aggregate Functions
-- COUNT - number of rows
SELECT COUNT(*) FROM employees; -- All rows
SELECT COUNT(manager_id) FROM employees; -- Non-NULL values
SELECT COUNT(DISTINCT department_id) FROM employees; -- Unique values
-- SUM - total of numeric column
SELECT SUM(salary) FROM employees;
-- AVG - average
SELECT AVG(salary) FROM employees;
-- MIN / MAX
SELECT MIN(salary), MAX(salary) FROM employees;
-- String aggregates
SELECT GROUP_CONCAT(first_name) FROM employees; -- MySQL
SELECT STRING_AGG(first_name, ', ') FROM employees; -- PostgreSQL
GROUP BY
-- Group employees by department
SELECT
department_id,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
SUM(salary) AS total_payroll
FROM employees
GROUP BY department_id;
-- Group by multiple columns
SELECT
department_id,
YEAR(hire_date) AS hire_year,
COUNT(*) AS hires
FROM employees
GROUP BY department_id, YEAR(hire_date)
ORDER BY department_id, hire_year;
-- With JOIN
SELECT
d.name AS department,
COUNT(e.id) AS employee_count,
ROUND(AVG(e.salary), 2) AS avg_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
ORDER BY employee_count DESC;
HAVING - Filtering Groups
-- HAVING filters groups (like WHERE filters rows)
SELECT
department_id,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) >= 5; -- Only departments with 5+ employees
-- WHERE vs HAVING
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
WHERE active = TRUE -- Filters rows BEFORE grouping
GROUP BY department_id
HAVING AVG(salary) > 60000 -- Filters groups AFTER grouping
ORDER BY avg_salary DESC;
SQL clauses are processed in this order:
- FROM - Tables and JOINs
- WHERE - Filter rows
- GROUP BY - Create groups
- HAVING - Filter groups
- SELECT - Choose columns
- ORDER BY - Sort results
- LIMIT - Restrict rows
Subqueries
A subquery is a query nested inside another query. They're powerful but can impact performance if overused.
Subquery in WHERE
-- Find employees who earn more than average
SELECT * FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
-- Find employees in the Engineering department
SELECT * FROM employees
WHERE department_id = (
SELECT id FROM departments WHERE name = 'Engineering'
);
-- Find employees in departments with more than 10 people
SELECT * FROM employees
WHERE department_id IN (
SELECT department_id
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 10
);
Subquery in FROM (Derived Table)
-- Use subquery result as a table
SELECT
dept_stats.department_id,
dept_stats.avg_salary,
dept_stats.employee_count
FROM (
SELECT
department_id,
AVG(salary) AS avg_salary,
COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
) AS dept_stats
WHERE dept_stats.employee_count > 5;
Subquery in SELECT
-- Scalar subquery (returns single value)
SELECT
first_name,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
-- Correlated subquery (references outer query)
SELECT
e.first_name,
e.salary,
(SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id) AS dept_avg
FROM employees e;
EXISTS and NOT EXISTS
-- EXISTS - check if any rows exist
SELECT * FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e
WHERE e.department_id = d.id
);
-- NOT EXISTS - find departments with no employees
SELECT * FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees e
WHERE e.department_id = d.id
);
-- EXISTS is often more efficient than IN for large datasets
Common Table Expressions (CTEs)
CTEs make complex queries more readable by allowing you to define named subqueries.
-- Basic CTE
WITH dept_stats AS (
SELECT
department_id,
AVG(salary) AS avg_salary,
COUNT(*) AS emp_count
FROM employees
GROUP BY department_id
)
SELECT
d.name,
ds.avg_salary,
ds.emp_count
FROM departments d
JOIN dept_stats ds ON d.id = ds.department_id
WHERE ds.emp_count > 5;
-- Multiple CTEs
WITH
active_employees AS (
SELECT * FROM employees WHERE active = TRUE
),
dept_salaries AS (
SELECT
department_id,
SUM(salary) AS total_salary
FROM active_employees
GROUP BY department_id
)
SELECT
d.name,
ds.total_salary
FROM departments d
JOIN dept_salaries ds ON d.id = ds.department_id;
-- Recursive CTE (for hierarchical data)
WITH RECURSIVE org_chart AS (
-- Base case: top-level managers (no manager)
SELECT id, first_name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees with managers
SELECT e.id, e.first_name, e.manager_id, oc.level + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart
ORDER BY level, first_name;
Window Functions (Advanced)
Window functions perform calculations across rows related to the current row, without collapsing the result set like GROUP BY.
-- ROW_NUMBER - unique sequential number
SELECT
first_name,
department_id,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
-- RANK and DENSE_RANK - handle ties differently
SELECT
first_name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank, -- Gaps after ties
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank -- No gaps
FROM employees;
-- PARTITION BY - window functions per group
SELECT
first_name,
department_id,
salary,
RANK() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS dept_salary_rank
FROM employees;
-- Running totals and averages
SELECT
hire_date,
first_name,
salary,
SUM(salary) OVER (ORDER BY hire_date) AS running_total,
AVG(salary) OVER (ORDER BY hire_date) AS running_avg,
COUNT(*) OVER (ORDER BY hire_date) AS cumulative_hires
FROM employees;
-- LAG and LEAD - access previous/next rows
SELECT
first_name,
hire_date,
salary,
LAG(salary) OVER (ORDER BY hire_date) AS prev_hire_salary,
LEAD(salary) OVER (ORDER BY hire_date) AS next_hire_salary
FROM employees;
-- FIRST_VALUE and LAST_VALUE
SELECT
first_name,
department_id,
salary,
FIRST_VALUE(first_name) OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS highest_paid_in_dept
FROM employees;
-- NTILE - divide rows into buckets
SELECT
first_name,
salary,
NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees;
Transactions
Transactions ensure data integrity by grouping multiple operations into a single atomic unit.
-- Start a transaction
START TRANSACTION; -- or BEGIN
-- Multiple operations
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- If everything is OK
COMMIT;
-- If something went wrong
ROLLBACK;
-- Savepoints for partial rollback
START TRANSACTION;
INSERT INTO orders (customer_id, total) VALUES (1, 100);
SAVEPOINT order_created;
INSERT INTO order_items (order_id, product_id) VALUES (1, 1);
-- Oops, wrong product!
ROLLBACK TO order_created;
INSERT INTO order_items (order_id, product_id) VALUES (1, 2);
COMMIT;
Performance Tips
Query Optimization Best Practices
- Use indexes on columns in WHERE, JOIN, and ORDER BY clauses
- Avoid SELECT * - select only needed columns
- Use EXPLAIN to analyze query execution plans
- Avoid functions on indexed columns in WHERE clauses
- Prefer JOINs over subqueries when possible
- Use LIMIT for pagination instead of fetching all rows
- Batch INSERT/UPDATE operations instead of one-by-one
Common Anti-Patterns
-- BAD: Function on indexed column prevents index use
WHERE YEAR(hire_date) = 2024
-- GOOD: Range query can use index
WHERE hire_date >= '2024-01-01'
AND hire_date < '2025-01-01'
-- BAD: Wildcard at start prevents index use
WHERE name LIKE '%smith'
-- GOOD: Wildcard at end can use index
WHERE name LIKE 'smith%'
-- BAD: Implicit type conversion
WHERE phone = 1234567890 -- phone is VARCHAR
-- GOOD: Match types
WHERE phone = '1234567890'
Summary
- DDL: CREATE, ALTER, DROP - define structure
- DML: SELECT, INSERT, UPDATE, DELETE - work with data
- JOINs: INNER, LEFT, RIGHT, FULL - combine tables
- Aggregates: COUNT, SUM, AVG, MIN, MAX with GROUP BY
- Subqueries: Queries within queries for complex logic
- CTEs: Named subqueries for readability
- Window Functions: Calculations across related rows
- Transactions: ACID compliance for data integrity