SQL

SQL Interview Questions

178 questions with answers · Updated April 2026

Every question ranked by how often it actually gets asked. Compiled from popular open-source interview repositories and real interview patterns. The ones at the top are the ones you will almost certainly face.

bar_chartQuick stats
Most tested topicSQL Basics
Come up regularly67 questions
To feel prepared~80 questions
Review top 50 in~2 hours

Source: open-source interview repositories

search

Most Asked

15 most asked questions
1

What is SQL and what is it used for

SQL stands for Structured Query Language, and it's the standard language for interacting with relational databases. You use it to create and modify database structures, insert and update data, and query records from tables. Almost every relational database system, including PostgreSQL, MySQL, Oracle, and SQL Server, uses SQL as its primary interface. It's been around since the 1970s and remains one of the most essential skills in data engineering and backend development.

2

Describe the difference between SQL and NoSQL databases.

SQL databases are relational, meaning data is stored in structured tables with predefined schemas, and relationships between tables are enforced through keys. NoSQL databases cover a broad category including document stores like MongoDB, key-value stores like Redis, and columnar databases like Cassandra, and they typically have flexible or schema-less designs. SQL databases excel at complex queries, joins, and transactional consistency (ACID compliance), while NoSQL databases are often chosen for horizontal scalability, high write throughput, or unstructured data. The choice really depends on your data model and access patterns, not which one is universally better.

3

What are the different types of SQL commands

SQL commands are grouped into five main categories. DDL (Data Definition Language) handles structure, covering commands like CREATE, ALTER, and DROP. DML (Data Manipulation Language) deals with data itself using INSERT, UPDATE, DELETE, and SELECT. DCL (Data Control Language) manages permissions with GRANT and REVOKE. TCL (Transaction Control Language) handles transactions using COMMIT, ROLLBACK, and SAVEPOINT. Knowing which category a command falls into helps you understand its scope and whether it's reversible.

sql
-- DDL
CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100));

-- DML
INSERT INTO users (id, name) VALUES (1, 'Alice');

-- DCL
GRANT SELECT ON users TO readonly_user;

-- TCL
BEGIN;
UPDATE users SET name = 'Bob' WHERE id = 1;
ROLLBACK;
4

Explain the purpose of the SELECT statement.

The SELECT statement is how you retrieve data from one or more tables in a database. You specify which columns you want, which table to pull from, and optionally filter, sort, join, or aggregate the results. It's the most frequently used SQL command and supports a huge range of operations through clauses like WHERE, JOIN, GROUP BY, ORDER BY, and LIMIT.

sql
SELECT name, email
FROM users
WHERE created_at > '2024-01-01'
ORDER BY name ASC
LIMIT 10;
5

What is the difference between WHERE and HAVING clauses

WHERE filters rows before any grouping or aggregation happens, while HAVING filters after grouping. If you want to filter on a raw column value, use WHERE. If you want to filter on the result of an aggregate function like COUNT or SUM, you have to use HAVING because WHERE runs before those calculations exist.

sql
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE status = 'active'          -- filters rows before grouping
GROUP BY department
HAVING COUNT(*) > 10;            -- filters groups after aggregation
6

What is a primary key in a database

A primary key is a column or combination of columns that uniquely identifies each row in a table. It enforces two constraints automatically: uniqueness and NOT NULL, so no two rows can share the same primary key value and no row can have a null one. Every table should have a primary key because it gives you a reliable way to reference individual records, and it's the foundation for setting up foreign key relationships with other tables.

sql
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  total DECIMAL(10,2)
);
7

Explain what a foreign key is and how it is used.

A foreign key is a column in one table that references the primary key of another table, creating a link between the two. It enforces referential integrity, meaning you can't insert a value in the foreign key column that doesn't exist in the referenced table, and depending on the setup, the database can restrict or cascade deletes and updates. For example, an orders table might have a customer_id column that references the id column in a customers table, ensuring every order belongs to a real customer.

sql
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  FOREIGN KEY (customer_id) REFERENCES customers(id)
    ON DELETE CASCADE
);
8

How can you prevent SQL injections

The most effective defense against SQL injection is using parameterized queries or prepared statements, which separate SQL code from user-supplied data so the database never interprets input as executable SQL. You should also validate and sanitize inputs, apply least-privilege principles so database accounts only have the permissions they actually need, and avoid building queries through string concatenation. ORMs can help because they use parameterized queries under the hood, but you still need to be careful with raw query escapes in those tools.

sql
-- Vulnerable (never do this)
query = "SELECT * FROM users WHERE email = '" + userInput + "'";

-- Safe: parameterized query (PostgreSQL example)
SELECT * FROM users WHERE email = $1;  -- $1 is bound separately

-- Safe: prepared statement syntax (MySQL/general)
SELECT * FROM users WHERE email = ?;
9

What is normalization? Explain with examples.

Normalization is the process of organizing a database to reduce data redundancy and improve data integrity by applying a series of rules called normal forms. The most common ones are 1NF (no repeating groups, atomic values), 2NF (no partial dependencies on a composite key), and 3NF (no transitive dependencies, meaning non-key columns shouldn't depend on other non-key columns). For example, storing a customer's city and zip code in an orders table violates 3NF because city depends on zip code, not on the order itself. The fix is to move that data into a separate addresses or locations table.

sql
-- Violates 3NF: city depends on zip_code, not order_id
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  zip_code VARCHAR(10),
  city VARCHAR(100)  -- transitive dependency
);

-- Better: separate the location data
CREATE TABLE zip_codes (
  zip_code VARCHAR(10) PRIMARY KEY,
  city VARCHAR(100)
);
10

Describe the concept of denormalization and when you would use it.

Denormalization is the deliberate process of introducing redundancy into a database by merging tables or duplicating data, with the goal of improving read performance. Normalized schemas require many joins, which can become expensive at scale, so denormalization trades write complexity and storage for faster query execution. A common example is storing a precomputed order total directly on the orders table instead of summing line items every time. You'd typically reach for denormalization in read-heavy systems, data warehouses, or reporting layers where query speed matters more than strict data consistency.

11

Explain the purpose of the GROUP BY clause.

GROUP BY collapses rows that share the same value in one or more columns into a single row, allowing you to apply aggregate functions like COUNT, SUM, AVG, MIN, and MAX to each group. Without GROUP BY, aggregate functions operate on the entire result set as one group. A classic use case is calculating total sales per region or counting users per signup date.

sql
SELECT region, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY region
ORDER BY total_sales DESC;
12

What is a subquery, and when would you use one

A subquery is a query nested inside another query, and it can appear in the SELECT, FROM, or WHERE clause depending on what you need. You'd use one when you need to filter based on a derived value, reference an aggregated result, or break a complex problem into steps. They're useful but can hurt performance if they run row-by-row (correlated subqueries), so it's worth considering whether a JOIN or CTE would be cleaner or faster.

sql
-- Find employees earning above the average salary
SELECT name, salary
FROM employees
WHERE salary > (
  SELECT AVG(salary) FROM employees
);
13

What is subquery

A subquery is a SELECT statement nested inside another SQL statement. It can be used in the WHERE clause to filter based on computed values, in the FROM clause to treat a derived result set as a temporary table, or in the SELECT clause to compute a value per row. Correlated subqueries reference the outer query and execute once per row, which can be slow on large datasets, so it's often worth replacing them with JOINs or CTEs when performance is a concern.

sql
-- Subquery in FROM clause (derived table)
SELECT dept, avg_salary
FROM (
  SELECT department AS dept, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) AS dept_averages
WHERE avg_salary > 70000;
14

What is a foreign key of a database

A foreign key is a column in one table that points to the primary key in another table, establishing a relationship between them. The database uses this constraint to enforce referential integrity, so you can't have an orphaned record that references a non-existent row. For instance, if an orders table has a customer_id foreign key referencing the customers table, the database will reject any insert where that customer_id doesn't exist in customers. You can also configure behavior for cascading deletes or updates so child records are automatically handled when the parent changes.

sql
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(id)
    ON DELETE RESTRICT
    ON UPDATE CASCADE
);
15

What is database normalization

Database normalization is the process of structuring a relational database to minimize redundancy and dependency issues by organizing data into well-defined tables. It works through a series of progressive rules called normal forms, with 1NF, 2NF, and 3NF being the most commonly applied in practice. 1NF requires atomic column values and no repeating groups. 2NF requires that every non-key column is fully dependent on the entire primary key, which matters when you have composite keys. 3NF removes transitive dependencies so non-key columns only depend on the primary key, not on other non-key columns. The end result is a schema that's easier to maintain and less prone to update, insert, or delete anomalies.

terminal

SQL Basics

54 questions

What is SQL and what is it used for

SQL stands for Structured Query Language, and it's the standard language for interacting with relational databases. You use it to create and modify database structures, insert and update data, and query records from tables. Almost every relational database system, including PostgreSQL, MySQL, Oracle, and SQL Server, uses SQL as its primary interface. It's been around since the 1970s and remains one of the most essential skills in data engineering and backend development.

Describe the difference between SQL and NoSQL databases.

SQL databases are relational, meaning data is stored in structured tables with predefined schemas, and relationships between tables are enforced through keys. NoSQL databases cover a broad category including document stores like MongoDB, key-value stores like Redis, and columnar databases like Cassandra, and they typically have flexible or schema-less designs. SQL databases excel at complex queries, joins, and transactional consistency (ACID compliance), while NoSQL databases are often chosen for horizontal scalability, high write throughput, or unstructured data. The choice really depends on your data model and access patterns, not which one is universally better.

What are the different types of SQL commands

SQL commands are grouped into five main categories. DDL (Data Definition Language) handles structure, covering commands like CREATE, ALTER, and DROP. DML (Data Manipulation Language) deals with data itself using INSERT, UPDATE, DELETE, and SELECT. DCL (Data Control Language) manages permissions with GRANT and REVOKE. TCL (Transaction Control Language) handles transactions using COMMIT, ROLLBACK, and SAVEPOINT. Knowing which category a command falls into helps you understand its scope and whether it's reversible.

sql
-- DDL
CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100));

-- DML
INSERT INTO users (id, name) VALUES (1, 'Alice');

-- DCL
GRANT SELECT ON users TO readonly_user;

-- TCL
BEGIN;
UPDATE users SET name = 'Bob' WHERE id = 1;
ROLLBACK;
See all 54 SQL Basics questionsarrow_forward
rocket_launch

Advanced SQL

24 questions

What is Failover clustering overview

Failover clustering is a high-availability solution where multiple servers (nodes) share access to the same database storage, and if the active node fails, another node automatically takes over with minimal downtime. SQL Server Failover Cluster Instances (FCI) present a single virtual server name to clients, so applications reconnect transparently after a failover. The shared storage (usually a SAN or Storage Spaces Direct) holds the actual data files, and only one node actively owns the SQL Server instance at any given time. It protects against hardware and OS-level failures but not against data corruption, which is why it's often combined with Always On Availability Groups.

What is a Cursor

A cursor is a database object that lets you process a result set one row at a time, rather than operating on the whole set at once. You declare it with a SELECT statement, then open it, fetch rows in a loop, process each row, and finally close and deallocate it. Cursors are useful when you need to perform row-by-row operations that genuinely cannot be expressed as a set-based query, like complex procedural logic that depends on the previous row. The downside is that cursors are significantly slower than set-based operations and should be avoided or replaced with set-based alternatives whenever possible.

sql
DECLARE emp_cursor CURSOR FOR
    SELECT emp_id, salary FROM employees;

OPEN emp_cursor;
FETCH NEXT FROM emp_cursor INTO @emp_id, @salary;

WHILE @@FETCH_STATUS = 0
BEGIN
    -- process each row here
    FETCH NEXT FROM emp_cursor INTO @emp_id, @salary;
END;

CLOSE emp_cursor;
DEALLOCATE emp_cursor;

When is the Explicit Cursor Used

An explicit cursor is used in PL/SQL when you need to manually control the fetch process for a query that returns multiple rows. You declare it, open it, fetch rows one at a time in a loop, and then close it. This is useful when you need fine-grained control, such as processing rows conditionally, fetching in bulk using BULK COLLECT, or managing multiple cursors at the same time. Implicit cursors handle single-row queries automatically, so explicit cursors are reserved for cases where that level of control matters.

sql
DECLARE
    CURSOR emp_cursor IS SELECT employee_id, name FROM employees WHERE department_id = 10;
    v_id employees.employee_id%TYPE;
    v_name employees.name%TYPE;
BEGIN
    OPEN emp_cursor;
    LOOP
        FETCH emp_cursor INTO v_id, v_name;
        EXIT WHEN emp_cursor%NOTFOUND;
        DBMS_OUTPUT.PUT_LINE(v_id || ' - ' || v_name);
    END LOOP;
    CLOSE emp_cursor;
END;
See all 24 Advanced SQL questionsarrow_forward
trending_up

Query Optimization

16 questions

SQL Query to find second highest salary of Employee

The cleanest approach uses a subquery to exclude the maximum salary and then picks the max of what remains. Another common approach uses DENSE_RANK() which handles ties correctly, which is usually what interviewers actually want.

sql
-- Subquery approach
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- Window function approach (handles ties)
SELECT salary
FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
    FROM employees
) ranked
WHERE rnk = 2;

What is the command used to fetch first 5 characters of the string

You use the SUBSTRING (or LEFT) function to extract the first 5 characters from a string. Both work across most major databases, but LEFT is more concise when you always want characters from the start.

sql
SELECT SUBSTRING(column_name, 1, 5) FROM table_name;

-- Or equivalently
SELECT LEFT(column_name, 5) FROM table_name;

SQL Query to find Max Salary from each department.

The standard approach is to use GROUP BY on the department and aggregate salary with MAX(). You join back to the departments table if you want department names alongside the results.

sql
SELECT department_id, MAX(salary) AS max_salary
FROM employees
GROUP BY department_id
ORDER BY department_id;

-- With department name via join
SELECT d.department_name, MAX(e.salary) AS max_salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY d.department_name;
See all 16 Query Optimization questionsarrow_forward
key

Keys & Constraints

14 questions

What is a primary key in a database

A primary key is a column or combination of columns that uniquely identifies each row in a table. It enforces two constraints automatically: uniqueness and NOT NULL, so no two rows can share the same primary key value and no row can have a null one. Every table should have a primary key because it gives you a reliable way to reference individual records, and it's the foundation for setting up foreign key relationships with other tables.

sql
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  total DECIMAL(10,2)
);

Explain what a foreign key is and how it is used.

A foreign key is a column in one table that references the primary key of another table, creating a link between the two. It enforces referential integrity, meaning you can't insert a value in the foreign key column that doesn't exist in the referenced table, and depending on the setup, the database can restrict or cascade deletes and updates. For example, an orders table might have a customer_id column that references the id column in a customers table, ensuring every order belongs to a real customer.

sql
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  FOREIGN KEY (customer_id) REFERENCES customers(id)
    ON DELETE CASCADE
);

What is a foreign key of a database

A foreign key is a column in one table that points to the primary key in another table, establishing a relationship between them. The database uses this constraint to enforce referential integrity, so you can't have an orphaned record that references a non-existent row. For instance, if an orders table has a customer_id foreign key referencing the customers table, the database will reject any insert where that customer_id doesn't exist in customers. You can also configure behavior for cascading deletes or updates so child records are automatically handled when the parent changes.

sql
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(id)
    ON DELETE RESTRICT
    ON UPDATE CASCADE
);
See all 14 Keys & Constraints questionsarrow_forward
schema

Normalization

10 questions

What is normalization? Explain with examples.

Normalization is the process of organizing a database to reduce data redundancy and improve data integrity by applying a series of rules called normal forms. The most common ones are 1NF (no repeating groups, atomic values), 2NF (no partial dependencies on a composite key), and 3NF (no transitive dependencies, meaning non-key columns shouldn't depend on other non-key columns). For example, storing a customer's city and zip code in an orders table violates 3NF because city depends on zip code, not on the order itself. The fix is to move that data into a separate addresses or locations table.

sql
-- Violates 3NF: city depends on zip_code, not order_id
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  zip_code VARCHAR(10),
  city VARCHAR(100)  -- transitive dependency
);

-- Better: separate the location data
CREATE TABLE zip_codes (
  zip_code VARCHAR(10) PRIMARY KEY,
  city VARCHAR(100)
);

Describe the concept of denormalization and when you would use it.

Denormalization is the deliberate process of introducing redundancy into a database by merging tables or duplicating data, with the goal of improving read performance. Normalized schemas require many joins, which can become expensive at scale, so denormalization trades write complexity and storage for faster query execution. A common example is storing a precomputed order total directly on the orders table instead of summing line items every time. You'd typically reach for denormalization in read-heavy systems, data warehouses, or reporting layers where query speed matters more than strict data consistency.

What is database normalization

Database normalization is the process of structuring a relational database to minimize redundancy and dependency issues by organizing data into well-defined tables. It works through a series of progressive rules called normal forms, with 1NF, 2NF, and 3NF being the most commonly applied in practice. 1NF requires atomic column values and no repeating groups. 2NF requires that every non-key column is fully dependent on the entire primary key, which matters when you have composite keys. 3NF removes transitive dependencies so non-key columns only depend on the primary key, not on other non-key columns. The end result is a schema that's easier to maintain and less prone to update, insert, or delete anomalies.

See all 10 Normalization questionsarrow_forward
code_blocks

Stored Procedures

10 questions

What is a stored procedure

A stored procedure is a precompiled collection of SQL statements saved in the database that can be executed as a single unit. You call it by name, optionally passing parameters, and it runs the enclosed logic on the server side. Stored procedures reduce network traffic since you send one call instead of multiple queries, and the database caches the execution plan for better performance.

sql
CREATE PROCEDURE GetEmployeesByDept
  @DeptId INT
AS
BEGIN
  SELECT name, salary
  FROM employees
  WHERE dept_id = @DeptId
  ORDER BY salary DESC;
END;

Why we use Stored Procedure

Stored procedures improve performance because the execution plan is compiled and cached on first run, so subsequent calls skip the parsing and optimization steps. They reduce network traffic by sending a single procedure call instead of multiple SQL statements. They also improve security by allowing users to execute the procedure without direct access to the underlying tables, and they help prevent SQL injection when used with parameterized inputs.

How to create a Stored Procedure

You create a stored procedure using the CREATE PROCEDURE statement, followed by the procedure name, parameters, and the SQL logic inside a BEGIN/END block.

sql
CREATE PROCEDURE GetActiveUsers
  @MinAge INT = 18
AS
BEGIN
  SELECT id, name, email
  FROM users
  WHERE is_active = 1
    AND age >= @MinAge;
END;

-- Call it
EXEC GetActiveUsers @MinAge = 25;
See all 10 Stored Procedures questionsarrow_forward
join_inner

Joins

8 questions

What is a join in SQL Server

A join in SQL Server is a way to combine rows from two or more tables based on a related column between them. It lets you query data spread across multiple tables as if it were a single result set. SQL Server supports several types: INNER JOIN, LEFT/RIGHT OUTER JOIN, FULL OUTER JOIN, and CROSS JOIN, each controlling which rows are included when there is or is not a match.

sql
SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;

What is an inner join in SQL

An INNER JOIN returns only the rows where there is a matching value in both tables based on the join condition. If a row in the left table has no corresponding row in the right table, it is completely excluded from the result, and the same goes for the right table. It is the most commonly used join type and the default when you just write JOIN without specifying a type.

sql
SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
-- Only employees with a valid department are returned

What is a right join in SQL Server

A RIGHT JOIN (or RIGHT OUTER JOIN) returns all rows from the right table and only the matching rows from the left table. Where there is no match in the left table, the result shows NULL for those columns. It is essentially the mirror image of a LEFT JOIN. In practice, most developers rewrite RIGHT JOINs as LEFT JOINs by swapping the table order because it tends to be easier to read, but functionally they are identical.

sql
SELECT e.name, d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department_id = d.id;
-- All departments are returned, even those with no employees
-- e.name will be NULL for departments with no staff
See all 8 Joins questionsarrow_forward
build

DDL

7 questions

What is the difference between DELETE TABLE and TRUNCATE TABLE commands

DELETE removes rows one at a time, logs each deletion in the transaction log, can use a WHERE clause to target specific rows, and fires any DELETE triggers on the table. TRUNCATE removes all rows at once with minimal logging (it deallocates data pages), cannot use a WHERE clause, does not fire triggers, and resets identity counters. DELETE is slower but more flexible; TRUNCATE is faster but removes everything. Both can be rolled back inside an explicit transaction in SQL Server and PostgreSQL.

How to alter a table schema in SQL Server

ALTER TABLE is the DDL command used to modify an existing table's structure in SQL Server. You can add columns, drop columns, change data types, add or drop constraints, and rename columns. The syntax varies slightly depending on what you're doing, for example adding a column uses ADD, while modifying an existing column's type uses ALTER COLUMN.

sql
-- Add a new column
ALTER TABLE employees ADD department_id INT NULL;

-- Change a column's data type
ALTER TABLE employees ALTER COLUMN salary DECIMAL(12, 2);

-- Drop a column
ALTER TABLE employees DROP COLUMN old_column;

-- Add a constraint
ALTER TABLE employees ADD CONSTRAINT chk_salary CHECK (salary > 0);

How to fetch alternate records from a table

Fetching alternate rows relies on using the row's position, typically via ROW_NUMBER() or a rowid. You filter on even or odd row numbers depending on which set you want. The exact syntax differs slightly between databases, but the pattern is the same.

sql
-- Fetch odd-numbered rows (1st, 3rd, 5th...)
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) AS rn
    FROM employees
) ranked
WHERE rn % 2 = 1;

-- Fetch even-numbered rows (2nd, 4th, 6th...)
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY employee_id) AS rn
    FROM employees
) ranked
WHERE rn % 2 = 0;
See all 7 DDL questionsarrow_forward
speed

Indexes & Performance

6 questions

What is an index

An index is a database object that speeds up data retrieval by creating a separate data structure that points to rows in a table, similar to an index at the back of a book. Without an index, SQL Server has to scan every row in a table to find matches, which gets very slow on large tables. Indexes store a sorted copy of one or more columns along with pointers to the actual rows, so the engine can jump directly to relevant data. The trade-off is that indexes consume disk space and slow down INSERT, UPDATE, and DELETE operations because the index must be maintained alongside the table.

sql
CREATE INDEX idx_employee_lastname ON employees (last_name);

-- With multiple columns (composite index)
CREATE INDEX idx_emp_dept ON employees (department_id, last_name);

What is the difference between Cluster and Non-Cluster Index

A clustered index determines the physical sort order of the data in the table itself, so a table can only have one clustered index. The leaf nodes of a clustered index are the actual data pages. A non-clustered index is a separate structure that stores a sorted copy of the indexed columns along with a pointer back to the actual row, and a table can have up to 999 non-clustered indexes in SQL Server. Lookups using a non-clustered index may require an extra step called a key lookup to fetch columns not included in the index, while a clustered index read goes straight to the data. Primary keys are clustered by default in SQL Server, but you can change that.

You have a composite index of three columns, and you only provide the value of two columns in WHERE clause of a select query? Will Index be used for this operation

It depends on which columns you provide. Composite indexes follow the leftmost prefix rule, meaning the index is used only if your WHERE clause includes the leading column(s) of the index. If your index is on (col_a, col_b, col_c) and you filter on col_a and col_b, the index will be used effectively. But if you skip col_a and only filter on col_b and col_c, the optimizer generally cannot use that index and will fall back to a full table scan. Always make sure your most selective and most commonly queried column is at the front of a composite index.

See all 6 Indexes & Performance questionsarrow_forward
sync_lock

Transactions

5 questions

What are different transaction levels in SQL SERVER

SQL Server supports five transaction isolation levels that control how transactions interact with each other. READ UNCOMMITTED allows dirty reads (seeing uncommitted changes from other transactions). READ COMMITTED (the default) only reads committed data. REPEATABLE READ prevents other transactions from modifying rows you have read. SERIALIZABLE is the strictest level, preventing phantom reads by locking entire ranges. SNAPSHOT uses row versioning to give each transaction a consistent view of the data without blocking other transactions.

What is ACID fundamental? What are transactions in SQL SERVER

ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity means all operations in a transaction succeed or all are rolled back, there is no partial completion. Consistency ensures the database moves from one valid state to another, respecting all constraints and rules. Isolation means concurrent transactions do not see each other's intermediate state, controlled by isolation levels like READ COMMITTED or SERIALIZABLE. Durability guarantees that once a transaction is committed, the changes survive even a system crash, ensured by the transaction log. In SQL Server, a transaction is a unit of work wrapped in BEGIN TRANSACTION and ended with COMMIT or ROLLBACK.

sql
BEGIN TRANSACTION;
    UPDATE accounts SET balance = balance - 500 WHERE account_id = 1;
    UPDATE accounts SET balance = balance + 500 WHERE account_id = 2;

    IF @@ERROR <> 0
        ROLLBACK;
    ELSE
        COMMIT;

If locking is not implemented, what issues can occur

Without locking, three classic concurrency problems can occur. First, dirty reads, where one transaction reads uncommitted data from another transaction that later gets rolled back, so you end up with data that never officially existed. Second, non-repeatable reads, where you read a row, another transaction modifies it and commits, and when you read it again in the same transaction you get a different value. Third, phantom reads, where a query returns a different set of rows when executed twice in the same transaction because another transaction inserted or deleted rows between the two reads.

See all 5 Transactions questionsarrow_forward
account_tree

Subqueries & CTEs

4 questions

What is a subquery, and when would you use one

A subquery is a query nested inside another query, and it can appear in the SELECT, FROM, or WHERE clause depending on what you need. You'd use one when you need to filter based on a derived value, reference an aggregated result, or break a complex problem into steps. They're useful but can hurt performance if they run row-by-row (correlated subqueries), so it's worth considering whether a JOIN or CTE would be cleaner or faster.

sql
-- Find employees earning above the average salary
SELECT name, salary
FROM employees
WHERE salary > (
  SELECT AVG(salary) FROM employees
);

What is subquery

A subquery is a SELECT statement nested inside another SQL statement. It can be used in the WHERE clause to filter based on computed values, in the FROM clause to treat a derived result set as a temporary table, or in the SELECT clause to compute a value per row. Correlated subqueries reference the outer query and execute once per row, which can be slow on large datasets, so it's often worth replacing them with JOINs or CTEs when performance is a concern.

sql
-- Subquery in FROM clause (derived table)
SELECT dept, avg_salary
FROM (
  SELECT department AS dept, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) AS dept_averages
WHERE avg_salary > 70000;

Explain correlated query work

A correlated subquery references a column from the outer query, which means it gets re-executed once for every row the outer query processes. This is different from a regular subquery, which runs once and returns a fixed result set. Because of that row-by-row execution, correlated subqueries can be slow on large tables, but they're useful when you need to compare each row against an aggregate or condition from a related set of rows.

sql
-- Find employees who earn more than the average salary in their own department
SELECT e1.employee_id, e1.name, e1.salary, e1.department_id
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id  -- references outer query
);
See all 4 Subqueries & CTEs questionsarrow_forward
functions

Aggregation

3 questions

Explain the purpose of the GROUP BY clause.

GROUP BY collapses rows that share the same value in one or more columns into a single row, allowing you to apply aggregate functions like COUNT, SUM, AVG, MIN, and MAX to each group. Without GROUP BY, aggregate functions operate on the entire result set as one group. A classic use case is calculating total sales per region or counting users per signup date.

sql
SELECT region, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY region
ORDER BY total_sales DESC;

What are aggregate and scalar functions

Aggregate functions operate on a set of rows and return a single summary value. Common examples are SUM(), COUNT(), AVG(), MIN(), and MAX(). They are typically used with GROUP BY to compute values per group. Scalar functions, on the other hand, operate on a single value and return a single value. Examples include UPPER(), LEN(), ROUND(), and GETDATE(). The distinction is that aggregate functions collapse multiple rows into one result, while scalar functions transform one input into one output on a row-by-row basis.

sql
-- Aggregate: summarizes multiple rows
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

-- Scalar: transforms a single value per row
SELECT UPPER(first_name), ROUND(salary, 2) FROM employees;

What does it mean to have QUOTED_IDENTIFIER ON? What are the implications of having it OFF

QUOTED_IDENTIFIER ON tells SQL Server to treat double-quoted strings as object identifiers, such as column or table names, rather than as string literals. With it ON, you must use single quotes for string literals, which is the ANSI standard behavior and required for creating or altering certain objects like computed columns and indexed views. With it OFF, double quotes behave like single quotes for string literals, which can break ANSI compliance and cause issues with any object that was created with QUOTED_IDENTIFIER ON. Most modern development keeps it ON by default, and SQL Server drivers like ODBC and OLE DB set it ON automatically.

Beginner SQL interview questions

If you are a fresher or have less than 2 years of experience, focus on SQL fundamentals: SELECT, WHERE, basic joins, primary and foreign keys, and normalization. Interviewers expect you to write simple queries confidently, not to know advanced optimization.

What is SQL and what is it used for

SQL stands for Structured Query Language, and it's the standard language for interacting with relational databases. You use it to create and modify database structures, insert and update data, and query records from tables. Almost every relational database system, including PostgreSQL, MySQL, Oracle, and SQL Server, uses SQL as its primary interface. It's been around since the 1970s and remains one of the most essential skills in data engineering and backend development.

What are the different types of SQL commands

SQL commands are grouped into five main categories. DDL (Data Definition Language) handles structure, covering commands like CREATE, ALTER, and DROP. DML (Data Manipulation Language) deals with data itself using INSERT, UPDATE, DELETE, and SELECT. DCL (Data Control Language) manages permissions with GRANT and REVOKE. TCL (Transaction Control Language) handles transactions using COMMIT, ROLLBACK, and SAVEPOINT. Knowing which category a command falls into helps you understand its scope and whether it's reversible.

What is a primary key in a database

A primary key is a column or combination of columns that uniquely identifies each row in a table. It enforces two constraints automatically: uniqueness and NOT NULL, so no two rows can share the same primary key value and no row can have a null one. Every table should have a primary key because it gives you a reliable way to reference individual records, and it's the foundation for setting up foreign key relationships with other tables.

Explain what a foreign key is and how it is used.

A foreign key is a column in one table that references the primary key of another table, creating a link between the two. It enforces referential integrity, meaning you can't insert a value in the foreign key column that doesn't exist in the referenced table, and depending on the setup, the database can restrict or cascade deletes and updates. For example, an orders table might have a customer_id column that references the id column in a customers table, ensuring every order belongs to a real customer.

What is the difference between WHERE and HAVING clauses

WHERE filters rows before any grouping or aggregation happens, while HAVING filters after grouping. If you want to filter on a raw column value, use WHERE. If you want to filter on the result of an aggregate function like COUNT or SUM, you have to use HAVING because WHERE runs before those calculations exist.

What is normalization? Explain with examples.

Normalization is the process of organizing a database to reduce data redundancy and improve data integrity by applying a series of rules called normal forms. The most common ones are 1NF (no repeating groups, atomic values), 2NF (no partial dependencies on a composite key), and 3NF (no transitive dependencies, meaning non-key columns shouldn't depend on other non-key columns). For example, storing a customer's city and zip code in an orders table violates 3NF because city depends on zip code, not on the order itself. The fix is to move that data into a separate addresses or locations table.

What is the difference between DELETE TABLE and TRUNCATE TABLE commands

DELETE removes rows one at a time, logs each deletion in the transaction log, can use a WHERE clause to target specific rows, and fires any DELETE triggers on the table. TRUNCATE removes all rows at once with minimal logging (it deallocates data pages), cannot use a WHERE clause, does not fire triggers, and resets identity counters. DELETE is slower but more flexible; TRUNCATE is faster but removes everything. Both can be rolled back inside an explicit transaction in SQL Server and PostgreSQL.

What is the difference between UNION and UNION ALL SQL syntax

UNION combines the results of two or more SELECT statements and removes duplicate rows from the final result. UNION ALL also combines results but keeps all rows including duplicates. UNION ALL is faster because it skips the duplicate-elimination step (which requires a sort or hash operation). Use UNION when you need distinct results across queries, and UNION ALL when you know there are no duplicates or when duplicates are acceptable.

What is a view in the database

A view is a virtual table defined by a SELECT query. It does not store data itself but provides a saved query that you can reference like a regular table. Views are useful for simplifying complex joins, restricting access to specific columns or rows, and presenting a consistent interface even when the underlying table structure changes.

What is the meaning of Not Null in SQL

NOT NULL is a column constraint that prevents any row from storing a NULL value in that column, meaning a value must always be provided. Without it, a column accepts NULL by default, which represents the absence of a value and can cause issues in calculations, comparisons, and joins since NULL does not equal anything, including itself. You apply it at column definition time, and if you try to insert a row without providing a value for a NOT NULL column (and there's no default), the database will throw an error.

Browse the SQL Basics and Keys & Constraints sections above for the full list.

Advanced SQL interview questions

With 3+ years of experience, interviewers expect you to go beyond basics. Expect questions about transactions, index optimization, stored procedures vs functions, locking, and complex queries. These come up regularly in mid-level and senior interviews:

What is ACID fundamental? What are transactions in SQL SERVER

ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity means all operations in a transaction succeed or all are rolled back, there is no partial completion. Consistency ensures the database moves from one valid state to another, respecting all constraints and rules. Isolation means concurrent transactions do not see each other's intermediate state, controlled by isolation levels like READ COMMITTED or SERIALIZABLE. Durability guarantees that once a transaction is committed, the changes survive even a system crash, ensured by the transaction log. In SQL Server, a transaction is a unit of work wrapped in BEGIN TRANSACTION and ended with COMMIT or ROLLBACK.

What is the difference between Cluster and Non-Cluster Index

A clustered index determines the physical sort order of the data in the table itself, so a table can only have one clustered index. The leaf nodes of a clustered index are the actual data pages. A non-clustered index is a separate structure that stores a sorted copy of the indexed columns along with a pointer back to the actual row, and a table can have up to 999 non-clustered indexes in SQL Server. Lookups using a non-clustered index may require an extra step called a key lookup to fetch columns not included in the index, while a clustered index read goes straight to the data. Primary keys are clustered by default in SQL Server, but you can change that.

What is the difference between Stored Procedure (SP) and User Defined Function (UDF)

A stored procedure can perform any database operation (SELECT, INSERT, UPDATE, DELETE) and does not have to return a value. A user-defined function must return a value (scalar or table) and cannot modify database state (no INSERT, UPDATE, or DELETE). Functions can be used inside SELECT statements and WHERE clauses, while stored procedures cannot. Stored procedures support output parameters and error handling with TRY/CATCH, while functions are more limited but composable within queries.

What are different transaction levels in SQL SERVER

SQL Server supports five transaction isolation levels that control how transactions interact with each other. READ UNCOMMITTED allows dirty reads (seeing uncommitted changes from other transactions). READ COMMITTED (the default) only reads committed data. REPEATABLE READ prevents other transactions from modifying rows you have read. SERIALIZABLE is the strictest level, preventing phantom reads by locking entire ranges. SNAPSHOT uses row versioning to give each transaction a consistent view of the data without blocking other transactions.

SQL Query to find second highest salary of Employee

The cleanest approach uses a subquery to exclude the maximum salary and then picks the max of what remains. Another common approach uses DENSE_RANK() which handles ties correctly, which is usually what interviewers actually want.

How can you increase SQL performance

The biggest wins usually come from proper indexing. Make sure you have indexes on columns used in WHERE clauses, JOIN conditions, and ORDER BY, and use covering indexes where possible to avoid table lookups. Beyond indexes, avoid SELECT * and only retrieve the columns you need, rewrite correlated subqueries as JOINs or CTEs, and analyze query execution plans to find full table scans or expensive sort operations. At the database level, partitioning large tables, updating statistics regularly, and making sure queries aren't blocked by locks also have a big impact on throughput.

If locking is not implemented, what issues can occur

Without locking, three classic concurrency problems can occur. First, dirty reads, where one transaction reads uncommitted data from another transaction that later gets rolled back, so you end up with data that never officially existed. Second, non-repeatable reads, where you read a row, another transaction modifies it and commits, and when you read it again in the same transaction you get a different value. Third, phantom reads, where a query returns a different set of rows when executed twice in the same transaction because another transaction inserted or deleted rows between the two reads.

What is LOCK escalation

Lock escalation is a process where the database engine automatically converts many fine-grained locks, like row-level or page-level locks, into a single coarser lock, typically a table-level lock. This happens to reduce the memory overhead that comes from maintaining a huge number of individual locks. SQL Server, for example, escalates to a table lock when a transaction holds roughly 5,000 locks on a single object. The downside is reduced concurrency since a table lock blocks other transactions from accessing any part of that table, so escalation can cause unexpected blocking and timeouts under heavy load.

Describe the concept of denormalization and when you would use it.

Denormalization is the deliberate process of introducing redundancy into a database by merging tables or duplicating data, with the goal of improving read performance. Normalized schemas require many joins, which can become expensive at scale, so denormalization trades write complexity and storage for faster query execution. A common example is storing a precomputed order total directly on the orders table instead of summing line items every time. You'd typically reach for denormalization in read-heavy systems, data warehouses, or reporting layers where query speed matters more than strict data consistency.

SQL Query to find Max Salary from each department.

The standard approach is to use GROUP BY on the department and aggregate salary with MAX(). You join back to the departments table if you want department names alongside the results.

Browse the Transactions, Indexes, and Stored Procedures sections above for the full list.

How to prepare

1. Know the top 50 cold. The high frequency questions are the ones you will face. Joins, GROUP BY, WHERE vs HAVING, primary keys, normalization.

2. Go deep on Joins and Aggregation. These two topics alone cover a large portion of SQL interviews. Know INNER vs LEFT vs FULL joins, and be comfortable with GROUP BY, HAVING, and aggregate functions.

3. Practice writing queries. Use NeetCode 150 or LeetCode SQL 50 to practice actual query writing.

4. Mock interviews. Practice with an AI mock interview. Most candidates fail on communication, not knowledge.

Where this data comes from

These 178 SQL interview questions were compiled from popular open-source interview preparation repositories on GitHub, cross-referenced with real interview patterns from developer communities.

Each question was categorized by topic and ranked by frequency based on how often it appears across multiple sources. The high frequency questions (15) are the ones that come up most consistently. If you are short on time, start there.

Frequently Asked Questions

What are the most common SQL interview questions?add

The most frequently asked SQL questions cover joins (INNER, LEFT, RIGHT, FULL), the difference between WHERE and HAVING, primary vs foreign keys, normalization, indexes, GROUP BY with aggregate functions, subqueries, and the ACID properties of transactions. These appear across virtually all SQL interviews regardless of role.

How many SQL interview questions should I prepare?add

Focus on the high frequency questions first (about 50). Once those are solid, work through your weak topics like window functions or query optimization. Understanding the top 80-100 gives you strong coverage for most SQL interviews.

What SQL topics are asked most in interviews?add

Joins, Keys and Constraints, Aggregation (GROUP BY, HAVING), Indexes, and Normalization are the most tested areas. For data analyst roles, expect more window functions and complex query questions. For backend roles, expect more about transactions, stored procedures, and optimization.

Are SQL interview questions different for freshers vs experienced?add

Yes. Freshers get more fundamentals like SELECT, WHERE, basic joins, and normalization. Experienced candidates get window functions, query optimization, execution plans, transaction isolation levels, and complex subqueries. The core join and aggregation questions apply to all levels.

How should I prepare for an SQL coding interview?add

Start with the high frequency questions on this page to build your knowledge base. Then practice writing actual queries on platforms like LeetCode SQL 50 or HackerRank SQL. Finally, do mock interviews to practice explaining your query logic under pressure.

Knowing the answers is half the battle

The other half is explaining them clearly under pressure.

Try a free mock interviewarrow_forward

Continue preparing