SQL Normalization Interview Questions
10 questions with answers · SQL Interview Guide
Normal forms (1NF through BCNF), denormalization, schema design, and database relationships. Critical for database design discussions.
What is normalization? Explain with examples.
Normalization is the process of organizing a database to reduce data redundancy and improve data integrity by applying a series of rules called normal forms. The most common ones are 1NF (no repeating groups, atomic values), 2NF (no partial dependencies on a composite key), and 3NF (no transitive dependencies, meaning non-key columns shouldn't depend on other non-key columns). For example, storing a customer's city and zip code in an orders table violates 3NF because city depends on zip code, not on the order itself. The fix is to move that data into a separate addresses or locations table.
-- Violates 3NF: city depends on zip_code, not order_id
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
zip_code VARCHAR(10),
city VARCHAR(100) -- transitive dependency
);
-- Better: separate the location data
CREATE TABLE zip_codes (
zip_code VARCHAR(10) PRIMARY KEY,
city VARCHAR(100)
);Describe the concept of denormalization and when you would use it.
Denormalization is the deliberate process of introducing redundancy into a database by merging tables or duplicating data, with the goal of improving read performance. Normalized schemas require many joins, which can become expensive at scale, so denormalization trades write complexity and storage for faster query execution. A common example is storing a precomputed order total directly on the orders table instead of summing line items every time. You'd typically reach for denormalization in read-heavy systems, data warehouses, or reporting layers where query speed matters more than strict data consistency.
What is database normalization
Database normalization is the process of structuring a relational database to minimize redundancy and dependency issues by organizing data into well-defined tables. It works through a series of progressive rules called normal forms, with 1NF, 2NF, and 3NF being the most commonly applied in practice. 1NF requires atomic column values and no repeating groups. 2NF requires that every non-key column is fully dependent on the entire primary key, which matters when you have composite keys. 3NF removes transitive dependencies so non-key columns only depend on the primary key, not on other non-key columns. The end result is a schema that's easier to maintain and less prone to update, insert, or delete anomalies.
What are database normalization forms
The main normalization forms are: 1NF requires atomic values in each column with no repeating groups. 2NF builds on 1NF by removing partial dependencies, so every non-key column depends on the entire primary key. 3NF removes transitive dependencies, meaning non-key columns depend only on the primary key, not on other non-key columns. BCNF (Boyce-Codd Normal Form) is a stricter version of 3NF where every determinant must be a candidate key. Most production databases aim for 3NF as a practical balance between normalization and query performance.
What is Denormalization.
Denormalization is the deliberate introduction of redundancy into a normalized database to improve read performance. You add duplicate or precomputed data to avoid expensive joins at query time. Common examples include adding a customer_name column directly on the orders table instead of joining to customers every time, or storing aggregated totals. The tradeoff is increased storage, more complex writes, and the risk of data inconsistency. It is commonly used in reporting databases, data warehouses, and read-heavy applications.
What is the use of OLAP
OLAP stands for Online Analytical Processing, and its purpose is to support fast, multidimensional analysis of large volumes of historical data. It's designed for read-heavy analytical workloads like reporting, trend analysis, forecasting, and business intelligence rather than transactional operations. OLAP systems let analysts slice and dice data across multiple dimensions such as time, geography, or product category, and they typically preaggregate or store data in ways that make these complex aggregation queries much faster than running them against a normalized OLTP database.
What is a measure in OLAP
A measure in OLAP is a numeric, quantitative value that you want to analyze and aggregate. Typical examples are sales revenue, units sold, profit margin, or page views. Measures are the 'what you are measuring' part of analysis, and they live in the fact table. They are almost always aggregated using functions like SUM, AVG, COUNT, MIN, or MAX when sliced across different dimensions.
What are dimensions in OLAP
Dimensions are the categorical attributes you use to filter, group, and slice your measures. Think of them as the 'by what' in a question like 'show me total sales by region, by product category, and by quarter.' Common dimensions include time, geography, product, customer, and sales channel. They provide the context around the numeric measures stored in fact tables, and they typically have hierarchies built into them.
What are fact tables and dimension tables in OLAP
A fact table is the central table in an OLAP schema and stores the quantitative measures you want to analyze, like sales amounts, quantities, or costs. It contains foreign keys that point to the surrounding dimension tables. Dimension tables hold the descriptive attributes used to filter and label the data, such as product names, customer demographics, or date details. In a star schema, the fact table sits in the middle with dimension tables radiating outward. In a snowflake schema, dimension tables are further normalized into sub-dimensions.
What is Datawarehouse
A data warehouse is a centralized repository designed for analytical reporting and business intelligence, as opposed to transactional processing. It consolidates data from multiple source systems, cleans and transforms it through an ETL process, and stores it in a way that makes querying large historical datasets fast and efficient. Data warehouses typically use dimensional modeling with fact and dimension tables, and tools like Redshift, Snowflake, or SQL Server Analysis Services are common implementations. The key difference from an OLTP database is that a warehouse is optimized for read-heavy analytical queries across massive datasets, not for high-frequency insert and update operations.
Knowing the answers is half the battle
The other half is explaining them clearly under pressure.
Try a free mock interviewarrow_forward