ServerlessBase Blog
  • Relational Database Fundamentals Explained

    A comprehensive guide to relational database concepts, ACID properties, normalization, and indexing for developers.

    Relational Database Fundamentals Explained

    You've probably worked with data before—spreadsheets, CSV files, maybe even a JSON blob. But when your application grows beyond a few dozen rows, those flat files stop working. You need structure, consistency, and reliability. That's where relational databases come in.

    Relational databases organize data into tables with predefined relationships, making it easy to store, retrieve, and manipulate complex information. They power everything from e-commerce platforms to banking systems, from social networks to content management systems. Understanding these fundamentals will help you design better applications and make smarter decisions about data storage.

    What Makes a Database "Relational"?

    The term "relational" comes from the mathematical foundation of set theory. In a relational database, data is stored in tables where each row represents a single record and each column represents a specific attribute. These tables are related to each other through shared columns, creating a network of connections that reflect real-world relationships.

    Think of a relational database like a well-organized filing cabinet. Each drawer is a table, each folder is a row, and each document inside the folder contains specific information. The drawers are connected through common labels, so you can easily find related documents across different sections.

    The power of this structure becomes clear when you need to answer complex questions. Want to know which customers haven't placed an order in the last six months? With a properly designed relational database, you can join multiple tables to get that answer in milliseconds. Without that structure, you'd be stuck writing complex queries or importing data into spreadsheets.

    Core Concepts: Tables, Rows, and Columns

    Every relational database is built on three fundamental components: tables, rows, and columns.

    Tables are the primary storage structure. Each table represents a specific entity or concept—users, orders, products, or any other domain object. A well-designed database will have one table per major entity, keeping related data organized and manageable.

    Rows (also called records or tuples) contain the actual data for a single instance of an entity. If you have a users table, each row represents one user. The first row might contain user information for Alice, the second for Bob, and so on.

    Columns (also called fields or attributes) define what information is stored in each row. In a users table, you might have columns for id, name, email, created_at, and status. Each column has a specific data type—integer, text, date, boolean, or more complex types like JSON or arrays.

    Here's a simple example of a users table:

    idnameemailcreated_atstatus
    1Alice Johnsonalice@example.com2024-01-15active
    2Bob Smithbob@example.com2024-02-20active
    3Carol Whitecarol@example.com2024-03-10inactive

    This table structure makes it easy to query specific information. Want to find all active users? Just filter where status = 'active'. Need to count how many users registered this month? A simple query can give you that answer instantly.

    Primary Keys and Foreign Keys

    Two concepts are essential for understanding how relational databases maintain data integrity: primary keys and foreign keys.

    Primary keys uniquely identify each row in a table. They're like a social security number for your data—every row must have one, and no two rows can have the same primary key value. Primary keys are typically integers that auto-increment, but they can also be UUIDs or other unique identifiers.

    Foreign keys establish relationships between tables. A foreign key in one table points to the primary key of another table, creating a link between related data. For example, an orders table might have a foreign key column called user_id that references the id column in the users table.

    This relationship structure is what makes relational databases so powerful. When you insert a new order, you're not just creating a row in the orders table—you're also creating a relationship to an existing user. When you delete a user, the database can enforce rules about what happens to their orders (cascade delete, set to null, or prevent the deletion).

    Here's how foreign keys work in practice:

    -- Create users table with primary key
    CREATE TABLE users (
      id SERIAL PRIMARY KEY,
      name VARCHAR(100) NOT NULL,
      email VARCHAR(255) UNIQUE NOT NULL
    );
     
    -- Create orders table with foreign key
    CREATE TABLE orders (
      id SERIAL PRIMARY KEY,
      user_id INTEGER REFERENCES users(id) ON DELETE CASCADE,
      product_name VARCHAR(200) NOT NULL,
      quantity INTEGER NOT NULL,
      total_price DECIMAL(10, 2) NOT NULL
    );

    The REFERENCES users(id) clause creates the foreign key relationship. The ON DELETE CASCADE option means that when a user is deleted, all their orders will also be deleted automatically. This prevents orphaned records and maintains data consistency.

    Data Types and Constraints

    Relational databases enforce strict data types and constraints to ensure data quality and prevent errors.

    Data types define what kind of data can be stored in a column. Common types include:

    • Integer types (SMALLINT, INTEGER, BIGINT) for whole numbers
    • Numeric/Decimal types (DECIMAL, NUMERIC) for precise financial calculations
    • Character types (CHAR, VARCHAR, TEXT) for text data
    • Date and time types (DATE, TIME, TIMESTAMP) for temporal data
    • Boolean types (BOOLEAN) for yes/no values
    • JSON types (JSON, JSONB) for flexible, semi-structured data

    Choosing the right data type isn't just about storage efficiency—it's about data integrity. Storing a phone number as a VARCHAR(20) is fine, but storing it as an INTEGER might cause issues if you need to perform calculations or if the number exceeds the maximum integer value.

    Constraints enforce rules on your data:

    • NOT NULL ensures a column always has a value
    • UNIQUE prevents duplicate values in a column
    • CHECK validates that values meet specific conditions
    • DEFAULT provides a default value when none is specified
    • FOREIGN KEY enforces relationships between tables

    Here's an example with multiple constraints:

    CREATE TABLE products (
      id SERIAL PRIMARY KEY,
      name VARCHAR(200) NOT NULL,
      price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
      stock_quantity INTEGER NOT NULL DEFAULT 0,
      category VARCHAR(100),
      is_active BOOLEAN NOT NULL DEFAULT TRUE
    );

    The price column has a CHECK constraint to ensure it's never negative. The stock_quantity column defaults to 0 if no value is provided. The is_active column defaults to TRUE, making it easy to soft-delete products without actually removing them from the database.

    Normalization: Organizing Data for Efficiency

    Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It's one of the most important concepts in database design.

    First Normal Form (1NF) requires that each table cell contains a single value, each column contains atomic values, and each row is unique. This means no repeating groups or multi-valued attributes.

    Second Normal Form (2NF) builds on 1NF by ensuring that all non-key attributes are fully dependent on the primary key. If your primary key is composite (multiple columns), every non-key column must depend on the entire key, not just part of it.

    Third Normal Form (3NF) requires that there are no transitive dependencies—non-key columns should depend only on the primary key, not on other non-key columns.

    Here's an example of how normalization works:

    Before normalization (violates 1NF and 2NF):

    iduser_nameuser_emailorder_dateproduct_nameproduct_price
    1Alicealice@example.com2024-01-15Widget A10.00
    1Alicealice@example.com2024-01-15Widget B15.00
    2Bobbob@example.com2024-02-20Widget C20.00

    This structure has problems: user information is repeated for each order, and product information is duplicated. If Alice places 100 orders, her name and email will be stored 100 times.

    After normalization (3NF):

    users table:

    orders table:

    iduser_idorder_date
    112024-01-15
    212024-01-15
    322024-02-20

    products table:

    idnameprice
    1Widget A10.00
    2Widget B15.00
    3Widget C20.00

    order_items table:

    idorder_idproduct_idquantity
    1111
    2121
    3231

    Now each piece of information is stored exactly once. When Alice places more orders, we only add rows to the orders and order_items tables—no duplication of her user information.

    SQL: The Language of Relational Databases

    Structured Query Language (SQL) is the standard language for interacting with relational databases. It's powerful, declarative, and widely supported across different database systems.

    SELECT queries retrieve data from one or more tables:

    SELECT users.name, orders.order_date, products.name, products.price
    FROM users
    JOIN orders ON users.id = orders.user_id
    JOIN order_items ON orders.id = order_items.order_id
    JOIN products ON order_items.product_id = products.id
    WHERE users.status = 'active'
    ORDER BY orders.order_date DESC;

    This query joins four tables to get a comprehensive view of active users, their orders, and the products they purchased. The JOIN operation combines rows from multiple tables based on related columns.

    INSERT statements add new data:

    INSERT INTO users (name, email, created_at, status)
    VALUES ('David Lee', 'david@example.com', CURRENT_TIMESTAMP, 'active');

    UPDATE statements modify existing data:

    UPDATE users
    SET status = 'inactive'
    WHERE id = 3;

    DELETE statements remove data:

    DELETE FROM users
    WHERE id = 3;

    SQL is powerful because it's declarative—you specify what you want, not how to get it. The database engine optimizes the query execution, choosing the most efficient path based on indexes, statistics, and query patterns.

    Transactions and ACID Properties

    Relational databases guarantee data consistency through transactions and ACID properties.

    Transactions are sequences of one or more SQL statements treated as a single unit of work. Either all statements succeed, or none do. This is critical for maintaining data integrity in scenarios like financial transactions or inventory management.

    ACID properties ensure reliable transactions:

    • Atomicity: Either all operations in a transaction complete, or none do. If a failure occurs mid-transaction, the database rolls back to its previous state.
    • Consistency: Transactions move the database from one valid state to another, maintaining all constraints and rules.
    • Isolation: Concurrent transactions don't interfere with each other. Each transaction sees a consistent view of the database.
    • Durability: Once a transaction is committed, it persists even in the event of system failures.

    Here's an example of a transaction:

    BEGIN TRANSACTION;
     
    -- Deduct stock from inventory
    UPDATE products
    SET stock_quantity = stock_quantity - 1
    WHERE id = 1 AND stock_quantity > 0;
     
    -- Create order record
    INSERT INTO orders (user_id, order_date)
    VALUES (1, CURRENT_TIMESTAMP);
     
    -- Add order item
    INSERT INTO order_items (order_id, product_id, quantity)
    VALUES (LAST_INSERT_ID(), 1, 1);
     
    -- Commit if everything succeeded
    COMMIT;

    If any of these statements fail, the entire transaction rolls back, leaving the database in its original state. This prevents scenarios where inventory is deducted but no order is created, or where an order is created but stock isn't updated.

    Common Relational Database Systems

    Several relational database systems are widely used in production environments:

    PostgreSQL is an open-source, object-relational database known for its robustness, advanced features, and strong standards compliance. It supports complex queries, JSON data types, full-text search, and extensive extensions.

    MySQL is the world's most popular open-source database, known for its speed, reliability, and ease of use. It's a great choice for web applications and is supported by a large ecosystem of tools and services.

    Microsoft SQL Server is a powerful enterprise database with advanced analytics, business intelligence, and security features. It's tightly integrated with the Microsoft ecosystem.

    Oracle Database is a commercial database with enterprise-grade features, high availability, and scalability. It's commonly used in large organizations and mission-critical applications.

    Each database system has its strengths and trade-offs. PostgreSQL and MySQL are excellent choices for most applications, offering a good balance of features, performance, and community support. Oracle and SQL Server are better suited for enterprise environments with specific requirements.

    When to Use a Relational Database

    Relational databases excel in scenarios where data integrity, consistency, and complex relationships are critical:

    • E-commerce platforms with products, orders, customers, and inventory
    • Banking and financial systems where data accuracy is paramount
    • Content management systems with hierarchical content structures
    • Enterprise applications with complex business logic and data relationships
    • Systems requiring ACID compliance for financial transactions or inventory management

    They're particularly well-suited when you need to enforce data integrity constraints, perform complex joins across multiple tables, or maintain strict consistency guarantees.

    When to Consider Alternatives

    While relational databases are powerful, they're not always the right choice. Consider alternatives when:

    • Data volume is extremely high (petabytes or more) and you need columnar storage for analytics
    • Data is highly unstructured and doesn't fit neatly into tables
    • You need real-time analytics on massive datasets
    • Your application requires horizontal scaling beyond what relational databases can provide

    In these cases, you might consider NoSQL databases like MongoDB (document stores), Cassandra (wide-column stores), or specialized analytics databases like ClickHouse.

    Practical Example: Building a Simple E-Commerce Schema

    Let's put it all together with a practical example. Here's a complete e-commerce database schema:

    -- Users table
    CREATE TABLE users (
      id SERIAL PRIMARY KEY,
      name VARCHAR(100) NOT NULL,
      email VARCHAR(255) UNIQUE NOT NULL,
      password_hash VARCHAR(255) NOT NULL,
      phone VARCHAR(20),
      created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      status VARCHAR(20) DEFAULT 'active'
    );
     
    -- Categories table
    CREATE TABLE categories (
      id SERIAL PRIMARY KEY,
      name VARCHAR(100) NOT NULL,
      slug VARCHAR(100) UNIQUE NOT NULL,
      description TEXT,
      parent_id INTEGER REFERENCES categories(id) ON DELETE SET NULL,
      created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    );
     
    -- Products table
    CREATE TABLE products (
      id SERIAL PRIMARY KEY,
      category_id INTEGER REFERENCES categories(id) ON DELETE SET NULL,
      name VARCHAR(200) NOT NULL,
      slug VARCHAR(200) UNIQUE NOT NULL,
      description TEXT,
      price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
      stock_quantity INTEGER NOT NULL DEFAULT 0,
      is_active BOOLEAN NOT NULL DEFAULT TRUE,
      created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    );
     
    -- Orders table
    CREATE TABLE orders (
      id SERIAL PRIMARY KEY,
      user_id INTEGER REFERENCES users(id) ON DELETE CASCADE,
      order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      status VARCHAR(20) DEFAULT 'pending',
      total_amount DECIMAL(10, 2) NOT NULL,
      shipping_address JSONB,
      notes TEXT
    );
     
    -- Order items table
    CREATE TABLE order_items (
      id SERIAL PRIMARY KEY,
      order_id INTEGER REFERENCES orders(id) ON DELETE CASCADE,
      product_id INTEGER REFERENCES products(id) ON DELETE SET NULL,
      quantity INTEGER NOT NULL CHECK (quantity > 0),
      unit_price DECIMAL(10, 2) NOT NULL,
      total_price DECIMAL(10, 2) NOT NULL
    );
     
    -- Create indexes for performance
    CREATE INDEX idx_users_email ON users(email);
    CREATE INDEX idx_products_category ON products(category_id);
    CREATE INDEX idx_products_active ON products(is_active);
    CREATE INDEX idx_orders_user ON orders(user_id);
    CREATE INDEX idx_orders_date ON orders(order_date);
    CREATE INDEX idx_order_items_order ON order_items(order_id);

    This schema follows normalization principles, uses appropriate data types and constraints, and includes indexes for common query patterns. It supports the core e-commerce functionality: users browsing products, placing orders, and viewing order history.

    Indexing: Speeding Up Queries

    Indexes are data structures that improve query performance by allowing the database to quickly locate rows without scanning every row in a table.

    Without indexes, a query must scan every row in a table to find matching rows. This is called a full table scan and becomes slow as tables grow.

    Indexes work like the index in a book. Instead of reading every page to find a specific topic, you go to the index, find the page number, and go directly to that page.

    -- Create an index on the username column
    CREATE INDEX idx_users_username ON users(username);
     
    -- Query with index
    SELECT * FROM users WHERE username = 'john_doe';
    -- The database uses the index to quickly find the matching row
     
    -- Query without index
    SELECT * FROM users WHERE email LIKE '%@gmail.com';
    -- The database must scan every row to find matching emails

    Index Types

    • B-Tree indexes: The most common type, efficient for range queries and equality comparisons
    • Hash indexes: Only support equality comparisons, faster for exact matches but not for ranges
    • GiST/GIN indexes: For full-text search and geometric data
    • Partial indexes: Index only a subset of rows based on a WHERE clause

    When to Use Indexes

    Indexes improve read performance but slow down write operations because the database must update the index whenever data changes. Use indexes strategically:

    • Index columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses
    • Index foreign key columns
    • Don't index columns with low cardinality (few unique values, like gender or status)
    • Don't index columns that are rarely queried

    Query Optimization Techniques

    Writing efficient queries is just as important as designing good indexes. Here are some techniques to optimize your queries:

    Use Explicit Column Names

    -- Good: Explicit column names
    SELECT id, username, email FROM users WHERE id = 1;
     
    -- Bad: Implicit column selection
    SELECT * FROM users WHERE id = 1;

    Selecting only the columns you need reduces the amount of data the database must process and transfer.

    Avoid SELECT *

    Always specify the columns you need. This reduces network traffic and improves performance.

    Use WHERE Instead of HAVING for Filtering

    -- Good: Filter before grouping
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    WHERE created_at >= '2024-01-01'
    GROUP BY user_id
    HAVING COUNT(*) > 10;
     
    -- Bad: Filter after grouping (less efficient)
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    GROUP BY user_id
    HAVING COUNT(*) > 10 AND created_at >= '2024-01-01';

    Filtering in the WHERE clause reduces the number of rows before grouping, making the operation faster.

    Use JOIN Instead of Subqueries

    -- Good: JOIN is often more efficient
    SELECT u.username, o.order_id, o.total
    FROM users u
    JOIN orders o ON u.id = o.user_id
    WHERE u.id = 1;
     
    -- Bad: Subquery may be less efficient
    SELECT username, order_id, total
    FROM users
    WHERE id = (SELECT user_id FROM orders WHERE user_id = 1);

    JOINs are generally more efficient than subqueries because the database can optimize the query plan better.

    Use EXPLAIN to Analyze Queries

    EXPLAIN ANALYZE
    SELECT u.username, o.order_id, o.total
    FROM users u
    JOIN orders o ON u.id = o.user_id
    WHERE u.id = 1;

    The EXPLAIN command shows how the database plans to execute your query, including which indexes it will use and the estimated cost.

    Conclusion

    Relational databases provide a robust foundation for storing and managing structured data. Their table-based structure, strong data integrity guarantees, and powerful query capabilities make them ideal for applications where consistency and reliability are paramount.

    The key takeaways are:

    • Tables, rows, and columns form the basic structure of relational databases
    • Primary keys uniquely identify each record
    • Foreign keys establish relationships between tables
    • Data types and constraints ensure data quality
    • Normalization reduces redundancy and improves efficiency
    • SQL provides a powerful, declarative interface for data manipulation
    • ACID properties guarantee transaction reliability
    • Proper schema design is critical for performance and maintainability

    As you build applications, take time to design your database schema thoughtfully. The effort you invest in understanding relational database fundamentals will pay dividends in the form of more maintainable, reliable, and performant applications.

    Platforms like ServerlessBase make it easy to deploy and manage relational databases, handling the infrastructure so you can focus on building great applications. With managed database services, you get automatic backups, scaling, and security updates without the operational overhead of managing databases yourself.

    Leave comment