🧵 L5: Process Alternative — Threads

Lecture Goal

Understand why threads were invented as a lighter alternative to processes, how they differ from processes in resource sharing, and master the POSIX thread (pthread) API.

The Big Picture

Threads are the dominant concurrency mechanism in modern computing. They solve the two fatal flaws of processes: expensive creation and hard communication. But shared memory is a double-edged sword — it enables easy communication but causes race conditions!


1. Why Threads? (Motivation)

🧠 The Core Problem

Processes are expensive to create and manage for two main reasons:

flowchart TD
    Problem["Process Model Problems"]
    Problem --> P1["High Creation Cost<br/>fork() duplicates entire memory"]
    Problem --> P2["Hard IPC<br/>Processes have independent memory"]
    P1 --> Solution["Threads!"]
    P2 --> Solution

    style Problem fill:#ffcc99
    style Solution fill:#99ff99
ProblemDescription
High creation costfork() duplicates the entire memory space and process context — wasteful when you just want multiple tasks within the same program
Hard IPCProcesses have independent memory spaces; passing data requires special mechanisms (pipes, shared memory, message queues)

🍱 The Cooking Analogy

Imagine preparing lunch: steam rice 🍚, fry fish 🐟, cook soup 🍲.

Single-threaded approach: One cook does all three tasks sequentially:

flowchart LR
    A["steamRice()"] --> B["fryFish()"] --> C["cookSoup()"] --> D["Lunch Ready!"]

    style A fill:#ffcc99
    style B fill:#ffcc99
    style C fill:#ffcc99
    style D fill:#99ff99

Total time = sum of all three task durations.

Multi-threaded approach: Three threads run concurrently within the same process:

flowchart TD
    subgraph Threads
        T1["Thread 1: steamRice()"]
        T2["Thread 2: fryFish()"]
        T3["Thread 3: cookSoup()"]
    end
    T1 --> Wait["Wait for all"]
    T2 --> Wait
    T3 --> Wait
    Wait --> Done["Lunch Ready!"]

    style T1 fill:#99ff99
    style T2 fill:#99ff99
    style T3 fill:#99ff99
    style Done fill:#99ff99

Total time ≈ duration of the longest task (not the sum!).


⚠️ Common Pitfalls

Pitfall 1: Concurrency ≠ Parallelism

Concurrent threads can interleave on a single CPU (they take turns). True parallelism requires multiple CPUs running threads simultaneously.

Pitfall 2: fork() is NOT threading

Using fork() creates a separate process with its own memory copy — threads within the same process share memory directly.


❓ Mock Exam Questions

Q1: Process Model Problems

Name two reasons why the process model (using fork()) is considered expensive.

Answer

Answer:

  1. High creation cost — fork() duplicates entire memory space
  2. Hard inter-process communication — processes have independent memory spaces, requiring IPC mechanisms

Q2: Multi-threaded Speedup

In the cooking analogy, why is the multi-threaded version faster than the single-threaded version?

Answer

Answer: Multi-threaded runs tasks concurrently, so total time ≈ longest task duration. Single-threaded runs sequentially, so total time = sum of all durations.

Q3: fork() Communication

Why does fork() make it hard for processes to communicate?

Answer

Answer: fork() creates an independent copy of the parent’s memory. Changes in the child don’t affect the parent. Communicating requires explicit IPC (pipes, shared memory, message queues).


2. What is a Thread?

🧠 Core Concept

A thread is a unit of execution within a process. Traditionally, a process has a single thread of control — only one instruction executes at any point in time.

flowchart TD
    subgraph Process["Process"]
        Shared["Shared: Code, Data, Heap, Files"]
        subgraph Threads
            T1["Thread 1<br/>PC, Registers, Stack"]
            T2["Thread 2<br/>PC, Registers, Stack"]
            T3["Thread 3<br/>PC, Registers, Stack"]
        end
    end
    Shared --> Threads

    style Shared fill:#ffcc99
    style Threads fill:#99ff99

Each thread has its own Program Counter (PC) that tracks where it is in the code.


⚠️ Common Pitfalls

Pitfall 1: Thread ≠ Process

Multiple threads live inside one process and share its address space.

Pitfall 2: "At the same time" is conceptual

On a single-core CPU, threads take turns very quickly, giving the illusion of parallelism.


❓ Mock Exam Questions

Q4: Thread of Control

What does “thread of control” mean? What hardware register tracks it?

Answer

Answer: “Thread of control” means the sequence of instructions being executed. The Program Counter (PC) register tracks it.

Q5: Program Counters

If a process has 3 threads, how many Program Counters exist?

Answer

Answer: 3 Each thread has its own PC to track its execution position.


3. Process vs Thread

🧠 What Threads Share vs. Own

ResourceShared Between Threads?
Code (Text Segment)✅ Yes
Global Data (Data Segment)✅ Yes
Heap (dynamic memory)✅ Yes
Open Files / Process ID✅ Yes (OS Context)
Registers (GPR, PC, SP, FP)❌ No — each thread has its own
Stack❌ No — each thread has its own
Thread ID❌ No — unique to each thread

Why Each Thread Needs Its Own Stack

Each thread calls functions independently, so it needs its own call stack to track local variables and return addresses.


🔄 Context Switch Comparison

flowchart LR
    subgraph ProcessSwitch["Process Context Switch (Heavy)"]
        PS1["Save OS Context"]
        PS2["Save Hardware Context"]
        PS3["Switch Page Tables<br/>← Expensive!"]
        PS4["Load new PCB"]
    end

    subgraph ThreadSwitch["Thread Context Switch (Light)"]
        TS1["Save Hardware Context"]
        TS2["Change SP/FP<br/>← Just registers!"]
    end

    style ProcessSwitch fill:#ff9999
    style ThreadSwitch fill:#99ff99
ComponentProcess SwitchThread Switch (same process)
General Purpose Registers✅ Save/restore✅ Save/restore
Program Counter (PC)✅ Save/restore✅ Save/restore
Stack Pointer (SP)✅ Save/restore✅ Save/restore
Page Table✅ Switch — expensive!❌ Not needed
OS Context (PCB, files)✅ Load new PCB❌ Not needed
TLB Flush✅ Usually required❌ Not required

This is why threads are called lightweight processes.


📊 Benefits of Threads

BenefitExplanation
EconomyLess resources needed than managing multiple processes
Resource SharingThreads share memory — no IPC overhead
ResponsivenessUI thread stays responsive while worker threads compute
ScalabilityCan utilize multiple CPU cores simultaneously

⚠️ Common Pitfalls

Pitfall 1: Shared memory is dangerous!

Threads can read/write each other’s data, enabling easy communication but causing race conditions if not synchronized.

Pitfall 2: Stack vs. Heap confusion

Local variables (on stack) are private to each thread. Global variables and heap allocations are shared by all threads.


❓ Mock Exam Questions

Q6: Shared vs. Unique Resources

List three things threads share, and two things unique to each thread.

Answer

Answer:

  • Shared: Code, global data, heap, open files, process ID
  • Unique: Registers (PC, SP, FP), stack, thread ID

Q7: Context Switch Cost

Why is a thread context switch cheaper than a process context switch?

Answer

Answer: Thread switching avoids expensive memory context switch (no page table swap, no TLB flush). Only hardware context (registers) needs saving/restoring.

Q8: Race Condition

Thread A and Thread B each increment counter++ 1000 times. Should the final value be 2000?

Answer

Answer: NO counter++ is not atomic (LOAD → ADD → STORE). If both threads read the same value before either writes, one increment is lost. Requires mutex locks to fix.


4. Thread Models: User vs Kernel Threads

🧠 Two Implementation Approaches

flowchart TD
    Model["Thread Models"]
    Model --> User["User Threads"]
    Model --> Kernel["Kernel Threads"]

    User --> U1["Library-managed<br/>OS unaware"]
    Kernel --> K1["OS-managed<br/>System calls"]

    style User fill:#ffcc99
    style Kernel fill:#99ff99

👤 User Threads

Implemented entirely as a user-space library. The OS kernel is completely unaware that multiple threads exist — it only sees one process.

AdvantageDisadvantage
Works on any OSOS schedules at process level — one thread blocks → all block
Thread ops are library calls — very fastCannot exploit multiple CPUs
Highly configurable scheduling


🔧 Kernel Threads

Threads are implemented inside the OS. Thread operations are system calls.

AdvantageDisadvantage
Kernel schedules per-threadEvery operation is a system call — slower
One thread blocking does not block othersLess flexible — one-size-fits-all
True multi-core parallelism

⚠️ Common Pitfalls

Pitfall 1: User threads aren't "bad"

They’re faster for thread operations and great for programs that don’t need true parallelism. But they fail hard if any thread makes a blocking system call.

Pitfall 2: Kernel threads are heavier

But essential when you need real multi-core parallelism.


❓ Mock Exam Questions

Q9: User Thread Blocking

In the user thread model, what happens if one thread makes a blocking system call?

Answer

Answer: The OS sees the entire process as blocked. All other threads in that process are unable to run, even if ready.

Q10: Kernel Thread Advantage

Why can kernel threads exploit multiple CPUs but user threads cannot?

Answer

Answer: The OS is aware of each kernel thread and can schedule them independently on different CPUs. User threads are invisible to the OS, which only sees one schedulable unit (the process).

Q11: Web Server Scenario

A web server uses user threads for 100 clients. One thread blocks on a database query. What happens?

Answer

Answer: All 99 other client threads are blocked too — the web server becomes unresponsive until the database query returns.


5. Hybrid Thread Model

🧠 Best of Both Worlds

The hybrid model supports both user threads and kernel threads:

flowchart TD
    UT["User-level threads (many)"]
    LWP["Light-weight Processes / LWPs"]
    KT["Kernel-level threads"]
    CPU["Physical CPUs"]

    UT <-->|"many-to-many"| LWP
    LWP <-->|"1-to-1"| KT
    KT <--> CPU

    style UT fill:#ffcc99
    style LWP fill:#99ff99
    style KT fill:#99ff99

Benefits:

  • Flexibility — tune how many kernel threads a process gets
  • Concurrency control — limit parallelism per process/user
  • Efficiency — cheap user-thread operations within kernel thread context

🔧 Hardware-Level Threading (SMT)

Modern CPUs use Simultaneous Multi-Threading (SMT), aka Hyperthreading:

FeatureDescription
Multiple register setsOne physical core has multiple logical cores
Shared execution unitsThreads share ALU, cache, etc.
True parallelismNo software context switch needed

SMT ≠ Multiple Cores

Hyperthreading makes one physical core look like two logical cores, but they share execution units — performance gains are workload-dependent.


❓ Mock Exam Questions

Q12: Hybrid Model Advantage

How does the hybrid model avoid the main weakness of pure user threads?

Answer

Answer: By binding user threads to kernel threads, when one user thread blocks, the kernel can switch to another kernel thread (with its bound user threads), allowing other threads to continue.

Q13: SMT vs. Software Threading

What is the key difference between SMT and software-level multi-threading?

Answer

Answer: SMT is hardware-level — multiple threads run truly simultaneously on the same physical core with no context switch overhead. Software threading requires the OS to save/restore context.


6. POSIX Threads (pthread)

🛠️ The Standard API

pthreads is the standard thread API for Unix/Linux, defined by POSIX.

#include <pthread.h>
// Compile with:
gcc myprogram.c -lpthread

Key data types:

TypePurpose
pthread_tThread ID
pthread_attr_tThread attributes (priority, stack size, etc.)

🚀 Thread Creation: pthread_create

int pthread_create(
    pthread_t* tidCreated,           // Output: TID of new thread
    const pthread_attr_t* threadAttr, // Thread attributes (NULL = defaults)
    void* (*startRoutine)(void*),    // Function for thread to execute
    void* argForStartRoutine         // Argument passed to that function
);
  • Returns 0 on success, non-zero on error
  • New thread begins executing startRoutine(arg) immediately
  • Caller and new thread run concurrently!

🛑 Thread Termination: pthread_exit

int pthread_exit(void* exitValue);
  • exitValue is captured by pthread_join
  • If not called explicitly, thread terminates when startRoutine returns

🔗 Thread Synchronization: pthread_join

int pthread_join(pthread_t threadID, void** status);
  • Blocks caller until target thread terminates
  • status receives the exit value
  • Analogous to waitpid() for processes
flowchart LR
    Main["Main Thread"] --> Create["pthread_create()"]
    Create --> Child["Child Thread Runs"]
    Child --> Exit["pthread_exit()"]
    Main --> Join["pthread_join()"]
    Join --> Continue["Continue"]

    style Join fill:#ffcc99
    style Exit fill:#99ff99

📖 Code Examples

Example 1 — Basic Creation

void* sayHello(void* arg) {
    printf("Just to say hello!\n");
    pthread_exit(NULL);
}
 
int main() {
    pthread_t tid;
    pthread_create(&tid, NULL, sayHello, NULL);
    printf("Thread created with tid %i\n", tid);
    return 0;  // ⚠️ PROBLEM: may exit before child prints!
}

Bug: No pthread_join

main() might exit before sayHello prints! The main thread doesn’t wait.

Fix:

pthread_join(tid, NULL);  // Wait for child to finish
return 0;

Example 2 — Shared Memory Race Condition

int globalVar;  // Shared between ALL threads
 
void* doSum(void* arg) {
    for (int i = 0; i < 1000; i++)
        globalVar++;  // All 5 threads modify the SAME globalVar
}
 
int main() {
    pthread_t tid[5];
    for (int i = 0; i < 5; i++)
        pthread_create(&tid[i], NULL, doSum, NULL);
 
    // ⚠️ PROBLEM: prints before threads finish!
    printf("Global variable is %i\n", globalVar);
    return 0;
}

Two problems:

  1. No pthread_join → main prints before threads finish
  2. globalVar++ is not atomic → race condition

Fix:

for (int i = 0; i < 5; i++)
    pthread_join(tid[i], NULL);
 
printf("Global variable is %i\n", globalVar);
// Note: Race condition on globalVar++ still exists!
// Requires mutex locks (covered later)

⚠️ Common Pitfalls

Pitfall 1: Forgetting pthread_join

Main thread may exit and kill all child threads before they finish.

Pitfall 2: pthread_exit in main vs return

return 0 in main may terminate all threads. pthread_exit(NULL) lets other threads finish.

Pitfall 3: globalVar++ is NOT atomic

It compiles to LOAD → ADD → STORE. Another thread can interrupt between steps.


❓ Mock Exam Questions

Q14: pthread_join Purpose

What is the role of pthread_join? What happens if you forget it?

Answer

Answer: It blocks the calling thread until the target thread terminates. Forgetting it means the main thread may exit before child threads finish, killing them prematurely.

Q15: pthread_exit in main

Why does pthread_exit(NULL) in main() behave differently from return 0?

Answer

Answer: return 0 exits the process, terminating all threads. pthread_exit(NULL) terminates only the main thread, allowing other threads to continue running.

Q16: Expected vs. Actual Result

5 threads each increment globalVar 1000 times. What’s expected? What’s actual?

Answer

Answer:

  • Expected: 5000
  • Actual: Unpredictable (less than 5000) due to race condition on non-atomic increment.

7. Summary

📊 Key Takeaways

ConceptKey Point
Why threads?Solve expensive process creation and hard IPC
Thread vs ProcessThreads share memory; each has own PC, registers, stack
Context switchThread switch is cheap (no page table swap)
User threadsFast but can’t use multiple CPUs; one blocks → all block
Kernel threadsSlower syscalls but true parallelism
pthread APIpthread_create, pthread_join, pthread_exit
Race conditionsShared variables need synchronization (mutex)

🔗 Connections

📚 Lecture🔗 Connection
L6Process Synchronization — how to fix race conditions with mutex locks and semaphores

The Key Insight

Threads give you concurrency with cheap context switching and easy shared memory. But with great power comes great responsibility — shared mutable state requires synchronization!