📚 L11: File System Implementations

Lecture Goal

Understand how file systems are implemented at the disk level, including block allocation schemes, free space management, and directory structures.

The Big Picture

A file system bridges the logical view (files and folders) with the physical reality (disk blocks). The key challenge is mapping files to disk blocks efficiently — balancing speed, space utilization, and flexibility. Different allocation schemes (contiguous, linked, FAT, indexed) make different trade-offs, and understanding these is essential for systems programming.

1. Disk Organization

🤔 The Layered View

A disk is viewed differently at each layer:

flowchart TB
    subgraph User["User View"]
        U["Files & Folders"]
    end

    subgraph OS["OS (File System) View"]
        F["Flat array of logical blocks"]
    end

    subgraph HW["Hardware View"]
        H["Physical sectors on tracks/platters"]
    end

    User -->|"abstraction"| OS
    OS -->|"mapping"| HW

    style User fill:#e1f5fe
    style OS fill:#c8e6c9
    style HW fill:#fff3e0

The disk is organized as a 1-D array of logical blocks, where the block is the smallest accessible unit (typically 512 bytes to 4 KB).

📀 Disk Layout

flowchart LR
    subgraph Disk["Physical Disk"]
        MBR["MBR<br/>(sector 0)"]
        P1["Partition 1"]
        P2["Partition 2"]
        PN["Partition N"]
    end

    MBR --> P1
    P1 --> P2
    P2 --> PN

    style MBR fill:#ff9999
    style P1 fill:#c8e6c9
    style P2 fill:#c8e6c9
    style PN fill:#c8e6c9

Each partition contains:

flowchart LR
    subgraph Partition["Partition Contents"]
        Boot["OS Boot Block"]
        Details["Partition Details"]
        Dir["Directory Structure"]
        Info["Files Info"]
        Data["File Data"]
    end

    style Boot fill:#fff9c4
    style Details fill:#e1f5fe
    style Dir fill:#e8f5e9
    style Info fill:#e8f5e9
    style Data fill:#e3f2fd

Region	Purpose
MBR (sector 0)	Master Boot Record; partition table + boot code
OS Boot Block	Instructions to boot the OS on this partition
Partition Details	Total blocks, number/location of free blocks
Directory Structure	Maps file names to their information
Files Info	Metadata and block locations for each file
File Data	The actual file contents

📚 Key Terminology

Term	Definition
Logical Block	OS abstraction; smallest addressable unit in the FS
Physical Sector	Hardware-level storage unit on disk
MBR	Master Boot Record; first sector with partition table
Partition	Logical subdivision of a disk with its own FS

⚠️ Common Pitfalls

Pitfall 1: Confusing logical blocks with physical sectors

Logical blocks are an OS abstraction; physical sectors are hardware reality. The mapping between them is hardware-dependent.

Pitfall 2: MBR is global

The MBR is global to the disk, while each partition has its own independent file system.

Pitfall 3: Internal fragmentation

The last block of a file may be partially filled, wasting space (internal fragmentation).

❓ Mock Exam Questions

Q1: Logical vs Physical

What is the difference between a logical block and a disk sector?

Answer

Answer: Logical blocks are OS abstractions; disk sectors are hardware units

A logical block is how the OS views storage (e.g., 4 KB blocks). A disk sector is the physical hardware unit (e.g., 512 bytes). Multiple sectors may map to one logical block.

Q2: Partition Contents

What information is stored in the Partition Details region?

Answer

Answer: Total blocks, number and location of free blocks

Partition Details contains metadata about the partition itself, not file contents.

2. File Block Allocation Schemes

🎯 The Core Problem

Since a file is logically a collection of blocks, the OS must decide how to allocate disk blocks to files.

2a. Contiguous Allocation

flowchart LR
    subgraph Directory["Directory Entry"]
        Name["file name"]
        Start["start block"]
        Len["length"]
    end

    subgraph Disk["Disk Blocks"]
        B0["Block 0"]
        B1["Block 1"]
        B2["Block 2"]
        B3["Block 3"]
        B4["Block 4"]
    end

    Start --> B0
    Len -->|"2 blocks"| B1

    style B0 fill:#c8e6c9
    style B1 fill:#c8e6c9

Every file occupies consecutive disk blocks. Directory entry stores start block and length.

Pros	Cons
Simple implementation	External fragmentation — free space scattered
Fast sequential and random access	File size must be declared in advance

2b. Linked List Allocation

flowchart LR
    subgraph Dir["Directory Entry"]
        First["first block"]
        Last["last block"]
    end

    subgraph Chain["Block Chain"]
        B9["Block 9<br/>+ptr→16"] --> B16["Block 16<br/>+ptr→1"]
        B16 --> B1["Block 1<br/>+ptr→10"]
        B1 --> B10["Block 10<br/>+ptr→25"]
        B10 --> B25["Block 25<br/>-1 (END)"]
    end

    First --> B9
    Last --> B25

    style B9 fill:#c8e6c9
    style B16 fill:#c8e6c9
    style B1 fill:#c8e6c9
    style B10 fill:#c8e6c9
    style B25 fill:#c8e6c9

Each block stores file data + pointer to next block. Directory stores first and last block numbers.

Pros	Cons
No external fragmentation	Slow random access — must follow N pointers
File can grow dynamically	Pointer wastes part of each block
	Single corrupted pointer breaks chain

2c. FAT (File Allocation Table)

flowchart TB
    subgraph Memory["Memory"]
        FAT["FAT Table<br/>All pointers in RAM"]
    end

    subgraph Disk["Disk"]
        B9["Block 9"]
        B16["Block 16"]
        B1["Block 1"]
        B10["Block 10"]
        B25["Block 25"]
    end

    FAT -->|"9→16→1→10→25→-1"| Disk

    style Memory fill:#fff9c4
    style Disk fill:#e1f5fe

All pointers moved to an in-memory table. FAT[i] holds the index of the next block after block i. -1 marks the end.

FAT Example: File "jeep" starts at block 9
FAT index | value (next block)
----------|-------------------
    1     |   10
    9     |   16
   10     |   25
   16     |    1
   25     |   -1   ← end of file
Block chain: 9 → 16 → 1 → 10 → 25 → END

Pros	Cons
Fast random access (traversal in memory)	FAT must track every disk block
No external fragmentation	Large memory consumption for big disks

2d. Indexed Allocation

flowchart LR
    subgraph Dir["Directory Entry"]
        Ptr["→ Index Block"]
    end

    subgraph IndexBlock["Index Block 19"]
        I0["[0]→9"]
        I1["[1]→16"]
        I2["[2]→1"]
        I3["[3]→10"]
        I4["[4]→25"]
        I5["[5]-1"]
    end

    subgraph Data["Data Blocks"]
        D9["Block 9"]
        D16["Block 16"]
        D1["Block 1"]
    end

    Ptr --> IndexBlock
    I0 --> D9
    I1 --> D16
    I2 --> D1

    style IndexBlock fill:#fff9c4
    style Data fill:#c8e6c9

Each file has a dedicated index block storing an array of data block addresses.

Pros	Cons
Fast direct access: jump to `IndexBlock[N]`	Max file size limited by index block capacity
Only index blocks of open files in memory	Index block overhead even for small files

2e. Indexed Allocation Variations (Large Files)

flowchart TB
    subgraph Inode["Unix Inode Structure"]
        D["Direct Pointers<br/>(12 entries)"]
        SI["Single Indirect<br/>(1 entry)"]
        DI["Double Indirect<br/>(1 entry)"]
        TI["Triple Indirect<br/>(1 entry)"]
    end

    D -->|"direct"| Data["Data Blocks"]
    SI -->|"index block"| SI_Blocks["Index Blocks"]
    DI -->|"index of indexes"| DI_Blocks["Double Level"]
    TI -->|"3 levels"| TI_Blocks["Triple Level"]

    style D fill:#c8e6c9
    style SI fill:#fff9c4
    style DI fill:#ffccbc
    style TI fill:#ff9999

Scheme	How It Works
Linked index blocks	Index blocks form a linked list
Multilevel index	Index block points to other index blocks
Combined (Unix inode)	Mix of direct, single, double, triple indirect pointers

Unix inode structure:

Direct block pointers → point directly to data blocks (fast for small files)
Single indirect → points to a block of direct pointers
Double indirect → points to a block of single-indirect pointers
Triple indirect → one more level deeper

📊 Allocation Scheme Comparison

Property	Contiguous	Linked List	FAT	Indexed
Sequential access	✅ Fast	✅ OK	✅ OK	✅ OK
Random access	✅ Fast	❌ Slow	✅ Fast	✅ Fast
External fragmentation	❌ Yes	✅ No	✅ No	✅ No
Memory usage	✅ Low	✅ Low	❌ High	✅ Low
File size limit	❌ Fixed upfront	✅ Flexible	✅ Flexible	❌ Limited

📚 Key Terminology

Term	Definition
Contiguous Allocation	File occupies consecutive disk blocks
External Fragmentation	Free space scattered across disk
FAT	File Allocation Table; in-memory table of block pointers
Index Block	Block storing array of data block addresses
Inode	Unix data structure with direct/indirect block pointers

⚠️ Common Pitfalls

Pitfall 1: FAT vs Linked List random access

In linked list, random access to block N requires N disk reads. In FAT, traversal is in memory — only 1 disk read for the final block.

Pitfall 2: Indexed allocation file size limit

Max file size = (block size / pointer size) × block size. E.g., 4 KB block with 4-byte pointers = 1024 pointers → max ~4 MB with single-level index.

Pitfall 3: Unix inode efficiency

The direct blocks in Unix inodes let small files (the common case!) be accessed extremely quickly without traversing indirect pointers.

❓ Mock Exam Questions

Q1: Random Access Speed

Why is random access slow in plain linked list allocation but faster with FAT?

Answer

Answer: FAT keeps all pointers in memory

In linked list, following pointers requires disk reads. In FAT, the chain is traversed entirely in memory — only the final data block needs a disk read.

Q2: Sentinel Value

What is the purpose of the -1 sentinel value in FAT and indexed allocation?

Answer

Answer: Marks the end of the file’s block chain

-1 indicates no more blocks follow — the file ends at this block.

Q3: Inode Design

Why does the Unix inode use a combination of direct, single-indirect, double-indirect, and triple-indirect pointers?

Answer

Answer: Optimizes for common case (small files) while supporting large files

Direct pointers give instant access for small files. Indirect pointers scale to handle very large files when needed.

3. Free Space Management

🎯 The Problem

The OS needs to track which disk blocks are free for allocation.

3a. Bitmap (Bit Vector)

flowchart LR
    subgraph Bitmap["Bitmap (32 bits)"]
        B["0 1 0 1 1 1 0 0 1 0 1 1 ..."]
    end

    Occupied["0, 2, 6, 7, 9..."]
    Free["1, 3, 4, 5, 8, 10, 11..."]

    Bitmap -->|"0 = occupied"| Occupied
    Bitmap -->|"1 = free"| Free

    style Bitmap fill:#fff9c4

Every disk block is represented by 1 bit: 1 = free, 0 = occupied.

Size calculation: $Bitmap size (bits) = Total number of disk blocks$ $Bitmap size (bytes) = ⌈ \frac{Total disk blocks}{8} ⌉$

Pros	Cons
Efficient bit operations (AND, OR, scan)	Must be kept in memory for performance
Easy to find contiguous free blocks	Large for big disks

3b. Linked List of Free Blocks

flowchart LR
    subgraph Memory["Memory"]
        Head["Head Pointer → Block 5"]
    end

    subgraph Disk["Disk Free Blocks"]
        B5["Block 5<br/>[8,12,19,22]<br/>→ Block 30"]
        B30["Block 30<br/>[35,40...]<br/>→ Block 50"]
        B50["Block 50<br/>..."]
    end

    Head --> B5
    B5 --> B30
    B30 --> B50

    style Memory fill:#fff9c4

Free blocks are chained. Each free block stores several free block numbers + pointer to next group.

Pros	Cons
Only head pointer needs to be in memory	High overhead if many free block lookups needed
Easy to allocate (take from head)	Fetching next group requires disk I/O

📚 Key Terminology

Term	Definition
Bitmap	Array of bits, one per disk block, tracking free/used status
Free Block List	Linked structure tracking free disk blocks

⚠️ Common Pitfalls

Pitfall 1: Forgetting to update free space

When a file is deleted, you must add its blocks to the free space structure. Skipping this causes leaked (permanently lost) disk blocks.

Pitfall 2: Bitmap persistence

The bitmap must be written back to disk periodically. A crash before this causes free space inconsistency.

Pitfall 3: Linked list I/O cost

Only the first group is in memory. Fetching the next group requires a disk read — which is slow.

❓ Mock Exam Questions

Q1: Bitmap Size

A 1 TB disk uses 4 KB blocks. How large is the bitmap?

Answer

Answer: 32 MB

Total blocks = $2^{40} / 2^{12} = 2^{28}$ blocks. Bitmap size = $2^{28} /8 = 2^{25}$ bytes = 32 MB.

Q2: Free Space Update

What operation does the OS perform on the free space list when a file is deleted?

Answer

Answer: Add the file’s blocks to the free space structure

All blocks previously used by the file are marked as free in the bitmap or added to the free list.

4. Directory Structure Implementation

🎯 Directory Purpose

A directory has two jobs:

Track the files it contains (with metadata)
Map file names to their file information

4a. Linear List

flowchart LR
    subgraph Dir["Directory (Linear List)"]
        E1["file1.txt → info"]
        E2["file2.txt → info"]
        E3["..."]
        E4["fileN.txt → info"]
    end

    Search["Lookup: O(N)"] --> Dir

    style Dir fill:#e1f5fe

Directory stored as sequential list. Search requires linear scan — O(N).

Pros	Cons
Simple implementation	Slow for large directories

4b. Hash Table

flowchart TB
    subgraph Hash["Hash Table Directory"]
        S0["Slot 0"]
        S1["Slot 1 → fileA, fileX (chain)"]
        S2["Slot 2 → fileB"]
        S3["Slot 3"]
    end

    Lookup["Hash(filename) → index"] --> Hash

    style Hash fill:#c8e6c9

Hash the file name → index K. Check HashTable[K] for matching entry.

Pros	Cons
O(1) average lookup	Fixed maximum size
	Depends on hash function quality

4c. Storing File Information

Approach	What’s in the Entry
All-in-one	File name + all metadata + disk block info
Pointer-based	File name + pointer to separate structure (e.g., inode)

Pointer-based (Unix) is more flexible — multiple directory entries can point to the same inode (hard links).

📚 Key Terminology

Term	Definition
Linear List Directory	Sequential list of directory entries
Hash Table Directory	Hash-indexed directory for O(1) lookup

⚠️ Common Pitfalls

Pitfall 1: Directory doesn't contain file data

A directory stores metadata and pointers, not actual file contents.

Pitfall 2: Name duplicate check

When creating a file, the OS must first check for name duplicates in the parent directory.

Pitfall 3: Hash collisions

Two filenames may hash to the same slot. Chaining (linked list at each slot) handles this.

❓ Mock Exam Questions

Q1: Hash vs Linear

Why is a hash table preferred over a linear list for large directories?

Answer

Answer: O(1) average lookup vs O(N) scan

Hash tables provide constant-time lookup regardless of directory size, while linear lists require scanning all entries.

Q2: Pointer-based Entries

What is the advantage of storing only a pointer to file information in the directory entry?

Answer

Answer: Enables hard links and flexible metadata management

Multiple directory entries can point to the same file structure (inode), allowing hard links.

5. File System in Action

🔄 Runtime Data Structures

flowchart TB
    subgraph Process["Process P"]
        FD["File Descriptor Table<br/>(per-process)"]
    end

    subgraph OS["OS Kernel"]
        Open["System-Wide Open File Table"]
        Cache["Directory Cache"]
        Buffer["Disk Block Buffer Cache"]
    end

    subgraph Disk["Disk"]
        DirData["Directory Data"]
        FileData["File Data"]
    end

    FD -->|"fd → entry"| Open
    Open -->|"file info"| FileData
    Cache --> DirData
    Buffer --> FileData

    style Process fill:#e1f5fe
    style OS fill:#c8e6c9
    style Disk fill:#fff3e0

Structure	Scope	Purpose
Per-process FD table	Per process	Maps fd numbers to open file table entries
System-wide open file table	Global	Stores file info, offset, for all open files
Directory cache	Global	Caches recently accessed directory entries
Buffer cache	Global	Caches recently read/written disk blocks

5a. File Create

flowchart TD
    Start["create(/path/to/parent/F)"] --> Traverse["Traverse path to parent"]
    Traverse --> Check["Search parent for F"]
    Check -->|"Found"| Error["Error: duplicate name"]
    Check -->|"Not found"| FindFree["Find free disk blocks"]
    FindFree --> AddEntry["Add entry to parent directory"]
    AddEntry --> Done["File created"]

    style Error fill:#ff9999
    style Done fill:#c8e6c9

5b. File Open

flowchart TD
    Start["open(/path/to/F)"] --> Traverse["Traverse path to locate F"]
    Traverse -->|"Not found"| Error["Error"]
    Traverse -->|"Found"| Load["Load F's info into open file table"]
    Load --> AddFD["Add entry to process FD table"]
    AddFD --> Return["Return fd to process"]

    style Error fill:#ff9999
    style Return fill:#c8e6c9

📚 Key Terminology

Term	Definition
File Descriptor Table	Per-process table mapping fd numbers to open files
Open File Table	System-wide table tracking all open files
Buffer Cache	In-memory cache of disk blocks

⚠️ Common Pitfalls

Pitfall 1: FD numbers are per-process

Different processes can have the same fd number pointing to different files. FD is just an index into the process’s own table.

Pitfall 2: Open file table is global

Two processes opening the same file get separate entries (independent offsets), unless sharing via fork().

Pitfall 3: Disk vs Memory structures

Inode is on disk. It’s read into the open file table at open() time. Don’t confuse the two.

❓ Mock Exam Questions

Q1: Independent Opens

Two processes both open the same file independently. Do they share a file offset?

Answer

Answer: No — each has independent offset

Independent open() calls create separate entries in the system-wide open file table with separate offsets.

Q2: FD Scope

What does a file descriptor number refer to?

Answer

Answer: Index into the process’s own file descriptor table

FD numbers are meaningful only within a process. Different processes can have the same FD number for different files.

6. Disk I/O Scheduling

⏱️ Disk Access Time

$T_{total} = T_{seek} + T_{rotational latency} + T_{transfer}$

Stage	Description	Typical Range
Seek time	Move head to correct track	2–10 ms
Rotational latency	Wait for sector under head	2–6.25 ms
Transfer time	Read/write data	Very fast (µs)

Seek and rotational latency dominate — the OS schedules requests to minimize head movement.

6a. FCFS (First Come First Served)

Requests served in arrival order. Simple but can cause large head movements.

FCFS Example

Requests: [13, 14, 2, 18, 17, 21, 15], starting at track 13 Path: 13→14→2→18→17→21→15 Total movement = 1+12+16+1+4+6 = 40 tracks

6b. SSF (Shortest Seek First)

Always serve the request closest to current head position.

SSF Example

Requests: [13, 14, 2, 18, 17, 21, 15], starting at track 13 Order: 13→14→15→17→18→21→2 Total movement = 1+1+2+1+3+19 = 27 tracks

Pros	Cons
Minimizes total head movement	Can starve far-away requests

📚 Key Terminology

Term	Definition
Seek Time	Time to move disk head to correct track
Rotational Latency	Time for correct sector to rotate under head
FCFS	First Come First Served scheduling
SSF	Shortest Seek First scheduling

⚠️ Common Pitfalls

Pitfall 1: FCFS fairness

FCFS is fair but inefficient — requests served in order regardless of head position.

Pitfall 2: SSF starvation

SSF can starve requests far from the busy area if nearby requests keep arriving.

Pitfall 3: Transfer time

Transfer time is negligible compared to seek + latency for small reads. Don’t overweight it.

❓ Mock Exam Questions

Q1: Seek vs Transfer

Why does seek time dominate transfer time in magnetic disk performance?

Answer

Answer: Mechanical movement is slow; electronic transfer is fast

Moving the disk head (mechanical) takes milliseconds. Transferring data (electronic) takes microseconds — 1000× faster.

Q2: SSF Disadvantage

What is the main disadvantage of SSF compared to FCFS?

Answer

Answer: Starvation of far-away requests

If requests near the current head position keep arriving, far-away requests may never be served.

7. Practice Problems

📝 Problem 1: Indexed Allocation

A disk uses indexed allocation with 512-byte blocks and 4-byte block addresses.

(a) How many pointers fit in one index block?

(b) What is the maximum file size with single-level index?

(c) What is the maximum file size with double-indirect pointer?

Solution

(a) Pointers per index block: $\frac{512 bytes}{4 bytes/pointer} = 128 pointers$

(b) Max file size (single-level): $128 \times 512 B = 65, 536 bytes = 64 KB$

(c) Max file size (double-indirect):

First level: 128 index blocks
Each second-level: 128 data blocks $128 \times 128 \times 512 B = 8, 388, 608 bytes = 8 MB$

📝 Problem 2: FAT Traversal

File “alpha” starts at block 3. The FAT contains:

Index	Value
3	7
7	12
12	20
20	-1

(a) List all blocks belonging to “alpha” in order.

(b) How many disk reads to access the 3rd block with plain linked list?

(c) How many disk reads with FAT (FAT in memory)?

Solution

(a) Block chain: 3 → 7 → 12 → 20

(b) Plain linked list: Must read blocks 3, 7, then 12. Answer: 3 disk reads

(c) FAT in memory: Chain traversal is free. Only need to read block 12. Answer: 1 disk read

📝 Problem 3: Disk Scheduling

Disk head at track 10. Requests: [3, 20, 5, 18, 8, 25, 12]

(a) Calculate total head movement using FCFS.

(b) Calculate total head movement using SSF.

Solution

(a) FCFS: Serve in arrival order.

10 → 3 → 20 → 5 → 18 → 8 → 25 → 12
    7  + 17 + 15 + 13 + 10 + 17 + 13 = 92 tracks

Answer: 92 tracks

(b) SSF: Always pick closest.

10 → 12 (2) → 8 (4) → 5 (3) → 3 (2) → 18 (15) → 20 (2) → 25 (5)
Total: 2+4+3+2+15+2+5 = 33 tracks

Answer: 33 tracks

SSF is better: 33 vs 92 tracks.

🔗 Connections

📚 Lecture	🔗 Connection
L10	File system implementation builds on file concepts from L10
L9	Virtual memory paging concepts apply to file buffer cache
L7	Memory allocation concepts parallel disk block allocation

The Key Insight

A file system is all about mapping — mapping logical files to physical blocks, mapping names to metadata, mapping free blocks to allocation requests. The genius of different allocation schemes is in their trade-offs: contiguous for speed, linked for flexibility, indexed for balance. The Unix inode’s multi-level design is particularly elegant — optimizing for the common case (small files) while scaling to handle the uncommon case (huge files). 💾

Quartz 4

Explorer

L11

📚 L11: File System Implementations

1. Disk Organization

🤔 The Layered View

📀 Disk Layout

📚 Key Terminology

⚠️ Common Pitfalls

❓ Mock Exam Questions

2. File Block Allocation Schemes

🎯 The Core Problem

2a. Contiguous Allocation

2b. Linked List Allocation

2c. FAT (File Allocation Table)

2d. Indexed Allocation

2e. Indexed Allocation Variations (Large Files)

📊 Allocation Scheme Comparison

📚 Key Terminology

⚠️ Common Pitfalls

❓ Mock Exam Questions

3. Free Space Management

🎯 The Problem

3a. Bitmap (Bit Vector)

3b. Linked List of Free Blocks

📚 Key Terminology

⚠️ Common Pitfalls

❓ Mock Exam Questions

4. Directory Structure Implementation

🎯 Directory Purpose

4a. Linear List

4b. Hash Table

4c. Storing File Information

📚 Key Terminology

⚠️ Common Pitfalls

❓ Mock Exam Questions

5. File System in Action

🔄 Runtime Data Structures

5a. File Create

5b. File Open

📚 Key Terminology

⚠️ Common Pitfalls

❓ Mock Exam Questions

6. Disk I/O Scheduling

⏱️ Disk Access Time

6a. FCFS (First Come First Served)

6b. SSF (Shortest Seek First)

📚 Key Terminology

⚠️ Common Pitfalls

❓ Mock Exam Questions

7. Practice Problems

📝 Problem 1: Indexed Allocation

📝 Problem 2: FAT Traversal

📝 Problem 3: Disk Scheduling

🔗 Connections

Graph View

Table of Contents