πŸ“ L10: File System Management

Lecture Goal

Understand how the file system provides persistent storage abstraction, how files are organized and accessed, and the mechanisms for directory structures and file sharing.

The Big Picture

The file system is the interface between users and persistent storage. It abstracts away the complexity of raw disk I/O, providing a logical view of data as files organized in directories. The FS must be self-contained, persistent, and efficient β€” ensuring data survives reboots, works across machines, and minimizes wasted space.


1. File System: Motivation & Criteria

πŸ€” The Core Problem

Physical memory (RAM) is volatile β€” it loses all data when power is cut. We need a way to store information permanently on external storage (e.g., hard disks, SSDs).

A File System (FS) is an abstraction layer that sits between your programs and the raw physical storage hardware. It provides a standardized way to store, retrieve, and manage data persistently.

graph TB
    subgraph "User Space"
        A[Application Programs]
    end

    subgraph "Kernel Space"
        B[File System Layer]
        C[Device Drivers]
    end

    subgraph "Hardware"
        D[HDD / SSD]
        E[RAM - Volatile]
    end

    A -->|"read/write calls"| B
    B -->|"block I/O"| C
    C -->|"commands"| D
    A -.->|"memory access"| E

    style B fill:#e1f5fe
    style D fill:#fff3e0
    style E fill:#ffebee

βœ… Three General Criteria

A well-designed file system must satisfy:

CriterionMeaning
Self-ContainedThe storage medium carries enough information to fully describe itself β€” plug it into any machine and it works
PersistentData survives beyond OS reboots and process termination
EfficientMinimizes wasted space and overhead for bookkeeping; manages free vs. used space well

Additional FS responsibilities:

  • Abstraction over hardware specifics (you don’t care if it’s HDD or SSD)
  • Resource management (tracking disk space usage)
  • Protection (preventing unauthorized access)
  • Sharing (allowing multiple processes/users to access files)

πŸ“š Key Terminology

TermDefinition
File System (FS)OS component that manages persistent storage
Self-ContainedFS metadata is stored on the media itself, not in the OS
Persistent StorageStorage that retains data without power (disk, SSD, tape)
Volatile StorageStorage that loses data without power (RAM)

⚠️ Common Pitfalls

Pitfall 1: Confusing "persistent" with "backed up"

Persistence means data survives power-off. It does not protect against deletion, corruption, or hardware failure. Backups are separate.

Pitfall 2: Thinking FS info is in the OS

The FS being β€œself-contained” means the media itself holds the structural info. This is why a USB drive works on any computer with the same FS type.


❓ Mock Exam Questions

Q1: FS Criteria

Which of the following is NOT a general criterion of a file system?

Answer

Answer: Real-Time

The three general criteria are Self-Contained, Persistent, and Efficient. File I/O times are variable and generally not guaranteed to be real-time.

Q2: Self-Contained Meaning

What does it mean for a file system to be β€œself-contained”?

Answer

Answer: The storage medium carries enough metadata to fully describe itself

A self-contained FS can be plugged into any compatible machine and work without needing external configuration.


2. Memory Management vs File Management

πŸ“Š The Comparison

These two systems both manage storage, but are fundamentally different:

flowchart LR
    subgraph MM["Memory Management"]
        direction TB
        MM1["RAM (Volatile)"]
        MM2["O(1) Constant Access"]
        MM3["Byte-level Addressing"]
        MM4["Implicit - OS handles"]
    end

    subgraph FM["File System Management"]
        direction TB
        FM1["Disk (Persistent)"]
        FM2["Variable I/O Time"]
        FM3["Sector Addressing"]
        FM4["Explicit - Programmer calls"]
    end

    CPU[CPU] --> MM
    MM -.->|"Power Off"| X[❌ Data Lost]
    FM -->|"Power Off"| OK[βœ“ Data Saved]
PropertyMemory ManagementFile System Management
Underlying StorageRAMDisk
Access SpeedConstant Variable disk I/O time
Unit of AddressingPhysical memory address (byte-level)Disk sector
UsageAddress space for running processes (implicit)Non-volatile data (explicit access)
OrganizationPaging / Segmentation (HW & OS)Many formats: ext4, FAT32, HFS+

Key Formula Insight

Paging:

Fixed Records:


πŸ“š Key Terminology

TermDefinition
Implicit AccessOS handles memory management automatically when process runs
Explicit AccessProgrammer must call open(), read(), write() for file access
SectorMinimum disk I/O unit (typically 512 bytes or 4 KiB)

⚠️ Common Pitfalls

Pitfall 1: Confusing access models

Memory management is implicit β€” the OS handles it when a process runs. File access is explicit β€” your program must call specific functions.

Pitfall 2: Mixing up addressing units

RAM uses byte-level addresses; disks use sectors (typically 512 bytes or 4 KiB). The unit of addressing differs.


3. File: Basic Description & Metadata

πŸ“„ What is a File?

A file is a logical unit of information created by a process. It is an Abstract Data Type (ADT) β€” it defines operations while hiding implementation details.

Every file has two components:

graph LR
    subgraph File["File System Object"]
        direction TB
        subgraph Meta["Metadata (inode)"]
            M1["Name: report.pdf"]
            M2["Identifier: inode #1042"]
            M3["Type: Regular"]
            M4["Size: 2.5 MB"]
            M5["Protection: rwx------"]
            M6["Timestamps"]
            M7["Data Pointers"]
        end
        subgraph Data["Actual Data"]
            D1["Byte 0"]
            D2["..."]
            D3["Byte N"]
        end
        Meta --> Data
    end

    style Meta fill:#e8f5e9
    style Data fill:#e3f2fd

πŸ“‹ Key Metadata Fields

FieldDescription
NameHuman-readable label (e.g., report.pdf)
IdentifierInternal unique ID used by the FS (inode number)
Typee.g., executable, text, directory
SizeIn bytes, words, or blocks
ProtectionRead/Write/Execute permissions per user class
Time/Date/OwnerCreation time, last modified, owner ID
Table of ContentsInfo the FS needs to locate actual data on disk

πŸ“ File Naming Rules (vary by FS)

  • Maximum length of name
  • Case sensitivity (Linux: File.txt β‰  file.txt; Windows: same)
  • Allowed special characters
  • Extension rules (e.g., .txt, .exe)

πŸ“š Key Terminology

TermDefinition
FileLogical unit of information; an ADT for persistent storage
Metadata”Data about data” β€” attributes describing a file
InodeUnix data structure holding file metadata (except name)
File IdentifierUnique internal number used by the FS

⚠️ Common Pitfalls

Pitfall 1: Confusing name with identifier

The name is for humans; the identifier (inode) is for the FS. Changing the name does NOT change the identifier.

Pitfall 2: Metadata is not file data

Metadata describes the file. It’s stored separately from the file’s actual content.

Pitfall 3: Extension changing file type (Windows)

On Windows, changing the file extension tells the OS to treat the file as a different type β€” even if the actual content hasn’t changed!


❓ Mock Exam Questions

Q1: Inode vs Name

What happens to a file’s inode number when you rename the file?

Answer

Answer: It stays the same

The inode is the internal identifier; renaming only changes the human-readable name in the directory entry.


4. File Types & Protection

πŸ“‚ File Types

flowchart TB
    subgraph Types["File Types"]
        Reg["πŸ“„ Regular Files"]
        Dir["πŸ“ Directories"]
        Spec["βš™οΈ Special Files"]
    end

    subgraph Regular["Regular Files"]
        ASCII["ASCII (text)<br/>Human-readable"]
        Binary["Binary<br/>Machine-specific"]
    end

    Reg --> Regular

    style Reg fill:#c8e6c9
    style Dir fill:#e1f5fe
    style Spec fill:#fff3e0
TypeDescriptionExamples
Regular filesUser data.txt, .mp3, .jpg, .exe
DirectoriesFS structure filesFolders
Special filesCharacter/block device files/dev/sda in Unix

πŸ” How OS Knows File Type

flowchart LR
    subgraph Extension["File Extension (Windows)"]
        E1["report.docx"]
        E2["β†’ Word opens it"]
    end

    subgraph Magic["Magic Number (Unix)"]
        M1["πŸ“„ File bytes"]
        M2["%PDF (bytes 0-3)"]
        M3["β†’ PDF Reader"]
        M1 --> M2 --> M3
    end

    style Extension fill:#fff3e0
    style Magic fill:#e8f5e9
  • File Extension (Windows): report.docx β†’ Word document. ⚠️ Renaming extension changes how OS treats it!
  • Magic Number (Unix): Special bytes at file beginning identify type. Example: PDF files start with %PDF.

πŸ” Unix Permission Model

Users are classified into 3 classes:

graph LR
    subgraph Classes["User Classes"]
        direction TB
        O["πŸ‘€ Owner"]
        G["πŸ‘₯ Group"]
        U["🌍 Universe (Other)"]
    end

    subgraph Bits["Permission Bits (9 bits total)"]
        direction TB
        OB["rwx"]
        GB["rwx"]
        UB["rwx"]
    end

    O -->|"first 3 bits"| OB
    G -->|"next 3 bits"| GB
    U -->|"last 3 bits"| UB

    style O fill:#e8f5e9
    style G fill:#e3f2fd
    style U fill:#fff3e0
Access TypeMeaning
ReadRetrieve file content
WriteModify file content
ExecuteLoad into memory and run
AppendAdd to the end
DeleteRemove from FS
ListRead file metadata

ls -l Output

-rw------- 1 axgopala axgopala 14 Mar 13 19:00 test

Owner: Read+Write, Group: no access, Universe: no access

πŸ“‹ Access Control List (ACL)

  • Minimal ACL: Same as the 9 permission bits above
  • Extended ACL: Adds named individual users or groups (granular control)

πŸ“š Key Terminology

TermDefinition
Magic NumberSpecial bytes at file start identifying type (Unix)
File ExtensionSuffix indicating file type (Windows)
ACLAccess Control List β€” fine-grained permission specification

⚠️ Common Pitfalls

Pitfall 1: Magic numbers beat extensions (Unix)

Unix determines type from the file’s internal bytes, not the name. You can name a shell script .txt and it will still be identified correctly.

Pitfall 2: Execute bit on directories

The execute bit on a directory means β€œcan traverse/enter this directory” β€” NOT run it as a program!

Pitfall 3: Permission checking order

Permissions are checked in order: Owner β†’ Group β†’ Universe. If you’re the owner, only owner bits apply, even if universe has more permissions.


❓ Mock Exam Questions

Q1: Execute on Directory

What does execute permission mean on a directory?

Answer

Answer: Can traverse/enter the directory

Execute on a directory allows you to pass through it, not run it as a program.

Q2: Permission Checking

If owner has r-- and universe has rwx, what can the owner do?

Answer

Answer: Read only

Permission is checked in order (Owner β†’ Group β†’ Universe). As owner, only owner permissions apply.


5. File Data: Structure & Access Methods

πŸ“¦ File Data Structures

StructureDescriptionPros/Cons
Array of bytesEach byte has unique offset from file startSimple, flexible; used by Unix/Windows
Fixed-length recordsArray of equal-size recordsFast random access via formula
Variable-length recordsRecords differ in sizeFlexible, but hard to locate specific record

For fixed-length records:

🎯 File Access Methods

flowchart TB
    subgraph Sequential["Sequential Access 🎞️"]
        direction LR
        S1[Byte 0] --> S2[Byte 1] --> S3[Byte 2] --> S4["..."]
        S4 --> SN[Byte N]
    end

    subgraph Random["Random Access 🎲"]
        direction LR
        R1["Seek to any offset"]
        R2["Read/Write at position"]
        R1 --> R2
    end

    subgraph Direct["Direct Access πŸ“Ό"]
        direction TB
        D1["Record N"]
        D2["Offset = (N-1) Γ— Size"]
        D3["Seek(Offset) + Read()"]
        D1 --> D2 --> D3
    end
MethodDescriptionUse Case
SequentialRead in order from beginningPlaying a podcast, processing logs
RandomRead at any position via seekJumping to any song in playlist
DirectJump to fixed-length record NDatabase accessing employee record #5000

πŸ“š Key Terminology

TermDefinition
Sequential AccessMust read data in order
Random AccessCan read at any position
Direct AccessRandom access for fixed-length records using formula
File PointerCurrent position in file for next read/write

⚠️ Common Pitfalls

Pitfall 1: Sequential β‰  Slow

Sequential access means you must access data in order β€” it’s a constraint, not a speed characteristic.

Pitfall 2: Direct access needs fixed records

Direct access requires fixed-length records. Variable-length records can’t use the offset formula.

Pitfall 3: Seek doesn't read/write

The seek() operation only moves the file pointer. Many students forget this.


❓ Mock Exam Questions

Q1: Direct Access Requirement

What is required for direct access to work?

Answer

Answer: Fixed-length records

Only with fixed-length records can you compute the byte offset using a formula.

Q2: Seek Operation

What does seek() do?

Answer

Answer: Moves the file pointer to a specified position

seek() does NOT read or write β€” it only repositions for subsequent operations.


6. File Operations as System Calls

πŸ”„ System Call Flow

sequenceDiagram
    participant App as Application
    participant OS as OS Kernel
    participant Disk as Disk Storage

    App->>OS: open("file.txt", mode)
    OS->>Disk: locate file metadata
    Disk-->>OS: file info
    OS-->>App: file descriptor (fd)

    App->>OS: read(fd, buffer, size)
    OS->>Disk: read blocks
    Disk-->>OS: data
    OS-->>App: bytes read

    App->>OS: write(fd, buffer, size)
    OS->>Disk: write blocks
    Disk-->>OS: confirmation
    OS-->>App: bytes written

    App->>OS: close(fd)
    OS->>Disk: flush buffers
    OS-->>App: success

πŸ“‹ Basic File Operations

OperationDescription
CreateAllocates space, creates metadata entry
OpenPrepares file for operations; returns file descriptor
ReadReads data from current file pointer position
WriteWrites data at current file pointer position
Seek (Reposition)Moves file pointer; no actual I/O
TruncateRemoves data from position to end of file
CloseReleases file descriptor and data structures

πŸ—‚οΈ Open-File Table Architecture

flowchart TB
    subgraph PerProcess["Per-Process Open-File Table"]
        subgraph PA["Process A"]
            FD_A0["fd 0"]
            FD_A1["fd 1"]
        end
        subgraph PB["Process B"]
            FD_B0["fd 0"]
            FD_B1["fd 1"]
        end
    end

    subgraph SystemWide["System-Wide Open-File Table"]
        Entry_X["Entry X: File1.abc\nOp Type: READ, Offset: 1234"]
        Entry_Y["Entry Y: File2.def\nOp Type: WRITE, Offset: 5678"]
    end

    FD_A0 --> Entry_X
    FD_A1 --> Entry_Y
    FD_B1 --> Entry_X

    style PerProcess fill:#fff3e0
    style SystemWide fill:#e8f5e9

When a file is opened, the OS maintains:

  • File Pointer: Current read/write position
  • Disk Location: Where actual data lives
  • Open Count: Number of processes with file open

πŸ”— File Sharing Cases

flowchart TB
    subgraph Case1["Case 1: Independent Opens"]
        direction TB
        P1A["Process A"]
        P1B["Process B"]
        E1A["Entry A\nOffset: 100"]
        E1B["Entry B\nOffset: 50"]
        F1["πŸ“„ File X"]

        P1A -->|"open()"| E1A
        P1B -->|"open()"| E1B
        E1A --> F1
        E1B --> F1
    end

    subgraph Case2["Case 2: Shared FD (after fork)"]
        direction TB
        P2A["Parent Process"]
        P2B["Child Process"]
        E2["Entry\nOffset: 100"]
        F2["πŸ“„ File Y"]

        P2A -->|"inherit"| E2
        P2B -->|"inherit"| E2
        E2 --> F2
    end

    style Case1 fill:#e8f5e9
    style Case2 fill:#e3f2fd
CaseDescriptionOffset Sharing
Case 1Two independent open() callsEach has own offset
Case 2Shared FD (e.g., after fork())Single shared offset

πŸ“š Key Terminology

TermDefinition
File DescriptorInteger returned by open(); used for subsequent operations
Open-File TableOS data structure tracking open files
lseek()Unix system call to move file pointer

⚠️ Common Pitfalls

Pitfall 1: Forgetting to close

Always close() when done. Leaking file descriptors can exhaust the OS limit on open files per process.

Pitfall 2: read() returning less than n

read() returning fewer bytes doesn’t mean error β€” it often means end-of-file was reached.

Pitfall 3: write() extends file

write() can extend a file beyond its current size β€” it’s not capped at existing length.

Pitfall 4: Fork sharing

After fork(), parent and child share file offset. I/O by one moves the pointer for both!


❓ Mock Exam Questions

Q1: Independent Opens

In Unix, P1 and P2 both open the same file independently. P1 reads 100 bytes. What happens to P2’s pointer?

Answer

Answer: P2’s pointer is unaffected

Independent open() calls create separate entries with independent offsets (Case 1).

Q2: Default File Descriptors

What are the default file descriptors 0, 1, and 2?

Answer

Answer: 0 = STDIN, 1 = STDOUT, 2 = STDERR

These are opened by convention for every process.


7. Directory Structures

πŸ“ Directory Purpose

A directory serves two purposes:

  1. Logical grouping of files (user’s perspective)
  2. File tracking β€” mapping names to metadata (system’s perspective)

πŸ—οΈ Directory Structure Types

1. Single-Level Directory

graph LR
    Root["πŸ“ Root Directory"]
    F1[πŸ“„ file1]
    F2[πŸ“„ file2]
    F3[πŸ“„ file3]
    F4[πŸ“„ file4]

    Root --> F1
    Root --> F2
    Root --> F3
    Root --> F4

    style Root fill:#ffecb3

All files in one flat directory. Simple but problematic: name conflicts across users.

2. Tree-Structured Directory 🌲

graph TB
    Root["πŸ“ /"]
    Dir1["πŸ“ dir1"]
    Dir2["πŸ“ dir2"]
    F1[πŸ“„ file1]
    F2[πŸ“„ file2]
    F3[πŸ“„ file3]

    Root --> Dir1
    Root --> Dir2
    Dir1 --> F1
    Dir1 --> F2
    Dir2 --> F3

    style Root fill:#c8e6c9
    style Dir1 fill:#e1f5fe
    style Dir2 fill:#e1f5fe

Directories can contain subdirectories forming a hierarchy.

  • Absolute pathname: Full path from root / β†’ dir1 β†’ file1
  • Relative pathname: Path from Current Working Directory (CWD)

3. DAG (Directed Acyclic Graph) πŸ”—

graph TB
    DirA["πŸ“ Dir A"]
    DirB["πŸ“ Dir B"]
    FileF["πŸ“„ File F\n(actual data)"]
    LinkA["πŸ”— Link in A"]
    LinkB["πŸ”— Link in B"]

    DirA --> LinkA
    DirB --> LinkB
    LinkA --> FileF
    LinkB --> FileF

    style FileF fill:#ffebee
    style LinkA fill:#e8f5e9
    style LinkB fill:#e8f5e9

A file can appear in multiple directories (shared) with only one copy of actual data.

4. General Graph πŸ”„

graph TB
    DirA["πŸ“ Dir A"]
    DirB["πŸ“ Dir B"]
    DirC["πŸ“ Dir C"]
    FileF["πŸ“„ File F"]

    DirA --> DirB
    DirB --> DirC
    DirC -->|"❌ Cycle!"| DirA
    DirA --> FileF

    style DirA fill:#ffcdd2
    style DirB fill:#ffcdd2
    style DirC fill:#ffcdd2

Allows cycles but is not desirable: infinite traversal loops, hard to know when files can be deleted.

flowchart TB
    subgraph HardLink["Hard Link Scenario"]
        direction TB
        HL_DirA["πŸ“ Dir A"]
        HL_DirB["πŸ“ Dir B"]
        HL_Inode["πŸ’Ύ Inode #1042\n(actual data)"]
        HL_EntryA["Entry: F.txt"]
        HL_EntryB["Entry: F.txt"]

        HL_DirA --> HL_EntryA
        HL_DirB --> HL_EntryB
        HL_EntryA -->|"direct pointer"| HL_Inode
        HL_EntryB -->|"direct pointer"| HL_Inode
    end

    subgraph SymLink["Symbolic Link Scenario"]
        direction TB
        SL_DirA["πŸ“ Dir A"]
        SL_DirB["πŸ“ Dir B"]
        SL_File["πŸ’Ύ File F.txt"]
        SL_Link["πŸ“„ Link G.txt\ncontains: /A/F.txt"]

        SL_DirA --> SL_File
        SL_DirB --> SL_Link
        SL_Link -.->|"path reference"| SL_File
    end

    style HardLink fill:#e8f5e9
    style SymLink fill:#e3f2fd

First, a bit of background πŸ“

In most filesystems (like Linux’s ext4), every file has two parts:

  • The inode β€” the actual data + metadata (permissions, size, timestamps) stored on disk
  • The filename/directory entry β€” just a human-readable name that points to an inode

Think of an inode as a locker πŸ—„οΈ, and the filename as a label stuck on the locker door.


A hard link creates another label on the same locker.

ln original.txt hardlink.txt
  • Both original.txt and hardlink.txt point to the exact same inode
  • The file’s data lives on disk once, but has two names
  • If you delete original.txt, the data is still safe β€” because hardlink.txt still points to the same inode
  • The inode keeps a reference count β€” data is only truly deleted when that count hits 0

🧠 Think of it like two keys to the same house. Losing one key doesn’t destroy the house!

Limitations ⚠️:

  • Cannot cross filesystem boundaries (both names must be on the same disk/partition)
  • Cannot link to directories (to avoid circular loops)

A symlink is a special file that contains a path string pointing to another file.

ln -s original.txt symlink.txt
  • symlink.txt gets its own inode, but its content is just the string "original.txt"
  • It’s like a road sign πŸͺ§ pointing you somewhere else
  • If you delete original.txt, the symlink becomes a broken/dangling link πŸ’”

🧠 Think of it like a sticky note that says β€œthe file is over there β†’β€œ. If the file moves, the note is useless!

Advantages βœ…:

  • Can cross filesystem boundaries
  • Can point to directories
  • Can point to non-existent files (dangling links)
PropertyHard LinkSymbolic Link
What it isDirect pointer to inodeFile containing path to target
Works forFiles onlyFiles AND directories
If target deletedData persists (other links valid)Link becomes dangling
Cross filesystem?❌ Noβœ… Yes
Unix commandlnln -s

πŸ“š Key Terminology

TermDefinition
Hard LinkDirectory entry pointing directly to inode
Symbolic LinkSpecial file containing path to target
Dangling SymlinkSymbolic link pointing to non-existent file
Link CountNumber of hard links to an inode

⚠️ Common Pitfalls

Pitfall 1: Hard links can't cross filesystems

Hard links point to inode numbers, which are unique only within a filesystem. Symbolic links store paths, so they work across filesystems.

Pitfall 2: Hard links to directories

Generally not allowed (except . and ..) because they create cycles too easily.

Pitfall 3: Forgetting link count

File data is only deleted when link count reaches 0. With hard links, deleting one link doesn’t delete the data.

Pitfall 4: Counting hard-linked files

When traversing directories, track visited inodes to avoid counting hard-linked files multiple times.


❓ Mock Exam Questions

Q1: Hard Link After Deletion

If a hard-linked file’s original entry is deleted, what happens to the hard link?

Answer

Answer: Hard link still works; data persists

Hard links point directly to the inode. As long as link count > 0, data remains.

Q2: Symbolic Link After Deletion

If a symbolic link’s target is deleted, what happens?

Answer

Answer: Link becomes dangling (broken)

The symlink file still exists but points to a non-existent path.

Q3: Cross-Filesystem Links

Why can’t hard links cross filesystems?

Answer

Answer: Inode numbers are unique only within a filesystem

Different filesystems have independent inode numbering, so inode #1042 on filesystem A is unrelated to #1042 on filesystem B.


8. Practice Problems

πŸ“ Problem 1: Record Offset Calculation

A file uses fixed-length records of size 256 bytes. A program needs to read records 1, 5, and 10 (1-indexed).

(a) Calculate the byte offset for each record.

(b) Write lseek() and read() calls (pseudocode) to read each record.

(c) What problem arises with variable-length records?

Solution

(a) Byte Offsets:

Using :

Record NCalculationOffset
10 bytes
51024 bytes
102304 bytes

(b) Pseudocode:

int fd = open("myfile.dat", O_RDONLY);
char buf[256];
 
lseek(fd, 0, SEEK_SET);
read(fd, buf, 256);    // Record 1
 
lseek(fd, 1024, SEEK_SET);
read(fd, buf, 256);    // Record 5
 
lseek(fd, 2304, SEEK_SET);
read(fd, buf, 256);    // Record 10
 
close(fd);

(c) Variable-Length Problem:

No formula exists to compute byte offset directly. You must sequentially scan through records 1-9 to find where record 10 begins. Direct access is impossible without additional indexing.


πŸ“ Problem 2: Directory Sharing

  • Alice owns /home/alice/project/data.csv
  • Bob wants to share access from /home/bob/shared/

(a) How would Bob create a hard link? Effect on link count?

(b) Alice deletes her file. What happens to Bob’s hard link?

(c) Instead, Bob creates a symbolic link. Alice deletes her file. What happens?

(d) Why can’t Bob create a hard link if Alice’s home is on a different filesystem?

Solution

(a) Hard Link:

ln /home/alice/project/data.csv /home/bob/shared/data.csv

Link count increments from 1 to 2.

(b) Alice Deletes (Hard Link):

  • Alice’s directory entry removed
  • Link count decrements to 1
  • File data persists β€” Bob’s link still works
  • Data only deleted when link count = 0

(c) Symbolic Link Scenario:

  • Bob’s symlink is a separate file containing the path /home/alice/project/data.csv
  • When Alice deletes, inode and data are removed
  • Bob’s symlink becomes a dangling symlink
  • Accessing it returns β€œNo such file or directory”

(d) Different Filesystem:

Hard links point to inode numbers, which are unique only within a filesystem. Different filesystems have independent inode numbering. The OS cannot resolve which inode on which device the link refers to. Symbolic links work because they store full paths that the OS can resolve.


πŸ”— Connections

πŸ“š LectureπŸ”— Connection
L6File systems use locks/semaphores for concurrent access control
L7Memory management provides the abstraction that file systems build upon
L9Virtual memory page replacement algorithms inform file buffer cache policies

The Key Insight

A file system is the contract between users and persistent storage. It provides a logical view (files in directories) over physical reality (blocks on disk). The genius is making this abstraction self-contained (plug-and-play), persistent (survives reboots), and efficient (minimal overhead). When you understand that a file is just metadata pointing to data, and a directory is just a special file listing names β†’ inodes, the whole system clicks into place. πŸ’Ύ