Back
How Operating Systems Work · Part 3

Privilege Rings: How the CPU Enforces the Boundary

The separation between user space and kernel space is not a software convention — it is enforced by the CPU in hardware. Here is the mechanism that makes it work.

When a program tries to execute a privileged instruction — one that directly accesses hardware, modifies memory mappings, or changes interrupt handling — the CPU does not ignore it. It does not log a warning. It faults immediately, before the instruction completes, and hands control to the kernel.

This is not software policy. It is hardware enforcement. The mechanism is called privilege rings.


What Rings Are

The x86 architecture defines four privilege levels, numbered 0 through 3. These are called rings, from the way they are traditionally drawn: ring 0 at the centre with full access, ring 3 on the outside with the most restrictions, rings 1 and 2 in between.

Each ring defines what the CPU will and will not allow code running at that level to do. The lower the number, the more privileged.

Linux uses only two: ring 0 for the kernel, ring 3 for every user process. Rings 1 and 2 were intended for device drivers and OS services in older architectures, but the Linux design placed drivers in the kernel at ring 0. Rings 1 and 2 are unused on Linux.

The x86-64 architecture — the 64-bit extension used on all modern desktops, laptops, and servers — maintains this ring model unchanged. The rings are a fundamental part of the instruction set architecture, present in hardware regardless of what operating system runs on top.


Ring 0: Kernel Mode

Code running at ring 0 has unrestricted access to the hardware. It can execute any instruction the CPU supports. It can read and write any physical memory address. It can configure the interrupt controller, modify page tables, halt the CPU, load and store from I/O ports, and change the privilege level itself.

The kernel runs at ring 0. When a system call crosses from user space into the kernel, the CPU switches to ring 0 before executing any kernel code. When an interrupt fires and the CPU jumps to a kernel handler, it is running at ring 0. The kernel's ability to manage hardware, enforce memory isolation, and control scheduling depends entirely on having ring 0 access that user processes do not.

A bug in ring 0 code is severe precisely because there is no layer below it to catch it. A kernel bug that corrupts memory or executes an invalid instruction does not produce a process crash — it produces a kernel panic, because nothing is in a position to handle the error.


Ring 3: User Mode

Every user process runs at ring 3. At this level the CPU enforces restrictions that ring 0 code faces none of.

A ring 3 process cannot execute privileged instructions. Attempting to do so causes a general protection fault — a #GP exception raised by the CPU. The kernel's exception handler receives it and sends SIGSEGV to the offending process, terminating it. The process does not get to decide what happens next; the CPU has already interrupted execution and handed control to the kernel before the instruction completed.

A ring 3 process cannot access memory outside its own address space. The CPU's memory management unit checks every memory access against the current process's page tables. An access to an address not mapped in those tables, or mapped with insufficient permissions, causes a page fault. The kernel handles it — either by loading the page if it was validly mapped but not present, or by terminating the process if the access was illegal.

A ring 3 process cannot change its own privilege level directly. There is no instruction a user process can execute to promote itself to ring 0. The only legitimate path from ring 3 to ring 0 is through the mechanism the CPU and kernel define together: the system call.


How the CPU Tracks the Current Ring

The CPU knows which ring it is currently executing in through the Current Privilege Level — CPL. This is a 2-bit field stored in the lower two bits of the CS register (Code Segment register).

When the CPU is executing kernel code, CS holds a value with CPL=0. When executing user code, CS holds a value with CPL=3. The CPU checks the CPL before executing any privileged instruction and before servicing any memory access that requires a privilege check. The check is not a software lookup — it is a hardware gate built into the execution pipeline.

You cannot read the CPL directly from a user process, because the CS register's privilege bits are not accessible to ring 3 code in a meaningful way. But you can observe the effect: run any privileged instruction from user space and the fault is immediate, not deferred.


The Crossing

The only legitimate way for a ring 3 process to reach ring 0 is through the syscall instruction.

When syscall executes, the CPU performs the transition atomically: it saves the current instruction pointer into RCX and the flags register into R11, switches CPL from 3 to 0 in the CS register, and jumps to a single fixed entry point in the kernel — entry_SYSCALL_64. The kernel then switches to its own stack using a value from the Task State Segment. This entry point address is stored in a model-specific register called LSTAR, written by the kernel at boot time.

The critical constraint is that the CPU always jumps to the same address — the one the kernel placed in LSTAR. A user process cannot choose where in the kernel it enters. It cannot jump to an arbitrary kernel function. It cannot modify LSTAR because writing model-specific registers is a privileged operation requiring ring 0. The kernel controls the entry point; a user process can only trigger the transition, not direct it.

This is why the ring boundary is meaningful even against a malicious process. A process that has been compromised can execute any instruction ring 3 permits — but it cannot reach ring 0 except through the one door the kernel controls.

The return path is the sysret instruction, which reverses the transition: CPL switches back to 3, the instruction pointer is restored from RCX and the flags register from R11, and execution resumes in user space where it left off.


Verify It

You can observe the privilege level indirectly. The cs register value visible in a process's register state encodes the CPL in its lower two bits.

Attach to your own shell process with gdb and read the CS register:

gdb -p $$

Once attached:

(gdb) info registers cs
cs    0x33    51

The value 0x33 in hexadecimal is 0011 0011 in binary. The lower two bits are 11 — that is 3 in decimal. CPL=3. Your shell is running at ring 3.

If you could attach to a kernel thread and read its CS register, you would see a value ending in 00 — CPL=0. You cannot do this from user space, which is itself a demonstration of the boundary.

Type detach then quit to exit gdb without affecting the shell process.


Why This Design Works

The ring model's strength comes from the fact that the CPU enforces it, not software. A kernel written entirely in software, running on a CPU with no privilege levels, could still attempt to restrict what user processes do — but a sufficiently determined or buggy process could overwrite the kernel's own memory and subvert the restrictions. The CPU would execute whatever instructions it was given.

With hardware-enforced rings, the CPU itself refuses to execute privileged instructions at ring 3, regardless of what the software says. The kernel does not need to defend against user processes attempting ring 0 operations — the hardware makes those attempts fail before they do anything.

Every mechanism that makes a Linux system secure and stable — process isolation, memory protection, controlled hardware access — depends on this one hardware primitive being reliable. The rest of the kernel is built on the assumption that ring 3 code cannot break out.


References