Tracing the Linux system calls

Tracing the Linux system calls

Hello there!

Today, I want to talk a bit about the Linux system calls, and more specifically, the mechanism offered by the kernel when it comes to tracing them. In this article, I’ll try to describe a part of the /sys filesystem, more specifically, a kernel debug feature. But first, let’s talk about system calls.

Linux system calls

System calls are the main interface between your program and the kernel itself. It allows your program to temporarily switch into kernel mode, and be granted the right to perform a specific action which wouldn’t be possible from user mode. These system calls are commonly used when it comes to…

  • I/O operations
  • Device management and communcation
  • Signal handling
  • IPC requests

One of the most obvious system calls is open. This call allows you to open a file somewhere on your system, create it if necessary, get write permissions on it, and so on. By calling open, your program switches into kernel mode, and obtains a descriptor referencing the file. Once you’ve got this descriptor, you can use other system calls which rely on it, such as readwrite, close, and so on.

System calls are documented in section 2 on the man pages. If you want a list of all documented system calls, you may use the apropos utility:

However, do not mix up the system calls and the library functions (man section 3) which rely on it. For instance, printf is not a system call, but it relies on the write system call. The same can be said of scanf, which relies on read. In some way, many library functions are nothing but wrappers written around system calls.

Monitoring system calls

The mechanism

Since system calls are requests made to the kernel, it is possible to use of the kernel’s interface to get them traced. This is done through the /sys filesystem. More specifically:

  • /sys/kernel/debug/tracing handles all system tracing features.
  • /sys/kernel/debug/tracing/events handles events tracing.
  • /sys/kernel/debug/tracing/events/syscalls handles a specific kind of events: system calls.

Now, if you have a look (with root privileges) at the latter, you’ll find a huge amount of directories. It is important to remember that the /sys filesystem is not a typical one, it is virtual. It acts as an intersface between you and the system, yet you won’t find it anywhere on your hard drive. In this way, it is quite similar to /proc. Each file reflects a setting, or a kernel data structure, and is not an actual file on your drive.

/sys/kernel/debug/tracing/events/syscalls has two types of directories, which you can distinguish from their names :

  1. sys_enter_[syscall], events triggered when entering a system call (shortly: when you actually make the call).
  2. sys_exit_[syscall], events triggered when exiting a system call (shortly: when the system call returns).

For instance, let’s say that mysupersystemcall is an actual system call known by the kernel. It means that it has two directories in events/syscallssys_enter_mysupersystemcall and sys_exit_mysupersystemcall.

An example: the mkdir system call

The mkdir system call is what the mkdir utility (man 1 mkdir) uses to create a directory at a given location. Now, let’s write a simple program which uses it:

First, let’s activate the trace for sys_enter_mkdir, this is done by writing to the enable file in the directory:

  • Writing 1 will activate the traces.
  • Writing 0 will deactivate it.

Now, execute our program (use_mkdir.c), and have a look at the kernel’s trace:

There, you’ll see some interesting information:

  • The PID of the process which used the system call, along with its name.
  • The time at which the system call was used (in seconds since boot).
  • The parameters passed to the system call, as hexadecimal values (0640 in octal is 1A0 in hexadecimal).

Now, if you use the watch utility, you’ll be able to monitor calls to mkdir:

And there you go! No matter what the program is, who runs it, if something calls mkdir anywhere on the system, you’ll know it… as long as you have root privileges.