Monitoring processes and resources
General purpose diagnostic programs (memory, CPU, I/O)
top
/htop
We can use ps to list the currently running processes but it does not provide much information about the resource metrics or how the process changes over time. We can use top
to get more information.
top
provides an interactive interface for the information that ps
displays. It updates in real time and shows the most active processes based on the CPU time that they are utilising. You can also order by memory usage.
Here I have pressed u
to show only the processes associated with my user:
Main commands
Command | Action |
---|---|
-u | Show processes by selected user |
M | Sort by memory usage |
P | Sort by cumulative CPU usage |
? | View key and explanation |
Understanding the categories
Main/IO
- The first covers all processes. The second focuses on input/output processes (i.e. reading and writing to disks and other devices)
PRI
- This stands for priority. This metric reflects the kernel’s current schedule priority for the process. The higher the value, it is less likely that the kernel will schedule the process if there are competing processes that require CPU time. The lower the value, the greater priority this process has over others.
NI
- This stands for nice value. This metric exists in order to allow administrators to nudge or influence the priority of a given process. You cannot directly tell the kernel to do x now instead of y but you can make what are effectively suggestions by manipulating the nice value.
- The kernel adds the nice value to the current priority value for the given process to determine its next time slot. When you increase the nice value of process P you are being “nicer” to the other processes by influencing the priority of P downwards so that the other processes receive greater precedence from the kernel.
- By default, the nice value will be 0. To reduce priority of PID 1234, you would use:
$ renice 20 1234
VIRT
- The total amount of Virtual_memory_and_the_MMU_in_Linux used by the process including: program code, data, shared libraries, pages that have been swapped, pages that have been mapped but not used.
RES
- Stands for resident size
- The non swapped physical memory the process has used
SHR
- The size of the process’s shared pages
S
- Status:
- S for sleeping (idle)
- R for running
- D for disk sleep
- Status:
vmstat
vmstat
provides similar metrics to htop
but tells you more about the memory state and the activities of the kernel in a single row.
The default output is a single line with the averages since boot. You can add a delay parameter (in secs) which will then output at that interval, allowing you to see memory usage in realtime, e.g:
$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 4326768 334228 5050952 0 0 8 19 80 10 4 1 94 0 0
0 0 0 4365520 334260 5054468 0 0 0 125 2140 3434 4 1 94 0 0
1 0 0 4382400 334276 5068940 0 0 0 77 2102 3988 3 1 95 0 0
1 0 0 4434000 334288 5052908 0 0 0 25 2859 4278 6 1 92 0 0
0 0 0 4391576 334304 5086484 0 0 0 110 2899 6480 8 3 90 0 0
procs
- The number of runnable processes (
r
) and the number of blocked (b
) processes
- The number of runnable processes (
memory
swap
- Distinguishes amount of memory swapped in (
si
) to memory and swapped out (so
) to disk
- Distinguishes amount of memory swapped in (
io
- Disk actions
- Amount of data read from harddisk (
bi
) - Amount of data written to harddisk (
bo
)
system
- The number of times the kernel switches to kernel code
cpu
- Percentage of the different CPU behaviours:
- Responding to user tasks (
us
) - Time that it is idle (
id
)
- Responding to user tasks (
- Percentage of the different CPU behaviours:
Files being used by active processes: lsof
lsof
stands for list open files. It lists opened files and the processes using them. Without modifiers it outputs a huge amount of data. The best way to use it is to execute it against a specific PID. For example the below output gives me some useful info about which files VS Code is using:
System calls: strace
A system call is when a process requests a service from the kernel, for instance an I/O operation to memory. We can trace these system calls with strace
.
CPU performance
We can use the uptime
program to assess overall CPU performance in the form of a load average.
Load average is the number of active processes currently ready to run. It is an estimate of the number of processes that are capable of using the CPU at any given time.
Uptime
gives you three load averages:
$ uptime
11:19:16 up 14 days, 3:53, 1 user, load average: 0.84, 0.57, 0.50
The three numbers are load averages for the past 1 minute, 5 minutes and 15 minutes respectively.
A load average close to 0 is usually a good sign because it means that your processor isn’t being challenged and you are conserving power. Anything equal to or above 1 means that a single process is using the CPU nearly all the time. You can identify that process with
htop
and it will obviously be near to the top. (This is often caused by Chrome and Electron-based software.)
Memory status
We know that processes primarily interact with virtual memory in the form of pages which are then translated to physical blocks by the kernel via the Virtual_memory_and_the_MMU_in_Linux. There are several tools which provide windows onto this process.
System page size
We can view the overall system page size which is a representation of the amount of virtual memory available:
$ getconf PAGE_SIZE
4096
This will typically be the same for all Linux systems.
free
: available physical memory
free
displays the total amount of free and¬used physical and swap memory in the system, as well as the buffers and caches used by the kernel.
$ free
total used free shared buff/cache available
Mem: 16099420 5931512 5039344 2046460 5128564 7781904
Swap: 3145724 0 3145724