Linux Performance Troubleshooting: A Top-Down Diagnostic Guide

When your Linux system feels sluggish, it’s crucial to diagnose issues methodically—starting from high-level resource usage and drilling down to specific bottlenecks. This guide follows a top-down approach to identify and fix performance problems efficiently.

Step 1: Check System-Wide Resource Usage

First, identify which resource (CPU, RAM, Disk, Network) is under the most strain.

A. Run `htop` (or `top`) for Live Monitoring

htop  # Install with `sudo apt install htop` if missing

Key columns to watch:
- CPU%: Processes consuming high CPU.
- MEM%: Memory-hungry applications.
- IO%: Disk I/O (if high, proceed to Step 3).

B. Check System Load Averages

uptime

Output example: load average: 1.25, 0.85, 0.60
- Values > (CPU cores) indicate congestion (e.g., >4 on a 4-core CPU).

C. Summarize with `vmstat`

vmstat 1  # Updates every second

Focus on:
- wa (I/O wait): >10% means disk bottleneck.
- free memory: Low values may trigger swapping.

Step 2: Identify the Culprit Process

Once you know the overloaded resource, pinpoint the problematic process.

A. For CPU Issues

ps aux --sort=-%cpu | head -n 10  # Top 10 CPU hogs

Common offenders:
- gnome-shell, chrome, java.
- Fix: Restart the process or limit CPU with cpulimit.

B. For Memory Issues

ps aux --sort=-%mem | head -n 10  # Top 10 RAM users

If free -h shows low memory:

Kill leaking apps or add swap:

sudo fallocate -l 2G /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile

C. For Disk I/O Issues

sudo iotop -o  # Show active I/O processes

High I/O processes:
- systemd-journald (logs), pool-nemo (file manager), updatedb (file indexing).
- Mitigate with:
```
ionice -c 3 -p <PID>  # Set low I/O priority  
```

Step 3: Diagnose Disk-Specific Problems

If iotop or vmstat show high wa (I/O wait), investigate further.

A. Check Disk Space

df -h  # Look for >90% usage

If full:

sudo ncdu /  # Find large files to delete

B. Test Disk Speed

sudo hdparm -Tt /dev/sdX  # Replace X with your disk (e.g., sda)

HDD: Expect ~80–160 MB/s.
SSD: Expect ~300–500 MB/s (SATA) or >2000 MB/s (NVMe).

C. Check for Fragmentation (ext4)

sudo e4defrag -c /home  # Fragmentation score

>30%: Defragment after freeing space:
```
sudo e4defrag -v /home  
```

Step 4: Investigate Logs & Background Services

Persistent slowness may stem from misconfigured services.

A. Check System Logs

sudo journalctl -xe --no-pager | grep -i "error\|fail"

Common fixes:
- Update broken packages: sudo apt --fix-broken install.
- Restart crashed services: sudo systemctl restart <service>.

B. Disable Unnecessary Services

systemctl list-units --type=service --state=running

Examples to disable (if unused):
- unattended-upgrades (auto-updates):
```
sudo systemctl disable unattended-upgrades  
```
- bluetooth, cups (printing).

Step 5: Hardware Checks

Rule out failing hardware.

A. Run SMART Tests on Disks

sudo smartctl -a /dev/sdX | grep -i "Reallocated\|Pending\|Uncorrectable"

Reallocated sectors > 0: Backup data and replace the disk soon.

B. Monitor Temperature

sensors  # Install `lm-sensors` if missing

CPU/GPU > 90°C: Clean fans, reapply thermal paste.

Final Recommendations

Regular Maintenance:
- Clean logs: sudo journalctl --vacuum-size=200M.
- Update packages: sudo apt update && sudo apt upgrade.
Upgrade Hardware:
- Replace HDDs with SSDs.
- Add RAM if swapping is frequent.
Optimize Workloads:
- Use nice/ionice for CPU/I/O priority.
- Schedule heavy tasks (e.g., backups) during off-hours with cron.

By following this structured approach, you can efficiently diagnose and resolve Linux performance issues. Hopefully you found this guide helpful.

Linux Performance Troubleshooting: A Top-Down Diagnostic Guide

Step 1: Check System-Wide Resource Usage

A. Run `htop` (or `top`) for Live Monitoring

B. Check System Load Averages

C. Summarize with `vmstat`

Step 2: Identify the Culprit Process

A. For CPU Issues

B. For Memory Issues

C. For Disk I/O Issues

Step 3: Diagnose Disk-Specific Problems

A. Check Disk Space

B. Test Disk Speed

C. Check for Fragmentation (ext4)

Step 4: Investigate Logs & Background Services

A. Check System Logs

B. Disable Unnecessary Services

Step 5: Hardware Checks

A. Run SMART Tests on Disks

B. Monitor Temperature

Final Recommendations

Comments

Leave a Reply Cancel reply

Linux Performance Troubleshooting: A Top-Down Diagnostic Guide

Step 1: Check System-Wide Resource Usage

A. Run htop (or top) for Live Monitoring

B. Check System Load Averages

C. Summarize with vmstat

Step 2: Identify the Culprit Process

A. For CPU Issues

B. For Memory Issues

C. For Disk I/O Issues

Step 3: Diagnose Disk-Specific Problems

A. Check Disk Space

B. Test Disk Speed

C. Check for Fragmentation (ext4)

Step 4: Investigate Logs & Background Services

A. Check System Logs

B. Disable Unnecessary Services

Step 5: Hardware Checks

A. Run SMART Tests on Disks

B. Monitor Temperature

Final Recommendations

Share this:

Comments

Leave a Reply Cancel reply

A. Run `htop` (or `top`) for Live Monitoring

C. Summarize with `vmstat`