Network Troubleshooting in Linux: A Beginner’s Step-by-Step Guide
Network issues can be among the most frustrating challenges for Linux users, system administrators, and developers. Whether you’re managing a server in a data center, a virtual machine, or a home lab router, having a systematic approach to troubleshooting is essential. In this beginner-friendly guide, you’ll learn how to identify and resolve network problems effectively using a structured workflow: gather information, form a hypothesis, test and isolate, implement a fix, and verify your solutions. We’ll cover practical commands, interpretation tips, and checks organized by OSI layers (link layer, IP layer, DNS, transport, and firewall) to help you pinpoint where the problem lies.
Important Notes Before You Begin
- Perform testing on non-production systems or during maintenance windows when possible.
- Many commands require elevated privileges; use
sudoor operate from an admin account. - Use persistent sessions (e.g.,
tmuxorscreen) to prevent SSH session drops during remote troubleshooting.
Keep these principles in mind: reproduce the issue, change one aspect at a time, and document each step for easier rollback if necessary.
Linux Networking Basics
A quick primer on key networking concepts includes:
- OSI Layers to Focus On: Link (Ethernet/Wi-Fi), Network (IP/routing), Transport (TCP/UDP), Application (DNS/HTTP/SSH).
- Key Terms:
- Interface: A network device (eth0, enp3s0, wlan0, docker0).
- IP Address & Netmask: Identifies the host and subnet (e.g., 192.168.1.10/24).
- Gateway: The destination for packets not local to your network.
- DNS: Translates domain names to IP addresses.
- ARP: Maps IP addresses to MAC addresses on the local network.
- DHCP vs. Static: DHCP assigns IP addresses automatically, while static requires manual assignment.
- Common Services That Change Runtime Config: NetworkManager,
systemd-resolved,netplan(for Ubuntu), andsystemd-networkd. These may overwrite your manual changes.
Note: The ip/iproute2 suite is the modern tool for Linux networking. Prefer this over legacy tools like ifconfig and route. For more details, refer to the man page for ip.
Prepare and Gather Information
Before making changes, establish a baseline and verify your access:
- Confirm permissions:
sudo/rootand whether access is local or remote. - Accurately record the problem: include timestamps, failed commands, screenshots, and complete error messages.
- Baseline Commands to Save:
# Interfaces and addresses
ip addr show
# Routes
ip route show
# DNS resolver status
resolvectl status || cat /etc/resolv.conf
# Listening sockets
ss -tuln
- If troubleshooting remotely, start a
tmuxsession:sudo apt install tmux && tmux. - Save outputs for future comparison:
ip addr show > /tmp/ip-addr.before.txt.
Always document logs and screenshots before any changes.
Essential Commands
Here are the primary tools for troubleshooting, and when to use them:
ip(ip addr, ip link, ip route, ip neigh): Inspect interfaces, routes, and ARP. Look for interface status (UP/DOWN), addresses, anddefault viaroutes.# Show interfaces ip addr show # Show routing table ip route # Look up ARP neighbors ip neighss(ss -tuln, ss -s) andnetstat(legacy): Usessfor modern systems to show sockets and listening ports:# Show TCP/UDP listening ports ss -tuln # Summary ss -sping: Test basic reachability and latency. Distinguish DNS problems by pinging both IPs and hostnames:ping -c 4 8.8.8.8 ping -c 4 example.comtraceroute / tracepath / mtr: Map the path packets take and analyze per-hop latency/loss. Usemtrfor continuous monitoring:traceroute 8.8.8.8 mtr --report example.comdig / nslookup: Query DNS servers directly:# Query Google's DNS
dig @8.8.8.8 example.com +short
- **`arp / ip neigh`**: Inspect ARP cache to verify IP-MAC mappings and identify duplicates.
- **`nmcli / resolvectl / systemctl`**: Check NetworkManager, resolver, and service statuses:
```bash
nmcli device status
resolvectl status
systemctl status NetworkManager
ethtool: Diagnose NIC link status, speed, duplex, and offload settings:sudo ethtool eth0iptables / nft / ufw / firewall-cmd: Inspect firewall rules. Modern systems utilize nftables; many distros still support iptables:sudo nft list ruleset # nftables sudo iptables -L -n -v # legacy iptables
| Purpose | Modern Tool | Legacy Tool | Notes |
|---|---|---|---|
| Interfaces & Addresses | ip (ip addr) | ifconfig | Use ip; ifconfig may be missing on minimal systems |
| Routes | ip route | route | ip route shows policy routes too |
| Sockets | ss | netstat | ss is faster and more feature-rich |
Step-by-Step Diagnostics by Layer
This section guides you through targeted checks to isolate issues by OSI layer:
-
Link Layer (Physical/NIC)
- Symptoms: Interface DOWN, NO-CARRIER, or frequent link flaps.
- Commands:
ip link show sudo ethtool enp3s0 dmesg | grep -i eth- Checklist: Look for UP/DOWN flags,
NO-CARRIER, driver errors indmesg, and RX/TX errors. - For Wi-Fi, verify SSID, signal strength, and authentication logs.
-
IP Layer (Addressing, Routes, ARP)
- Verifying IP existence and subnet:
ip addr show. - Check default route:
ip routeshould displaydefault via <gateway>. - Test gateway reachability:
ping -c 3 <gateway-ip> - Verifying IP existence and subnet:
ip neigh show
- Address potential ARP issues: missing entries may indicate switch issues or duplicate IPs.
3. **DNS (Name Resolution vs. Connectivity)**
- Distinguish between DNS and connectivity: if `ping 8.8.8.8` succeeds but `ping example.com` fails, the issue is with DNS.
- Test DNS directly:
```bash
dig @8.8.8.8 example.com
resolvectl query example.com
- Confirm configuration in
/etc/resolv.confor check systemd-resolved status.
-
Routing and Path Checks
- Use
ip route get <dest>to see packet routing via the kernel. - Use
tracerouteormtrto analyze where the path fails:
ip route get 8.8.8.8 traceroute 8.8.8.8 - Use
-
Transport-Level Checks (Services and Ports)
- Verify if the service is listening:
ss -tuln. - Test port reachability from a client:
nc -vz example.com 22- Inspect service logs (e.g., SSH:
journalctl -u sshd -b).
- Verify if the service is listening:
-
Firewall & Security Blocks
- Examine firewall rules:
nft list rulesetoriptables -Lorufw status. - Temporarily disable the firewall if deemed safe for testing:
sudo ufw disable # Test only, re-enable afterward- Consider security frameworks (AppArmor/SELinux) that may restrict networking.
- Examine firewall rules:
Packet Capture and Analysis
If checks don’t pinpoint the issue, capture traffic using tcpdump:
- Basic Commands:
# Capture traffic between local host and 1.2.3.4 on interface eth0, limit size and save
sudo tcpdump -i eth0 host 1.2.3.4 -s 96 -w /tmp/capture.pcap
# View a text summary
sudo tcpdump -r /tmp/capture.pcap -nn -tt
- Filters can minimize disk usage: capture only necessary hosts or protocols. Use
-sto limit the snapshot length. - Import captures into Wireshark for detailed analysis. Refer to Wireshark docs for guidance.
Common capture challenges include:
- Repeated SYNs without SYN-ACK: Indicates the remote host may not be responding.
- RST from remote: Suggests rejection of the connection.
- DNS responses with NXDOMAIN or delays: Indicate potential DNS server issues.
Interpreting captures requires practice; start by focusing on SYNs, SYN-ACKs, RSTs, and ICMP messages.
Logs and System Services
Logs can often reveal the underlying problem. Key commands include:
# NetworkManager logs this boot
journalctl -u NetworkManager -b
# Kernel and driver messages
journalctl -k | tail -n 200
# General syslog (Debian/Ubuntu)
tail -f /var/log/syslog
# For RedHat/CentOS, check /var/log/messages
Look for DHCP failures, repeated link flaps, driver errors, and messages correlating with the observed outages.
Common Scenarios and Fixes
Frequent problems and practical solutions include:
- No Network on Machine (No IP)
- Symptoms: No IP assigned.
- Checks & Fixes:
- Ensure the interface is UP:
ip link show. - For DHCP, check service status:
systemctl status dhclientand renew if necessary:sudo dhclient -v <iface>. - Assign a temporary static IP:
sudo ip addr add 192.168.1.50/24 dev enp3s0 sudo ip route add default via 192.168.1.1 - Ensure the interface is UP:
- No Internet but LAN Works
- Symptoms: Can connect to local devices but not external hosts.
- Fixes:
- Verify the default route:
ip route. - Test the gateway: If successful, ping an external IP (8.8.8.8). If it fails, check router/NAT.
- If you’re running a Linux router, check NAT rules:
sudo nft list ruleset.
- Verify the default route:
- DNS Resolving Failures
- Symptoms: IP pings succeed, but hostnames do not.
- Fixes:
- Query a public DNS server:
dig @8.8.8.8 example.com. - If the public query works, restart your resolver:
sudo systemctl restart systemd-resolvedand clear caches with:resolvectl flush-caches.
- Query a public DNS server:
- Unable to Reach Remote Server (SSH/HTTP)
- Symptoms: Connection times out or is refused.
- Fixes:
- Verify the service is running:
ss -tuln | grep :22. - Test connectivity from another host within the same network.
- Check firewall rules on both the server and any intermediate devices.
- Verify the service is running:
- Intermittent Connectivity and Packet Loss
- Symptoms: Random drops or high latency.
- Fixes:
- Inspect NIC errors:
ip -s linkfor RX/TX errors. - Identify duplex mismatches using
ethtool. - Check Wi-Fi signal strength and interference.
- Use
mtrto pinpoint loss across hops.
- Inspect NIC errors:
- Slow Network Performance
- Causes: Saturated link, mismatched MTU, packet loss.
- Tests:
- Measure throughput with
iperf3:iperf3 -son the server andiperf3 -c serveron the client. - Identify high retransmissions with
tcpdump. - Verify MTU settings with
ip link showor adjust using:
ip link set dev eth0 mtu 1400 - Measure throughput with
Advanced Topics
- Containers and Namespaces: Containers add virtual interfaces and networks. Use
ip netns,docker network ls, andbridgecommands for inspection. - VPNs / Overlays: Verify tunnel endpoints and routing policies.
- WSL/VM Networking Quirks: If troubleshooting WSL, consult the WSL configuration guide.
- When to Escalate: If issues trace beyond your network, gather evidence and contact your provider.
Also consider dependencies like LDAP for service access by consulting our LDAP integration guide.
Preventive Measures & Best Practices
- Monitoring & Alerting: Implement uptime checks and service latency monitors.
- Change Management: Document changes and use version control for config files.
- Backups: Maintain backups of network configurations.
- Security Hygiene: Enforce least privilege and ensure centralized logging.
Troubleshooting Checklist (Quick Reference)
Keep this checklist handy for rapid troubleshooting:
- Reproduce & record symptoms (timestamps, screenshots).
- Check physical/link status:
ip link,ethtool. - Verify IP & netmask:
ip addr. - Confirm default route & gateway reachability:
ip route,ping <gateway>. - Test external connectivity:
ping 8.8.8.8. - Test DNS functionality:
dig @8.8.8.8 example.com. - Check service ports:
ss -tuln,nc -vz. - Inspect firewall rules:
nft list rulesetoriptables -L. - Capture packets as needed:
tcpdump -i <iface> host <ip> -w capture.pcap. - Review logs:
journalctl -u NetworkManager -b,journalctl -k. - Document fixes and update runbooks.
(Consider converting this checklist into a printable PDF for operational runbooks.)
Conclusion
Network troubleshooting is a vital skill developed through practice and documentation. Start in a lab environment—set up VMs or a home lab to simulate issues and fixes. For further practice, simulate DNS failures or intentionally cause DHCP outages to enhance your skills.
For assistance with documenting incidents or creating post-mortem analyses, refer to our guide on creating technical presentations.