ServerlessBase Blog
  • How to Read and Analyze Linux System Logs (fr)

    A comprehensive guide to reading and analyzing Linux system logs for troubleshooting and monitoring

    How to Read and Analyze Linux System Logs

    You've just deployed an application to your server, and something isn't working. The logs are full of cryptic error messages, but you don't know where to start. Reading Linux system logs is a fundamental skill for any system administrator or DevOps engineer. Without understanding what your logs are telling you, you're flying blind.

    This guide will teach you how to navigate the Linux logging ecosystem, identify common issues, and use tools to analyze logs effectively. You'll learn where logs live, how to read them, and how to extract meaningful information from the noise.

    Understanding the Linux Logging Ecosystem

    Linux uses a centralized logging system called journald that collects logs from various sources. These logs are then stored in files under /var/log, where they persist even after a system reboot. The most important directories for log analysis are /var/log/ and /var/log/journal/.

    The systemd service manages these logs through journald. When you run journalctl without arguments, you're seeing all logs from all services. This can be overwhelming, so it's crucial to understand how to filter and search through this data.

    Different services write to different log files. The kernel logs go to /var/log/kern.log, system messages go to /var/log/syslog or /var/log/messages, and authentication logs go to /var/log/auth.log (Debian/Ubuntu) or /var/log/secure (RHEL/CentOS). Understanding which log file contains which type of information saves time when troubleshooting.

    Common Log Files and Their Purposes

    Linux systems generate logs for every major component. Here's a breakdown of the most important log files you'll encounter:

    Log FilePurposeTypical Issues Found
    /var/log/syslog or /var/log/messagesGeneral system messagesHardware errors, service startup failures
    /var/log/auth.log or /var/log/secureAuthentication eventsFailed login attempts, SSH issues
    /var/log/kern.logKernel messagesDriver problems, hardware failures
    /var/log/dmesgBoot-time kernel messagesBoot failures, driver loading issues
    /var/log/nginx/error.logNginx web server errors404 errors, connection failures
    /var/log/apache2/error.logApache web server errorsConfiguration errors, permission issues
    /var/log/mysql/error.logMySQL database errorsConnection timeouts, query failures
    /var/log/docker.logDocker daemon logsContainer crashes, network issues

    Each log file serves a specific purpose. The syslog or messages file contains general system information, while auth.log tracks authentication events. Kernel logs (kern.log and dmesg) are critical for hardware and driver issues. Application-specific logs like nginx/error.log contain errors from your web server.

    When troubleshooting, start with the most relevant log file based on the symptoms you're seeing. A web application error usually points to the application or web server logs, while a database connection failure points to the database logs.

    Reading Logs with journalctl

    The journalctl command is the primary tool for reading logs managed by journald. It provides powerful filtering options to find exactly what you need.

    To see all logs from the current boot:

    journalctl

    To see logs from a specific service:

    journalctl -u nginx

    To see logs from the last hour:

    journalctl --since "1 hour ago"

    To see logs from a specific time range:

    journalctl --since "2026-03-12 00:00:00" --until "2026-03-12 12:00:00"

    To follow logs in real-time (similar to tail -f):

    journalctl -f

    The -u flag filters by service name, -b shows logs from the current boot, and --since/--until specify time ranges. These flags combine powerfully—for example, to see all Nginx errors from the last hour:

    journalctl -u nginx --since "1 hour ago" | grep -i error

    This command filters Nginx logs for the last hour and extracts only error messages.

    Analyzing Log Files Directly

    While journalctl is convenient for systemd-managed logs, many applications write to traditional log files. The tail command is your primary tool for reading these files.

    To see the last 10 lines of a log file:

    tail -n 10 /var/log/syslog

    To follow logs in real-time:

    tail -f /var/log/nginx/error.log

    To see the last 50 lines and then follow:

    tail -n 50 -f /var/log/nginx/error.log

    The -n flag specifies the number of lines, and -f enables follow mode. This is essential for monitoring logs while an application runs.

    For searching within log files, grep is indispensable:

    grep "error" /var/log/syslog

    To search case-insensitively:

    grep -i "error" /var/log/syslog

    To count occurrences:

    grep -c "error" /var/log/syslog

    To search for multiple patterns:

    grep -E "error|failed|critical" /var/log/syslog

    The -i flag makes the search case-insensitive, -c counts matches, and -E enables extended regular expressions for complex patterns.

    Identifying Common Log Patterns

    Certain log patterns indicate specific types of issues. Recognizing these patterns speeds up troubleshooting significantly.

    Failed login attempts typically appear in /var/log/auth.log or /var/log/secure:

    Mar 12 10:23:45 server sshd[12345]: Failed password for invalid user admin from 192.168.1.100 port 45678 ssh2

    Multiple failed attempts indicate brute force attacks. You can count them with:

    grep "Failed password" /var/log/auth.log | wc -l

    Service startup failures appear in syslog or the service's own log file:

    Mar 12 10:30:00 server systemd[1]: nginx.service: Main process exited, code=exited, status=1/FAILURE

    This indicates Nginx failed to start. Check the configuration with nginx -t to identify syntax errors.

    Database connection errors in MySQL logs:

    Mar 12 10:35:00 server mysqld[67890]: [ERROR] Can't start server: Bind on TCP/IP port: Address already in use

    This means another process is using the MySQL port. Check with netstat -tlnp | grep 3306.

    Disk space issues appear in syslog:

    Mar 12 10:40:00 server kernel: XFS: filesystem full

    This indicates the filesystem is full. Check disk usage with df -h and identify large files with du -sh * | sort -rh.

    Memory exhaustion appears in kern.log:

    Mar 12 10:45:00 server kernel: Out of memory: Kill process 12345 (nginx) score 900 or sacrifice child

    This means the system is running out of memory. Check memory usage with free -h and identify memory-hungry processes with top or htop.

    Practical Log Analysis Walkthrough

    Let's walk through a real-world troubleshooting scenario. Your web application is returning 500 errors, and you need to find the root cause.

    Step 1: Check the web server logs

    tail -n 50 -f /var/log/nginx/error.log

    You see repeated errors:

    2026/03/12 10:00:00 [error] 12345#12346: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.1.50, server: example.com, request: "GET /api/users HTTP/1.1", upstream: "http://127.0.0.1:3000/api/users", host: "example.com"

    The error shows that Nginx can't connect to the upstream application on port 3000.

    Step 2: Check if the application is running

    systemctl status myapp

    The output shows:

    ● myapp.service - My Application
       Loaded: loaded (/etc/systemd/system/myapp.service; enabled; vendor preset: enabled)
       Active: inactive (dead) since Thu 2026-03-12 09:55:00 UTC; 5 minutes ago

    The application is not running.

    Step 3: Check application logs

    journalctl -u myapp --since "10 minutes ago"

    You see:

    Mar 12 09:55:00 server myapp[12345]: Error: Port 3000 is already in use
    Mar 12 09:55:00 server myapp[12345]: Failed to start server

    The application can't start because port 3000 is already in use.

    Step 4: Identify what's using port 3000

    netstat -tlnp | grep 3000

    Output:

    tcp6       0      0 :::3000                 :::*                    LISTEN      67890/python3

    Another Python process is using port 3000.

    Step 5: Check if it's the same application

    ps aux | grep 67890

    Output:

    user123   67890  0.5  1.2  123456  78901 ?        Ssl  09:50:00 python3 /opt/myapp/app.py

    It's a different instance of the same application.

    Step 6: Stop the conflicting process

    kill 67890

    Step 7: Start the application

    systemctl start myapp

    Step 8: Verify the application is running

    curl http://localhost:3000/health

    Output:

    {"status":"ok"}

    The application is now running, and Nginx can connect to it. Check the Nginx logs again:

    tail -n 5 /var/log/nginx/error.log

    No more errors. The issue is resolved.

    This walkthrough demonstrates the systematic approach to log analysis: identify the error, find the relevant logs, trace the root cause, and implement a fix. Each step builds on the previous one, using logs as your guide.

    Log Rotation and Management

    Logs grow quickly and can fill your disk if not managed properly. Linux uses logrotate to manage log rotation, which archives old logs and creates new ones.

    The main configuration file is /etc/logrotate.conf, and system-specific configurations are in /etc/logrotate.d/. A typical logrotate configuration looks like:

    /var/log/nginx/*.log {
        daily
        rotate 14
        compress
        delaycompress
        notifempty
        create 0640 www-data adm
        sharedscripts
        postrotate
            systemctl reload nginx > /dev/null 2>&1 || true
        endscript
    }

    This configuration rotates Nginx logs daily, keeps 14 days of logs, compresses old logs, and reloads Nginx after rotation.

    To check if logrotate is working:

    logrotate -d /etc/logrotate.conf

    The -d flag runs in debug mode without actually rotating logs. If you see errors, check the logrotate log at /var/lib/logrotate/status.

    For manual log rotation:

    logrotate -f /etc/logrotate.conf

    The -f flag forces rotation even if the rotation conditions aren't met.

    Advanced Log Analysis Techniques

    For more sophisticated analysis, consider these techniques:

    Structured logging: Use JSON-formatted logs for easier parsing. Most modern applications support this:

    {"timestamp":"2026-03-12T10:00:00Z","level":"error","service":"api","message":"Database connection failed","error":"Connection timeout"}

    Log aggregation: Centralize logs from multiple servers using tools like ELK Stack (Elasticsearch, Logstash, Kibana) or Loki (Grafana's log aggregation system).

    Log analysis tools: Use specialized tools like grep, awk, and sed for pattern matching and text processing:

    # Extract timestamps from logs
    grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/syslog
     
    # Count errors by service
    awk '{print $5}' /var/log/syslog | sort | uniq -c | sort -rn

    Log monitoring: Set up alerts for critical log patterns using tools like logwatch or custom scripts:

    # Alert if there are more than 10 failed logins in an hour
    if [ $(grep "Failed password" /var/log/auth.log | wc -l) -gt 10 ]; then
        echo "Multiple failed login attempts detected" | mail -s "Security Alert" admin@example.com
    fi

    Best Practices for Log Analysis

    Effective log analysis follows these principles:

    Always check the most recent logs first: Newer logs are more likely to contain the current issue.

    Use appropriate tools for the job: journalctl for systemd logs, tail and grep for traditional logs, specialized tools for structured logs.

    Understand the log format: Each log file has its own format. Read the documentation or examine sample logs to understand the structure.

    Combine multiple log sources: Issues often span multiple components. Check application logs, web server logs, and system logs together.

    Document your findings: Keep a record of common issues and their solutions. This builds institutional knowledge.

    Automate repetitive tasks: Create scripts for common log analysis tasks to save time.

    Monitor log growth: Ensure log rotation is configured and working to prevent disk space issues.

    Use log levels appropriately: Applications should use appropriate log levels (debug, info, warning, error, critical) to help filter logs.

    Conclusion

    Reading and analyzing Linux system logs is a critical skill for any system administrator or DevOps engineer. By understanding where logs live, how to read them, and how to identify common patterns, you can troubleshoot issues efficiently and keep your systems running smoothly.

    The key takeaways are: logs are your primary diagnostic tool, journalctl and tail are your main reading tools, grep is your search tool, and systematic analysis—identifying the error, finding the relevant logs, tracing the root cause, and implementing a fix—is your approach.

    Start by familiarizing yourself with the log files in /var/log/ and mastering the journalctl, tail, and grep commands. As you gain experience, you'll develop an intuition for common patterns and faster troubleshooting workflows.

    Platforms like ServerlessBase simplify deployment and monitoring, but understanding your logs remains essential. When issues arise, your ability to read and analyze logs will determine how quickly you can resolve them and keep your applications running smoothly.

    For more information on monitoring and troubleshooting, check out the ServerlessBase documentation on monitoring and applications.

    Leave comment