Sagar.BlogArticle
All posts
All posts
Bash

Reading Files Line by Line in Bash

Process files safely — one line at a time. The canonical while-read pattern, mapfile, and how to handle edge cases like binary files and missing newlines.

January 22, 20266 min read
BashFilesIOScripting

Processing files — config files, logs, CSVs, lists — is one of Bash's most common tasks. Doing it correctly means handling spaces in lines, lines without trailing newlines, and avoiding the performance cost of launching subshells in a tight loop.

The Canonical Pattern

# The correct way to read a file line by line
while IFS= read -r line; do
    echo "Line: $line"
done < input.txt

# Breaking down the idiom:
# IFS=        — prevent leading/trailing whitespace stripping
# read -r     — don't interpret backslash escapes
# < input.txt — file is redirected to while's stdin (not cat | while!)

Don't use `cat file | while read`

cat file | while read line; do ...; done runs the while loop in a subshell (because of the pipe). Any variables you set inside the loop are lost when it ends.

Instead, redirect with < file or use process substitution < <(command) — these keep the loop in the current shell.

Parsing Delimited Fields

# Parse CSV-ish data (simple, no quoted commas)
while IFS=, read -r name age city; do
    echo "Name: $name | Age: $age | City: $city"
done < people.csv

# Parse /etc/passwd (colon-delimited)
while IFS=: read -r user _ uid gid gecos home shell; do
    echo "$user$shell"
done < /etc/passwd

# Parse key=value config file
while IFS='=' read -r key value; do
    # Skip comments and blank lines
    [[ "$key" =~ ^[[:space:]]*# ]] && continue
    [[ -z "$key" ]] && continue
    # Trim whitespace from key and value
    key="${key// /}"
    value="${value// /}"
    echo "Config: $key = $value"
done < app.conf

mapfile — Read All Lines into Array

# Read entire file into array (one element per line)
mapfile -t lines < /etc/hosts    # -t trims trailing newlines

echo "Total lines: ${#lines[@]}"

# Access by index
echo "Line 0: ${lines[0]}"

# Process
for line in "${lines[@]}"; do
    [[ "$line" == "#"* ]] && continue   # skip comments
    echo "$line"
done

# From command output
mapfile -t pids < <(pgrep python3)
echo "Python PIDs: ${pids[*]}"

Performance — Avoid Subshells in Loops

# Slow — spawns a subshell for each line's awk
while IFS= read -r line; do
    field=$(echo "$line" | awk '{print $1}')   # ← subprocess per line
    echo "$field"
done < bigfile.txt

# Fast — use read with IFS to split fields directly
while read -r field rest; do    # first word → $field, rest → $rest
    echo "$field"
done < bigfile.txt

# Even better — process the entire file with one awk/sed call
awk '{print $1}' bigfile.txt

Edge Cases

# Handle file with no trailing newline (last line still read)
while IFS= read -r line || [[ -n "$line" ]]; do
    # || [[ -n "$line" ]] catches the last line if file lacks trailing newline
    echo "$line"
done < file_without_newline.txt

# Skip empty lines
while IFS= read -r line; do
    [[ -z "$line" ]] && continue
    echo "$line"
done < file.txt

# Limit to first N lines
head -n 100 bigfile.txt | while IFS= read -r line; do
    process "$line"
done
Quick Check

Why does `cat file | while IFS= read -r line; do x=1; done; echo $x` print nothing (empty)?

Exercise

Write a script count-errors.sh that:

  1. Takes a log file path as argument
  2. Reads it line by line
  3. Counts lines containing the word "ERROR" (case-insensitive)
  4. Also counts lines containing "WARN"
  5. Prints a summary: "Errors: 5, Warnings: 2"