Skip to main content

Readable Streams: Processing Data Without Memory Overload

Have you ever tried to load a massive file into memory, only to watch your application freeze or crash? Or wondered how Netflix streams hours of video without consuming gigabytes of RAM? The answer lies in a powerful concept called readable streams. Let's discover how Node.js handles large amounts of data efficiently, one manageable piece at a time.

Quick Reference

When to use: Processing large files, handling network data, or working with any data source that's too big to load all at once

Basic syntax:

import { createReadStream } from "fs";

const stream = createReadStream("large-file.txt");
stream.on("data", (chunk) => console.log(chunk));
stream.on("end", () => console.log("Done!"));

Common patterns:

  • Reading large files chunk by chunk
  • Processing HTTP request/response data
  • Streaming database query results
  • Handling real-time data from WebSockets

Gotchas:

  • ⚠️ Streams don't load all data at once - you get it piece by piece
  • ⚠️ Always handle the 'error' event to prevent crashes
  • ⚠️ Remember to clean up resources when done

What You Need to Know First

To get the most out of this guide, you should be comfortable with:

  • Node.js basics: Understanding how to import modules and run JavaScript in Node.js
  • Asynchronous programming: Familiarity with callbacks and event listeners (we'll explain streams' event system in detail)
  • File system concepts: Basic understanding of what files are and how they're stored

If you're new to Node.js, we recommend starting with introductory Node.js tutorials first.

What We'll Cover in This Article

By the end of this guide, you'll understand:

  • What readable streams are and why they exist
  • How streams work internally with chunks and buffers
  • The four states a stream can be in
  • How to read data using events and methods
  • When to use streams vs loading everything into memory
  • How to handle errors and clean up resources properly

What We'll Explain Along the Way

Don't worry if you're unfamiliar with these—we'll explain them as we go:

  • Chunks and buffers (with visual examples)
  • Backpressure and flow control
  • Event-driven architecture for streams
  • The highWaterMark option and memory management

Understanding the Problem: Why Streams Exist

Let's start with a story. Imagine you're building an application that needs to process a 5GB log file. Here's what happens with two different approaches:

Approach 1: Loading Everything at Once (The Wrong Way)

import { readFile } from "fs/promises";

// ❌ This will consume 5GB of RAM!
async function processLargeFile() {
try {
// Step 1: Load entire file into memory
const data = await readFile("huge-log.txt", "utf-8");

// Step 2: Your computer struggles...
console.log(`Loaded ${data.length} characters`);

// Step 3: Process the data
const lines = data.split("\n");

// Problem: If file is 5GB, you just used 5GB of RAM!
} catch (error) {
console.error("Probably ran out of memory:", error);
}
}

What happens:

  1. Node.js tries to load all 5GB into memory
  2. Your application slows to a crawl
  3. Other parts of your app can't get enough memory
  4. Eventually, the process crashes with "out of memory" error

Approach 2: Using Streams (The Right Way)

import { createReadStream } from "fs";

// ✅ This uses only a small, fixed amount of memory!
function processLargeFile() {
// Step 1: Create a stream (doesn't load data yet)
const stream = createReadStream("huge-log.txt", {
encoding: "utf-8",
highWaterMark: 64 * 1024, // Read 64KB at a time
});

// Step 2: Process data as it arrives, chunk by chunk
stream.on("data", (chunk) => {
// Each chunk is only 64KB
console.log(`Processing ${chunk.length} characters`);
// Process this chunk and move on
});

// Step 3: Know when we're done
stream.on("end", () => {
console.log("Finished! Never used more than 64KB at once.");
});

// Step 4: Handle errors gracefully
stream.on("error", (error) => {
console.error("Error reading file:", error);
});
}

What happens:

  1. Node.js reads only 64KB at a time
  2. You process each chunk and discard it
  3. Memory usage stays constant at ~64KB
  4. Your application remains responsive

See the difference? Streams let you handle massive amounts of data with minimal memory. Let's explore how this magic works.

The Conveyor Belt Analogy: Visualizing Streams

Think of a readable stream like a conveyor belt at a factory:

[Data Source] ═══> [Chunk 1] → [Chunk 2] → [Chunk 3] ═══> [Your Code]
(File) 64KB 64KB 64KB (Processing)

Conveyor Belt Speed: Controlled by backpressure
(slows down if you can't keep up)

Key insights from this analogy:

  1. Packages arrive one at a time - You don't get the whole shipment at once
  2. You control the speed - If you're not ready, the belt slows down
  3. Constant workflow - You're always processing, never waiting for everything to arrive
  4. Efficient use of space - You only need room for one package at a time

Let's see how this translates to actual code.

Creating Your First Readable Stream

Let's build this understanding step by step, starting with the simplest possible example.

Step 1: Import the Module

import { createReadStream } from "fs";
import { Readable } from "stream";

// We'll use these to create and work with streams

What's happening:

  • createReadStream: A built-in function to read files as streams
  • Readable: The base class for all readable streams (useful for understanding properties)

Step 2: Create a Stream

const stream = createReadStream("example.txt", {
encoding: "utf-8", // Read as text, not raw bytes
highWaterMark: 64 * 1024, // Read 64KB chunks
});

// At this point, the stream is created but not reading yet
console.log("Stream created, but not flowing yet");

What's happening behind the scenes:

  1. Node.js opens a file handle to "example.txt"
  2. It allocates a small internal buffer (64KB based on highWaterMark)
  3. The stream is in "paused" state - waiting for you to start reading
  4. No data has been read from disk yet

Step 3: Listen for Data

stream.on("data", (chunk) => {
// This event fires each time a chunk is ready
console.log("Received chunk:");
console.log(` Size: ${chunk.length} characters`);
console.log(` Content preview: ${chunk.slice(0, 50)}...`);

// After this callback finishes, Node.js will read the next chunk
});

What triggers the 'data' event:

  1. You attach a listener to 'data'
  2. Stream automatically switches to "flowing" mode
  3. Node.js starts reading from the file
  4. Each time 64KB is read, 'data' event fires
  5. Process repeats until file is exhausted

Step 4: Know When It's Finished

stream.on("end", () => {
// No more data will arrive
console.log("✅ Finished reading entire file");
console.log("Stream is now closed");
});

Step 5: Handle Errors (Critical!)

stream.on("error", (error) => {
// Something went wrong
console.error("❌ Error occurred:", error.message);

// Common errors:
// - File doesn't exist
// - Permission denied
// - Disk read error
// - File was deleted while reading
});

Why error handling is crucial: Without an error listener, an error will crash your entire application. Always include this!

Complete Working Example: Reading a File with Streams

Here's everything together, with detailed explanations of what happens at each step:

import { createReadStream } from "fs";

/**
* Reads a file using streams and counts lines
*
* Why streams? This works for files of ANY size without loading
* the entire file into memory.
*/
function countLinesInFile(filePath: string): Promise<number> {
return new Promise((resolve, reject) => {
// Track our progress
let lineCount = 0;
let partialLine = ""; // Store incomplete lines between chunks

// Step 1: Create the stream
const stream = createReadStream(filePath, {
encoding: "utf-8",
highWaterMark: 64 * 1024, // 64KB chunks for efficiency
});

// Step 2: Process each chunk as it arrives
stream.on("data", (chunk: string) => {
// Add any partial line from previous chunk
const text = partialLine + chunk;

// Split into lines
const lines = text.split("\n");

// Last element might be incomplete (no \n at end)
partialLine = lines.pop() || "";

// Count complete lines
lineCount += lines.length;

console.log(`Processed chunk: ${lineCount} lines so far`);
});

// Step 3: Handle completion
stream.on("end", () => {
// Don't forget the last line if it exists
if (partialLine.length > 0) {
lineCount++;
}

console.log(`✅ Finished! Total lines: ${lineCount}`);
resolve(lineCount);
});

// Step 4: Handle errors
stream.on("error", (error) => {
console.error(`❌ Error reading ${filePath}:`, error.message);
reject(error);
});
});
}

// Usage
countLinesInFile("large-log.txt")
.then((count) => console.log(`File has ${count} lines`))
.catch((error) => console.error("Failed:", error));

What makes this example special:

  1. Handles incomplete lines - Chunks might split in the middle of a line
  2. Memory efficient - Works on 10GB files using only 64KB memory
  3. Progress tracking - Shows you what's happening in real-time
  4. Proper cleanup - Promises ensure resources are released
  5. Error resilient - Won't crash on missing files

Stream States: The Four Stages of a Stream's Life

Every readable stream goes through different states during its lifetime. Understanding these states helps you debug issues and write better stream code.

Let's explore each state with a real example:

import { createReadStream } from "fs";

const stream = createReadStream("data.txt");

// State 1: Initial State (Just Created)
console.log("=== Initial State ===");
console.log(`readableFlowing: ${stream.readableFlowing}`); // null
console.log(`readableEnded: ${stream.readableEnded}`); // false
console.log(`isPaused: ${stream.isPaused()}`); // false

// State 2: Flowing State (Actively Reading)
stream.on("data", (chunk) => {
// Once we attach this listener, stream starts flowing
console.log("\n=== Flowing State ===");
console.log(`readableFlowing: ${stream.readableFlowing}`); // true
console.log(`readableEnded: ${stream.readableEnded}`); // false
console.log(`isPaused: ${stream.isPaused()}`); // false
});

// State 4: Ended State (All Data Consumed)
stream.on("end", () => {
console.log("\n=== Ended State ===");
console.log(`readableFlowing: ${stream.readableFlowing}`); // true
console.log(`readableEnded: ${stream.readableEnded}`); // true
console.log(`isPaused: ${stream.isPaused()}`); // false
});

Visual State Diagram

┌─────────────┐
│ Created │ readableFlowing: null
│ (Initial) │ readableEnded: false
└──────┬──────┘
│ on('data') attached

┌─────────────┐
│ Flowing │ readableFlowing: true
│ (Reading) │ readableEnded: false
└──────┬──────┘
│ pause() called

┌─────────────┐
│ Paused │ readableFlowing: false
│ (Stopped) │ readableEnded: false
└──────┬──────┘
│ resume() called

┌─────────────┐
│ Flowing │ readableFlowing: true
│ (Reading) │ readableEnded: false
└──────┬──────┘
│ all data consumed

┌─────────────┐
│ Ended │ readableFlowing: true
│ (Complete) │ readableEnded: true
└─────────────┘

State Comparison Table

Stream StatereadableFlowingreadableEndedisPaused()What's Happening
InitialnullfalsefalseStream created, waiting to start
FlowingtruefalsefalseActively reading and emitting data
PausedfalsefalsetrueReading stopped, can be resumed
EndedtruetruefalseAll data consumed, stream closed

Key insight: Notice that when a stream ends, readableFlowing is still true. This seems counterintuitive at first, but it makes sense - the stream finished flowing through all its data. Think of it like a river that flowed until it reached the ocean.

Controlling Stream Flow: Pause, Resume, and Backpressure

One of the most powerful features of streams is flow control. Let's discover how you can control the pace of data processing.

The Backpressure Problem

Imagine you're reading data faster than you can process it. Here's what happens without flow control:

import { createReadStream } from "fs";

// ❌ Potential problem: Processing too slowly
const stream = createReadStream("huge-file.txt");

stream.on("data", async (chunk) => {
// This takes 1 second per chunk
await slowProcessing(chunk);

// But stream keeps sending chunks every 100ms!
// Data piles up in memory...
// Eventually: Out of memory error
});

async function slowProcessing(data: string): Promise<void> {
// Simulate slow processing
return new Promise((resolve) => setTimeout(resolve, 1000));
}

The problem: Stream reads faster than you can process. Data accumulates in memory until your application crashes.

The Solution: Pause and Resume

import { createReadStream } from "fs";

// ✅ Proper flow control
const stream = createReadStream("huge-file.txt", {
highWaterMark: 64 * 1024,
});

stream.on("data", async (chunk) => {
// Step 1: Pause the stream immediately
stream.pause();
console.log("⏸️ Stream paused - processing current chunk");

// Step 2: Process the chunk (takes time)
await slowProcessing(chunk);
console.log("✅ Chunk processed");

// Step 3: Resume to get next chunk
stream.resume();
console.log("▶️ Stream resumed - ready for next chunk");
});

stream.on("end", () => {
console.log("🏁 All data processed at a safe pace");
});

async function slowProcessing(data: string): Promise<void> {
// Simulate time-consuming processing
return new Promise((resolve) => setTimeout(resolve, 1000));
}

What's happening:

  1. Chunk arrives → immediately pause stream
  2. Process chunk at your own pace (no new data arrives)
  3. When done → resume stream for next chunk
  4. Repeat until file is exhausted

Memory usage: Constant! Only one chunk in memory at a time.

Understanding the Internal Buffer

Streams have an internal buffer that stores data temporarily. Let's visualize this:

import { createReadStream } from "fs";

const stream = createReadStream("data.txt", {
highWaterMark: 16, // Very small for demonstration (16 bytes)
});

stream.on("data", (chunk) => {
console.log(`Received: ${chunk.length} bytes`);
console.log(`Buffer size: ${stream.readableLength} bytes waiting`);
console.log(`High water mark: ${stream.readableHighWaterMark} bytes\n`);
});

Visual representation:

File on Disk: [████████████████████████████████████] (1000 bytes)
↓ reading
Internal Buffer: [████] (16 bytes - highWaterMark)
↓ emitting
Your Code: [data event] (processing)

When buffer empties: Read more from disk
When buffer full: Pause reading (backpressure)
When you're slow: Data waits in buffer

Essential Stream Events: Complete Guide

Let's explore all the important events a readable stream can emit:

import { createReadStream } from "fs";

const stream = createReadStream("example.txt", {
encoding: "utf-8",
highWaterMark: 64 * 1024,
});

// Event 1: 'data' - Chunk is available
stream.on("data", (chunk: string) => {
console.log("📦 Data arrived");
console.log(` Size: ${chunk.length} characters`);
// Fires repeatedly until stream ends
});

// Event 2: 'end' - No more data
stream.on("end", () => {
console.log("🏁 End: All data has been read");
// Stream is finished, no more data will arrive
// But stream is not yet closed
});

// Event 3: 'close' - Stream and resources closed
stream.on("close", () => {
console.log("🔒 Close: Stream and underlying resources closed");
// File handle is released
// Memory is freed
// Stream object can be garbage collected
});

// Event 4: 'error' - Something went wrong
stream.on("error", (error: Error) => {
console.error("💥 Error:", error.message);
// CRITICAL: Always handle this!
// Without error handler, your app will crash
});

// Event 5: 'pause' - Stream was paused
stream.on("pause", () => {
console.log("⏸️ Pause: Stream stopped flowing");
// Useful for debugging flow control
});

// Event 6: 'resume' - Stream was resumed
stream.on("resume", () => {
console.log("▶️ Resume: Stream started flowing again");
});

Event Firing Order

Here's the typical sequence of events:

1. 'data' (first chunk arrives)
2. 'data' × N (multiple chunks)
3. [optional] 'pause' / 'resume' (if you control flow)
4. 'end' (all data consumed)
5. 'close' (resources released)

If error occurs at any point:
X. 'error' (immediately)
X. 'close' (cleanup)

Essential Stream Methods and Properties

Let's explore the most important methods and properties you'll use:

Key Methods

import { createReadStream } from "fs";

const stream = createReadStream("data.txt");

// Method 1: pause() - Stop reading
stream.pause();
console.log("Stream paused");
console.log(`Is paused? ${stream.isPaused()}`); // true

// Method 2: resume() - Start/continue reading
setTimeout(() => {
stream.resume();
console.log("Stream resumed");
console.log(`Is paused? ${stream.isPaused()}`); // false
}, 1000);

// Method 3: setEncoding() - Set how to decode bytes
stream.setEncoding("utf-8"); // Read as text
// Options: 'utf8', 'ascii', 'base64', 'hex', etc.

// Method 4: destroy() - Force close stream
stream.destroy();
console.log("Stream destroyed");

// Can optionally include error
stream.destroy(new Error("Something went wrong"));

Key Properties

import { createReadStream } from "fs";

const stream = createReadStream("data.txt", {
encoding: "utf-8",
highWaterMark: 64 * 1024,
});

// Property 1: readableFlowing
// Indicates stream state: null, true, or false
console.log(`Flow state: ${stream.readableFlowing}`);
// null = Not started
// true = Flowing
// false = Paused

// Property 2: readableEnded
// Has stream received all data?
console.log(`Ended? ${stream.readableEnded}`); // false initially
stream.on("end", () => {
console.log(`Ended? ${stream.readableEnded}`); // true after end
});

// Property 3: readableHighWaterMark
// Size of internal buffer
console.log(`Buffer size: ${stream.readableHighWaterMark} bytes`);
// 64KB in our example (64 * 1024)

// Property 4: readableLength
// How much data is currently in buffer
stream.on("data", () => {
console.log(`Buffered data: ${stream.readableLength} bytes`);
});

Common Misconceptions

❌ Misconception: Streams Load Everything into Memory Eventually

Reality: Streams never load the entire dataset into memory. They process one chunk at a time, and each chunk is discarded after processing (unless you explicitly store it).

Why this matters: This is the fundamental benefit of streams. If you're storing every chunk in an array, you're defeating the purpose.

Example:

import { createReadStream } from "fs";

// ❌ Wrong: Defeating the purpose of streams
const allChunks: string[] = [];
stream.on("data", (chunk) => {
allChunks.push(chunk); // Storing everything = memory issues
});

// ✅ Right: Process and discard
stream.on("data", (chunk) => {
processChunk(chunk); // Process immediately
// Chunk is discarded after this function
});

❌ Misconception: The 'end' Event Means the Stream is Closed

Reality: The 'end' event means all data has been consumed, but the stream isn't fully closed until the 'close' event fires. Resources like file handles may still be open.

Why this matters: If you need to ensure files are closed or resources freed, wait for 'close', not just 'end'.

Example:

stream.on("end", () => {
console.log("All data read");
// File handle might still be open!
});

stream.on("close", () => {
console.log("File handle closed");
// NOW it's safe to delete/move the file
});

❌ Misconception: You Can Use Array Methods on Streams

Reality: Streams are NOT arrays. They don't have methods like .map(), .filter(), or .length.

Why this matters: Different data structure, different API. Streams are iterables, not arrays.

Example:

// ❌ Wrong: Streams don't have array methods
const stream = createReadStream("data.txt");
stream.map((chunk) => chunk.toUpperCase()); // Error!

// ✅ Right: Process chunks in event handlers
stream.on("data", (chunk) => {
const processed = chunk.toUpperCase();
// Use the processed data
});

Real-World Use Case: Processing Large CSV Files

Let's see a practical example that brings everything together:

import { createReadStream } from "fs";
import { createInterface } from "readline";

/**
* Process a huge CSV file line by line
* Memory usage: Constant (one line at a time)
*/
async function processCsvFile(filePath: string): Promise<void> {
const stream = createReadStream(filePath);

// readline provides line-by-line interface
const rl = createInterface({
input: stream,
crlfDelay: Infinity, // Handle all line endings
});

let lineNumber = 0;
let headerProcessed = false;
let headers: string[] = [];

for await (const line of rl) {
lineNumber++;

if (!headerProcessed) {
// First line is headers
headers = line.split(",");
headerProcessed = true;
console.log(`Headers: ${headers.join(", ")}`);
continue;
}

// Process data row
const values = line.split(",");
const row: Record<string, string> = {};

headers.forEach((header, index) => {
row[header] = values[index];
});

// Do something with the row
await processRow(row);

if (lineNumber % 1000 === 0) {
console.log(`Processed ${lineNumber} lines...`);
}
}

console.log(`✅ Complete! Processed ${lineNumber} lines total`);
}

async function processRow(row: Record<string, string>): Promise<void> {
// Your processing logic here
// Could save to database, transform data, etc.
}

// Usage
processCsvFile("sales-data-10gb.csv")
.then(() => console.log("CSV processing complete"))
.catch((error) => console.error("Error:", error));

Troubleshooting Common Issues

Problem: Stream Emits 'data' But I Don't See All My Data

Symptoms: You're logging chunks, but the total doesn't match the file size.

Common Causes:

  1. Not waiting for 'end' event (70% of cases)
  2. Encoding issues with binary data (20% of cases)
  3. Stream destroyed prematurely (10% of cases)

Solution:

let totalBytes = 0;
let chunkCount = 0;

stream.on("data", (chunk) => {
chunkCount++;
totalBytes += chunk.length;
console.log(`Chunk ${chunkCount}: ${chunk.length} bytes`);
});

stream.on("end", () => {
console.log(`Total: ${totalBytes} bytes in ${chunkCount} chunks`);

// Verify against actual file size
import { statSync } from "fs";
const stats = statSync("yourfile.txt");
console.log(`File size: ${stats.size} bytes`);

if (totalBytes === stats.size) {
console.log("✅ All data received");
} else {
console.log("❌ Data mismatch - check encoding");
}
});

Prevention: Always accumulate data across all chunks and verify in the 'end' event.

Problem: Application Running Out of Memory

Symptoms: Memory usage grows continuously until crash.

Common Causes:

  1. Storing all chunks instead of processing (50% of cases)
  2. Processing too slow (backpressure not handled) (30% of cases)
  3. Memory leak in processing logic (20% of cases)

Solution:

// ✅ Process and discard
stream.on("data", async (chunk) => {
// Pause while processing
stream.pause();

// Process chunk
await processChunk(chunk);

// Chunk is now out of scope (garbage collected)

// Resume for next chunk
stream.resume();
});

Prevention: Never store all chunks. Process immediately and let garbage collection clean up.

Problem: File Handle Not Released

Symptoms: Cannot delete/move file, "file in use" errors.

Common Causes:

  1. Not waiting for 'close' event (80% of cases)
  2. Error occurred but cleanup didn't run (15% of cases)

Solution:

// ✅ Wait for close, handle errors
stream.on("close", () => {
// NOW file handle is released
performCleanup();
});

stream.on("error", (error) => {
// Even on error, close will fire
console.error("Error, but cleanup will still happen:", error);
});

Prevention: Always use 'close' event for cleanup operations involving file handles.

Summary: Key Takeaways

Let's review what we've discovered on our journey through readable streams:

Core Concepts:

  • ✅ Streams process data in small chunks, not all at once
  • ✅ They're like conveyor belts - data flows continuously
  • ✅ Memory usage stays constant regardless of data size
  • ✅ Perfect for large files, network data, and real-time processing

The Four States:

  • 📍 Initial (readableFlowing: null) - Stream created, waiting to start
  • 📍 Flowing (readableFlowing: true) - Actively reading and emitting data
  • 📍 Paused (readableFlowing: false) - Temporarily stopped
  • 📍 Ended (readableEnded: true) - All data consumed

Critical Events:

  • 📦 'data' - Process each chunk automatically
  • 🏁 'end' - All data received
  • 🔒 'close' - Resources fully released
  • 💥 'error' - Always handle this!

Flow Control:

  • Use pause() when processing is slow
  • Use resume() when ready for more data
  • Let backpressure protect your application
  • The highWaterMark controls buffer size

Best Practices:

  • Always handle 'error' event (prevents crashes)
  • Wait for 'close' before file cleanup
  • Don't store all chunks (defeats the purpose)
  • Use appropriate chunk sizes (64KB is good default)
  • Process and discard chunks immediately

When to Use Streams:

  • ✅ Files larger than 10MB
  • ✅ Network data (HTTP, WebSockets)
  • ✅ Real-time data processing
  • ✅ Memory-constrained environments
  • ❌ Small files where simplicity matters more

What's Next?

Now that you understand readable streams, you're ready to explore:

  • Advanced Stream Techniques - Manual reading with 'readable' event, creating custom streams, and async iterators
  • Writable Streams - Learn how to write data efficiently using streams
  • Transform Streams - Modify data as it flows through your application
  • Piping Streams - Connect multiple streams for powerful data pipelines

You've taken a major step toward mastering Node.js data handling. Streams might seem complex at first, but they're one of the most powerful tools in your toolkit. You now understand how to process massive amounts of data without breaking a sweat - or running out of memory!

Version Information

Tested with:

  • Node.js: v18.x, v20.x, v22.x
  • TypeScript: v5.x

Known Issues:

  • ⚠️ Node.js 16 or lesser: Some stream methods may have different behavior
  • ⚠️ ReadableStream from Web Streams API has different API than Node.js streams

Deprecation Warnings:

  • ❌ Avoid using new Readable() constructor directly (use factory functions)
  • ✅ Use createReadStream() and other built-in factory functions