Batch INSERT Statements for Bulk Data

Why This Matters

Inserting data row-by-row creates significant overhead. Each INSERT statement requires a round-trip to the database, transaction logging, and index updates. For bulk operations, this can take hours instead of minutes.

Batch inserts are essential for:

Data migrations: Importing large datasets from external sources
ETL pipelines: Loading transformed data into Postgres
Bulk imports: CSV files, API responses, or cached data
Initial data seeding: Setting up test or production databases

Warning: Single-row inserts for 10,000 rows can take 30+ seconds. Batch inserts reduce this to under 1 second.

The Problem

Looping through data and inserting one row at a time is extremely slow:

Incorrect - Single-row inserts in a loop

-- Application code: inserting 10,000 rows one at a time
for (item in items) {
  INSERT INTO products (name, price, category_id)
  VALUES (item.name, item.price, item.category_id);
}

-- Result: 10,000 round-trips, 10,000 transaction commits
-- Time: ~30 seconds for 10,000 rows

Each INSERT is a separate statement, requiring network round-trips, parsing, planning, and execution overhead. This scales linearly with the number of rows.

The Solution

Use multi-row INSERT statements to batch multiple rows into a single statement:

Correct - Multi-row INSERT

-- Insert multiple rows in a single statement
INSERT INTO products (name, price, category_id)
VALUES
  ('Product 1', 29.99, 1),
  ('Product 2', 39.99, 1),
  ('Product 3', 49.99, 2),
  -- ... up to 1000 rows per statement
  ('Product 1000', 19.99, 3);

-- Result: 1 round-trip, 1 transaction commit
-- Time: ~0.3 seconds for 10,000 rows (batched in groups of 1000)

For even better performance with very large datasets, use the COPY command:

-- COPY is the fastest method for bulk inserts
COPY products (name, price, category_id)
FROM '/path/to/data.csv'
WITH (FORMAT csv, HEADER true);

-- Or use COPY with STDIN for programmatic inserts
-- COPY products (name, price, category_id) FROM STDIN;
-- [stream data]
-- \.

-- Result: Fastest possible insert speed
-- Time: ~0.1 seconds for 10,000 rows

Best Practice: Batch sizes of 500-1000 rows per INSERT statement provide optimal balance between performance and memory usage. For datasets larger than 100,000 rows, prefer COPY.

Result: Batch inserts reduce insert time from 30 seconds to 0.3 seconds - a 100x improvement. COPY can achieve even better performance for very large datasets.

Check Your Project Automatically

Connect your Supabase project to detect inefficient insert patterns and other performance issues.

Start Supabase Audit

Common Mistakes

Too large batches: Inserting 10,000+ rows in a single statement can cause memory issues and lock contention. Keep batches to 500-1000 rows.
Not using transactions: Wrap batch inserts in a transaction to ensure atomicity and better performance: BEGIN; INSERT ...; COMMIT;
Ignoring COPY for large datasets: For CSV imports or very large datasets, COPY is significantly faster than multi-row INSERT.
Not disabling triggers/indexes temporarily: For one-time bulk loads, temporarily disable triggers and drop/recreate indexes after loading.

Implementation Tips

When implementing batch inserts in your application:

-- JavaScript/Node.js example
const batchSize = 1000;
const batches = [];

// Group items into batches
for (let i = 0; i < items.length; i += batchSize) {
  batches.push(items.slice(i, i + batchSize));
}

// Insert each batch
for (const batch of batches) {
  const values = batch.map(item => 
    `('${item.name}', ${item.price}, ${item.category_id})`
  ).join(', ');
  
  await db.query(`
    INSERT INTO products (name, price, category_id)
    VALUES ${values}
  `);
}

For Python with psycopg2, use execute_values:

# Python example
from psycopg2.extras import execute_values

execute_values(
    cursor,
    "INSERT INTO products (name, price, category_id) VALUES %s",
    [(item['name'], item['price'], item['category_id']) for item in items],
    page_size=1000
)

Why This Matters

The Problem

The Solution

Check Your Project Automatically

Common Mistakes

Implementation Tips

Related Practices

Learn More