Why This Matters
Inserting data row-by-row creates significant overhead. Each INSERT statement requires a round-trip to the database, transaction logging, and index updates. For bulk operations, this can take hours instead of minutes.
Batch inserts are essential for:
- Data migrations: Importing large datasets from external sources
- ETL pipelines: Loading transformed data into Postgres
- Bulk imports: CSV files, API responses, or cached data
- Initial data seeding: Setting up test or production databases
Warning: Single-row inserts for 10,000 rows can take 30+ seconds. Batch inserts reduce this to under 1 second.
The Problem
Looping through data and inserting one row at a time is extremely slow:
-- Application code: inserting 10,000 rows one at a time
for (item in items) {
INSERT INTO products (name, price, category_id)
VALUES (item.name, item.price, item.category_id);
}
-- Result: 10,000 round-trips, 10,000 transaction commits
-- Time: ~30 seconds for 10,000 rows
Each INSERT is a separate statement, requiring network round-trips, parsing, planning, and execution overhead. This scales linearly with the number of rows.
The Solution
Use multi-row INSERT statements to batch multiple rows into a single statement:
-- Insert multiple rows in a single statement
INSERT INTO products (name, price, category_id)
VALUES
('Product 1', 29.99, 1),
('Product 2', 39.99, 1),
('Product 3', 49.99, 2),
-- ... up to 1000 rows per statement
('Product 1000', 19.99, 3);
-- Result: 1 round-trip, 1 transaction commit
-- Time: ~0.3 seconds for 10,000 rows (batched in groups of 1000)
For even better performance with very large datasets, use the COPY command:
-- COPY is the fastest method for bulk inserts
COPY products (name, price, category_id)
FROM '/path/to/data.csv'
WITH (FORMAT csv, HEADER true);
-- Or use COPY with STDIN for programmatic inserts
-- COPY products (name, price, category_id) FROM STDIN;
-- [stream data]
-- \.
-- Result: Fastest possible insert speed
-- Time: ~0.1 seconds for 10,000 rows
Best Practice: Batch sizes of 500-1000 rows per INSERT statement provide optimal balance between performance and memory usage. For datasets larger than 100,000 rows, prefer COPY.
Result: Batch inserts reduce insert time from 30 seconds to 0.3 seconds - a 100x improvement. COPY can achieve even better performance for very large datasets.
Check Your Project Automatically
Connect your Supabase project to detect inefficient insert patterns and other performance issues.
Start Supabase AuditCommon Mistakes
- Too large batches: Inserting 10,000+ rows in a single statement can cause memory issues and lock contention. Keep batches to 500-1000 rows.
- Not using transactions: Wrap batch inserts in a transaction to ensure atomicity and better performance:
BEGIN; INSERT ...; COMMIT; - Ignoring COPY for large datasets: For CSV imports or very large datasets,
COPYis significantly faster than multi-row INSERT. - Not disabling triggers/indexes temporarily: For one-time bulk loads, temporarily disable triggers and drop/recreate indexes after loading.
Implementation Tips
When implementing batch inserts in your application:
-- JavaScript/Node.js example
const batchSize = 1000;
const batches = [];
// Group items into batches
for (let i = 0; i < items.length; i += batchSize) {
batches.push(items.slice(i, i + batchSize));
}
// Insert each batch
for (const batch of batches) {
const values = batch.map(item =>
`('${item.name}', ${item.price}, ${item.category_id})`
).join(', ');
await db.query(`
INSERT INTO products (name, price, category_id)
VALUES ${values}
`);
}
For Python with psycopg2, use execute_values:
# Python example
from psycopg2.extras import execute_values
execute_values(
cursor,
"INSERT INTO products (name, price, category_id) VALUES %s",
[(item['name'], item['price'], item['category_id']) for item in items],
page_size=1000
)