You have files sitting in Cloudflare R2 and a user just clicked "Download All." Now what?
R2 doesn't have a built-in "zip these objects" operation. You need to figure it out yourself. After building a file processing API that has archived 550K+ files and 10TB+ on R2, here are the three approaches I've found — each with very different trade-offs.
Approach 1: Pull to a Server and Use the zip Command
The most straightforward approach. Spin up a container (Fargate, Cloud Run, EC2, etc.), pull the files from R2, and run the good old zip command.
# Pull files from R2 and zip them
aws s3 sync s3://your-r2-bucket/files/ /tmp/files/ \
--endpoint-url https://<account-id>.r2.cloudflarestorage.com
zip -r /tmp/archive.zip /tmp/files/
# Upload the archive back to R2 or serve it directly
Pros:
-
Dead simple.
zipis battle-tested and handles everything — compression, large files, edge cases - No ZIP implementation needed. You're not writing any ZIP logic yourself
- Full control over the server environment, compression level, file structure
Cons:
- Disk and memory bound. You need enough disk space to hold all the files + the archive. For large archives (10GB+), this means provisioning beefy instances
- Egress costs if your server isn't on Cloudflare. Pulling files from R2 is free (R2 has zero egress fees), but once the ZIP lives on your AWS/GCP server, serving it to the user or uploading it back to R2 means paying your cloud provider's egress fees
- Infrastructure overhead. You need to manage containers, queues, autoscaling, and cleanup. It's no longer "just zip these files" — it's a whole pipeline
- Not real-time. The user has to wait for the entire download + zip + upload cycle before they can start downloading
Best for: Batch processing, internal tooling, or when you already have server infrastructure and egress costs aren't a concern.
Approach 2: Stream a ZIP in a Cloudflare Worker
Instead of pulling files to a server, you can stream a ZIP archive directly from a Cloudflare Worker. Libraries like JSZip and fflate support streaming, so you can pipe R2 objects through them without buffering entire files.
// Using a streaming ZIP library in a Worker
import { ZipWriter } from 'some-streaming-zip-lib';
export default {
async fetch(request, env) {
const keys = ['file1.pdf', 'file2.jpg', 'file3.csv'];
const { readable, writable } = new TransformStream();
const zipWriter = new ZipWriter(writable);
(async () => {
for (const key of keys) {
const obj = await env.BUCKET.get(key);
await zipWriter.addStream(key, obj.body);
}
await zipWriter.close();
})();
return new Response(readable, {
headers: { 'Content-Type': 'application/zip' }
});
}
};
This works well for simple cases. But things get complicated fast when you need production-level reliability.
Pros:
- Constant memory usage. Only one file chunk in memory at a time
- Zero egress fees. R2 → Worker → client, all within Cloudflare's network
- Streaming. Client starts downloading immediately, no waiting for the full archive
- Horizontal scaling. Workers handle many concurrent requests naturally — high throughput isn't a problem
Cons:
- Per-archive size is limited. Workers have a 15-minute wall clock limit and a subrequest cap per invocation, so large archives (tens of GB+) won't complete in a single run
- Error handling is brutal. If file #500 of 1000 fails mid-stream, you've already sent 499 files to the client. The HTTP response is in-flight — you can't restart or send an error code. The client just gets a truncated ZIP
- Checkpoint/resume requires a custom ZIP implementation. To work around the wall clock limit, you'd need to serialize mid-stream state — CRC32 computations, byte offsets, multipart upload progress — and resume exactly where you left off. At that point, off-the-shelf libraries won't cut it, and you're deep in the ZIP spec implementing local file headers, data descriptors, central directory, and ZIP64 extensions yourself
Best for: Small-to-medium archives where the 15-minute wall clock limit isn't a concern. For anything larger, you'll need either serious engineering investment or a different approach.
Approach 3: Use a ZIP API Service
Instead of building and maintaining streaming ZIP infrastructure yourself, use an API that handles it for you. You send a list of R2 URLs (or presigned URLs), and get back a ZIP.
curl -X POST https://api.eazip.io/jobs \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"files": [
{ "url": "https://your-bucket.r2.dev/file1.pdf" },
{ "url": "https://your-bucket.r2.dev/file2.jpg" },
{ "url": "https://your-bucket.r2.dev/file3.csv" }
]
}'
The service handles streaming, CRC32, ZIP64, error recovery, and checkpoint/resume — all the hard parts — so you don't have to.
Pros:
- One API call. No ZIP implementation to build or maintain
- Handles edge cases you don't want to think about (ZIP64 for large files, Data Descriptors, checkpoint/resume for failures)
- Zero egress if the service also runs on Cloudflare's network
- Scales to 5,000+ files per archive, up to 50GB
Cons:
- Third-party dependency. You're relying on an external service
- Cost. Free tier exists but large-scale usage has costs
- Less control over ZIP structure details
Best for: Teams that want ZIP functionality without building ZIP infrastructure. Ship in an afternoon instead of a sprint.
Full disclosure: I built Eazip because I went through Approach 2 myself and realized most teams shouldn't have to.
Comparison
| Server + zip | Stream in Worker | ZIP API | |
|---|---|---|---|
| Memory | O(total size) | O(chunk size) | N/A |
| Egress cost | Depends on server location | $0 | $0 |
| Max archive size | Limited by disk | Limited by wall clock (15 min) | 50GB |
| Implementation time | Hours | Hours–Weeks | Minutes |
| Maintenance | Medium (infra) | High (ZIP spec edge cases) | None |
| Error recovery | Easy (retry all) | Hard (mid-stream failures) | Built-in |
Which Should You Pick?
Have server infrastructure and don't mind egress costs? → Approach 1. Pull the files, run zip, and move on. Just keep in mind that egress fees add up fast at scale — especially if you're on AWS or GCP.
Want to stay on Cloudflare's network? → Approach 2 works great for small-to-medium archives. But once you hit the wall clock limit or need error recovery, complexity escalates quickly.
Want to ship the feature and move on? → Approach 3. One API call, zero infrastructure, zero egress. You can be done in an afternoon.
The reality is that ZIP archiving looks simple until it isn't. What starts as "just zip these files" turns into managing disk space, egress bills, wall clock limits, or mid-stream error recovery — depending on which approach you choose. I learned this the hard way after archiving 10TB+ of files.
What's your approach? Have you tried something different? Let me know in the comments.
Top comments (0)