How to escape file storage limits without losing visibility in Salesforce

https://n8n.io/workflows/6305-salesforce-to-s3-file-migration-and-cleanup/
If you’ve worked with Salesforce for more than a few years, you’ve almost certainly had this conversation:

“We’re running out of file storage again.”
“Can we just… delete old files?”
“Not without a backup. And we still need to know what was attached to what.”

Salesforce is fantastic for managing customer data, but it’s not a cheap or flexible place to park gigabytes of PDFs, contracts, reports, and exports forever. As orgs mature, files quietly become one of the biggest sources of bloat and a real operational headache.

That’s exactly the problem this Salesforce → S3 File Migration & Cleanup flow is designed to solve.

This article walks through:

  • Why file storage is so painful inside Salesforce
  • The technical limitations when trying to “fix it” purely with Salesforce tools
  • How an n8n + S3–based workflow can automatically archive, clean up, and still give you full visibility from Salesforce

Why large files are painful inside Salesforce

1. File storage is expensive and hard-capped

Salesforce file storage is:

  • Charged separately from data storage
  • Allocated per license and does not auto-scale cheaply
  • Consumed extremely quickly by:
    • Email attachments
    • User uploads (screenshots, exports)
    • Auto-generated reports, signed contracts, etc.

Once you hit the threshold, your options are:

  • Buy more file storage (recurring cost), or
  • Start deleting… carefully.

Neither is attractive if you have compliance requirements or want a proper audit trail.


2. Bulk migration from within Salesforce hits platform limits

You can write Apex and Flows to move or manipulate files, but you quickly run into:

  • Governor limits
    • Heap size limits when working with large binary blobs
    • CPU time limits when processing many files in a single transaction
    • Strict limits on how many SOQL queries / DML operations you can perform
  • REST / Apex callout limits
    • Moving large binaries via callouts is constrained by body size and memory
    • Complex orchestration required if you want to chunk, stream, or retry
  • Operational complexity
    • Handling ContentDocument, ContentVersion, and ContentDocumentLink correctly
    • Ensuring you don’t accidentally delete files still actively in use
    • Building reporting and traceability on what was archived

In short: Salesforce is not a file-migration platform. It’s a CRM platform with some file capabilities.


3. Cleanup without losing access is tricky

Admins usually want three things:

  1. Free up storage
  2. Keep a backup somewhere cheaper (like S3)
  3. Still be able to see, from a Salesforce record, what files existed and where they are now

Out of the box, Salesforce gives you either:

  • Keep files in Salesforce → expensive but convenient, or
  • Export/delete files manually → cheap but blind (no easy way to know what was linked where over time).

What’s missing is a pattern that:

  • Moves the binary out to cheap storage
  • Cleans up the original in Salesforce
  • Leaves behind a clickable, queryable trace of every file that ever existed

The architecture: Salesforce + n8n + S3

The Salesforce to S3 File Migration & Cleanup solution does exactly that using three main components:

  1. Salesforce
    • Source of truth for ContentDocument and ContentDocumentLink
    • Custom object S3_File__c to track archived files
    • Optional LWC to view/download archived files from record pages
  2. n8n workflow
    • Orchestration engine that queries Salesforce, downloads files, pushes them to S3, logs them back, and cleans up
    • Runs on a schedule (e.g., daily) — no manual buttons to push
  3. Amazon S3
    • Long-term, cost-effective file storage
    • Accessible via pre-signed URLs or via your existing data lake patterns

How the n8n flow works (step-by-step)

Here’s what the published n8n workflow does under the hood.

1. Schedule Trigger – “Set it and forget it”

A Schedule Trigger node starts the workflow on an interval you define:

  • Daily at midnight
  • Weekly on Sundays
  • Or any cron expression you like

This turns file cleanup into a regular maintenance job, not a one-off emergency.


2. Query old ContentDocuments from Salesforce

A Salesforce node runs a SOQL query:

SELECT Id, FileExtension, Title, FileType 
FROM ContentDocument 
WHERE CreatedDate < N_DAYS_AGO:365

This means:

  • You only target files older than 365 days (adjustable)
  • You avoid touching recent or active files
  • You control the “retention policy” purely via query logic

3. Loop through each file

A Split In Batches / Loop node processes each ContentDocument one by one, which:

  • Avoids overloading Salesforce or n8n with huge bulk operations
  • Makes it easier to handle failures/retries at a per-file level
  • Keeps resource usage predictable

4. Download file content via REST

For each file, an HTTP Request node calls the Salesforce Files REST API:

  • Uses the ContentDocument Id to fetch the binary content
  • Receives the response as a file/binary object in n8n
  • This avoids needing Apex to handle binary blobs

You sidestep heap size and Apex limits by doing the heavy lifting outside Salesforce.


5. Upload to S3 with the original filename

An AWS S3 node:

  • Uploads the binary file into your S3 bucket (e.g., crmaiinsight)
  • Uses a dynamic filename like:
    Title.FileExtension → e.g., Signed_Contract_2019.pdf

You can extend this easily to:

  • Folder by year / object type
  • Prefix with org or environment
  • Include ContentDocumentId in the key for uniqueness

6. Resolve where the file was linked (ContentDocumentLink)

Next, a Salesforce search node queries ContentDocumentLink:

SELECT Id, LinkedEntityId, ContentDocumentId, IsDeleted 
FROM ContentDocumentLink 
WHERE ContentDocumentId = '{current ContentDocument Id}'

This tells you:

  • Which record(s) the file was attached to (Account, Opportunity, Case, custom object, etc.)
  • Whether there are multiple parents
  • Which parent should be used for your S3_File__c record

7. Filter out user attachments

A Code node then:

  • Filters out any LinkedEntityId that starts with '005'
    • In Salesforce, '005' usually indicates User records
  • Ensures the archival & logging focuses on business records, not user profile photos or chatter avatars

This keeps your S3 log clean and relevant.


8. Log an S3_File__c record back in Salesforce

A Salesforce node creates a record of type S3_File__c with fields such as:

  • Object_Id__c → the LinkedEntityId of the parent record
  • File_Name__cTitle.FileExtension
  • S3_URL__c → the S3 object URL or a pre-signed link pattern

This gives you:

  • Full traceability: every archived file is represented as a record
  • Easy reporting: run SOQL reports on archived files per object, per year, etc.
  • A foundation for UI: you can surface this in related lists or LWCs

9. Prepare and delete the original ContentDocument

Once archived and logged:

  1. A Code node extracts the ContentDocumentId
  2. An HTTP Request node sends a DELETE to Salesforce’s Files API:
DELETE /services/data/vXX.X/connect/files/{ContentDocumentId}

This:

  • Removes the original file from Salesforce
  • Immediately frees up file storage space
  • Leaves behind the S3_File__c record as a breadcrumb

Because you’ve already:

  • Backed it up to S3
  • Logged it via S3_File__c

…you can delete confidently.


10. Notify the team in Slack

Finally, a Slack node posts something like:

“Salesforce to S3 cleanup: X files archived and deleted successfully.”

This gives admins / ops teams:

  • Peace of mind that the job ran
  • A quick alert if something fails or if counts look suspicious
  • An easy audit trail in your team’s Slack history

Keeping backups accessible inside Salesforce

One of the biggest objections to moving files out of Salesforce is:

“But my users still need to see what was there.”

That’s exactly why the solution uses the S3_File__c custom object and (optionally) an LWC.

Typical pattern:

  • S3_File__c has lookups to the parent object (Object_Id__c)
  • A related list or LWC component (s3FilesViewer) is added to the parent page layout
  • Users can:
    • See a list of archived files
    • Click a link to open/download from S3 (often via a pre-signed URL with expiry)

From the user’s perspective:

  • “Old files” are still visible on the record
  • You can visually separate On-Platform Files vs Archived to S3
  • Nothing feels “lost”, even though you’ve dramatically cut storage usage

Why not just do this all in Salesforce?

You could try to build something similar with pure Salesforce tools:

  • Apex batch classes to process ContentDocuments
  • Callouts to S3 from Apex
  • Custom metadata for S3 config
  • Complex retry logic to respect limits

But you’d be fighting:

  • Governor limits (heap, CPU, queries)
  • Deployment cycles and test coverage
  • Operational overhead of monitoring batch jobs

By pushing orchestration into n8n and storage into S3, you get:

  • A visual, low-code workflow that’s easy to maintain
  • Native nodes for HTTP, AWS S3, Slack, etc.
  • Freedom from Salesforce’s runtime constraints for the “heavy lifting”

Salesforce becomes what it’s best at: data + UI + security model, not a file processing engine.


When this pattern is a great fit

This Salesforce → S3 migration & cleanup flow is ideal if:

  • Your org is constantly hitting file storage limits
  • You must retain historical documents but don’t need them on-platform
  • You want traceable, reversible cleanup (you always know what was archived, when, and where)
  • You’re comfortable using or adopting n8n for background jobs and integrations

It’s especially powerful for:

  • Long-running orgs with years of attachments
  • High-volume case or opportunity orgs with lots of PDFs/images
  • Managed service / multi-org consultancies who want a repeatable pattern for clients

Summary

Salesforce’s native file capabilities are great for day-to-day work, but they’re not designed for long-term, large-scale file retention and migration. Storage is expensive, platform limits are strict, and “just delete some files” is never as simple as it sounds.

The Salesforce to S3 File Migration & Cleanup flow:

  • Offloads old files to S3
  • Logs every archived file back into Salesforce via S3_File__c
  • Cleans up the originals to free storage
  • Keeps access easy for users through related lists or custom UI
  • Automates everything on a schedule with Slack notifications

You end up with a Salesforce org that’s lean, compliant, and still user-friendly — and a file archive that lives where big files actually belong.