-
Notifications
You must be signed in to change notification settings - Fork 5k
Open
Description
Harbor Version: v2.13.1
Database: Postgres (AWS RDS, 4 vCPU - db.m7g.xlarge)
Summary:
When running Tag Retention, our Postgres CPU usage spikes to 90–100% until the job finishes. Normally the DB runs at ~4% CPU, so keeping a large RDS instance only for these jobs is overprovisioning IMHO.
Example repository:
- ~3,403 artifacts
- ~138 GB size
Retention run results looks like:
Retained/Total
199/424
199/420
196/413
197/421
205/430
205/432
205/432
205/431
But during this, Postgres is fully saturated. Scaling from 2 → 4 vCPUs helped only slightly.
Culprit SQL.
From db insights we can see the following SQL query during retention run:
SELECT b.digest_blob
FROM artifact a, artifact_blob b
WHERE a.digest = b.digest_af
AND a.project_id = $1
AND b.digest_blob IN ($2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12);
This query shows up repeatedly and drives CPU usage.
References
- Related issue: Tag deletion performance is very poor with a high number of tags #14708
- Related PR: fix: reduce the high db cpu usage for tag retention #17296
- And this one: The garbage collection job caused harbor-core and postgres high CPU in huge amount of data environment #17653
Even with these changes already merged, in v2.13.1 we still observe the same CPU pressure.
Questions
- Do the you have any thoughts or recommendations to mitigate this?
- Are there indexes or config tweaks that could help with this specific query pattern?
- Any planned optimizations for retention/GC to avoid repeated heavy scans?