orion
anderson

Delete Untagged Images on Google Artifact Registry

Using Cloud Build and Cloud Scheduler

October 19, 2021
Ta Prohm Temple, Banteay Chhmar, Cambodia
Ta Prohm Temple, Banteay Chhmar, Cambodia ( unsplash.com )

When we push a container image to Google Artifact Registry with the same tag as an existing image, the tag on the existing image is removed, leaving us an image with no tag.

Empty Tags

All images take up space and count towards your bill. Assuming you have no use for the untagged images, wouldn’t it be nice if we could remove all of them at once? What if we also wanted to clean out unwanted tagged images as well?

Existing Solutions

Unlike Amazon’s ECR Lifecycle Policies, there is currently no official way to do this on GCP, but we have options.

I’ve seen a few ways to tackle this issue, notably GCR Cleaner, which is mentioned in Google’s Container Registry’s official documentation. There are also some good methods in this Stack Overflow question.

Give these approaches a read to see if they might fit your use case.

My Solution

My solution uses Cloud Build and Cloud Scheduler with a dash of PowerShell and shell scripting. It is a bit simpler than the solutions referenced above, but works well.

The commands below are for Google Cloud Artifact Registry (not to be confused with Container Registry). You can easily modify the gcloud commands below to accommodate the latter.

Prerequisites

List of Images to Clean

This is the list of repositories you want to clean out, in a text file.

# repos-to-clean.txt

us-central1-docker.pkg.dev/project/cool-app
us-central1-docker.pkg.dev/project/awesome-app
EOF

Cloud Build Script

# cloudbuild.yaml

steps:
  - id: Generate list of all images
    name: gcr.io/cloud-builders/gcloud
    entrypoint: sh
    args:
      - -c
      - |
        mkdir images
        while read i; do
          gcloud artifacts docker images list "$i" \
            --include-tags --format=json > "./images/$(echo $i | sed -e 's/\//-/g').json"
        done < ./repos-to-clean.txt
  - id: New list of images to delete
    name: mcr.microsoft.com/powershell
    entrypoint: pwsh
    args:
      - ./New-ImageList.ps1
  - id: Delete images
    name: gcr.io/cloud-builders/gcloud
    entrypoint: sh
    args:
      - -c
      - |
        if test -f clean.txt; then
          while read i; do
            gcloud artifacts docker images delete --async --delete-tags --quiet "$i"
          done < clean.txt
        else
          echo "No images to clean."
        fi

First read the repos-to-clean.txt file into a short shell script. For each line (repository), use gcloud to list Artifact Registry container images for the repo, and send the output to a json file at ./images. To name the output json file, use the repository name, but replace / with -. We’ll use these files in PowerShell below.

PowerShell Script

# New-ImageList.ps1

Get-ChildItem -Path ./images | ForEach-Object {
    $Images = Get-Content $_ | ConvertFrom-Json | Sort-Object createTime -Descending

    $Images | Where-Object {$_.tags -eq ""} | ForEach-Object {
        $p, $v = $_.package, $_.version
        Add-Content clean.txt "$p`@$v"
    }

    $Images | Where-Object {$_.tags -ne ""} | Select-Object -Skip 3 | ForEach-Object {
        $p, $v = $_.package, $_.version
        Add-Content clean.txt "$p`@$v"
    }
}

The PowerShell script reads each json file, parses it into an object, and sorts by date. The object represents all of the container images in the repository currently being processed, sorted from newest to oldest. PowerShell looks at the object tag values. It will find empty tags, and in this example, tags where the image count is greater than 3. If it finds any, it adds the repository name and sha256 hash to a text file.

At this point the CloudBuild script will read through the output text file and delete those images.

You can change -Skip to a larger value to keep more tagged images, or just omit that step.

This is a destructive process. Please take care.

The End

I created a Cloud Scheduler job to run this automatically once a week. Instead of using Cloud Scheduler to trigger your build, you could certainly integrate the steps into an existing build, or run it manually.

Hope this helps some of you clean out your repositories. Happy coding!

Contact Form

Form submissions are sent to my email inbox. Besides reading your comment, I don't do anything with your information. Caveat: I log IPs to prevent spam.

Enter a valid email if you want a response.