Cloudsmith

What happens when you upload a Package?

Packages in Cloudsmith repository are so much more than just “bits on disk”, and treating them as such is really doing them a disservice!

Dan McKinney

Oct 23, 2024 • 2 min read

It’s a fairly common perception that a package repository is basically a file share or file storage, and perhaps for some of the most simple implementations, this is a reasonable analogy.

However, when thinking of Cloudsmith, this analogy misses a lot of important details that make Cloudsmith package repositories rather unique.

Synchronization - what is it good for?

If you have used Cloudsmith, you may have noticed that when you publish/upload a package to Cloudsmith or fetch a package from a public package repository (Like Maven Central or PyPi) into Cloudsmith, the first thing that happens is that the package enters a “Synchronizing” state. What exactly is synchronizing?Put simply, synchronization is where Cloudsmith processes the uploaded/ingested package. But that still belies a lot of the detail. What possible processing could a package need? Well, quite a lot actually!

Some of the steps that synchronization/package processing involves are:

Initialization - Setting up the initial state, tools and environment for synchronization.
Retrieving - Getting the package files from the upload storage location.
Assembling - Extracting the package files and package assembly (layers and configs for Docker images, for example)
Malware Scanning - Once we have the complete package file set, we scan the files for trojans, malicious content etc.
Parsing - Verify and generate package checksums and signatures, as well as parse and verify package metadata and licenses.
Final Synchronization - Local and Distributed Storage synchronization.

These processes are automatic, require no user interaction and run asynchronously on Cloudsmith's global infrastructure. They are a large part of what empowers Cloudsmith users to implement effective package controls. As they say, knowledge is power - and it’s by the process of synchronization that we gain the knowledge of packages.

How does this help me?

Once a package has been synchronized, we can then use the metadata generated to apply things like Vulnerability, Licence and Package Deny policies, create a scoped access token, add tags to the package, or fire a webhook for specific packages/versions. The data generated from synchronization drives a lot of the subsequent actions and workflows that you can perform.

Also, you may encounter occasions where a package fails synchronization:

This is typically a good thing (contrary to initial impressions!) because it can alert you to a problem with the package itself such as invalid/missing/incorrect metadata (the package not meeting the specification for the package type, for example), the presence of Malware in the package, or that you are attempting to upload/publish a package that already exists in the repository (as above). Package synchronization is an essential step in verifying the “correctness” of a package and It’s always better to catch things earlier in your processes than later, as the cost of remediation rises dramatically the later issues are identified.

In summary:

Cloudsmith Package Repositories do far more than just store your packages, and they have a lot more functionality than just storing your packages in an AWS S3 bucket or Azure Blob Storage, or spinning up a simplistic instance of a package repository. Packages in Cloudsmith repository are so much more than just “bits on disk”, and treating them as such is really doing them a disservice!

Liked this article? Don\'t be selfish (:-), share with others: Tweet

The source of truth for software everywhere.

Cloudsmith optimizes your software supply chain from source to delivery — with complete trust, control, and security.

Start Free Trial