Saving space on PyPI

We ran into a problem when we tried to release 11.0.0rc1:

Project size too large. Limit for project ‘ansible’ total size is 10 GB.

@felixfontein requested to increase the limit, but this might take some time. In order to be able to release 11.0.0rc, @gundalow deleted some yanked releases.

In the long run, I think having a higher limit would be the best solution. However, we also should discuss what we can do to save space. One suggestion is to remove old pre-releases. Although pre-releases prior to 2.10 should probably be consulted about with core.

Thanks for starting the discussion
Earlier today, I deleted the following releases which had been yanked previously:

  • 10.0.0: contains extra files which shouldn’t be included
  • 9.6.0: contains extra files which shouldn’t be included
  • 9.5.0: Accidently contains breaking change
  • 9.0.0: (no reason given)

I count ~35 pre-release from 3.0.0b1 (Feb 2021) through 10.0.0rc1 (May 2024) which I think would be safe to delete (though maybe we yank them first? That should give us some space more space back while we wait for the request to be processed.

I’m not a fan of deleting existing releases. Deleting yanked and pre-releases is IMO OK as a last resort (as in the current case), but I would avoid even deleting pre-releases if possible.

3 Likes

@felixfontein Could you please outline your concerns, there may well be an impact I haven’t considered for deleting pre-releases

Someone noticed that 9.0.0 is gone: Ansible 9.0.0 not available anymore from the pip repo? · Issue #500 · ansible-community/ansible-build-data · GitHub

@gundalow: it’s mainly a personal preference, IMO releases are immutable and should stay there forever, if there aren’t very good reasons for not keeping them. (Like having malicious code in them, or some legal reasons.) The pre-releases are part of the release history like every other release.

But I’d definitely still prefer deleting them over not being able to publish new releases :slight_smile:

1 Like

Agreed that deleting releases shouldn’t be taken lightly, but judging from prior experience, we have no idea when someone might get around to evaluating the PyPI quota increase request. If hard decisions have to be made, I’d suggest starting with the oldest alphas, then oldest betas, and so on, and only as-needed (ie, not pre-emptively killing off all old pre-releases).

1 Like

I like @nitzmahone’s proposal. The oldest pre-releases are for Ansible 2.5, so 2.5.0a1 would be the first release to be deleted once we need more space (but not before that).

Does anyone have better/other suggestions?

To figure out the size of the ansible PyPI repository without having admin access to it, you can use this Python snippet:

import requests
r = requests.get('https://pypi.org/simple/ansible/', headers={'Accept': 'application/vnd.pypi.simple.v1+json'}).json()
size = sum(file['size'] for file in r['files'])
print(f"{size / 1024**3:0.4g} GiB")

The current size is 9.731 GiB.

(Thanks to @nitzmahone for figuring most of this out, I just hacked a script together that summed up the numbers :wink: )

2 Likes

Preciously we have 3 problems in hand :

  1. Monitor the space in PyPI and request increase when needed
  2. Come up with the rules of deletion of released packages (if/when needed and which one to be deleted, the process to be followed before and after the deletion)
  3. How and where to archive the deleted pacakges.

Now for a part of the rule I agree with @felixfontein 's first comment on deleting the existing release.

I wanted to also spell out how people are able hit yanked releases: pip install ansible never considers them during dependency resolution. But if somebody pinned it to the exact release, only then it’ll be installed.
So people affected by fully removing such releases are going to be those who favor reproducible deployments and pin ansible in their requirements files and scripts. We’ve seen one case, evidently, but there may be people having such pins but running their automation periodically. JFTR.

Additionally, PyPI stats are available via BigQuery: Statistics · PyPI. We should be able to inspect it somehow and verify that the releases being removed have low downloads.

Im not sure if I understand your proposal, so just to clarify:

  1. 2.5.0a1, 2.6.0a1, 2.6.0a2… 11.0.0a2, 2.5.0b1… or
  2. 2.5.0a1, 2.5.0b1, 2.5.0b2, 2.5.0rc1, 2.5.0rc2, 2.5.0rc3, 2.6.0a1…

So alphas from oldest to newest, then betas from oldest to newest and then rcs from oldest to newest or generally pre-releases from oldest to newest? I tend to the latter.

I fully agree. Let’s not do this generally, only when (as you put it) hard decisions have to be made in order to be able to do a new release.

The proposal is the former: first delete all a1 pre-releases (from oldest to newest), then all a2 pre-releases, etc.

I hope it won’t come to this, but this could mean we might have to delete the current 11.0.0a1 and 11.0.0a2 releases while keeping 2.5.0b1 which is 6 1/2 years old.

Why is it more important to keep old betas than current / pretty new alphas? I’m open to both, I just want to understand. I would have said the other way round makes more sense.