reimplementing cabal check

I spent Summer 2022 taking part in Google Summer of Code, working on the cabal-install Haskell package manager. My task, in brief: reimplement the command cabal check.

the problem and proposed solution

cabal check is a conceptually simple program which checks your package for common mistakes (dubious options, missing licence files, etc). As the .cabal specification grew more complex, cabal check did not become smarter: checks were done on unstructured information without context, individual settings (filepaths, options) were first merged into a single soup and then checked.

The nature of the checks disallowed writing simple logic such as “do not warn about GHC flag -xyz if put behind an off-by-default cabal flag”. It also resulted in the creation of spurious warnings, e.g., checks on mutually exclusive branches.

There is only one way to reestablish context-aware and sensible checking: begin from the topmost type representing a .cabal file (GenericPackageDescription), pattern match on fields and initiate checks, unpacking information step by step:

checkExecutable :: Monad m => PackageId -> Executable -> CheckM m ()
checkExecutable pid exe@(Executable
                           exeName_ modulePath_ exeScope_ buildInfo_) = do
        scopeCheck exeScope_
        ⁝

This way:

breaking down what I did

A number of check functions were initially written in 2008, a long time ago in Haskell time. Moreover, most of the checks did not have corresponding tests, which made a rewrite a difficult task to handle. So the first part of my job consisted in:

The complete reimplementation (#8427) was more challenging. To understand the patch structure, note that checks performed by cabal check can be divided in three parts:

  1. pure checks, only needing GenericPackageDescription.
  2. Package checks, related to the content of files which are part of a package, as specified in the .cabal file.
  3. Working tree checks: checks on files which are in the working tree of a project (to appreciate the difference: files in the working tree which are not present in the .cabal manifest — e.g. because we forgot to add them — can only be checked here).

This is further complicated by the previous two checks not strictly happening inside IO. Hackage (or another server) may want to check the content of a loaded .tar.gz archive; in a similar fashion, information on the working tree could be provided by a VCS. For this reason there was a need for some kind of abstraction. To handle all of this (and more) I chose a monad transformer to implement all of these checks, CheckM m a, where the m is a type parameter which can be IO or any other monad that supports operations for non-basic checks.

Before checking even begins we must have a realised configuration (remember: the goal is to avoid a “package soup”). Cabal targets (libraries, executables, benchmarks) come with conditional blocks (datatype: CondTree). Walking the tree, building targets slice by slice is the natural choice which allows us to have real, instantiated targets to check and to provide some kind of context with which to pass them. Working on a CondTree opens up more possibilities, e.g. detecting common dependencies in conditional branches.

challenges

The original plan had to be tweaked a little:

benefits

A number of benefits comes with the rewrite, some immediate, some at later stages:

In summary, the path taken is away from an ad-hoc string-based approach to one of principled and typed checking.

conclusions and future work

I must say that the scope of the work caught me a little by surprise. Types and tests are nice to have during a refactor, but cabal is not a small codebase. I implemented a new approach whilst trying to minimise any breakage of the API or other functionality. This will be useful when the new checking method goes live: logic mistakes in checks will surface immediately and can — and will! — be immediately addressed. I will be on guard, particularly to iron out issues relating to Hackage and third party tools (stack).

The last commit I made for GSoC is on PR #8427, Add changelog for haskell#8427, hash 9b94dde7c1419ac4eff5d57caaa32bf039bd6d0f.

Once the dust settles and the refactor is accepted, I will continue to work on the remaining open issues in the issue tracker.

thanks

Working on an open-source project and enjoying collaboration with other people was something I have wanted to do for a long time. Without support from Google I could not have devoted an entire summer to doing so.

cabal people are splendid, always ready to help, smart, laborious. Writing code with them was satisfying and rewarding. Thank you Andreas, Francesco, Artem, Mikolaj, Emily, Ben, Andrea, Gershom, Kristen, Hécate.

I broke my PC during the project, thanks to mzan for having quickly provided a VM to me to work with. Thanks to Jasper for guiding me through the GSoC process. Thanks to morganw for proofreading this, thanks to #loh for moral support.