I spent Summer 2022 taking part in Google Summer of Code, working on
the cabal-install Haskell package manager. My task, in brief:
reimplement the command cabal check
.
cabal check
is a conceptually simple program which checks your package
for common mistakes (dubious options, missing licence files, etc). As the
.cabal
specification grew more complex, cabal check
did not become
smarter: checks were done on unstructured information without context,
individual settings (filepaths, options) were first merged into a single
soup and then checked.
The nature of the checks disallowed writing simple logic such as “do not
warn about GHC flag -xyz
if put behind an off-by-default cabal flag”.
It also resulted in the creation of spurious warnings, e.g., checks on
mutually exclusive branches.
There is only one way to reestablish context-aware and sensible checking:
begin from the topmost type representing a .cabal
file
(GenericPackageDescription
), pattern match on fields and initiate
checks, unpacking information step by step:
checkExecutable :: Monad m => PackageId -> Executable -> CheckM m ()
checkExecutable pid exe@(Executable
exeName_ modulePath_ exeScope_ buildInfo_) = do
scopeCheck exeScope_
⁝
This way:
FilePaths
from GenericPackageDescription
and checking those is not sensible
if, e.g., context dictates only some can be glob patterns);Executable
changes checkExecutable
will not compile (due to
a mismatch in pattern-matched fields), prompting us to add relevant
checks where required;A number of check functions were initially written in 2008, a long time ago in Haskell time. Moreover, most of the checks did not have corresponding tests, which made a rewrite a difficult task to handle. So the first part of my job consisted in:
cabal-check
warning.
Interestingly enough, reproducing some older warnings was not possible
(e.g.: unrecognised testsuite type): today they are correctly caught
by the parser.String
s. I rectified this in #8269 and #8311.The complete reimplementation (#8427) was more challenging. To
understand the patch structure, note that checks performed by cabal check
can be divided in three parts:
GenericPackageDescription
..cabal
file..cabal
manifest — e.g. because we
forgot to add them — can only be checked here).This is further complicated by the previous two checks not strictly
happening inside IO
. Hackage (or another server) may want to check the
content of a loaded .tar.gz
archive; in a similar fashion, information
on the working tree could be provided by a VCS. For this reason there was
a need for some kind of abstraction. To handle all of this (and more) I
chose a monad transformer to implement all of these checks, CheckM m a
, where the m
is a type parameter which can be IO
or any other
monad that supports operations for non-basic checks.
Before checking even begins we must have a realised configuration
(remember: the goal is to avoid a “package soup”). Cabal targets (libraries,
executables, benchmarks) come with conditional blocks (datatype:
CondTree
). Walking the tree, building targets slice by slice is the
natural choice which allows us to have real, instantiated targets to check
and to provide some kind of context with which to pass them.
Working on a CondTree
opens up more possibilities, e.g. detecting common
dependencies in conditional branches.
The original plan had to be tweaked a little:
BuildInfo
, as
an example, is a type with almost 50 fields. Not only that, those
accessors are not that useful by themselves: if you check
Distribution.Types.BuildInfo
, there are a number of functions which
only extract options of a certain flavour (e.g. hcStaticOptions
).
Duplicating those functions would have been non-economical and, most
importantly, bug prone if in the future someone was to tweak the original
definitions in the Type
module. For this reason, I chose not to
deconstruct BuildInfo
and rely on the above-mentioned accessors.CondTree
in one pass.
This happens in two checks: flag usage (checking that every declared flag
is used) and module duplication.
If the first is obvious (we need to fold the tree), the latter
came as a bit of a surprise; alas the current monoidal instances of
various targets calls nub
on module lists, hence rendering any
checking on configured targets meaningless.
Those are the only two checks (out of more than a hundred) which had
to be accommodated in a special fashion.A number of benefits comes with the rewrite, some immediate, some at later stages:
-O2
in --force-o2
will no longer generate
a warning..cabal
files (e.g.: conditional library names).GenericPackageDescription
downwards have pattern
matched fields. Modify the type and the code will break, so the programmer
is gently reminded to add new checks.and . map
are now
all
, the use of catMaybes
is minimised.CheckExplanation
(the type describing warnings) is leaner, being passed
just what is required to display an error.cabal
codebase puts an end to the
awkward split of GenericPackageDescription
vs.
PackageDescription
, we are ready to accommodate that with zero or
minimal refactoring.line:row
numbers in warning output when the
.cabal
parser decides to add support for those (previously it was
not possible because of the souping of information before checks). This
can be beneficial for users of both terminal-based editors and IDEs.In summary, the path taken is away from an ad-hoc string-based approach to one of principled and typed checking.
I must say that the scope of the work caught me a little by surprise.
Types and tests are nice to have during a refactor, but cabal
is not a
small codebase. I implemented a new approach whilst trying to minimise any
breakage of the API or other functionality. This will be useful when the
new checking method goes live: logic mistakes in checks will surface
immediately and can — and will! — be immediately addressed. I will be on
guard, particularly to iron out issues relating to Hackage and third party
tools (stack
).
The last commit I made for GSoC is on PR #8427, Add changelog for haskell#8427
, hash 9b94dde7c1419ac4eff5d57caaa32bf039bd6d0f
.
Once the dust settles and the refactor is accepted, I will continue to work on the remaining open issues in the issue tracker.
Working on an open-source project and enjoying collaboration with other people was something I have wanted to do for a long time. Without support from Google I could not have devoted an entire summer to doing so.
cabal
people are splendid, always ready to help, smart, laborious.
Writing code with them was satisfying and rewarding. Thank you Andreas,
Francesco, Artem, Mikolaj, Emily, Ben, Andrea, Gershom, Kristen, Hécate.
I broke my PC during the project, thanks to mzan for having quickly provided a VM to me to work with. Thanks to Jasper for guiding me through the GSoC process. Thanks to morganw for proofreading this, thanks to #loh for moral support.