Ensuring all values of an ADT are explicitly handled OR finding all occurrences of type X in my app

Hi, If I have the following ADT data BookingState = Confirmed | Cancelled which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ? If that is not possible, is the following possible instead: evolve BookingState and then find all possible occurrences of values of BookingState throughout my app? I'm basically trying to make sure that any a newly added state, which may require special-case handling is not automatically handled by the '_' branch. -- Saurabh.

On 01/30/2017 09:47 PM, Saurabh Nanda wrote:
Hi,
If I have the following ADT
data BookingState = Confirmed | Cancelled
which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ?
Don't write the "_" case? GHC will warn you about any pattern matches you've missed.

I would want the compiler (or linter) to help me here. Think if a
mid-to-large team where everyone may not know (or remember) what the
current best practices are.
On 31 Jan 2017 9:51 am, "Michael Orlitzky"
On 01/30/2017 09:47 PM, Saurabh Nanda wrote:
Hi,
If I have the following ADT
data BookingState = Confirmed | Cancelled
which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ?
Don't write the "_" case? GHC will warn you about any pattern matches you've missed.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Actually in team, one who writes `_` match, is very useful as that prevents breaking code when adding new value... I can't really see any problem here. There is real world use case when member of team don't need to cover all cases therefore `_`. On 01/31/2017 05:59 AM, Saurabh Nanda wrote:
I would want the compiler (or linter) to help me here. Think if a mid-to-large team where everyone may not know (or remember) what the current best practices are.
On 31 Jan 2017 9:51 am, "Michael Orlitzky"
mailto:michael@orlitzky.com> wrote: On 01/30/2017 09:47 PM, Saurabh Nanda wrote: > Hi, > > If I have the following ADT > > data BookingState = Confirmed | Cancelled > > which had a very high chance of being expanded in the future to have more > values. How do I ensure that every pattern match on BookingState matches > each value explicitly. Basically prevent the '_' matcher ? >
Don't write the "_" case? GHC will warn you about any pattern matches you've missed.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Actually in team, one who writes `_` match, is very useful as that prevents breaking code when adding new value...I can't really see any problem here. There is real world use case when member of team don't need to cover all cases therefore `_`.
I **want** code to break when I add a new value to an ADT. I want the programmer to go through all the sites where a pattern match was done on a particular ADT and explicitly consider the impact of the new value on these sites. Actual use-case -- we start with the following ADT definition and call-sites: data BookingStatus = Confirmed | Cancelled | Abandoned
computeRemainingSeats :: BookingStatus -> (...) computeBilling :: BookingStatus -> (...)
Now, assume that within `computeAvailability` only `Confirmed` is explicitly matched to result in a reduction of available seats, everything else is matched by `_` and results in a no-op. What happens when `BookingStatus` evolves to have a new value of `ManualReview`? We want to reduce the number of seats till the manual-review of the booking is complete. However, the compiler will not force us to look at this particular call-site to evaluate this impact. Another one: assume that within `computeBilling` only `Confirmed is explicitly matched to trigger an invoicing action, others are matched via `_` to a no-op. What happens when `BookingStatus` evolves to have a new value of `Refunded`, which should trigger a credit-note action? Again, the compiler is not going to help us here. One may reason that this is conflation of concerns; that this should be split into **three** ADTs -- one for BookingStatus, one for AvailabilityStatus and one for BillingStatus. And pedantically this might be right. But all of these learnings evolve over time and one may not make the right decisions when writing v1 of the system. Under such circumstances, can Haskell still help in helping the programmer ensure correctness? -- Saurabh.

There is a proposal about that
https://github.com/ghc-proposals/ghc-proposals/pull/43
2017-01-31 10:41 GMT+05:00 Saurabh Nanda
Actually in team, one who writes `_` match, is very useful as that
prevents breaking code when adding new value...I can't really see any problem here. There is real world use case when member of team don't need to cover all cases therefore `_`.
I **want** code to break when I add a new value to an ADT. I want the programmer to go through all the sites where a pattern match was done on a particular ADT and explicitly consider the impact of the new value on these sites.
Actual use-case -- we start with the following ADT definition and call-sites:
data BookingStatus = Confirmed | Cancelled | Abandoned
computeRemainingSeats :: BookingStatus -> (...) computeBilling :: BookingStatus -> (...)
Now, assume that within `computeAvailability` only `Confirmed` is explicitly matched to result in a reduction of available seats, everything else is matched by `_` and results in a no-op. What happens when `BookingStatus` evolves to have a new value of `ManualReview`? We want to reduce the number of seats till the manual-review of the booking is complete. However, the compiler will not force us to look at this particular call-site to evaluate this impact.
Another one: assume that within `computeBilling` only `Confirmed is explicitly matched to trigger an invoicing action, others are matched via `_` to a no-op. What happens when `BookingStatus` evolves to have a new value of `Refunded`, which should trigger a credit-note action? Again, the compiler is not going to help us here.
One may reason that this is conflation of concerns; that this should be split into **three** ADTs -- one for BookingStatus, one for AvailabilityStatus and one for BillingStatus. And pedantically this might be right. But all of these learnings evolve over time and one may not make the right decisions when writing v1 of the system. Under such circumstances, can Haskell still help in helping the programmer ensure correctness?
-- Saurabh.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

There is a proposal about that https://github.com/ghc- proposals/ghc-proposals/pull/43
Thanks for the link. Added my 2 cents there: https://github.com/ghc-proposals/ghc-proposals/pull/43#issuecomment-27628340... While this is being considered for inclusion in core GHC, can a linter help, in the meantime? -- Saurabh.

We want this at my company too, so we don't write fall-through cases.
On Mon, Jan 30, 2017 at 11:41 PM, Saurabh Nanda
Actually in team, one who writes `_` match, is very useful as that prevents breaking code when adding new value...I can't really see any problem here. There is real world use case when member of team don't need to cover all cases therefore `_`.
I **want** code to break when I add a new value to an ADT. I want the programmer to go through all the sites where a pattern match was done on a particular ADT and explicitly consider the impact of the new value on these sites.
Actual use-case -- we start with the following ADT definition and call-sites:
data BookingStatus = Confirmed | Cancelled | Abandoned computeRemainingSeats :: BookingStatus -> (...) computeBilling :: BookingStatus -> (...)
Now, assume that within `computeAvailability` only `Confirmed` is explicitly matched to result in a reduction of available seats, everything else is matched by `_` and results in a no-op. What happens when `BookingStatus` evolves to have a new value of `ManualReview`? We want to reduce the number of seats till the manual-review of the booking is complete. However, the compiler will not force us to look at this particular call-site to evaluate this impact.
Another one: assume that within `computeBilling` only `Confirmed is explicitly matched to trigger an invoicing action, others are matched via `_` to a no-op. What happens when `BookingStatus` evolves to have a new value of `Refunded`, which should trigger a credit-note action? Again, the compiler is not going to help us here.
One may reason that this is conflation of concerns; that this should be split into **three** ADTs -- one for BookingStatus, one for AvailabilityStatus and one for BillingStatus. And pedantically this might be right. But all of these learnings evolve over time and one may not make the right decisions when writing v1 of the system. Under such circumstances, can Haskell still help in helping the programmer ensure correctness?
-- Saurabh.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Chris Allen Currently working on http://haskellbook.com

Btw, what about the other approach -- listing all possible occurrences of
type X throughout the app?
-- Saurabh.
On Tue, Jan 31, 2017 at 12:15 PM, Saurabh Nanda
We want this at my company too, so we don't write fall-through cases.
That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness.
-- Saurabh.

I've done exactly this a number of times. The approach I generally take is
to define a completely new ADT, mark the old one as deprecated, and then
simply plough through the compiler warnings and errors, which is a mostly
mechanical process.
On occasion I've written a (temporary) injection from the old datatype into
the new one which lets you make the changes piecemeal (by topologically
sorting the usages) which is nice as if there's any risk of making a
mistake then tools like `git bisect` can help you find the slip in the sea
of otherwise identical changes. Doesn't always work smoothly but it's often
ok.
This also has the advantage that you don't have to fix all the usages in
third-party code or dependent libraries straight away. The deprecation step
need not be immediate, depending on how stable your API is supposed to be.
Cheers,
David
On 31 Jan 2017 06:48, "Saurabh Nanda"
Btw, what about the other approach -- listing all possible occurrences of type X throughout the app?
-- Saurabh.
On Tue, Jan 31, 2017 at 12:15 PM, Saurabh Nanda
wrote: We want this at my company too, so we don't write fall-through cases.
That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness.
-- Saurabh.
-- http://www.saurabhnanda.com
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I can envisage some sort of tooling to fill in all the unmatched cases,
based on the
GHC warnings. Which can make the editing process trivial, for the "make it
work quickly" case.
On 31 January 2017 at 10:11, David Turner
I've done exactly this a number of times. The approach I generally take is to define a completely new ADT, mark the old one as deprecated, and then simply plough through the compiler warnings and errors, which is a mostly mechanical process.
On occasion I've written a (temporary) injection from the old datatype into the new one which lets you make the changes piecemeal (by topologically sorting the usages) which is nice as if there's any risk of making a mistake then tools like `git bisect` can help you find the slip in the sea of otherwise identical changes. Doesn't always work smoothly but it's often ok.
This also has the advantage that you don't have to fix all the usages in third-party code or dependent libraries straight away. The deprecation step need not be immediate, depending on how stable your API is supposed to be.
Cheers,
David
On 31 Jan 2017 06:48, "Saurabh Nanda"
wrote: Btw, what about the other approach -- listing all possible occurrences of type X throughout the app?
-- Saurabh.
On Tue, Jan 31, 2017 at 12:15 PM, Saurabh Nanda
wrote: We want this at my company too, so we don't write fall-through cases.
That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness.
-- Saurabh.
-- http://www.saurabhnanda.com
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

2017-01-31 7:45 GMT+01:00 Saurabh Nanda
We want this at my company too, so we don't write fall-through cases.
That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness.
I don't see this as a workaround, this is *the* way to go IMHO. Using '_' is saying: "I know what I'm doing here, for all eternity, trust me...", so you get what you ask for. Unless you don't care about an argument at all, using '_' is counterproductive for maintenance. But that's the usual tension between being able to write something down quickly which works *now* and writing something which will be maintained for a long time by lots of people. So '_' itself is fine, but you should be aware of what kind of SW you are writing. Implicitness will always hurt you sooner or later, it is only a matter of time, and '_' has a very implicit flavor. For exactly this reason, using C++'s "default" case was banned in the last 2 companies I've worked for, and this turned out to be very beneficial: Finding all the places where a "default" was not really the default anymore hit us several times and resulted in actual bugs in released SW. Banning '_', just like Christopher mentioned, seems to be a sensible approach.

Branimir Maksimovic wrote:
Actually in team, one who writes `_` match, is very useful as that prevents breaking code when adding new value...
I write Haskell code in a team of 15 engineers and wild card pattern matches are considered a bad idea exactly *because* we want compiler errors when a sum type is extended with a new constructor. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

I write Haskell code in a team of 15 engineers and wild card pattern matches are considered a bad idea exactly *because* we want compiler errors when a sum type is extended with a new constructor
How would you respond to this comment: https://github.com/ghc-proposals/ghc-proposals/pull/43#issuecomment-27629203... -- Saurabh.

On Jan 30, 2017 21:00, "Saurabh Nanda"
On 01/30/2017 09:47 PM, Saurabh Nanda wrote:
Hi,
If I have the following ADT
data BookingState = Confirmed | Cancelled
which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ?
Don't write the "_" case? GHC will warn you about any pattern matches you've missed.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I think the best solution here would be a new warning, say -fwarn-wildcard-patterns. This would be a simple syntactic check for a wildcard pattern anywhere in the module. Combined with -Werror you would be prevented from compiling programs with wildcards. I think this would be quite valuable given the rest of the comments in the thread, and probably a simple addition to GHC. Eric
On Jan 30, 2017, at 20:59, Saurabh Nanda
wrote: I would want the compiler (or linter) to help me here. Think if a mid-to-large team where everyone may not know (or remember) what the current best practices are.
On 31 Jan 2017 9:51 am, "Michael Orlitzky"
wrote: On 01/30/2017 09:47 PM, Saurabh Nanda wrote: Hi,
If I have the following ADT
data BookingState = Confirmed | Cancelled
which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ?
Don't write the "_" case? GHC will warn you about any pattern matches you've missed.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I think the best solution here would be a new warning, say -fwarn-wildcard-patterns. This would be a simple syntactic check for a wildcard pattern anywhere in the module.
Note that case <foo> | [] -> ... | (_ : xs) -> ... also contains a wildcard-pattern. So emitting a warning for every use of wildcard patterns would likely lead to a lot of pain. You'd instead want to warn about "default branch", e.g. case <foo> | [] -> ... | (1 : xs) -> ... | (_ : xs) -> ... here the wildcard pattern does correspond to a "default branch" and might hence deserve a warning. Stefan

Note that
case <foo> | [] -> ... | (_ : xs) -> ...
also contains a wildcard-pattern. So emitting a warning for every use of wildcard patterns would likely lead to a lot of pain.
Yes, good point, that would be too restrictive. When I said wildcard-pattern I was thinking specifically of a top-level wildcard, so your example would be accepted, but e.g. case ... of [] -> ... _ -> ... would be rejected.
You'd instead want to warn about "default branch", e.g.
case <foo> | [] -> ... | (1 : xs) -> ... | (_ : xs) -> ...
here the wildcard pattern does correspond to a "default branch" and might hence deserve a warning.
This sounds promising, but how would you define “default branch”? Seems like it could be an involved definition, which could make the warning unpredictable for users.

You'd instead want to warn about "default branch", e.g.
case <foo> | [] -> ... | (1 : xs) -> ... | (_ : xs) -> ...
here the wildcard pattern does correspond to a "default branch" and might hence deserve a warning.
This sounds promising, but how would you define “default branch”? Seems like it could be an involved definition, which could make the warning unpredictable for users.
A "default branch" seems to correspond to a wildcard overlapping a previous pattern. This would be a warning symmetrical to -Woverlapping-patterns. - A wildcard which overlaps with a pattern below it makes the latter unreachable, which is certainly not intentional. This is caught by -Woverlapping-patterns. case x of _ -> y C -> z - A wildcard which overlaps with a pattern above it has the risk mentionned in this thread, that it will catch any new constructor added to the corresponding ADT, and thus the programmer may forget to update some case expressions when the new constructor is to be handled differently. case x of C -> y _ -> z Li-yao

On Tue, Jan 31, 2017 at 12:56 PM, Li-yao Xia
A "default branch" seems to correspond to a wildcard overlapping a previous pattern. This would be a warning symmetrical to -Woverlapping-patterns.
- A wildcard which overlaps with a pattern below it makes the latter unreachable, which is certainly not intentional. This is caught by -Woverlapping-patterns.
case x of _ -> y C -> z
- A wildcard which overlaps with a pattern above it has the risk mentionned in this thread, that it will catch any new constructor added to the corresponding ADT, and thus the programmer may forget to update some case expressions when the new constructor is to be handled differently.
case x of C -> y _ -> z
Li-yao
Excellent. That also elegantly covers the case of functions which ignore the argument, like "const x _ = x". Good thinking! --Will

I feel like there's no good technical solution here. For example, this is
potentially dangerous in the face of a changing data type, but presumably
wouldn't fall under the definition of a "default branch":
data T = A | B -- Later, C could be added which might need special handling
case ... of
[] -> ...
(A : xs) -> ...
(_ : xs) -> ...
On the other hand, this would presumably be a "default branch" but I
challenge you to replace the wildcard pattern with an exhaustive list of
pattern matches:
case ... of
0 -> ...
1 -> ...
_ -> ...
Of course this is a special case, but there are many data types in the wild
with a large number of constructors: think of generated enumerations, for
example.
I'd say education is the best option here. I thought wildcards were
convenient, until I realized their effect on maintainability in the face of
future data type changes.
Erik
On 31 January 2017 at 18:31, Eric Seidel
Note that
case <foo> | [] -> ... | (_ : xs) -> ...
also contains a wildcard-pattern. So emitting a warning for every use of wildcard patterns would likely lead to a lot of pain.
Yes, good point, that would be too restrictive. When I said wildcard-pattern I was thinking specifically of a top-level wildcard, so your example would be accepted, but e.g.
case ... of [] -> ... _ -> ...
would be rejected.
You'd instead want to warn about "default branch", e.g.
case <foo> | [] -> ... | (1 : xs) -> ... | (_ : xs) -> ...
here the wildcard pattern does correspond to a "default branch" and might hence deserve a warning.
This sounds promising, but how would you define “default branch”? Seems like it could be an involved definition, which could make the warning unpredictable for users. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
participants (15)
-
Alan & Kim Zimmerman
-
Alexey Vagarenko
-
Branimir Maksimovic
-
Bryan Richter
-
Christopher Allen
-
David Turner
-
Eric Seidel
-
Erik de Castro Lopo
-
Erik Hesselink
-
Li-yao Xia
-
Michael Orlitzky
-
Saurabh Nanda
-
Stefan Monnier
-
Sven Panne
-
William Yager