Ensuring all values of an ADT are explicitly handled OR finding all occurrences of type X in my app

newer
Second Call for Papers: ICFP 2017

Saurabh Nanda

31 Jan 2017 31 Jan '17

2:47 a.m.

Hi, If I have the following ADT data BookingState = Confirmed | Cancelled which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ? If that is not possible, is the following possible instead: evolve BookingState and then find all possible occurrences of values of BookingState throughout my app? I'm basically trying to make sure that any a newly added state, which may require special-case handling is not automatically handled by the '_' branch. -- Saurabh.

Attachments:

attachment.html (text/html — 936 bytes)

Show replies by date

Michael Orlitzky

31 Jan 31 Jan

4:19 a.m.

On 01/30/2017 09:47 PM, Saurabh Nanda wrote:

...

Hi,

If I have the following ADT

data BookingState = Confirmed | Cancelled

which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ?

Don't write the "_" case? GHC will warn you about any pattern matches you've missed.

Saurabh Nanda

4:59 a.m.

I would want the compiler (or linter) to help me here. Think if a mid-to-large team where everyone may not know (or remember) what the current best practices are. On 31 Jan 2017 9:51 am, "Michael Orlitzky" wrote:

...

On 01/30/2017 09:47 PM, Saurabh Nanda wrote:

...
Hi,

If I have the following ADT

data BookingState = Confirmed | Cancelled

which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ?

Don't write the "_" case? GHC will warn you about any pattern matches you've missed.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Branimir Maksimovic

5:10 a.m.

Actually in team, one who writes `_` match, is very useful as that prevents breaking code when adding new value... I can't really see any problem here. There is real world use case when member of team don't need to cover all cases therefore `_`. On 01/31/2017 05:59 AM, Saurabh Nanda wrote:

...

I would want the compiler (or linter) to help me here. Think if a mid-to-large team where everyone may not know (or remember) what the current best practices are.

On 31 Jan 2017 9:51 am, "Michael Orlitzky" mailto:michael@orlitzky.com> wrote:

On 01/30/2017 09:47 PM, Saurabh Nanda wrote: > Hi, > > If I have the following ADT > > data BookingState = Confirmed | Cancelled > > which had a very high chance of being expanded in the future to have more > values. How do I ensure that every pattern match on BookingState matches > each value explicitly. Basically prevent the '_' matcher ? >

Don't write the "_" case? GHC will warn you about any pattern matches you've missed.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Saurabh Nanda

5:41 a.m.

...

Actually in team, one who writes `_` match, is very useful as that prevents breaking code when adding new value...I can't really see any problem here. There is real world use case when member of team don't need to cover all cases therefore `_`.

I **want** code to break when I add a new value to an ADT. I want the programmer to go through all the sites where a pattern match was done on a particular ADT and explicitly consider the impact of the new value on these sites. Actual use-case -- we start with the following ADT definition and call-sites: data BookingStatus = Confirmed | Cancelled | Abandoned

...

computeRemainingSeats :: BookingStatus -> (...) computeBilling :: BookingStatus -> (...)

Now, assume that within `computeAvailability` only `Confirmed` is explicitly matched to result in a reduction of available seats, everything else is matched by `_` and results in a no-op. What happens when `BookingStatus` evolves to have a new value of `ManualReview`? We want to reduce the number of seats till the manual-review of the booking is complete. However, the compiler will not force us to look at this particular call-site to evaluate this impact. Another one: assume that within `computeBilling` only `Confirmed is explicitly matched to trigger an invoicing action, others are matched via `_` to a no-op. What happens when `BookingStatus` evolves to have a new value of `Refunded`, which should trigger a credit-note action? Again, the compiler is not going to help us here. One may reason that this is conflation of concerns; that this should be split into **three** ADTs -- one for BookingStatus, one for AvailabilityStatus and one for BillingStatus. And pedantically this might be right. But all of these learnings evolve over time and one may not make the right decisions when writing v1 of the system. Under such circumstances, can Haskell still help in helping the programmer ensure correctness? -- Saurabh.

Alexey Vagarenko

6 a.m.

There is a proposal about that https://github.com/ghc-proposals/ghc-proposals/pull/43 2017-01-31 10:41 GMT+05:00 Saurabh Nanda :

...

Actually in team, one who writes `_` match, is very useful as that

...
prevents breaking code when adding new value...I can't really see any problem here. There is real world use case when member of team don't need to cover all cases therefore `_`.

I **want** code to break when I add a new value to an ADT. I want the programmer to go through all the sites where a pattern match was done on a particular ADT and explicitly consider the impact of the new value on these sites.

Actual use-case -- we start with the following ADT definition and call-sites:

data BookingStatus = Confirmed | Cancelled | Abandoned

...
computeRemainingSeats :: BookingStatus -> (...) computeBilling :: BookingStatus -> (...)

Now, assume that within `computeAvailability` only `Confirmed` is explicitly matched to result in a reduction of available seats, everything else is matched by `_` and results in a no-op. What happens when `BookingStatus` evolves to have a new value of `ManualReview`? We want to reduce the number of seats till the manual-review of the booking is complete. However, the compiler will not force us to look at this particular call-site to evaluate this impact.

Another one: assume that within `computeBilling` only `Confirmed is explicitly matched to trigger an invoicing action, others are matched via `_` to a no-op. What happens when `BookingStatus` evolves to have a new value of `Refunded`, which should trigger a credit-note action? Again, the compiler is not going to help us here.

One may reason that this is conflation of concerns; that this should be split into **three** ADTs -- one for BookingStatus, one for AvailabilityStatus and one for BillingStatus. And pedantically this might be right. But all of these learnings evolve over time and one may not make the right decisions when writing v1 of the system. Under such circumstances, can Haskell still help in helping the programmer ensure correctness?

-- Saurabh.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Saurabh Nanda

6:11 a.m.

...

There is a proposal about that https://github.com/ghc- proposals/ghc-proposals/pull/43

Thanks for the link. Added my 2 cents there: https://github.com/ghc-proposals/ghc-proposals/pull/43#issuecomment-27628340... While this is being considered for inclusion in core GHC, can a linter help, in the meantime? -- Saurabh.

Christopher Allen

6:38 a.m.

We want this at my company too, so we don't write fall-through cases. On Mon, Jan 30, 2017 at 11:41 PM, Saurabh Nanda wrote:

...

...
Actually in team, one who writes `_` match, is very useful as that prevents breaking code when adding new value...I can't really see any problem here. There is real world use case when member of team don't need to cover all cases therefore `_`.

I **want** code to break when I add a new value to an ADT. I want the programmer to go through all the sites where a pattern match was done on a particular ADT and explicitly consider the impact of the new value on these sites.

Actual use-case -- we start with the following ADT definition and call-sites:

...
data BookingStatus = Confirmed | Cancelled | Abandoned computeRemainingSeats :: BookingStatus -> (...) computeBilling :: BookingStatus -> (...)

Now, assume that within `computeAvailability` only `Confirmed` is explicitly matched to result in a reduction of available seats, everything else is matched by `_` and results in a no-op. What happens when `BookingStatus` evolves to have a new value of `ManualReview`? We want to reduce the number of seats till the manual-review of the booking is complete. However, the compiler will not force us to look at this particular call-site to evaluate this impact.

Another one: assume that within `computeBilling` only `Confirmed is explicitly matched to trigger an invoicing action, others are matched via `_` to a no-op. What happens when `BookingStatus` evolves to have a new value of `Refunded`, which should trigger a credit-note action? Again, the compiler is not going to help us here.

One may reason that this is conflation of concerns; that this should be split into **three** ADTs -- one for BookingStatus, one for AvailabilityStatus and one for BillingStatus. And pedantically this might be right. But all of these learnings evolve over time and one may not make the right decisions when writing v1 of the system. Under such circumstances, can Haskell still help in helping the programmer ensure correctness?

-- Saurabh.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

-- Chris Allen Currently working on http://haskellbook.com

Saurabh Nanda

6:45 a.m.

...

We want this at my company too, so we don't write fall-through cases.

That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness. -- Saurabh.

Saurabh Nanda

6:45 a.m.

Btw, what about the other approach -- listing all possible occurrences of type X throughout the app? -- Saurabh. On Tue, Jan 31, 2017 at 12:15 PM, Saurabh Nanda wrote:

...

We want this at my company too, so we don't write fall-through cases.

...
That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness.

-- Saurabh.

-- http://www.saurabhnanda.com

David Turner

8:11 a.m.

I've done exactly this a number of times. The approach I generally take is to define a completely new ADT, mark the old one as deprecated, and then simply plough through the compiler warnings and errors, which is a mostly mechanical process. On occasion I've written a (temporary) injection from the old datatype into the new one which lets you make the changes piecemeal (by topologically sorting the usages) which is nice as if there's any risk of making a mistake then tools like `git bisect` can help you find the slip in the sea of otherwise identical changes. Doesn't always work smoothly but it's often ok. This also has the advantage that you don't have to fix all the usages in third-party code or dependent libraries straight away. The deprecation step need not be immediate, depending on how stable your API is supposed to be. Cheers, David On 31 Jan 2017 06:48, "Saurabh Nanda" wrote:

...

Btw, what about the other approach -- listing all possible occurrences of type X throughout the app?

-- Saurabh.

On Tue, Jan 31, 2017 at 12:15 PM, Saurabh Nanda wrote:

...
We want this at my company too, so we don't write fall-through cases.

...
That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness.

-- Saurabh.

-- http://www.saurabhnanda.com

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Alan & Kim Zimmerman

8:41 a.m.

I can envisage some sort of tooling to fill in all the unmatched cases, based on the GHC warnings. Which can make the editing process trivial, for the "make it work quickly" case. On 31 January 2017 at 10:11, David Turner wrote:

...

I've done exactly this a number of times. The approach I generally take is to define a completely new ADT, mark the old one as deprecated, and then simply plough through the compiler warnings and errors, which is a mostly mechanical process.

On occasion I've written a (temporary) injection from the old datatype into the new one which lets you make the changes piecemeal (by topologically sorting the usages) which is nice as if there's any risk of making a mistake then tools like `git bisect` can help you find the slip in the sea of otherwise identical changes. Doesn't always work smoothly but it's often ok.

This also has the advantage that you don't have to fix all the usages in third-party code or dependent libraries straight away. The deprecation step need not be immediate, depending on how stable your API is supposed to be.

Cheers,

David

On 31 Jan 2017 06:48, "Saurabh Nanda" wrote:

...
Btw, what about the other approach -- listing all possible occurrences of type X throughout the app?

-- Saurabh.

On Tue, Jan 31, 2017 at 12:15 PM, Saurabh Nanda wrote:

...
We want this at my company too, so we don't write fall-through cases.

...
That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness.

-- Saurabh.

-- http://www.saurabhnanda.com

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Sven Panne

7:43 a.m.

2017-01-31 7:45 GMT+01:00 Saurabh Nanda :

...

We want this at my company too, so we don't write fall-through cases.

...
That's a short-term workaround, yes. However it doesn't fit into Haskell's overall story of language-enforced correctness.

I don't see this as a workaround, this is *the* way to go IMHO. Using '_' is saying: "I know what I'm doing here, for all eternity, trust me...", so you get what you ask for. Unless you don't care about an argument at all, using '_' is counterproductive for maintenance. But that's the usual tension between being able to write something down quickly which works *now* and writing something which will be maintained for a long time by lots of people. So '_' itself is fine, but you should be aware of what kind of SW you are writing. Implicitness will always hurt you sooner or later, it is only a matter of time, and '_' has a very implicit flavor. For exactly this reason, using C++'s "default" case was banned in the last 2 companies I've worked for, and this turned out to be very beneficial: Finding all the places where a "default" was not really the default anymore hit us several times and resulted in actual bugs in released SW. Banning '_', just like Christopher mentioned, seems to be a sensible approach.

Erik de Castro Lopo

7:34 a.m.

Branimir Maksimovic wrote:

...

Actually in team, one who writes `_` match, is very useful as that prevents breaking code when adding new value...

I write Haskell code in a team of 15 engineers and wild card pattern matches are considered a bad idea exactly *because* we want compiler errors when a sum type is extended with a new constructor. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Saurabh Nanda

7:38 a.m.

...

I write Haskell code in a team of 15 engineers and wild card pattern matches are considered a bad idea exactly *because* we want compiler errors when a sum type is extended with a new constructor

How would you respond to this comment: https://github.com/ghc-proposals/ghc-proposals/pull/43#issuecomment-27629203... -- Saurabh.

Bryan Richter

5:25 a.m.

On Jan 30, 2017 21:00, "Saurabh Nanda" wrote: I would want the compiler (or linter) to help me here. Think if a mid-to-large team where everyone may not know (or remember) what the current best practices are. I believe you are asking, "Is there an option to emit warnings when the underscore pattern is used?" I further believe the answer to that question is no. I can appreciate your use case, however. On 31 Jan 2017 9:51 am, "Michael Orlitzky" wrote:

...

On 01/30/2017 09:47 PM, Saurabh Nanda wrote:

...
Hi,

If I have the following ADT

data BookingState = Confirmed | Cancelled

which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ?

Don't write the "_" case? GHC will warn you about any pattern matches you've missed.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Eric Seidel

3:52 p.m.

I think the best solution here would be a new warning, say -fwarn-wildcard-patterns. This would be a simple syntactic check for a wildcard pattern anywhere in the module. Combined with -Werror you would be prevented from compiling programs with wildcards. I think this would be quite valuable given the rest of the comments in the thread, and probably a simple addition to GHC. Eric

...

On Jan 30, 2017, at 20:59, Saurabh Nanda wrote:

I would want the compiler (or linter) to help me here. Think if a mid-to-large team where everyone may not know (or remember) what the current best practices are.

On 31 Jan 2017 9:51 am, "Michael Orlitzky" wrote: On 01/30/2017 09:47 PM, Saurabh Nanda wrote:

...
Hi,

If I have the following ADT

data BookingState = Confirmed | Cancelled

which had a very high chance of being expanded in the future to have more values. How do I ensure that every pattern match on BookingState matches each value explicitly. Basically prevent the '_' matcher ?

Don't write the "_" case? GHC will warn you about any pattern matches you've missed.

_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Stefan Monnier

5:11 p.m.

New subject: Ensuring all values of an ADT are explicitly handled OR finding all occurrences of type X in my app

...

I think the best solution here would be a new warning, say -fwarn-wildcard-patterns. This would be a simple syntactic check for a wildcard pattern anywhere in the module.

Note that case <foo> | [] -> ... | (_ : xs) -> ... also contains a wildcard-pattern. So emitting a warning for every use of wildcard patterns would likely lead to a lot of pain. You'd instead want to warn about "default branch", e.g. case <foo> | [] -> ... | (1 : xs) -> ... | (_ : xs) -> ... here the wildcard pattern does correspond to a "default branch" and might hence deserve a warning. Stefan

Eric Seidel

5:31 p.m.

...

Note that

case <foo> | [] -> ... | (_ : xs) -> ...

also contains a wildcard-pattern. So emitting a warning for every use of wildcard patterns would likely lead to a lot of pain.

Yes, good point, that would be too restrictive. When I said wildcard-pattern I was thinking specifically of a top-level wildcard, so your example would be accepted, but e.g. case ... of [] -> ... _ -> ... would be rejected.

...

You'd instead want to warn about "default branch", e.g.

case <foo> | [] -> ... | (1 : xs) -> ... | (_ : xs) -> ...

here the wildcard pattern does correspond to a "default branch" and might hence deserve a warning.

This sounds promising, but how would you define “default branch”? Seems like it could be an involved definition, which could make the warning unpredictable for users.

Li-yao Xia

6:56 p.m.

...

...
You'd instead want to warn about "default branch", e.g.

case <foo> | [] -> ... | (1 : xs) -> ... | (_ : xs) -> ...

here the wildcard pattern does correspond to a "default branch" and might hence deserve a warning.

This sounds promising, but how would you define “default branch”? Seems like it could be an involved definition, which could make the warning unpredictable for users.

A "default branch" seems to correspond to a wildcard overlapping a previous pattern. This would be a warning symmetrical to -Woverlapping-patterns. - A wildcard which overlaps with a pattern below it makes the latter unreachable, which is certainly not intentional. This is caught by -Woverlapping-patterns. case x of _ -> y C -> z - A wildcard which overlaps with a pattern above it has the risk mentionned in this thread, that it will catch any new constructor added to the corresponding ADT, and thus the programmer may forget to update some case expressions when the new constructor is to be handled differently. case x of C -> y _ -> z Li-yao

William Yager

1 Feb 1 Feb

12:05 a.m.

On Tue, Jan 31, 2017 at 12:56 PM, Li-yao Xia wrote:

...

A "default branch" seems to correspond to a wildcard overlapping a previous pattern. This would be a warning symmetrical to -Woverlapping-patterns.

- A wildcard which overlaps with a pattern below it makes the latter unreachable, which is certainly not intentional. This is caught by -Woverlapping-patterns.

case x of _ -> y C -> z

- A wildcard which overlaps with a pattern above it has the risk mentionned in this thread, that it will catch any new constructor added to the corresponding ADT, and thus the programmer may forget to update some case expressions when the new constructor is to be handled differently.

case x of C -> y _ -> z

Li-yao

Excellent. That also elegantly covers the case of functions which ignore the argument, like "const x _ = x". Good thinking! --Will

Erik Hesselink

31 Jan 31 Jan

7:31 p.m.

I feel like there's no good technical solution here. For example, this is potentially dangerous in the face of a changing data type, but presumably wouldn't fall under the definition of a "default branch": data T = A | B -- Later, C could be added which might need special handling case ... of [] -> ... (A : xs) -> ... (_ : xs) -> ... On the other hand, this would presumably be a "default branch" but I challenge you to replace the wildcard pattern with an exhaustive list of pattern matches: case ... of 0 -> ... 1 -> ... _ -> ... Of course this is a special case, but there are many data types in the wild with a large number of constructors: think of generated enumerations, for example. I'd say education is the best option here. I thought wildcards were convenient, until I realized their effect on maintainability in the face of future data type changes. Erik On 31 January 2017 at 18:31, Eric Seidel wrote:

...

...
Note that

case <foo> | [] -> ... | (_ : xs) -> ...

also contains a wildcard-pattern. So emitting a warning for every use of wildcard patterns would likely lead to a lot of pain.

Yes, good point, that would be too restrictive. When I said wildcard-pattern I was thinking specifically of a top-level wildcard, so your example would be accepted, but e.g.

case ... of [] -> ... _ -> ...

would be rejected.

...
You'd instead want to warn about "default branch", e.g.

case <foo> | [] -> ... | (1 : xs) -> ... | (_ : xs) -> ...

here the wildcard pattern does correspond to a "default branch" and might hence deserve a warning.

This sounds promising, but how would you define “default branch”? Seems like it could be an involved definition, which could make the warning unpredictable for users. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

3075

Age (days ago)

3076

Last active (days ago)

List overview

Download

21 comments

15 participants

participants (15)

Alan & Kim Zimmerman
Alexey Vagarenko
Branimir Maksimovic
Bryan Richter
Christopher Allen
David Turner
Eric Seidel
Erik de Castro Lopo
Erik Hesselink
Li-yao Xia
Michael Orlitzky
Saurabh Nanda
Stefan Monnier
Sven Panne
William Yager