Functional design question

older
AI for Math Summer Fellowship --...

Joachim Durchholz

22 Mar 2026 22 Mar '26

11:10 p.m.

Hi all, I have some design questions - not for Haskell but for the Java+Vavr combo, but I am about to dip my toes into actual work in a functional manner, as far as that is possible, and I thought I might as well go for the community that's best subscribed to a purely functional style. Bird's eye view: Java everybody knows (I don't like it either, no worries), Vavr is a - to my eyes - pretty nice functional library, https://docs.vavr.io/ for details if you're actually interested. I want to leverage Vavr to get a Java program to be as Haskell-ish as is reasonable; that's not going to be much by a Haskeller's standards, but... baby steps. Application is a simple command-line thing: Read configuration from command line and configuration file, emit diagnostics about any errors in the config, then process. I want the configuration processing to be as functional as reasonable, given the constraints. However, I'm undecides about many things, no doubt because I simply don't know the best design patterns, and it's frustrating to see multiple options and not knowing which ones will paint me into a corner and which ones will not. Things I'm undecided about: a) Data type variations During the configuration phase, I need to carry information about where some configuration item came from (its ("context", usually file, line number, column number). In the processing phase, configuration is considered final and error-free, so context is not needed (that's a done design decision). I could carry context information into the processing phase, but it's going to be awkward: Say, we have the following types (forgive the most un-Haskellish pseudo syntax but I don't dare to use Haskell style because I'd almost certainly get that wrong and provoke misunderstandings) DirectoryConfig { ConfigData<Path> path ConfigData<String> title ... } data ConfigData<a> { String fileName int lineNumber int columnNumber a value } but in the processing phase I don't want my config objects polluted with context, so I want DirectoryConfig { Path path String title ... } No idea how to deal with that. I'd use code generation in Java I guess, but that's horribly inelegant and complicated to set up (no, I don't particularly like Java, it's just what I'm currently using). So... how would one do such a thing in a functional language? Not necessarily Haskell, I guess some language extensions exist for that kind of stuff, but I'm more-or-less tied to Java + functional libraries, so I'll have to stick with the more basic approaches most likely. Besides, even if I did Haskell, I'd want to avoid the advanced stuff until I get confident in the basics. TL;DR: I have a deeply nested configuration data structure where each field has a "context", i.e. the place it came from; how to I make it so that the context is available during configuration evaluation but is unavailable in the later processing phase? I hope this is understandable; it's really hard to do that when you don't even know enough to ask the questions precisely enough. Regards, Jo

Show replies by date

Akhra Gannon

23 Mar 23 Mar

12:07 a.m.

honestly, the way you've laid it out looks perfectly reasonable to me! it generally follows the principle ot "parse, don't validate" laid out here: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ the only core change I might make is to remove the payload from ConfigData and use a tuple of (SourceIndex, a) instead, so you can operate on one without examining the other. this lets you do all the source-related analysis with monomorphic code; and in cases where the final payload type is unchanged, you can just discard the tuple wrapper to finalize. at the top level, for each final-config item I'd imagine the process would (in Haskell) look something like: finalizeConfigValue :: (a -> Maybe b) -> [(SourceIndex, a)] -> Maybe b finalizeConfigValue valueParser = listToMaybe . mapMaybe (valueParser . snd) . sortOn fst with an Ord instance on SourceIndex, and the Maybe output representing failure of all candidates to parse (you could also use defaulting and/or exceptions) On Sun, Mar 22, 2026, 4:10 PM Joachim Durchholz wrote:

...

Hi all,

I have some design questions - not for Haskell but for the Java+Vavr combo, but I am about to dip my toes into actual work in a functional manner, as far as that is possible, and I thought I might as well go for the community that's best subscribed to a purely functional style.

Bird's eye view:

Java everybody knows (I don't like it either, no worries), Vavr is a - to my eyes - pretty nice functional library, https://docs.vavr.io/ for details if you're actually interested. I want to leverage Vavr to get a Java program to be as Haskell-ish as is reasonable; that's not going to be much by a Haskeller's standards, but... baby steps.

Application is a simple command-line thing: Read configuration from command line and configuration file, emit diagnostics about any errors in the config, then process. I want the configuration processing to be as functional as reasonable, given the constraints. However, I'm undecides about many things, no doubt because I simply don't know the best design patterns, and it's frustrating to see multiple options and not knowing which ones will paint me into a corner and which ones will not.

Things I'm undecided about:

a) Data type variations During the configuration phase, I need to carry information about where some configuration item came from (its ("context", usually file, line number, column number). In the processing phase, configuration is considered final and error-free, so context is not needed (that's a done design decision). I could carry context information into the processing phase, but it's going to be awkward: Say, we have the following types (forgive the most un-Haskellish pseudo syntax but I don't dare to use Haskell style because I'd almost certainly get that wrong and provoke misunderstandings) DirectoryConfig { ConfigData<Path> path ConfigData<String> title ... } data ConfigData<a> { String fileName int lineNumber int columnNumber a value } but in the processing phase I don't want my config objects polluted with context, so I want DirectoryConfig { Path path String title ... } No idea how to deal with that. I'd use code generation in Java I guess, but that's horribly inelegant and complicated to set up (no, I don't particularly like Java, it's just what I'm currently using). So... how would one do such a thing in a functional language? Not necessarily Haskell, I guess some language extensions exist for that kind of stuff, but I'm more-or-less tied to Java + functional libraries, so I'll have to stick with the more basic approaches most likely. Besides, even if I did Haskell, I'd want to avoid the advanced stuff until I get confident in the basics.

TL;DR: I have a deeply nested configuration data structure where each field has a "context", i.e. the place it came from; how to I make it so that the context is available during configuration evaluation but is unavailable in the later processing phase?

I hope this is understandable; it's really hard to do that when you don't even know enough to ask the questions precisely enough.

Regards, Jo _______________________________________________ Haskell-Cafe mailing list -- haskell-cafe@haskell.org To (un)subscribe, modify options or view archives go to: Only members subscribed via the mailman list are allowed to post.

jo＠durchholz.org

1:15 a.m.

Am 23.03.26 um 01:07 schrieb Akhra Gannon:

...

honestly, the way you've laid it out looks perfectly reasonable to me! it generally follows the principle ot "parse, don't validate" laid out here: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t- validate/ <https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t- validate/>

Heh. Took me long enough to arrive at the same conclusions, though it was more an intuitive process than the clear(ish) explanations there.

...

the only core change I might make is to remove the payload from ConfigData and use a tuple of (SourceIndex, a) instead, so you can operate on one without examining the other. this lets you do all the source-related analysis with monomorphic code; and in cases where the final payload type is unchanged, you can just discard the tuple wrapper to finalize.

at the top level, for each final-config item I'd imagine the process would (in Haskell) look something like:

finalizeConfigValue :: (a -> Maybe b) -> [(SourceIndex, a)] -> Maybe b finalizeConfigValue valueParser = listToMaybe . mapMaybe (valueParser . snd) . sortOn fst

with an Ord instance on SourceIndex, and the Maybe output representing failure of all candidates to parse (you could also use defaulting and/or exceptions) Okay... I have to read this carefully.

Just to verify I understood it correctly: It's declaring function finalizeConfigValue, with two parameters (disregarding currying), one is a function a -> Maybe b, second is a list of SourceIndex/a tuples; result is a Maybe B. What kind of type is a? A raw parse tree as delivered from the yaml parser, i.e. a hierarchical blob of Maps, Lists, and Strings for terminals? (I decided to configure the parser for "everything is a string" because (a) no real need for numbers in my use case and (b) possibly YAML has some weird definition of what's number and what's a string.) What does listToMaybe do? Hm. Which parameter does it pick up, the first or the second? I'm not sure how dot notation and currying interact. valueParser seems composed with snd ("second") - ah ok, this is essentially processing an event list/stream ("stream" would be a list that happens to be lazy in Haskell land). Not sure what the mapMaybe serves. I guess it's dealing with error cases or some such, but I can't infer what kind of cases that would be and what the intended effect it (my lack of Haskell knowledge shows again). Ah right. The (sortOn fst) somehow gets routed to the event stream. Not sure why that's needed, as the inputs should already be sorted by source position, but maybe you're thinking about situations where things are processed in a different order than they are read - doesn't happen in my use case, so maybe that's why I'm guessing instead of understanding, I'm blocked by differences in implicit assumptions. I guess. I believe the main block to me understanding such code is that I don't know how many parameters a function has - or, in the currying perspective, what order a function is. It's a great way to write and reason about higher-level code, but for noobs like me it's making it really hard to understand what's going on - I suspect it's because I don't know what order the functions inside the function's body are (resp. how many parameters they take). I also suspect it's really an issue with dot-notation pipelines like above: If you don't know the order and return types of the functions being composed, you have no clue about what function is working on how many parameters - might be two, or might be just one and the extra parameter is accepted later in the queue (maybe? I am sooo unsure here...) Oh my. I guess I'm going to learn a whole lot here. Which is exactly the point, actually :-) Regards, Jo

Akhra Gannon

8:12 a.m.

On Sun, Mar 22, 2026, 6:15 PM wrote:

...

Just to verify I understood it correctly:

It's declaring function finalizeConfigValue, with two parameters (disregarding currying), one is a function a -> Maybe b, second is a list of SourceIndex/a tuples; result is a Maybe B.

correct! What kind of type is a? A raw parse tree as delivered from the yaml parser, i.e. a hierarchical blob of Maps, Lists, and Strings for

...

terminals? (I decided to configure the parser for "everything is a string" because (a) no real need for numbers in my use case and (b) possibly YAML has some weird definition of what's number and what's a string.)

in that case, it's probably either a string or a list of strings. the assumption is that you've already dispatched on config keys: the output is the final canonical value of *one* key; the input list represents potentially multiple files, repeat definitions within a file, maybe a hard default for fallback. the "sorting" would be via custom logic for your SourceIndex type, indicating which alternative to prefer if there are multiple candidates. depending on specifics, this may not be versatile enough; just read that line as "disambiguate between multiple definitions." What does listToMaybe do?

...

head of list with an option wrapper, so it's safe on empty lists: listToMaybe [] = Nothing listToMaybe (x:_) = Just x Not sure what the mapMaybe serves. I guess it's dealing with error cases or some such, but I can't infer what kind of cases that would be and

...

what the intended effect it (my lack of Haskell knowledge shows again).

it takes a function with optional return (here, the final value parser) and maps it to a list, discarding failures and unwrapping the rest. mapMaybe :: (a -> Maybe b) -> [a] -> [b] so the whole process is: - sort by source preference - parse, discarding failures - return the first successful parse if one exists (list laziness skips the rest) I believe the main block to me understanding such code is that I don't

...

know how many parameters a function has - or, in the currying perspective, what order a function is.

yes, the type annotation is often critical for this. it includes all those things, which the function head might skip. it's common to read *only* types on a first pass through unfamiliar code! I guess I'm going to learn a whole lot here.

...

Which is exactly the point, actually :-)

cheers to that! 😄

Albert Y. C. Lai

12:09 a.m.

Perhaps DirectoryConfig<C> { Pair path Pair file } Context { String filename int lineNumber, columnNumber } Pair is a 2-tuple type. Unit is an informationless type. Then you can have DirectoryConfig<Context> and DirectoryConfig<Unit>. On 3/22/26 19:10, Joachim Durchholz wrote:

...

Hi all,

I have some design questions - not for Haskell but for the Java+Vavr combo, but I am about to dip my toes into actual work in a functional manner, as far as that is possible, and I thought I might as well go for the community that's best subscribed to a purely functional style.

Bird's eye view:

Java everybody knows (I don't like it either, no worries), Vavr is a - to my eyes - pretty nice functional library, https://docs.vavr.io/ for details if you're actually interested. I want to leverage Vavr to get a Java program to be as Haskell-ish as is reasonable; that's not going to be much by a Haskeller's standards, but... baby steps.

Application is a simple command-line thing: Read configuration from command line and configuration file, emit diagnostics about any errors in the config, then process. I want the configuration processing to be as functional as reasonable, given the constraints. However, I'm undecides about many things, no doubt because I simply don't know the best design patterns, and it's frustrating to see multiple options and not knowing which ones will paint me into a corner and which ones will not.

Things I'm undecided about:

a) Data type variations During the configuration phase, I need to carry information about where some configuration item came from (its ("context", usually file, line number, column number). In the processing phase, configuration is considered final and error-free, so context is not needed (that's a done design decision). I could carry context information into the processing phase, but it's going to be awkward: Say, we have the following types (forgive the most un-Haskellish pseudo syntax but I don't dare to use Haskell style because I'd almost certainly get that wrong and provoke misunderstandings) DirectoryConfig { ConfigData<Path> path ConfigData<String> title ... } data ConfigData<a> { String fileName int lineNumber int columnNumber a value } but in the processing phase I don't want my config objects polluted with context, so I want DirectoryConfig { Path path String title ... } No idea how to deal with that. I'd use code generation in Java I guess, but that's horribly inelegant and complicated to set up (no, I don't particularly like Java, it's just what I'm currently using). So... how would one do such a thing in a functional language? Not necessarily Haskell, I guess some language extensions exist for that kind of stuff, but I'm more-or-less tied to Java + functional libraries, so I'll have to stick with the more basic approaches most likely. Besides, even if I did Haskell, I'd want to avoid the advanced stuff until I get confident in the basics.

TL;DR: I have a deeply nested configuration data structure where each field has a "context", i.e. the place it came from; how to I make it so that the context is available during configuration evaluation but is unavailable in the later processing phase?

I hope this is understandable; it's really hard to do that when you don't even know enough to ask the questions precisely enough.

Regards, Jo _______________________________________________ Haskell-Cafe mailing list -- haskell-cafe@haskell.org To (un)subscribe, modify options or view archives go to: Only members subscribed via the mailman list are allowed to post.

jo＠durchholz.org

12:32 a.m.

Am 23.03.26 um 01:09 schrieb Albert Y. C. Lai:

...

Perhaps

DirectoryConfig<C> { Pair path Pair file }

Context { String filename int lineNumber, columnNumber }

Pair is a 2-tuple type.

Unit is an informationless type.

Then you can have DirectoryConfig<Context> and DirectoryConfig<Unit>.

Assuming directoryConfig is a variable of type DirectoryConfig, I'd have to access the values as directoryConfig.path.first, so that's not an improvement over directoryConfig.path.value. It does hide the context information; how would I map from a DirectoryConfig<Context> to a DirectoryConfig<Unit>? The data structure is deeply nested; there's a List<DirectoryConfig> children in each DirectoryConfig, and the whole thing is inside a SiteConfig which in turn is inside a GlobalConfig, so it's not just hierarchical but heterogenously nested. Regards, Jo

jo＠durchholz.org

10:53 p.m.

Thanks for all the feedback! I believe I found out the basic misdesign: Thinking in terms of functions that are married to a type, as you do in OO. The functional approach would be to have a single function that does the conversion, broken down into subfunctions and thunks to manage each detail. I believe that OO is fine in many design situtations, but it seems it's inappropriate for complex conversion tasks: Any intermediate step tends to know about both source and destination data type, which gives just the kind of coupling that a clean design does not have. I suspect it's better to have a big conversion function, composed of subfunctions and thunks, freely being dependent on one, the other, or both data types, and once anybody tries to adapt this thing to other situations, you can still refactor the common pieces of code. So - I guess I answered my own question, but your input has been invaluable to get me out of my existing tracks so I could find better ones! And sorry for all the sidetracking thoughts. I had a feeling I was having some very fundamental misconception, but I just didn't know how to identify it, so I couldn't ask the right questions. Regards, and thanks again, Jo

Akhra Gannon

24 Mar 24 Mar

12:46 a.m.

glad to have helped! and by the way, a very useful resource for learning about library functions you find in examples, and also finding out if there's an established function that does something thing you need, is Hoogle: https://hoogle.haskell.org/ you can search by function name OR by type signature, and for the latter it will also find generic functions that could match a fixed-type search! On Mon, Mar 23, 2026, 3:53 PM wrote:

...

Thanks for all the feedback!

I believe I found out the basic misdesign: Thinking in terms of functions that are married to a type, as you do in OO. The functional approach would be to have a single function that does the conversion, broken down into subfunctions and thunks to manage each detail.

I believe that OO is fine in many design situtations, but it seems it's inappropriate for complex conversion tasks: Any intermediate step tends to know about both source and destination data type, which gives just the kind of coupling that a clean design does not have. I suspect it's better to have a big conversion function, composed of subfunctions and thunks, freely being dependent on one, the other, or both data types, and once anybody tries to adapt this thing to other situations, you can still refactor the common pieces of code.

So - I guess I answered my own question, but your input has been invaluable to get me out of my existing tracks so I could find better ones! And sorry for all the sidetracking thoughts. I had a feeling I was having some very fundamental misconception, but I just didn't know how to identify it, so I couldn't ask the right questions.

Regards, and thanks again, Jo _______________________________________________ Haskell-Cafe mailing list -- haskell-cafe@haskell.org To (un)subscribe, modify options or view archives go to: Only members subscribed via the mailman list are allowed to post.

Age (days ago)

Last active (days ago)

List overview

Download

7 comments

4 participants

participants (4)

Akhra Gannon
Albert Y. C. Lai
jo＠durchholz.org
Joachim Durchholz

Functional design question

tags

participants (4)