
Hi! TL;DR: My implementation looks elegant (to me), but is rather slow compared to a well optimized C++ impl. of the same algorithm. Runtime differences are currently huge. Looking for input on what I could change to match C++ performance, without loosing too much expressiveness of the code. I am writing software to deal with Braille music code. As an exercise for me, and as an attempt to boil the algorithm down to a more readable form, I am in the process of translating my C++ program[1] to Haskell. So far, so good. Given that I only came back to Haskell (after 10 years of a break) roughly 2 weeks ago, I think I have made quite nice progress. However, now that I reached a stage where I can actually do a performance comparison, the dreaded thing has happened: The performance of my Haskell implementation is rather horrible to what my C++ implementation can deliver. While the code size has gone down roughly by a factor of 10, runtime has gone up roughly by the same factor... For the impatient, here is the code I am talking about: https://github.com/mlang/hbmc/blob/master/Data/Braille/Music.hs The first half of the code is concerend with parsing braille. Performance is not critical here, so I opted to use Parsec, mostly because I am interested in good quality error messages. The fun starts with the definition of 'pvs'. The problem: Braille music values are inherently ambiguous. The sign for a whole note can also mean a 16th note, a sign for a half note can also mean a 32th note, the sign for a quarter note can also mean a 64th, and a 8th note could also mean a 128th note. This is histoical, and there is no way around this. The time signature (meter) is used to actually tell which note is which. This is relatively easy to do for an experienced human, but quite complicated for a computer. A measure (bar) of braille music can contain parallel *and* sequential parts. The data structure is basically: data Sign = ... type PartialVoice = [Sign] type PartialMeasure = [PartialVoice] type Voice = [PartialMeasure] type Measure = [Voice] As a constraint, all PartialVoices in a PartialMeasure need to have the exact same duration. Similarily, all Voices inside a Measure also need to have the exact same duration. So 'pvs', 'pms', 'vs' and 'ms' in my Data.Braille.Music module are concerned with calculating all possible interpretations of the input. All in all, the current Haskell implementation is doing what it should. However, profiling tells me I am allocating roughly 2GB of memory to disambiguate a single measure of music in 3/2 time. This takes roughly 1.2s on my rather fast CPU. The amount of allocation and the runtime are unacceptable. If I leave it at that, I might as well ditch the code and terminate the experiment. As a comparison, my C++ impl. takes roughly 1s to disambiguate 32 measures of the same piece, while Haskell already takes 1.2s to disambiguate just one of these measures. Profiling tells me I am spending 90% of all the time in 'dur', which is my small helper method to calculate the duration of a PartialVoice, PartialMeasure or Voice. The project is setup to support profiling: git clone https://github.com/mlang/hbmc cd hbmc cabal run profiling Do you see a way to significantly speed this algorithm up, without loosing a lot of expressiveness/elegancy? Since I am back to Haskell since only 2 weeks, I am basically happy abut every sort of hint/tip you can give me. Stylistic, as well as performance-related. For now, this is really just a toy project. However, if I can manage to improve the performance significantly without destroying the readability of the code too much, I might be interested to finish translating my current C++ work into a more-or-less complete Haskell library. After all, BMC is more or less a compiler. And Haskell is supposed to be very good at this stuff :-) Here is a link to the original C++ project: [1] https://github.com/malng/bmc -- CYa, ⡍⠁⠗⠊⠕

Hi Mario, I just took a quick look on the implementation of 'dur' and as a first thing I would replace foldl with the strict version foldl'. This already might explain the memory usage. Greetings, Daniel

Daniel Trstenjak
Hi Mario,
I just took a quick look on the implementation of 'dur' and as a first thing I would replace foldl with the strict version foldl'.
This already might explain the memory usage.
foldl' helps a bit, but is not the main contributor to the overhead. Before: 1,954,532,728 bytes allocated in the heap After: 1,741,200,280 bytes allocated in the heap And runtime is more or less unaffected. -- CYa, ⡍⠁⠗⠊⠕

I haven't had a chance to test anything, but list operations are frequently problematic because they make it easy to accidentally write O(n^2) algorithms. Have you tried using Sequences or Vectors? I would expect one of them to give you better performance for your access patterns. On Thursday, September 24, 2015, Daniel Trstenjak < daniel.trstenjak@gmail.com> wrote:
Hi Mario,
I just took a quick look on the implementation of 'dur' and as a first thing I would replace foldl with the strict version foldl'.
This already might explain the memory usage.
Greetings, Daniel _______________________________________________ Beginners mailing list Beginners@haskell.org javascript:; http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

I tried memoizing dur in some places, but it gets called with too many
different arguments to make it worthwhile.
I tried change StateT to Strict.StateT.
Last ditch effort, I tried making your data types strict, but that didn't
help.
The dur in your PartialVoice Duration instance gets called 3.5 million
times. I don't know if you are going to find a magic bullet that makes
this faster without tweaking your algorithm. I feel like their is a way to
speed it up, maybe even exploit laziness to eliminate things early, but I'm
just not familiar with this domain. Sorry
On Thu, Sep 24, 2015 at 10:11 AM, Mario Lang
Hi!
TL;DR: My implementation looks elegant (to me), but is rather slow compared to a well optimized C++ impl. of the same algorithm. Runtime differences are currently huge. Looking for input on what I could change to match C++ performance, without loosing too much expressiveness of the code.
I am writing software to deal with Braille music code. As an exercise for me, and as an attempt to boil the algorithm down to a more readable form, I am in the process of translating my C++ program[1] to Haskell. So far, so good. Given that I only came back to Haskell (after 10 years of a break) roughly 2 weeks ago, I think I have made quite nice progress. However, now that I reached a stage where I can actually do a performance comparison, the dreaded thing has happened: The performance of my Haskell implementation is rather horrible to what my C++ implementation can deliver. While the code size has gone down roughly by a factor of 10, runtime has gone up roughly by the same factor...
For the impatient, here is the code I am talking about: https://github.com/mlang/hbmc/blob/master/Data/Braille/Music.hs
The first half of the code is concerend with parsing braille. Performance is not critical here, so I opted to use Parsec, mostly because I am interested in good quality error messages.
The fun starts with the definition of 'pvs'.
The problem: Braille music values are inherently ambiguous. The sign for a whole note can also mean a 16th note, a sign for a half note can also mean a 32th note, the sign for a quarter note can also mean a 64th, and a 8th note could also mean a 128th note. This is histoical, and there is no way around this. The time signature (meter) is used to actually tell which note is which. This is relatively easy to do for an experienced human, but quite complicated for a computer. A measure (bar) of braille music can contain parallel *and* sequential parts. The data structure is basically:
data Sign = ... type PartialVoice = [Sign] type PartialMeasure = [PartialVoice] type Voice = [PartialMeasure] type Measure = [Voice]
As a constraint, all PartialVoices in a PartialMeasure need to have the exact same duration. Similarily, all Voices inside a Measure also need to have the exact same duration.
So 'pvs', 'pms', 'vs' and 'ms' in my Data.Braille.Music module are concerned with calculating all possible interpretations of the input.
All in all, the current Haskell implementation is doing what it should. However, profiling tells me I am allocating roughly 2GB of memory to disambiguate a single measure of music in 3/2 time. This takes roughly 1.2s on my rather fast CPU.
The amount of allocation and the runtime are unacceptable. If I leave it at that, I might as well ditch the code and terminate the experiment. As a comparison, my C++ impl. takes roughly 1s to disambiguate 32 measures of the same piece, while Haskell already takes 1.2s to disambiguate just one of these measures.
Profiling tells me I am spending 90% of all the time in 'dur', which is my small helper method to calculate the duration of a PartialVoice, PartialMeasure or Voice.
The project is setup to support profiling:
git clone https://github.com/mlang/hbmc cd hbmc cabal run profiling
Do you see a way to significantly speed this algorithm up, without loosing a lot of expressiveness/elegancy?
Since I am back to Haskell since only 2 weeks, I am basically happy abut every sort of hint/tip you can give me. Stylistic, as well as performance-related.
For now, this is really just a toy project. However, if I can manage to improve the performance significantly without destroying the readability of the code too much, I might be interested to finish translating my current C++ work into a more-or-less complete Haskell library. After all, BMC is more or less a compiler. And Haskell is supposed to be very good at this stuff :-)
Here is a link to the original C++ project: [1] https://github.com/malng/bmc
-- CYa, ⡍⠁⠗⠊⠕ _______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

David McBride
I tried memoizing dur in some places, but it gets called with too many different arguments to make it worthwhile.
Likely with all possible combinations from the input :-)
I tried change StateT to Strict.StateT. Last ditch effort, I tried making your data types strict, but that didn't help.
The dur in your PartialVoice Duration instance gets called 3.5 million times. I don't know if you are going to find a magic bullet that makes this faster without tweaking your algorithm. I feel like their is a way to speed it up, maybe even exploit laziness to eliminate things early, but I'm just not familiar with this domain. Sorry
lazyness is a good topic. I am not sure how it actually affects this algorithm. I wonder if my use of 'filter' is the actual culprit, and if I instead should try to rewrite the uses of 'traverse' to actually try and not generate invalid possibilities in the first place. Right now, I am sort of hoping lazyness would work together with filter being applied after all possibilities have been generated. Thanks for your time and insights.
On Thu, Sep 24, 2015 at 10:11 AM, Mario Lang
wrote: Hi!
TL;DR: My implementation looks elegant (to me), but is rather slow compared to a well optimized C++ impl. of the same algorithm. Runtime differences are currently huge. Looking for input on what I could change to match C++ performance, without loosing too much expressiveness of the code.
I am writing software to deal with Braille music code. As an exercise for me, and as an attempt to boil the algorithm down to a more readable form, I am in the process of translating my C++ program[1] to Haskell. So far, so good. Given that I only came back to Haskell (after 10 years of a break) roughly 2 weeks ago, I think I have made quite nice progress. However, now that I reached a stage where I can actually do a performance comparison, the dreaded thing has happened: The performance of my Haskell implementation is rather horrible to what my C++ implementation can deliver. While the code size has gone down roughly by a factor of 10, runtime has gone up roughly by the same factor...
For the impatient, here is the code I am talking about: https://github.com/mlang/hbmc/blob/master/Data/Braille/Music.hs
The first half of the code is concerend with parsing braille. Performance is not critical here, so I opted to use Parsec, mostly because I am interested in good quality error messages.
The fun starts with the definition of 'pvs'.
The problem: Braille music values are inherently ambiguous. The sign for a whole note can also mean a 16th note, a sign for a half note can also mean a 32th note, the sign for a quarter note can also mean a 64th, and a 8th note could also mean a 128th note. This is histoical, and there is no way around this. The time signature (meter) is used to actually tell which note is which. This is relatively easy to do for an experienced human, but quite complicated for a computer. A measure (bar) of braille music can contain parallel *and* sequential parts. The data structure is basically:
data Sign = ... type PartialVoice = [Sign] type PartialMeasure = [PartialVoice] type Voice = [PartialMeasure] type Measure = [Voice]
As a constraint, all PartialVoices in a PartialMeasure need to have the exact same duration. Similarily, all Voices inside a Measure also need to have the exact same duration.
So 'pvs', 'pms', 'vs' and 'ms' in my Data.Braille.Music module are concerned with calculating all possible interpretations of the input.
All in all, the current Haskell implementation is doing what it should. However, profiling tells me I am allocating roughly 2GB of memory to disambiguate a single measure of music in 3/2 time. This takes roughly 1.2s on my rather fast CPU.
The amount of allocation and the runtime are unacceptable. If I leave it at that, I might as well ditch the code and terminate the experiment. As a comparison, my C++ impl. takes roughly 1s to disambiguate 32 measures of the same piece, while Haskell already takes 1.2s to disambiguate just one of these measures.
Profiling tells me I am spending 90% of all the time in 'dur', which is my small helper method to calculate the duration of a PartialVoice, PartialMeasure or Voice.
The project is setup to support profiling:
git clone https://github.com/mlang/hbmc cd hbmc cabal run profiling
Do you see a way to significantly speed this algorithm up, without loosing a lot of expressiveness/elegancy?
Since I am back to Haskell since only 2 weeks, I am basically happy abut every sort of hint/tip you can give me. Stylistic, as well as performance-related.
For now, this is really just a toy project. However, if I can manage to improve the performance significantly without destroying the readability of the code too much, I might be interested to finish translating my current C++ work into a more-or-less complete Haskell library. After all, BMC is more or less a compiler. And Haskell is supposed to be very good at this stuff :-)
Here is a link to the original C++ project: [1] https://github.com/malng/bmc
-- CYa, ⡍⠁⠗⠊⠕ _______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
-- CYa, ⡍⠁⠗⠊⠕ | Debian Developer URL:http://debian.org/ .''`. | Get my public key via finger mlang/key@db.debian.org : :' : | 1024D/7FC1A0854909BCCDBE6C102DDFFC022A6B113E44 `. `' `- URL:http://delysid.org/ URL:http://www.staff.tugraz.at/mlang/

On 24 September 2015 at 15:11, Mario Lang
Profiling tells me I am spending 90% of all the time in 'dur', which is my small helper method to calculate the duration of a PartialVoice, PartialMeasure or Voice.
As a start, and to aid profiling, I suggest getting rid of the Duration type class and giving explicit names to different dur functions. Which one of the dur's is getting called most? How do they interact? My hunch is that after you do this (something like) memoization to avoid calculating the durations repeatedly will be needed. Hope this helps, Ozgur PS: Cool problem! Do you have some input/output pairs available? If you can share some, others can validate their attempts without too much domain knowledge.

On 24.09.2015, Mario Lang wrote:
TL;DR: My implementation looks elegant (to me), but is rather slow compared to a well optimized C++ impl. of the same algorithm. Runtime differences are currently huge. Looking for input on what I could change to match C++ performance, without loosing too much expressiveness of the code. [...] Profiling tells me I am spending 90% of all the time in 'dur', which is my small helper method to calculate the duration of a PartialVoice, PartialMeasure or Voice.
The project is setup to support profiling:
git clone https://github.com/mlang/hbmc cd hbmc cabal run profiling
Some performance tips: AFAICT, the main reason the dur methods are so slow is they traverses the lists much too often. An easy way to reduce that number is to cache the duration of lists of Sign values in PartialVoice by replacing the type alias with a new data type like this: data PartialVoice = PV (Maybe Rational) [Sign] To always initialize the duration when constructing a PartialVoice, a helper function is useful: mkPV signs = PV (sumDuration signs) signs Here sumDuration is defined in the same way as the current dur method for PartialVoice: sumDuration :: Duration a => [a] -> Maybe Rational sumDuration = foldl' (liftA2 (+)) (pure 0) . map dur The dur method for PartialVoice can then be implemented by simply accessing the cached value: instance Duration PartialVoice where dur (PV d _) = d Since that parameter of PV is lazy the actual sum is only computed when the value is needed and for any given PV value it is computed at most once. On my system, that change alone improved the running time by a factor of almost 10 according to the profiler output. The type Voice can obviously be treated in a similar way. Some other changes that can improve the speed: Use Integer instead of Rational. This should be possible since all duration values appear to be integer multiples of 1/128, so you can just represent them by that factor and use a newtype wrapper for increased type safety. Parameterize Sign with the type of the duration. AFAICT the only reason to wrap the duration in a Maybe is that you cannot assign a duration during parsing, so the parsers always assign Nothing as duration. The pvs function will always assign a Just-value to each Sign, however. So after parsing, the Measure has Nothing for the duration in every Sign, but the Measures returned by ms always have Just values. You you could express this in the types by parameterizing Sign and the other types with the type of a duration. The parser would then create e.g. Sign () values and pvs would turn those into e.g. Sign Rational values. This improves type safety, makes the meaning clearer and simplifies the code because you don't need to lift operations into the Maybe or do case analyses. Some stylistic things: instance Duration Measure where dur m = case m of [] -> Nothing; otherwise -> dur (head m) would better be written like this: instance Duration Measure where dur [] = Nothing dur (x:_) = dur x I.e. try not to use head. Use pattern matching instead. Particularly in cases like this, where you already pattern match on the list in question. This applies to some other functions as well, e.g. allEqDur. Also, if you want to ignore parts of a pattern match, use "_" as the pattern, not "otherwise". In the way you used it, it introduces a new binding in that branch of the case expression and shadows the definition from the Prelude. Idiomatic use of "otherwise" is as the condition on the catch-all case of a guard. If you compile your code with GHC's -Wall option it warns about this and other things. Bernhard

Bernhard Herzog
On 24.09.2015, Mario Lang wrote:
TL;DR: My implementation looks elegant (to me), but is rather slow compared to a well optimized C++ impl. of the same algorithm. Runtime differences are currently huge. Looking for input on what I could change to match C++ performance, without loosing too much expressiveness of the code. [...] Profiling tells me I am spending 90% of all the time in 'dur', which is my small helper method to calculate the duration of a PartialVoice, PartialMeasure or Voice.
The project is setup to support profiling:
git clone https://github.com/mlang/hbmc cd hbmc cabal run profiling
Some performance tips:
AFAICT, the main reason the dur methods are so slow is they traverses the lists much too often. An easy way to reduce that number is to cache the duration of lists of Sign values in PartialVoice by replacing the type alias with a new data type like this:
data PartialVoice = PV (Maybe Rational) [Sign]
To always initialize the duration when constructing a PartialVoice, a helper function is useful:
mkPV signs = PV (sumDuration signs) signs
Here sumDuration is defined in the same way as the current dur method for PartialVoice:
sumDuration :: Duration a => [a] -> Maybe Rational sumDuration = foldl' (liftA2 (+)) (pure 0) . map dur
The dur method for PartialVoice can then be implemented by simply accessing the cached value:
instance Duration PartialVoice where
dur (PV d _) = d
Since that parameter of PV is lazy the actual sum is only computed when the value is needed and for any given PV value it is computed at most once.
On my system, that change alone improved the running time by a factor of almost 10 according to the profiler output.
Confirmed (committed). Thanks a lot! This indeed brought a factor of 10 speedup. I wonder why the attempts to memoize dur by other people did not help at all. After all, this is just another type of memoisation, isn't it?
Some other changes that can improve the speed:
Use Integer instead of Rational.
This should be possible since all duration values appear to be integer multiples of 1/128, so you can just represent them by that factor and use a newtype wrapper for increased type safety.
I will eventually have to deal with musical tuplets, so it is not just 128th values. I see your point, but I am going to delay that optimisation until I know I really need it.
Parameterize Sign with the type of the duration.
AFAICT the only reason to wrap the duration in a Maybe is that you cannot assign a duration during parsing, so the parsers always assign Nothing as duration. The pvs function will always assign a Just-value to each Sign, however. So after parsing, the Measure has Nothing for the duration in every Sign, but the Measures returned by ms always have Just values. You you could express this in the types by parameterizing Sign and the other types with the type of a duration. The parser would then create e.g. Sign () values and pvs would turn those into e.g. Sign Rational values.
That is a good idea I was already thinking about. Actually, what I am trying to figure out is how to do closely related, but different, datatypes in Haskell. In C++, I am used to being able to inherit from an existing class, just adding the new fields I need. This works without name clashes obviously because two classes can have the same field names without clashing. However, I am missing something like this from Haskell. What I actually need in the long run, is a separate data type for every enhancement step in my post-processing. I current have AmbiguousSign and Sign. However, I will need more types like this, with every step, I'd like to add yet another field, which is going to be computed by that step. AFAIU the parametrisation trick you mention above works for a single field. How would I go about progressively adding more and more fields to a base data type?
This improves type safety, makes the meaning clearer and simplifies the code because you don't need to lift operations into the Maybe or do case analyses.
I agree. Doing away with the Maybe was a good thing.
Also, if you want to ignore parts of a pattern match, use "_" as the pattern, not "otherwise". In the way you used it, it introduces a new binding in that branch of the case expression and shadows the definition from the Prelude. Idiomatic use of "otherwise" is as the condition on the catch-all case of a guard.
Oh, thanks for pointing that out, I didn't realize at all that otherwise doesn't work like expected in "normal" case expressions. -- CYa, ⡍⠁⠗⠊⠕
participants (6)
-
Bernhard Herzog
-
Chas Leichner
-
Daniel Trstenjak
-
David McBride
-
Mario Lang
-
Ozgur Akgun