Haskell-Cafe
Threads by month
- ----- 2025 -----
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2004 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2003 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2002 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2001 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2000 -----
- December
- November
- October
December 2007
- 244 participants
- 302 discussions
This set of measurements was captured by Daniel Fischer on one of his older
machines, running SuSE 8.2, which has a Linux 2.4 kernel!
The benchmarks were run on 2007-12-16 using ghc 6.9.20071124.
Unfortunately, the ghc version is not quite the same as the one I've used for
most measurements (6.9.20071119) so things may be a little different just for
that reason.
The results seemed a bit off at first, but now that I have graphs of all the
runs on all the machines they don't seem strange at all. First of all, the
memory use is about the same as on the other machines. Secondly, the timing
differences for the C getchar/getwchar might be partly due to different
versions of the C library. The remaining differences (a "steeper" profile
than on the core duo and the Athlon64) may be due to different
microarchitectures.
-Peter
Fischer's machine
ghc 6.9.20071124
AMD Duron(tm) processor
1200.089 MHz
TESTKIND=THOROUGH
SUFFIX=
Time (byte counting) std
-------------------- avg dev slack
hs/byte-bs----acc: 1.892 21‰ 0.4 ██▊ |
hs/byte-bs----foldlx: 2.258 3‰ 0.1 ███▎ |
hs/byte-bs----foldrx: 2.933 0‰ 0.1 ████▎ |
hs/byte-bsl---acc: 14.319 45‰ 0.1 ████████████████████▌ |
hs/byte-xxxxx-acc-1: 20.915 17‰ 4.0 █████████████████████████████▉ |
hs/byte-xxxxx-acc-2: 20.691 8‰ 0.1 █████████████████████████████▌ |
hs/byte-xxxxx-foldl: 20.610 5‰ 1.4 █████████████████████████████▍ |
c/byte-getchar: 9.042 0‰ 0.1 ████████████▉ |
c/byte-getchar-u: 1.314 3‰ 0.2 █▉ |
c/byte-4k: 0.419 5‰ 0.5 ▋ |
Memory: Peak
------- KB
hs/byte-bs----acc: 147492 ████████████████████████████████████████ |
hs/byte-bs----foldlx: 147492 ████████████████████████████████████████ |
hs/byte-bs----foldrx: 147488 ████████████████████████████████████████ |
hs/byte-bsl---acc: 2896 ▊ |
hs/byte-xxxxx-acc-1: 1612 ▌ |
hs/byte-xxxxx-acc-2: 1612 ▌ |
hs/byte-xxxxx-foldl: 1612 ▌ |
c/byte-getchar: 384 ▏ |
c/byte-getchar-u: 384 ▏ |
c/byte-4k: 380 ▏ |
Time (space counting) std
--------------------- avg dev slack
hs/space-bs-c8-acc-1: 2.467 1‰ 0.3 ███▌ |
hs/space-bs-c8-foldlx-1: 2.585 2‰ 0.1 ███▊ |
hs/space-bs-c8-foldlx-2: 2.576 2‰ 0.3 ███▋ |
hs/space-bs-c8-foldrx: 2.982 8‰ 2.3 ████▎ |
hs/space-bs-c8-lenfil: 2.599 1‰ 0.2 ███▊ |
hs/space-bslc8-acc-1: 15.228 8‰ 0.1 █████████████████████▊ |
hs/space-bslc8-acc-2: 15.855 38‰ 0.0 ██████████████████████▋ |
hs/space-bslc8-acc-3: 14.980 14‰ 0.0 █████████████████████▍ |
hs/space-bslc8-chunk-1: 2.443 2‰ 0.2 ███▌ |
hs/space-bslc8-chunk-2: 2.449 1‰ 0.3 ███▌ |
hs/space-bslc8-chunk-3: 2.534 3‰ 0.3 ███▋ |
hs/space-bslc8-foldl: 2.938 1‰ 0.2 ████▎ |
hs/space-bslc8-foldlx-1: 2.928 1‰ 0.0 ████▏ |
hs/space-bslc8-foldlx-2: 2.937 2‰ 0.2 ████▎ |
hs/space-bslc8-foldr-1: 4.043 6‰ 0.1 █████▊ |
hs/space-bslc8-foldr-2: 4.007 4‰ 0.1 █████▊ |
hs/space-bslc8-lenfil-1: 3.240 1‰ 0.2 ████▋ |
hs/space-bslc8-lenfil-2: 3.236 1‰ 0.2 ████▋ |
hs/space-bsl---foldlx: 2.821 1‰ 0.1 ████ |
hs/space-xxxxx-acc-1: 21.002 4‰ 0.1 ██████████████████████████████ |
hs/space-xxxxx-acc-2: 21.270 22‰ 4.7 ██████████████████████████████▍ |
hs/space-xxxxx-foldl: 20.934 1‰ 0.1 █████████████████████████████▉ |
hs/space-xxxxx-lenfil: 25.915 3‰ 0.0 █████████████████████████████████████|
c/space-getchar: 9.354 0‰ 0.0 █████████████▍ |
c/space-getchar-u: 1.676 2‰ 0.2 ██▍ |
c/space-4k: 1.293 2‰ 0.5 █▉ |
c/space-megabuf: 1.830 3‰ 0.5 ██▋ |
c/space-getwchar: 14.721 1‰ 0.1 █████████████████████ |
c/space-getwchar-u: 4.814 0‰ 0.0 ██████▉ |
c/space-32k: 1.276 2‰ 0.6 █▉ |
c/space-32k-8: 1.275 2‰ 0.2 █▉ |
Memory: Peak
------- KB
hs/space-bs-c8-acc-1: 147488 ████████████████████████████████████████ |
hs/space-bs-c8-foldlx-1: 147492 ████████████████████████████████████████ |
hs/space-bs-c8-foldlx-2: 147492 ████████████████████████████████████████ |
hs/space-bs-c8-foldrx: 147488 ████████████████████████████████████████ |
hs/space-bs-c8-lenfil: 147492 ████████████████████████████████████████ |
hs/space-bslc8-acc-1: 2896 ▊ |
hs/space-bslc8-acc-2: 2896 ▊ |
hs/space-bslc8-acc-3: 2896 ▊ |
hs/space-bslc8-chunk-1: 65892 █████████████████▉ |
hs/space-bslc8-chunk-2: 65892 █████████████████▉ |
hs/space-bslc8-chunk-3: 76472 ████████████████████▊ |
hs/space-bslc8-foldl: 86772 ███████████████████████▋ |
hs/space-bslc8-foldlx-1: 86772 ███████████████████████▋ |
hs/space-bslc8-foldlx-2: 86772 ███████████████████████▋ |
hs/space-bslc8-foldr-1: 169360 ██████████████████████████████████████████████|
hs/space-bslc8-foldr-2: 169360 ██████████████████████████████████████████████|
hs/space-bslc8-lenfil-1: 110704 ██████████████████████████████▏ |
hs/space-bslc8-lenfil-2: 110704 ██████████████████████████████▏ |
hs/space-bsl---foldlx: 86776 ███████████████████████▋ |
hs/space-xxxxx-acc-1: 1612 ▌ |
hs/space-xxxxx-acc-2: 1612 ▌ |
hs/space-xxxxx-foldl: 1612 ▌ |
hs/space-xxxxx-lenfil: 1588 ▍ |
c/space-getchar: 384 ▏ |
c/space-getchar-u: 384 ▏ |
c/space-4k: 412 ▏ |
c/space-megabuf: 146904 ███████████████████████████████████████▉ |
c/space-getwchar: 440 ▏ |
c/space-getwchar-u: 440 ▏ |
c/space-32k: 436 ▏ |
c/space-32k-8: 436 ▏ |
1
0
Some of the scripts warrant a closer look.
'make zipdata'
creates a nice tarball with all the data necessary to recreate a report AND
to merge that report together with other reports, possible with rescaled
bar charts. Very handy.
All the files in the tarball are inside the 'ghc-measurements/' directory so
the risk of things going wrong when unpacking the tarball is less.
The names of the benchmarks are put in ghc-measurements/progs, mainly to
ensure they end up in the right order when regenerating and merging reports.
tools/genreport.pl [list of benchmarks to put in the report]
It doesn't parse the command-line in any way because life is too short for
command-line parsing. Instead, it is controlled via (too many) environment
variables.
ASCII - set to avoid using UTF-8 for bar charts and "per mille" character.
NOSRC - the tool normally creates */*.srctimemem files containing the source
code for each benchmark with bar charts for time/mem appended to the
end. Setting this variable switches that off (necessary when
regenerating and merging reports).
EXCLUDE - disregard some of the benchmarks on the command line. Why is this
necessary? Because it makes regenerating and merging reports easier.
And because I was too lazy to filter the command line in
tools/regenreport.sh and tools/merge.pl.
FINDMAX - used by tools/merge.pl when rescaling. Outputs max time and max
mem to stdout instead of the normal report.
MAX_FILEWIDTH - used by tools/merge.pl to make merged reports look nice
MAX_TIME,
MAX_PEAKMEM - used by tools/merge.pl when rescaling
Note that strictly speaking, there is a bug in the script(s) because it
conflates the width of time/mem measurement represented as numbers (which
you always want to take into account when merging) and MAX_TIME/MAX_PEAKMEM
(which you only care about when rescaling).
[FIXED now - 2007-12-21]
tools/regenreport.sh
unpacks a measurement tarball into a tmp directory and runs
tools/genreport.pl to generate the report.
Takes care not to disturb the normal files.
tools/merge.pl [tarballs]
Uses tools/regenreport.sh on each tarball in turn to generate a report which
it reads in and stores on a benchmark-by-benchmark basis. At the end,
synthetically combine all the pieces it cut out of the original report(s)
into a brand-spanking new, merged report.
Even the headers and the platforminfo at the top of each report is cut out
and stored in data structures until they get spit out again at the end.
The reading magic is in the state machine in gather(). It is not as bad as
it looks. Some of the complications arise from marking repeated benchmark
names as ' -- ', which improves the readability of the merged reports
immensely. Another part of the complications arise due to the fact that not
all tarballs contain the exact same benchmarks! Those that don't get a nice
'n/a' instead of numbers and a bar. And finally, the benchmarks should be
in the right order. That is trickier than it sounds...
When rescaling, tools/regenreport.sh is first run once for each tarball with
the FINDMAX environment variable set. This results in tools/regenreport.sh
outputting the maximum filename width, time, and peakmem for each tarball.
ASCII - use ASCII instead of UTF-8
RESCALE - sometimes you want to rescale and sometimes you don't
MAX_FILEWIDTH - if you want to force a specific width
MAX_TIME,
MAX_PEAKMEM - if you want to force a specific max
-Peter
1
0

[4/16] SBM: How to use the Makefile (how to run benchmarks etc.)
by Peter Firefly Brodersen Lund 22 Dec '07
by Peter Firefly Brodersen Lund 22 Dec '07
22 Dec '07
Introduction
------------
Most of the smarts of the benchmark harness is in the Makefile.
If you want to rerun the benchmarks (or a single benchmark) or look at the
intermediate code for a benchmark or the I/O trace or the memory consumption or
the time spent or ... then you use the makefile.
There are some support scripts in shell and Perl (and two C programs) that the
Makefile uses to do its job. And there are some that you, the user, will want
to interact directly with.
The benchmarks are only expected to work on Linux. They have been tested on
SuSE 8.2 (from 2003, with a 2.4 kernel), Ubuntu 7.04, and Ubuntu 7.10.
Quick howto
-----------
make phase1 -- compiles, generates test files, measures memory use.
Safe to run on a busy machine if there's no active memory
pressure.
make phase2 -- timing runs. NOT safe to run on a busy machine.
Should be run in runlevel 1 (= no X, no daemons, single-user
mode) for best measurements.
Outputs report at end. This is where you check the quality
of the measurements. If you don't like them, run 'make
redophase2' (or delete the .time and .stat files with low
quality and run 'make phase2' again.)
make zipdata -- make a tarball with all the measurements, suitable for
emailing or putting on a website.
The Makefile will beep after phase 1 and 2.
The above will run a "NORMAL" run, which is fine during development if you
want to see if you nailed a performance bug. It runs reasonably fast (about
43 seconds on my Athlon64).
If you want better measurements, you should use:
make TESTKIND=THOROUGH phase1 phase2 zipdata
This will use a 150MB data file instead of a 15MB one and it will run the
timing measurements 6 times (before throwing the first away) instead of 4
times (before throwing the first away).
If you don't want to use single-user mode, you can improve the measurements
by piping the output to a file (or run the test from the console) instead of
involving a terminal and an X server (the screen update may kick in in the
middle of a timing run and disturb things if for no other reason than their
polluting the CPU caches).
Filesystem layout
-----------------
The benchmarks are in:
hs/*.hs
c/*.c
hand/*.s
hand/*.hs and hand/*.c are not compiled. The two *.hs files are the
originals from which the tweaked assembly code has been derived. The two
*.c files are sketches of how the MMX tweaks work (because MMX code by itself
can be a bit off-putting).
These are the support scripts:
tools/genfiles.pl -- generate the test input files.
tools/cutmem.pl
tools/cutpid.pl -- both are used to disentangle the outputs of strace
and pause-at-end (see below). I combine strace,
memory info, and +RTS -sstderr into a single run to
save time. This means that things end up in fewer
files than I'd like.
tools/cut.pl -- cut out main loop from disassembly ('make discut')
tools/stat.pl -- looks at all timings for a single benchmark and
calculates average and standard deviation and "time
slack", that is the discrepancy between user+sys and
real. It optionally throws away the first run.
tools/eatmem.c -- allocates a chunk of memory and makes damn sure
it really is in RAM!
tools/pause-at-end.c -- part of a hack that copies /proc/self/maps and
/proc/self/status to stderr just before a benchmark
exits.
tools/iosummary.pl -- takes an strace and sums up the I/O
tools/genreport.pl -- generate a nice report with bar charts.
Takes way too many options in the form of
environment variables.
tools/regenreport.sh -- regenerates the report from ANY measurement tarball.
tools/merge.pl -- merge data from many measurement tarballs, with or
without rescaling.
Generated files:
hs/*.core hs/*.stg hs/*.cmm hs/*.s -- intermediate code
hs/*.hi -- "Haskell Interface"
*/*.o -- object code
*/* (the files in $(HSPROGS) $(CPROGS) $(HANDPROGS)) -- programs
*/*.dis */*.discut -- disassembled programs (and inner loops)
*/*.doc -- source + intermediate code + inner loops + timings
*/*.mem -- output from '+RTS -sstderr' + /proc/self/status +
/proc/self/maps + output from /usr/bin/time (where
the number of minor page faults is most interesting
datum)
*/*.strace -- complete strace, taken together with */*.mem
*/*.iotrace -- only I/O operations from the strace (read/write/
select)
*/*.iosum -- summary of I/O operations
*/*.time -- time measurements
*/*.stat -- average + std.dev. + "time slack"
*/*.srctimespace -- source code + time/mem barchart (in ASCII)
sysinfo -- description of the platform (uname, ghc, gcc, etc)
platforminfo -- short description of the platform
report.txt [8-16K]
docs [1MB] -- sysinfo + all */*.doc concatenated
Makefile targets
----------------
This is taken from 'make help':
phase1 -- preparation + measurements that can run in background
phase2 -- measurements that should run on unloaded machine
redophase2 -- rerun phase2
doc, [ASCII=1] report, lastreport - reports
zipdata -- zip up measurements (to ghc-measurements.tar.gz)
prog,core,stg,cmm,asm,dis,discut
-- compile, compile to core/stg/cmm/asm, disassemble, cut out main loop
time,stat,mem,strace,iotrace,iosum,cache
-- measure run-time, GHC heap + OS mem, syscalls, I/O patterns, cache
cleartime, clean, distclean -- delete measurements etc
TESTKIND=(SMOKETEST,NORMAL,THOROUGH), defaults to NORMAL
STRACE=OLD, defaults to NEW
Necessary tools
---------------
Perl, sed, /usr/bin/time, bash (doesn't have to be the default shell as long as
it its in PATH), strace, GNU Make, objdump (from the binutils package), gcc.
Other things that could come in handy:
A console and/or terminal that understands UTF-8.
A less that understands UTF-8.
An editor that understands UTF-8.
A %!&# printing program that understands both UTF-8 and fonts. A2ps
doesn't do UTF-8. Uniprint used 1) a proportional font which 2) didn't
even have all the fractional-width blocks. U2ps used a with none of the
block characters. I ended up resorting to gedit's print function :(
That's enough for this email.
-Peter
1
0
This is the entire Makefile. It perhaps ought to be sent as an attachment
but my hacky mailer script wouldn't like it.
A few of the lines are wider than 80 columns, unfortunately.
-Peter
# GHC benchmarks of parsing (bytestring, basic code generation, I/O).
#
# Copyright 2007 Peter Lund <firefly(a)vax64.dk>, licensed under GPLv2.
#
#
# You will need the following tools:
# perl, strace, /usr/bin/time, bash, a gcc that uses shared libraries and libc
# (not dietlibc, klibc, uclibc), objdump (usually found in the binutils
# package).
#
# A benchmark run on a new platform can be split into two phases:
#
# phase1: Compiles, dices, and slices the code in various ways. If this
# completes you can be pretty sure that everything works all right.
# Performs non-timing sensitive measurements.
#
# phase2: Performs timing sensitive measurements. It is a good idea to run
# this phase on an idle machine, preferably without using X.
# For example, you can log out of X and run "telinit 1" to get to
# single-user mode. If you do so, please remember to set the correct
# path to your ghc compiler.
# Make a few bits of the makefile less noisy.
Q:=@
#########################################################
# Ask ghc to optimize and warn
GHCFLAGS= -O2 -W
# Some newer versions of gcc prefer -Wextra -Wall
GCCWARNFLAGS=-W -Wall
# Default compilers
CC=gcc
GHC=ghc
GHCPKG=ghc-pkg
#########################################################
HSPROGS=hs/byte-bs----acc \
hs/byte-bs----foldlx \
hs/byte-bs----foldrx \
hs/byte-bsl---acc \
hs/byte-xxxxx-acc-1 \
hs/byte-xxxxx-acc-2 \
hs/byte-xxxxx-foldl \
\
hs/space-bs-c8-acc-1 \
hs/space-bs-c8-count \
hs/space-bs-c8-foldlx-1 \
hs/space-bs-c8-foldlx-2 \
hs/space-bs-c8-foldrx \
hs/space-bs-c8-lenfil \
hs/space-bslc8-acc-1 \
hs/space-bslc8-acc-2 \
hs/space-bslc8-acc-3 \
hs/space-bslc8-chunk-1 \
hs/space-bslc8-chunk-2 \
hs/space-bslc8-chunk-3 \
hs/space-bslc8-chunk-4 \
hs/space-bslc8-count \
hs/space-bslc8-foldl \
hs/space-bslc8-foldlx-1 \
hs/space-bslc8-foldlx-2 \
hs/space-bslc8-foldr-1 \
hs/space-bslc8-foldr-2 \
hs/space-bslc8-lenfil-1 \
hs/space-bslc8-lenfil-2 \
hs/space-bsl---foldlx \
hs/space-xxxxx-acc-1 \
hs/space-xxxxx-acc-2 \
hs/space-xxxxx-foldl \
hs/space-xxxxx-lenfil
# RMPROGS keeps track of programs that are not always included in the tests.
# We do want 'make clean' to delete them even when they are not currently
# part of the build (they may be left over from a previous build).
# stack overflow with long4.
#HSPROGS:=$(HSPROGS) hs/byte-xxxxx-foldr-1
RMPROGS:=$(RMPROGS) hs/byte-xxxxx-foldr-1
# stack overflow with long4.
#HSPROGS:=$(HSPROGS) hs/byte-xxxxx-foldr-2
RMPROGS:=$(RMPROGS) hs/byte-xxxxx-foldr-2
# stack overflow with long4.
#HSPROGS:=$(HSPROGS) hs/space-xxxxx-foldr-1
RMPROGS:=$(RMPROGS) hs/space-xxxxx-foldr-1
# stack overflow with long4.
#HSPROGS:=$(HSPROGS) hs/space-xxxxx-foldr-2
RMPROGS:=$(RMPROGS) hs/space-xxxxx-foldr-2
HANDPROGS= hand/byte-bs----acc-a \
hand/byte-bs----acc-b \
hand/byte-bs----acc-c \
hand/byte-bs----acc-d \
\
hand/space-bs-c8-acc-1-a \
hand/space-bs-c8-acc-1-b \
hand/space-bs-c8-acc-1-c \
hand/space-bs-c8-acc-1-d \
hand/space-bs-c8-acc-1-e \
hand/space-bs-c8-acc-1-f \
hand/space-bs-c8-acc-1-g \
hand/space-bs-c8-acc-1-h \
hand/space-bs-c8-acc-1-i \
hand/space-bs-c8-acc-1-j \
hand/space-bs-c8-acc-1-k \
hand/space-bs-c8-acc-1-l \
hand/space-bs-c8-acc-1-m \
hand/space-bs-c8-acc-1-n \
hand/space-bs-c8-acc-1-o \
hand/space-bs-c8-acc-1-p \
hand/space-bs-c8-acc-1-q \
hand/space-bs-c8-acc-1-r \
hand/space-bs-c8-acc-1-s
RMPROGS:=$(RMPROGS) $(HANDPROGS)
ifeq ($(shell $(GHCPKG) list | grep bytestring),)
# ghc 6.6.1 with an old version of bytestring in 'base' but without its own
# module name
HSPROGS:=$(shell printf "%s\n" $(HSPROGS) | grep -v '.*-chunk-.*')
endif
HANDTEXT:=including hand-tweaked assembly
ifeq ($(shell $(GHC) --version | grep 6.9.20071119),)
HANDPROGS:=
HANDTEXT:=no hand-tweaked assembly
endif
ifneq ($(SUFFIX),)
HANDPROGS:=
HANDTEXT:=no hand-tweaked assembly
endif
CPROGS= c/byte-getchar c/byte-getchar-u c/byte-4k \
\
c/space-getchar c/space-getchar-u c/space-4k \
c/space-megabuf c/space-getwchar c/space-getwchar-u \
c/space-32k c/space-32k-8
#########################################################
# The benchmarks can be run in three modes. The default can be overridden from
# command line:
#
# make TESTKIND=SMOKETEST phase1
#
# just tests the test suite, as fast as possible
#TESTKINDDEFAULT=SMOKETEST
# small test
TESTKINDDEFAULT=NORMAL
# very thorough test
#TESTKINDDEFAULT=THOROUGH
TESTKIND=$(TESTKINDDEFAULT)
ifeq ($(TESTKIND),THOROUGH)
TESTFILE= testfiles/long4
TESTFILECACHE= testfiles/long3
else
ifeq ($(TESTKIND),NORMAL)
TESTFILE= testfiles/long3
TESTFILECACHE= testfiles/long2
else
ifeq ($(TESTKIND),SMOKETEST)
TESTFILE= testfiles/long2
TESTFILECACHE= testfiles/long1
endif
endif
endif
# Older versions of strace don't support the -E parameter which we use to
# set LD_PRELOAD before running the straced command (so we can get by with
# a single run of each benchmark in phase 1 instead of 2).
#
# Override with STRACE=OLD on the command line if you need to work with an
# old strace.
STRACE=NEW
#########################################################
.PHONY: XXXXFIRST testfiles core stg cmm asm dis discut prog \
time stat strace iotrace iosum mem cache doc phase1 phase2 redophase2
XXXXFIRST: help
testfiles: testfiles/long1 testfiles/long2 testfiles/long3 testfiles/long4
core: $(addsuffix .core ,$(HSPROGS) )
stg: $(addsuffix .stg ,$(HSPROGS) )
cmm: $(addsuffix .cmm ,$(HSPROGS) )
asm: $(addsuffix .s ,$(HSPROGS) $(CPROGS))
dis: $(addsuffix .dis ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
discut: $(addsuffix .discut ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
prog: $(HSPROGS) $(HANDPROGS) $(CPROGS)
time: prog \
testfiles \
$(addsuffix .time ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
stat: $(addsuffix .stat ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
strace: $(addsuffix .strace ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
iotrace:$(addsuffix .iotrace,$(HSPROGS) $(HANDPROGS) $(CPROGS))
iosum: $(addsuffix .iosum ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
mem: $(addsuffix .mem ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
cache: $(addsuffix .cache ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
doc: $(addsuffix .doc ,$(HSPROGS) $(HANDPROGS) $(CPROGS))
###
phase1: testfiles tools/eatmem prog iosum mem
printf "Done!\07" # beep
phase2: time report
printf "Done!\07" # beep
redophase2: cleartime
rm -f */*.srctimespace report.txt
$(MAKE) phase2
#########################################################
testfiles/long1:
mkdir -p testfiles
tools/genfiles.pl 10000 > "$@"
testfiles/long2:
mkdir -p testfiles
tools/genfiles.pl 100000 > "$@"
testfiles/long3:
mkdir -p testfiles
tools/genfiles.pl 1000000 > "$@"
testfiles/long4:
mkdir -p testfiles
tools/genfiles.pl 10000000 > "$@"
#########################################################
tools/eatmem: tools/eatmem.c
$(CC) $(GCCWARNFLAGS) -O2 "$<" -o "$@"
tools/pause-at-end.so: tools/pause-at-end.c
$(CC) $(GCCWARNFLAGS) -shared -ldl "$<" -o "$@"
#########################################################
docs: core stg cmm asm discut time iotrace doc sysinfo
rm -f docs
cat */*.doc sysinfo > docs
hs/%.doc: hs/%.core hs/%.stg hs/%.cmm hs/%.s hs/%.discut hs/%.time
(export F="$(basename $@)" ; \
printf "\n" ; \
printf "*********************************************\n" ; \
printf "****\n" ; \
printf "**** %s:\n" "$$F" ; \
printf "****\n" ; \
printf "*********************************************\n\n" ; \
printf "Haskell code:\n\n" ; \
cat "$$F.hs" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.core" ; \
cat "$$F.core" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.stg" ; \
cat "$$F.stg" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.cmm" ; \
cat "$$F.cmm" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.s" ; \
cat "$$F.s" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.discut" ; \
cat "$$F.discut" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.time" ; \
cat "$$F.time" ; \
printf -- "-------------------------------\n" ; \
printf "\014" ; \
) >> "$@"
hand/%.doc: hand/%.discut hand/%.time
(export F="$(basename $@)" ; \
printf "\n" ; \
printf "*********************************************\n" ; \
printf "****\n" ; \
printf "**** %s:\n" "$$F" ; \
printf "****\n" ; \
printf "*********************************************\n\n" ; \
printf "Haskell code:\n\n" ; \
export X=`echo $$F | sed -e 's/[a-z]$$//'` ; \
cat "$$X.hs" ; \
printf -- "-------------------------------\n" ; \
printf "%s (hand tweaked):\n" "$$F.s" ; \
cat "$$F.s" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.discut" ; \
cat "$$F.discut" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.time" ; \
cat "$$F.time" ; \
printf -- "-------------------------------\n" ; \
printf "\014" ; \
) >> "$@"
c/%.doc: c/%.s c/%.discut c/%.time
(export F="$(basename $@)" ; \
printf "\n" ; \
printf "*********************************************\n" ; \
printf "****\n" ; \
printf "**** %s:\n" "$$F" ; \
printf "****\n" ; \
printf "*********************************************\n\n" ; \
printf "C code:\n\n" ; \
cat "$$F.c" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.s" ; \
cat "$$F.s" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.discut" ; \
cat "$$F.discut" ; \
printf -- "-------------------------------\n" ; \
printf "%s:\n" "$$F.time" ; \
cat "$$F.time" ; \
printf -- "-------------------------------\n" ; \
printf "\014" ; \
) >> "$@"
#########################################################
.PHONY: report.txt lastreport
report: report.txt
$(Q)cat report.txt
report.txt \
$(addsuffix .srctimespace,$(HSPROGS) $(HANDPROGS) $(CPROGS)): \
iosum mem time stat platforminfo
$(Q)tools/genreport.pl $(HSPROGS) $(HANDPROGS) $(CPROGS) > report.txt
lastreport:
$(Q)cat report.txt
#########################################################
# Probably all or most of the targets (left-hand sides) in these rules should
# be mentioned in a .SECONDARY rule so make won't delete them behind our backs,
# in its infinite wisdom. This is sometimes necessary when using pattern
# rules (i.e. rules with '%' wildcards in them).
#
# For some reason, it doesn't seem to be all that necessary, although I had to
# insert a couple of those .SECONDARY things earlier to make make behave. For
# example, I had to insert this rule at some point but now things keep working
# even when it's commented out:
#
# .SECONDARY: $(HSPROGS) $(HANDPROGS) $(CPROGS)
#
hs/%: hs/%.hs
$(GHC) $(GHCFLAGS) --make -fforce-recomp "$<" -o "$@"
hand/%: hand/%.s
$(GHC) -no-hs-main "$<" -o "$@" -package bytestring
c/%: c/%.c
$(CC) $(GCCWARNFLAGS) -O2 "$<" -o "$@"
###
%.dis: %
@# Limit the disassembly for speed reasons (10x+ difference) and
@# file size reasons (20x-30x difference).
@# The stuff we are interested in comes early in the .text segment so
@# there's no reason to disassemble the entire runtime system, which
@# comes afterwards in case of hand/ and hs/ binaries.
objdump -M intel -D --stop-address=0x08060000 "$<" > "$@"
###
%.discut: %.dis
tools/cut.pl < "$<" > "$@"
###
%.core: %.hs
$(GHC) $(GHCFLAGS) -c -ddump-simpl "$<" > "$@"
###
%.stg: %.hs
$(GHC) $(GHCFLAGS) -c -ddump-stg "$<" > "$@"
###
%.cmm: %.hs
$(GHC) $(GHCFLAGS) -c -ddump-cmm "$<" > "$@"
###
%.s: %.hs
$(GHC) $(GHCFLAGS) -c -fforce-recomp -keep-s-files "$<"
.SECONDARY: $(addsuffix .s,$(CPROGS))
%.s: %.c
$(CC) $(GCCWARNFLAGS) -O2 -S $< -o $@
#########################################################
# The first run is sacrificial, except when smoketesting where there only is
# one run.
ifeq ($(TESTKIND),THOROUGH)
TIME= bash -c "time $<" < $(TESTFILE); \
bash -c "time $<" < $(TESTFILE); \
bash -c "time $<" < $(TESTFILE); \
bash -c "time $<" < $(TESTFILE); \
bash -c "time $<" < $(TESTFILE); \
bash -c "time $<" < $(TESTFILE)
NOSKIP:=
else
ifeq ($(TESTKIND),NORMAL)
TIME= bash -c "time $<" < $(TESTFILE); \
bash -c "time $<" < $(TESTFILE); \
bash -c "time $<" < $(TESTFILE); \
bash -c "time $<" < $(TESTFILE)
NOSKIP:=
else
ifeq ($(TESTKIND),SMOKETEST)
TIME= bash -c "time $<" < $(TESTFILE)
NOSKIP:= NOSKIP=1
endif
endif
endif
# In order to reduce the risk of swapping during the time test, we try to make
# sure there's twice the test file size free (and a bit).
# NEEDFREE is measured in kilobytes.
NEEDFREE=$(shell expr 22 '*' `ls -s $(TESTFILE) | cut -f1 -d' '` / 10)
%.time: % tools/eatmem $(TESTFILE)
printf "%s\n" "$<" > "$@"
tools/eatmem $(NEEDFREE)
dd if=$(TESTFILE) of=/dev/null
dd if="$<" of=/dev/null
($(TIME)) >>"$@" 2>&1
printf "%s\n\n" "-----" >> "$@"
%.stat: %.time
$(NOSKIP) tools/stat.pl < "$<" > "$@"
%.mem %.strace: % tools/pause-at-end.so $(TESTFILE)
ifeq ($(STRACE),OLD)
strace -o tmp.strace -f \
/usr/bin/time "$<" +RTS -sstderr < $(TESTFILE) > $(basename $(a)).mem 2>&1
LD_PRELOAD=tools/pause-at-end.so \
"$<" +RTS -sstderr < $(TESTFILE) >> $(basename $(a)).mem 2>&1
else
strace -o tmp.strace -ELD_PRELOAD=tools/pause-at-end.so -f \
/usr/bin/time "$<" +RTS -sstderr < $(TESTFILE) > $(basename $(a)).mem 2>&1
endif
tools/cutmem.pl < $(basename $(a)).mem > tmp
mv tmp $(basename $(a)).mem
tools/cutpid.pl < tmp.strace > $(basename $(a)).strace
rm -f tmp.strace
#%.strace: %.mem
# @echo > /dev/null
%.iotrace: %.strace
grep '^\(read\|write\|select\)' "$<" > "$@"
%.iosum: %.iotrace
tools/iosummary.pl < "$<" > "$@"
%.cache: % $(TESTFILECACHE)
valgrind --tool=cachegrind "$<" < $(TESTFILECACHE) 2> "$@"
#########################################################
.PHONY: zipdata help cleartime clean distclean
sysinfo:
hostname > sysinfo
cat /etc/*release >> sysinfo
@echo >> sysinfo
uname -a >> sysinfo
@echo >> sysinfo
cat /proc/cpuinfo >> sysinfo
$(GHC) --version >> sysinfo
echo >> sysinfo
$(CC) --version >> sysinfo
# This variable makes testing with weird /proc/cpuinfo files easier
CPUINFO=/proc/cpuinfo
platforminfo:
hostname > platforminfo
(printf 'ghc '; ($(GHC) --version | sed -ne 's/^The.*version //p')) >> platforminfo
cat $(CPUINFO) | sed -ne '/model name.*:/ { s/model name.*: //p; q}' >> platforminfo
printf "%s MHz\n" `cat $(CPUINFO) | sed -ne '/cpu MHz.*:/ { s/cpu MHz.*: //p; q}'` >> platforminfo
printf "TESTKIND=$(TESTKIND)\n" >> platforminfo
printf "SUFFIX=$(SUFFIX)\n" >> platforminfo
zipdata: time stat mem strace iotrace iosum sysinfo report.txt
rm -f ghc-measurements.tar.gz
rm -rf ghc-measurements
mkdir -p ghc-measurements
cp --parents \
$(addprefix */*, .time .stat .mem .iosum) sysinfo report.txt platforminfo \
ghc-measurements
printf "%s " "$(HSPROGS)" "$(HANDPROGS)" "$(CPROGS)" > ghc-measurements/progs
tar -zcf ghc-measurements.tar.gz ghc-measurements
rm -rf ghc-measurements
help:
@echo 'Measurements of very simple string I/O and parsing.'
@printf ' (%d benchmarks, %s)\n' `echo $(HSPROGS) $(HANDPROGS) $(CPROGS) | wc -w` "$(HANDTEXT)"
@echo ''
@echo ' phase1 -- preparation + measurements that can run in background'
@echo ' phase2 -- measurements that should run on unloaded machine'
@echo ' redophase2 -- rerun phase2'
@echo ''
@echo ' doc, [ASCII=1] report, lastreport - reports'
@echo ' zipdata -- zip up measurements (to ghc-measurements.tar.gz)'
@echo ''
@echo ' prog,core,stg,cmm,asm,dis,discut'
@echo ' -- compile, compile to core/stg/cmm/asm, disassemble, cut out main loop'
@echo ' time,stat,mem,strace,iotrace,iosum,cache'
@echo ' -- measure run-time, GHC heap + OS mem, syscalls, I/O patterns, cache'
@echo ''
@echo ' cleartime, clean, distclean -- delete measurements etc'
@echo ''
@echo ' TESTKIND=(SMOKETEST,NORMAL,THOROUGH), defaults to $(TESTKINDDEFAULT)'
@echo ' STRACE=OLD, defaults to NEW'
cleartime:
rm -f */*.time
clean:
# keep and hand/*.s !
rm -rf */*.hi */*.o *.o \
*/*.core */*.stg */*.cmm hs/*.s c/*.s */*.dis */*.discut \
*/*.hcr \
*/*.time */*.stat \
*/*.real \
*/*.strace */*.iotrace */*.iosum \
*/*.mem */*.cache cachegrind.out.* \
*/*.doc */*.srctimespace \
tmp.strace tmp \
tools/eatmem tools/pause-at-end.so \
$(HSPROGS) $(CPROGS) $(HANDPROGS) $(RMPROGS) a.out \
testfiles/ \
ghc-measurements/ \
sysinfo platforminfo docs xx.ps
distclean: clean
rm -f *~ */*~ report.txt ghc-measurements.tar.gz
1
0

[2/16] SBM: Inner loops of the hand-tweaked assembly benchmarks
by Peter Firefly Brodersen Lund 22 Dec '07
by Peter Firefly Brodersen Lund 22 Dec '07
22 Dec '07
I've taken the two benchmarks byte-bs----acc and space-bs-c8-acc-1 and
gradually tweaked their inner loops from something that used memory all the
time to something that used registers more and more efficiently. I've done
this gradually, pretty much one register at a time. Along the way, I've also
done a simple common subexpression/loop hoisting thing in which I combined the
pointer to the start of the string and the index into the string into a single
pointer. Doing this in real life may cause bad problems with the garbage
collector.
At the end, I go a bit mad and start doing heroic optimizations (reading four
bytes at a time, using MMX registers to read 8 bytes at a time, twisted MMX
math to keep 8 space counters in an MMX register + a bit of loop unrolling).
Here follows first the two original inner loops and then the 23 hand-tweaked
versions.
I used the following shell code to isolate the inner loops:
(for F in hs/byte-bs----acc.s hs/space-bs-c8-acc-1.s hand/*.s ; \
do echo "------------------------------"; \
echo "$F:"; \
echo ; \
cat "$F" | perl -e 'while(<>){ if (/Main_zdwcnt_info:/ .. /.section .data/) { print; }}' | head -n-1; \
done; \
echo "=============================="; \
) > xx.txt
-Peter
------------------------------
hs/byte-bs----acc.s:
Main_zdwcnt_info:
.LcYL:
cmpl $0,16(%ebp)
jle .LcYO
movl 12(%ebp),%eax
incl %eax
movl (%ebp),%ecx
incl %ecx
subl $1,16(%ebp)
movl %eax,12(%ebp)
movl %ecx,(%ebp)
jmp Main_zdwcnt_info
.LcYO:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
------------------------------
hs/space-bs-c8-acc-1.s:
Main_zdwcnt_info:
.Lc16u:
cmpl $0,16(%ebp)
jle .Lc16x
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16F
movl 12(%ebp),%eax
incl %eax
movl (%ebp),%ecx
incl %ecx
subl $1,16(%ebp)
movl %eax,12(%ebp)
movl %ecx,(%ebp)
jmp Main_zdwcnt_info
.Lc16x:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
.Lc16F:
movl 12(%ebp),%eax
incl %eax
subl $1,16(%ebp)
movl %eax,12(%ebp)
jmp Main_zdwcnt_info
------------------------------
hand/byte-bs----acc-a.s:
Main_zdwcnt_info:
.LcYN:
cmpl $0,16(%ebp)
jle .LcYQ
movl 00(%ebp),%ecx
movl 12(%ebp),%eax
movl 16(%ebp),%edx
incl %ecx
incl %eax
decl %edx
movl %ecx,00(%ebp)
movl %eax,12(%ebp)
movl %edx,16(%ebp)
jmp Main_zdwcnt_info
.LcYQ:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/byte-bs----acc-b.s:
Main_zdwcnt_info:
.LcYN:
cmpl $0,16(%ebp)
jle .LcYQ
movl 00(%ebp),%ecx
movl 12(%ebp),%eax
movl 16(%ebp),%edx
.L_again:
cmpl $0,%edx
jle .L_out
incl %ecx
incl %eax
decl %edx
jmp .L_again
.L_out:
movl %ecx,00(%ebp)
movl %eax,12(%ebp)
movl %edx,16(%ebp)
jmp Main_zdwcnt_info
.LcYQ:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/byte-bs----acc-c.s:
Main_zdwcnt_info:
.LcYN:
cmpl $0,16(%ebp)
jle .LcYQ
movl 00(%ebp),%ecx
movl 12(%ebp),%eax
movl 16(%ebp),%edx
cmpl $0,%edx
jle .L_out
.L_again:
incl %ecx
incl %eax
decl %edx
cmpl $0,%edx
jg .L_again
.L_out:
movl %ecx,00(%ebp)
movl %eax,12(%ebp)
movl %edx,16(%ebp)
jmp Main_zdwcnt_info
.LcYQ:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/byte-bs----acc-d.s:
Main_zdwcnt_info:
.LcYN:
cmpl $0,16(%ebp)
jle .LcYQ
movl 00(%ebp),%ecx
movl 12(%ebp),%eax
movl 16(%ebp),%edx
cmpl $0,%edx
jle .L_out
.align 16
.L_again:
incl %ecx
incl %eax
decl %edx
cmpl $0,%edx
jg .L_again
.L_out:
movl %ecx,00(%ebp)
movl %eax,12(%ebp)
movl %edx,16(%ebp)
jmp Main_zdwcnt_info
.LcYQ:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-a.s:
Main_zdwcnt_info:
.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
movl 12(%ebp),%eax
incl %eax
movl (%ebp),%ecx
incl %ecx
subl $1,16(%ebp)
movl %eax,12(%ebp)
movl %ecx,(%ebp)
jmp Main_zdwcnt_info
.Lc16H:
movl 12(%ebp),%eax
incl %eax
subl $1,16(%ebp)
movl %eax,12(%ebp)
jmp Main_zdwcnt_info
.Lc16z:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-b.s:
Main_zdwcnt_info:
.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
je .Lc16H
movl 12(%ebp),%eax
incl %eax
subl $1,16(%ebp)
movl %eax,12(%ebp)
jmp Main_zdwcnt_info
.Lc16H:
movl 12(%ebp),%eax
incl %eax
movl (%ebp),%ecx
incl %ecx
subl $1,16(%ebp)
movl %eax,12(%ebp)
movl %ecx,(%ebp)
jmp Main_zdwcnt_info
.Lc16z:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-c.s:
Main_zdwcnt_info:
.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
movl (%ebp),%ecx
incl %ecx
movl 12(%ebp),%eax
incl %eax
movl %ecx,(%ebp)
movl %eax,12(%ebp)
subl $1,16(%ebp)
jmp Main_zdwcnt_info
.Lc16z:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
.Lc16H:
movl 12(%ebp),%eax
incl %eax
movl %eax,12(%ebp)
subl $1,16(%ebp)
jmp Main_zdwcnt_info
------------------------------
hand/space-bs-c8-acc-1-d.s:
Main_zdwcnt_info:
.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
addl $1,(%ebp)
addl $1,12(%ebp)
subl $1,16(%ebp)
jmp Main_zdwcnt_info
.Lc16z:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
.Lc16H:
addl $1,12(%ebp)
subl $1,16(%ebp)
jmp Main_zdwcnt_info
------------------------------
hand/space-bs-c8-acc-1-e.s:
Main_zdwcnt_info:
.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
movl 12(%ebp),%eax
incl %eax
incl %ecx
movl (%ebp),%eax
incl %eax
subl $1,16(%ebp)
movl %ecx,12(%ebp)
movl %eax,(%ebp)
jmp Main_zdwcnt_info
.Lc16z:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
.Lc16H:
incl %ecx
subl $1,16(%ebp)
movl %ecx,12(%ebp)
jmp Main_zdwcnt_info
------------------------------
hand/space-bs-c8-acc-1-f.s:
Main_zdwcnt_info:
.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
incl %ecx
subl $1,16(%ebp)
addl $1,(%ebp)
movl %ecx,12(%ebp)
jmp Main_zdwcnt_info
.Lc16z:
movl (%ebp),%esi
addl $20,%ebp
jmp *(%ebp)
.Lc16H:
incl %ecx
subl $1,16(%ebp)
movl %ecx,12(%ebp)
jmp Main_zdwcnt_info
------------------------------
hand/space-bs-c8-acc-1-g.s:
Main_zdwcnt_info:
movl (%ebp),%esi
.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movl 12(%ebp),%ecx
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
incl %ecx
subl $1,16(%ebp)
inc %esi
movl %ecx,12(%ebp)
jmp .Lc16w
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
.Lc16H:
incl %ecx
subl $1,16(%ebp)
movl %ecx,12(%ebp)
jmp .Lc16w
------------------------------
hand/space-bs-c8-acc-1-h.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 12(%ebp),%ecx
.Lc16w:
cmpl $0,16(%ebp)
jle .Lc16z
movl 4(%ebp),%eax
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
incl %ecx
subl $1,16(%ebp)
inc %esi
jmp .Lc16w
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
.Lc16H:
incl %ecx
subl $1,16(%ebp)
jmp .Lc16w
------------------------------
hand/space-bs-c8-acc-1-i.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 12(%ebp),%ecx
movl 16(%ebp),%edx
.Lc16w:
cmpl $0,%edx
jle .Lc16z
movl 4(%ebp),%eax
movzbl (%eax,%ecx,1),%eax
cmpl $32,%eax
jne .Lc16H
incl %ecx
decl %edx
inc %esi
jmp .Lc16w
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
.Lc16H:
incl %ecx
decl %edx
jmp .Lc16w
------------------------------
hand/space-bs-c8-acc-1-j.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 4(%ebp),%ecx
addl 12(%ebp),%ecx
movl 16(%ebp),%edx
.Lc16w:
cmpl $0,%edx
jle .Lc16z
movzbl (%ecx),%eax
cmpl $32,%eax
jne .Lc16H
incl %ecx
decl %edx
inc %esi
jmp .Lc16w
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
.Lc16H:
incl %ecx
decl %edx
jmp .Lc16w
------------------------------
hand/space-bs-c8-acc-1-k.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 4(%ebp),%ecx
addl 12(%ebp),%ecx
movl 16(%ebp),%edx
.Lc16w:
cmpl $0,%edx
jle .Lc16z
movzbl (%ecx),%eax
cmpl $32,%eax
jne .Lc16H
incl %ecx
decl %edx
inc %esi
jmp .Lc16w
.Lc16H:
incl %ecx
decl %edx
jmp .Lc16w
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-l.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 4(%ebp),%ecx
addl 12(%ebp),%ecx
movl 16(%ebp),%edx
.Lc16w:
cmpl $0,%edx
jle .Lc16z
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16H
inc %esi
jmp .Lc16w
.Lc16H:
jmp .Lc16w
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-m.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 4(%ebp),%ecx
addl 12(%ebp),%ecx
movl 16(%ebp),%edx
.Lc16w:
cmpl $0,%edx
jle .Lc16z
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16w
inc %esi
jmp .Lc16w
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-n.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 4(%ebp),%ecx
addl 12(%ebp),%ecx
movl 16(%ebp),%edx
.Lc16w:
cmpl $0,%edx
jle .Lc16z
.Lc16xx:
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16w
inc %esi
cmpl $0,%edx
jg .Lc16xx
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-o.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 4(%ebp),%ecx
addl 12(%ebp),%ecx
movl 16(%ebp),%edx
.Lc16w:
cmpl $0,%edx
jle .Lc16z
.Lc16xx:
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16w
inc %esi
cmpl $0,%edx
jle .Lc16z
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16w
inc %esi
cmpl $0,%edx
jg .Lc16xx
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-p.s:
Main_zdwcnt_info:
movl (%ebp),%esi
movl 4(%ebp),%ecx
addl 12(%ebp),%ecx
movl 16(%ebp),%edx
.Lc16w4:
cmpl $4,%edx
jl .Lc16wxx
movl (%ecx),%eax
addl $4,%ecx
subl $4,%edx
cmpb $32,%al
jne .Lc16wa
incl %esi
.Lc16wa:
cmpb $32,%ah
jne .Lc16wb
incl %esi
.Lc16wb:
shrl $16,%eax
cmpb $32,%al
jne .Lc16wc
incl %esi
.Lc16wc:
cmpb $32,%ah
jne .Lc16w4
incl %esi
jmp .Lc16w4
.Lc16w1:
cmpl $0,%edx
jle .Lc16z
.Lc16wxx:
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16w1
inc %esi
jmp .Lc16w1
.Lc16z:
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-q.s:
Main_zdwcnt_info:
movl (%ebp),%esi /* #spaces found */
movl 4(%ebp),%ecx /* ptr */
addl 12(%ebp),%ecx /* ... + idx */
movl 16(%ebp),%edx /* cnt of remaining bytes */
emms /* clear fp tags so we can use mmx instrs */
mov $0x20202020,%eax
movd %eax,%mm1 /* mm1: 0000000020202020 */
movq %mm1,%mm0 /* mm0: 0000000020202020 */
psllq $32,%mm1 /* mm1: 2020202000000000 */
por %mm0,%mm1 /* mm1: 2020202020202020 */
mov $0x01010101,%eax
movd %eax,%mm2 /* mm2: 0000000001010101 */
movq %mm2,%mm0 /* mm0: 0000000001010101 */
psllq $32,%mm2 /* mm2: 0101010100000000 */
por %mm0,%mm2 /* mm2: 0101010101010101 */
/* MMX loads can use any alignment (potentially at a speed-hit) */
/* this loop looks at 8 bytes at a time */
.Lc16w8:
cmpl $8,%edx
jl .Lc16w1
movq (%ecx),%mm0 /* mm0 holds 8 characters */
addl $8,%ecx
subl $8,%edx
pcmpeqb %mm1,%mm0 /* cmp byte for byte with ' ' */
/* the result flag is 00 or FF */
pand %mm2,%mm0 /* turn FF into 01, which is actually useful */
/* if we could just add the bytes up horizontally in %mm0, sigh.. .*/
movd %mm0,%eax
push %eax
add %ah, %al
and $0x03,%eax
add %eax,%esi
pop %eax
shr $16,%eax
add %ah,%al
and $0x03,%eax
add %eax,%esi
psrlq $32,%mm0
movd %mm0,%eax
push %eax
add %ah, %al
and $0x03,%eax
add %eax,%esi
pop %eax
shr $16,%eax
add %ah,%al
and $0x03,%eax
add %eax,%esi
jmp .Lc16w8
/* this loop looks at one byte at a time to handle the remainder */
.Lc16w1:
cmpl $0,%edx
jle .Lc16z
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16w1
inc %esi
jmp .Lc16w1
/* done, remember to clear fp/mmx tags with emms */
.Lc16z:
emms
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-r.s:
Main_zdwcnt_info:
movl (%ebp),%esi /* #spaces found */
movl 4(%ebp),%ecx /* ptr */
addl 12(%ebp),%ecx /* ... + idx */
movl 16(%ebp),%edx /* cnt of remaining bytes */
emms /* clear fp tags so we can use mmx instrs */
mov $0x20202020,%eax
movd %eax,%mm1 /* mm1: 0000000020202020 */
movq %mm1,%mm0 /* mm0: 0000000020202020 */
psllq $32,%mm1 /* mm1: 2020202000000000 */
por %mm0,%mm1 /* mm1: 2020202020202020 */
mov $0x01010101,%eax
movd %eax,%mm2 /* mm2: 0000000001010101 */
movq %mm2,%mm0 /* mm0: 0000000001010101 */
psllq $32,%mm2 /* mm2: 0101010100000000 */
por %mm0,%mm2 /* mm2: 0101010101010101 */
/* MMX loads can use any alignment (potentially at a speed-hit) */
/* therefore we don't have to try to read 1-7 bytes one at a time */
/* first in order to end up with an aligned %ecx. */
.Lc16_mainloop:
cmpl $8,%edx
jl .Lc16w1
movl %edx,%eax
shr $3,%eax
cmpl $127,%eax
jle .Lc16_127
movl $127,%eax
.Lc16_127:
shl $3,%eax
sub %eax,%edx
shr $3,%eax
pxor %mm3,%mm3 /* clear block of space counters */
/* loop up to 127 times in a loop that looks at 8 bytes at a time. */
/* Going above 255 could overflow the 8 counters in mm3. */
/* Going above 127 could overflow the horizontal summation code. */
.Lc16w8:
cmpl $0,%eax
jle .Lc16w8end
movq (%ecx),%mm0 /* mm0 holds 8 characters */
addl $8,%ecx
decl %eax
pcmpeqb %mm1,%mm0 /* cmp byte for byte with ' ' */
/* the result flag is 00 or FF */
pand %mm2,%mm0 /* turn FF into 01, which is actually useful */
paddb %mm0,%mm3 /* add to the 8 space counters */
jmp .Lc16w8
.Lc16w8end:
/* sum the 8 space counters in mm3 and add to %esi */
/* if only MMX had horizontal byte adds... */
movd %mm3,%eax
push %eax
add %ah, %al /* NOTE! potential overflow! */
and $0xFF,%eax
add %eax,%esi
pop %eax
shr $16,%eax
add %ah,%al /* NOTE! potential overflow! */
and $0xFF,%eax
add %eax,%esi
psrlq $32,%mm3
movd %mm3,%eax
push %eax
add %ah, %al /* NOTE! potential overflow! */
and $0xFF,%eax
add %eax,%esi
pop %eax
shr $16,%eax
add %ah,%al /* NOTE! potential overflow! */
and $0xFF,%eax
add %eax,%esi
jmp .Lc16_mainloop
/* this loop looks at one byte at a time to handle the remainder */
.Lc16w1:
cmpl $0,%edx
jle .Lc16z
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16w1
inc %esi
jmp .Lc16w1
/* done, remember to clear fp/mmx tags with emms */
.Lc16z:
emms
addl $20,%ebp
jmp *(%ebp)
------------------------------
hand/space-bs-c8-acc-1-s.s:
Main_zdwcnt_info:
movl (%ebp),%esi /* #spaces found */
movl 4(%ebp),%ecx /* ptr */
addl 12(%ebp),%ecx /* ... + idx */
movl 16(%ebp),%edx /* cnt of remaining bytes */
emms /* clear fp tags so we can use mmx instrs */
mov $0x20202020,%eax
movd %eax,%mm1 /* mm1: 0000000020202020 */
movq %mm1,%mm0 /* mm0: 0000000020202020 */
psllq $32,%mm1 /* mm1: 2020202000000000 */
por %mm0,%mm1 /* mm1: 2020202020202020 */
mov $0x01010101,%eax
movd %eax,%mm2 /* mm2: 0000000001010101 */
movq %mm2,%mm0 /* mm0: 0000000001010101 */
psllq $32,%mm2 /* mm2: 0101010100000000 */
por %mm0,%mm2 /* mm2: 0101010101010101 */
/* MMX loads can use any alignment (potentially at a speed-hit) */
/* therefore we don't have to try to read 1-7 bytes one at a time */
/* first in order to end up with an aligned %ecx. */
.Lc16_mainloop:
cmpl $8,%edx
jl .Lc16w1
movl %edx,%eax
shr $3,%eax
cmpl $127,%eax
jle .Lc16_127
movl $127,%eax
.Lc16_127:
shl $3,%eax
sub %eax,%edx
shr $3,%eax
pxor %mm3,%mm3 /* clear block of space counters */
/* loop up to 127 times in a loop that looks at 8 bytes at a time. */
/* Going above 255 could overflow the 8 counters in mm3. */
/* Going above 127 could overflow the horizontal summation code. */
cmpl $0,%eax
jle .Lc16w8end
/* this is an unspeakably ugly and sloppy loop unrolling. Doesn't */
/* seem to help much on an Athlon64 3000+. */
test $1,%eax
jz .Lc16w8
incl %eax
jmp .Lc16w8x
.Lc16w8:
movq (%ecx),%mm0 /* mm0 holds 8 characters */
addl $8,%ecx
pcmpeqb %mm1,%mm0 /* cmp byte for byte with ' ' */
/* the result flag is 00 or FF */
pand %mm2,%mm0 /* turn FF into 01, which is actually useful */
paddb %mm0,%mm3 /* add to the 8 space counters */
.Lc16w8x:
movq (%ecx),%mm0 /* mm0 holds 8 characters */
addl $8,%ecx
pcmpeqb %mm1,%mm0 /* cmp byte for byte with ' ' */
/* the result flag is 00 or FF */
pand %mm2,%mm0 /* turn FF into 01, which is actually useful */
paddb %mm0,%mm3 /* add to the 8 space counters */
subl $2,%eax
jnz .Lc16w8
.Lc16w8end:
/* sum the 8 space counters in mm3 and add to %esi */
/* if only MMX had horizontal byte adds... */
movd %mm3,%eax
push %eax
add %ah, %al /* NOTE! potential overflow! */
and $0xFF,%eax
add %eax,%esi
pop %eax
shr $16,%eax
add %ah,%al /* NOTE! potential overflow! */
and $0xFF,%eax
add %eax,%esi
psrlq $32,%mm3
movd %mm3,%eax
push %eax
add %ah, %al /* NOTE! potential overflow! */
and $0xFF,%eax
add %eax,%esi
pop %eax
shr $16,%eax
add %ah,%al /* NOTE! potential overflow! */
and $0xFF,%eax
add %eax,%esi
jmp .Lc16_mainloop
/* this loop looks at one byte at a time to handle the remainder */
.Lc16w1:
cmpl $0,%edx
jle .Lc16z
movzbl (%ecx),%eax
incl %ecx
decl %edx
cmpl $32,%eax
jne .Lc16w1
inc %esi
jmp .Lc16w1
/* done, remember to clear fp/mmx tags with emms */
.Lc16z:
emms
addl $20,%ebp
jmp *(%ebp)
==============================
1
0
Here are the 48 Haskell and C benchmarks.
Don Stewart contributed three (although I had to fight a bit to make one of
them compile).
Jules Bean (quicksilver) contributed one.
Bertram Felgenhauer (int-e) contributed three (in the form of a single file,
which I untangled).
Spencer Jannsen (sjannsen) contributed one.
wli (William Lee Irwin III) inspired me to add the getwchar benchmarks.
I used the following shell code to gather all the benchmarks:
(for F in hs/*.hs c/*.c; \
do echo "------------------------------"; \
echo "$F:"; \
echo ; \
cat "$F"; \
done; \
echo "==============================" \
) > xx.txt
They are not in the same order as in the Makefile or in the reports,
unfortunately.
-Peter
------------------------------
hs/byte-bs----acc.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B
cnt :: Int -> B.ByteString -> Int
cnt !acc !bs = if B.null bs
then acc
else cnt (acc+1) (B.tail bs)
main = do s <- B.getContents
print (cnt 0 s)
------------------------------
hs/byte-bs----foldlx.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B
cnt :: B.ByteString -> Int
cnt !bs = B.foldl' (\sum _ -> sum+1) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/byte-bs----foldrx.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B
cnt :: B.ByteString -> Int
cnt !bs = B.foldr' (\_ sum -> sum+1) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/byte-bsl---acc.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy as B
cnt :: Int -> B.ByteString -> Int
cnt !acc !bs = if B.null bs
then acc
else cnt (acc+1) (B.tail bs)
main = do s <- B.getContents
print (cnt 0 s)
------------------------------
hs/byte-xxxxx-acc-1.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: Int -> String -> Int
cnt !acc bs = if null bs
then acc
else cnt (acc+1) (tail bs)
main = do s <- getContents
print (cnt 0 s)
------------------------------
hs/byte-xxxxx-acc-2.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: Int -> String -> Int
cnt !acc !bs = if null bs
then acc
else cnt (acc+1) (tail bs)
main = do s <- getContents
print (cnt 0 s)
------------------------------
hs/byte-xxxxx-foldl.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: String -> Int
cnt !bs = foldl (\sum _ -> sum+1) 0 bs
main = do s <- getContents
print (cnt s)
------------------------------
hs/byte-xxxxx-foldr-1.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: String -> Int
cnt bs = foldr (\_ sum -> sum+1) 0 bs
main = do s <- getContents
print (cnt s)
------------------------------
hs/byte-xxxxx-foldr-2.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: String -> Int
cnt !bs = foldr (\_ sum -> sum+1) 0 bs
main = do s <- getContents
print (cnt s)
------------------------------
hs/space-bs-c8-acc-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
cnt :: Int -> B.ByteString -> Int
cnt !acc bs = if B.null bs
then acc
else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
main = do s <- B.getContents
print (cnt 0 s)
------------------------------
hs/space-bs-c8-count.hs:
-- Don Stewart
import qualified Data.ByteString.Char8 as B
main = print . B.count ' ' =<< B.getContents
------------------------------
hs/space-bs-c8-foldlx-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
cnt :: B.ByteString -> Int
cnt bs = B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bs-c8-foldlx-2.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
main = do s <- B.getContents
print $ B.foldl' (\v c -> if c == ' ' then v+1 else v :: Int) 0 s
------------------------------
hs/space-bs-c8-foldrx.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
cnt :: B.ByteString -> Int
cnt bs = B.foldr' (\c sum -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bs-c8-lenfil.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as B
cnt :: B.ByteString -> Int
cnt bs = B.length (B.filter (== ' ') bs)
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bslc8-acc-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: Int -> B.ByteString -> Int
cnt !acc bs = if B.null bs
then acc
else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
main = do s <- B.getContents
print (cnt 0 s)
------------------------------
hs/space-bslc8-acc-2.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: Int -> B.ByteString -> Int
cnt !acc !bs = if B.null bs
then acc
else cnt (if B.head bs == ' ' then acc+1 else acc) (B.tail bs)
main = do s <- B.getContents
print (cnt 0 s)
------------------------------
hs/space-bslc8-acc-3.hs:
{-# LANGUAGE BangPatterns #-}
-- this version by quicksilver
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: Int -> B.ByteString -> Int
cnt !acc bs | B.null bs = acc
| B.head bs == ' ' = cnt (acc+1) (B.tail bs)
| otherwise = cnt acc (B.tail bs)
main = do s <- B.getContents
print (cnt 0 s)
------------------------------
hs/space-bslc8-chunk-1.hs:
{-# LANGUAGE BangPatterns #-}
-- this version by int-e
import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.ByteString.Char8 as BS
import Data.List (foldl')
cntS :: Int -> BS.ByteString -> Int
cntS !acc !bs = case BS.uncons bs of
Nothing -> acc
Just (hd, tl) | hd == ' ' -> cntS (acc+1) tl
| otherwise -> cntS acc tl
cnt :: Int -> B.ByteString -> Int
cnt acc bs = foldl' cntS acc (B.toChunks bs)
main = do s <- B.getContents
print $ cnt 0 s
------------------------------
hs/space-bslc8-chunk-2.hs:
{-# LANGUAGE BangPatterns #-}
-- this version by int-e
import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.ByteString.Char8 as BS
import Data.List (foldl')
cntS' :: Int -> BS.ByteString -> Int
cntS' !acc !bs | BS.null bs = acc
| BS.head bs == ' ' = cntS' (acc+1) (BS.tail bs)
| otherwise = cntS' acc (BS.tail bs)
cnt :: Int -> B.ByteString -> Int
cnt acc bs = foldl' cntS' acc (B.toChunks bs)
main = do s <- B.getContents
print $ cnt 0 s
------------------------------
hs/space-bslc8-chunk-3.hs:
{-# LANGUAGE BangPatterns #-}
-- this version by int-e
import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.ByteString.Char8 as BS
import Data.List (foldl')
cntS'' :: Int -> BS.ByteString -> Int
cntS'' !acc !bs = BS.foldl' (\v c -> if c == ' ' then v+1 else v) acc bs
cnt :: Int -> B.ByteString -> Int
cnt acc bs = foldl' cntS'' acc (B.toChunks bs)
main = do s <- B.getContents
print $ cnt 0 s
------------------------------
hs/space-bslc8-chunk-4.hs:
{-# LANGUAGE BangPatterns #-}
-- Don Stewart
import qualified Data.ByteString.Lazy.Char8 as BLC8
import qualified Data.ByteString.Lazy.Internal as BLI
import qualified Data.ByteString as B
import qualified Data.ByteString.Unsafe as BU
import qualified Data.ByteString.Internal as BI
cnt :: Int -> BLC8.ByteString -> Int
cnt n BLI.Empty = n
cnt n (BLI.Chunk x xs) = cnt (n + cnt_strict 0 x) xs -- process lazy spine
where -- now we can process a chunk without checking for Empty
cnt_strict !i !s -- then strict chunk
| B.null s = i
| c == ' ' = cnt_strict (i+1) t
| otherwise = cnt_strict i t
where
(c,t) = (BI.w2c (BU.unsafeHead s), BU.unsafeTail s) -- no bounds check
main = do s <- BLC8.getContents; print (cnt 0 s)
------------------------------
hs/space-bslc8-count.hs:
-- Don Stewart
import qualified Data.ByteString.Lazy.Char8 as B
main = print . B.count ' ' =<< B.getContents
------------------------------
hs/space-bslc8-foldl.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: B.ByteString -> Int
cnt !bs = B.foldl (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bslc8-foldlx-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: B.ByteString -> Int
cnt bs = B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bslc8-foldlx-2.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: B.ByteString -> Int
cnt !bs = B.foldl' (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bslc8-foldr-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: B.ByteString -> Int
cnt bs = B.foldr (\c sum -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bslc8-foldr-2.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
cnt :: B.ByteString -> Int
cnt !bs = B.foldr (\c sum -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bslc8-lenfil-1.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
import GHC.Int (Int64)
-- note that D.BS.Lazy.Char8.length is ByteString -> Int64
-- D.BS.C8.length is ByteString -> Int
cnt :: B.ByteString -> Int64
cnt bs = B.length (B.filter (== ' ') bs)
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bslc8-lenfil-2.hs:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
import GHC.Int (Int64)
-- note that D.BS.Lazy.Char8.length is ByteString -> Int64
-- D.BS.C8.length is ByteString -> Int
cnt :: B.ByteString -> Int64
cnt !bs = B.length (B.filter (== ' ') bs)
main = do s <- B.getContents
print (cnt s)
------------------------------
hs/space-bsl---foldlx.hs:
{-# LANGUAGE BangPatterns #-}
-- this version by sjannsen
import Data.ByteString.Lazy as B
cnt :: B.ByteString -> Int
cnt = B.foldl' f 0
where
f !n 32 = n+1
f !n _ = n
main = do
s <- B.getContents
print $ cnt s
------------------------------
hs/space-xxxxx-acc-1.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: Int -> String -> Int
cnt !acc bs = if null bs
then acc
else cnt (if head bs == ' ' then acc+1 else acc) (tail bs)
main = do s <- getContents
print (cnt 0 s)
------------------------------
hs/space-xxxxx-acc-2.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: Int -> String -> Int
cnt !acc !bs = if null bs
then acc
else cnt (if head bs == ' ' then acc+1 else acc) (tail bs)
main = do s <- getContents
print (cnt 0 s)
------------------------------
hs/space-xxxxx-foldl.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: String -> Int
cnt bs = foldl (\sum c -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- getContents
print (cnt s)
------------------------------
hs/space-xxxxx-foldr-1.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: String -> Int
cnt bs = foldr (\c sum -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- getContents
print (cnt s)
------------------------------
hs/space-xxxxx-foldr-2.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: String -> Int
cnt !bs = foldr (\c sum -> if c == ' ' then sum+1 else sum) 0 bs
main = do s <- getContents
print (cnt s)
------------------------------
hs/space-xxxxx-lenfil.hs:
{-# LANGUAGE BangPatterns #-}
cnt :: String -> Int
cnt bs = length (filter (== ' ') bs)
main = do s <- getContents
print (cnt s)
------------------------------
c/byte-4k.c:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
int Main_cnt()
{
int cnt = 0;
ssize_t sze;
char buf[4*1024];
do {
again:
sze = read(fileno(stdin), buf, sizeof(buf));
if (sze < 0) {
switch (errno) {
case EAGAIN: goto again;
default:
perror("read() failed\n");
exit(1);
}
}
cnt += sze;
} while (sze != 0);
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/byte-getchar.c:
#include <stdio.h>
#include <stdlib.h>
int Main_cnt()
{
int cnt = 0;
int c;
while ((c = getchar()) != EOF)
cnt++;
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/byte-getchar-u.c:
#include <stdio.h>
#include <stdlib.h>
int Main_cnt()
{
int cnt = 0;
int c;
while ((c = getchar_unlocked()) != EOF)
cnt++;
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/space-32k-8.c:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
int Main_cnt()
{
int cnt = 0;
ssize_t sze, left;
char buf[32760];
char *p;
printf("using a buffer of %g KB\n", sizeof(buf) / 1024.0);
do {
again:
sze = read(fileno(stdin), buf, sizeof(buf));
if (sze < 0) {
switch (errno) {
case EAGAIN: goto again;
default:
perror("read() failed\n");
exit(1);
}
}
for (p = buf, left=sze; left > 0; left--)
if (*p++ == ' ')
cnt++;
} while (sze != 0);
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/space-32k.c:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
int Main_cnt()
{
int cnt = 0;
ssize_t sze, left;
char buf[32*1024];
char *p;
printf("using a buffer of %g KB\n", sizeof(buf) / 1024.0);
do {
again:
sze = read(fileno(stdin), buf, sizeof(buf));
if (sze < 0) {
switch (errno) {
case EAGAIN: goto again;
default:
perror("read() failed\n");
exit(1);
}
}
for (p = buf, left=sze; left > 0; left--)
if (*p++ == ' ')
cnt++;
} while (sze != 0);
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/space-4k.c:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
int Main_cnt()
{
int cnt = 0;
ssize_t sze, left;
char buf[4*1024];
char *p;
printf("using a buffer of %g KB\n", sizeof(buf) / 1024.0);
do {
again:
sze = read(fileno(stdin), buf, sizeof(buf));
if (sze < 0) {
switch (errno) {
case EAGAIN: goto again;
default:
perror("read() failed\n");
exit(1);
}
}
for (p = buf, left=sze; left > 0; left--)
if (*p++ == ' ')
cnt++;
} while (sze != 0);
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/space-getchar.c:
#include <stdio.h>
#include <stdlib.h>
int Main_cnt()
{
int cnt = 0;
int c;
while ((c = getchar()) != EOF)
if (c == ' ')
cnt++;
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/space-getchar-u.c:
#include <stdio.h>
#include <stdlib.h>
int Main_cnt()
{
int cnt = 0;
int c;
while ((c = getchar_unlocked()) != EOF)
if (c == ' ')
cnt++;
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/space-getwchar.c:
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
int Main_cnt()
{
int cnt = 0;
wint_t c;
while ((c = getwchar()) != WEOF)
if (c == ' ')
cnt++;
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/space-getwchar-u.c:
#define _GNU_SOURCE
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
int Main_cnt()
{
int cnt = 0;
wint_t c;
while ((c = getwchar_unlocked()) != WEOF)
if (c == ' ')
cnt++;
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
------------------------------
c/space-megabuf.c:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/stat.h>
#include <unistd.h>
int isfile(int handle)
{
struct stat buf;
if (fstat(handle, &buf) == -1) {
perror("fstat(stdin)\n");
exit(1);
}
return S_ISREG(buf.st_mode);
}
ssize_t getbufsize()
{
if (isfile(fileno(stdin))) {
off_t x;
x = lseek(fileno(stdin), 0, SEEK_END);
if (x == -1) {
perror("lseek(... SEEK_END)\n");
exit(1);
}
if (lseek(fileno(stdin), 0, SEEK_SET) == -1) {
perror("lseek(... SEEK_SET)\n");
exit(1);
}
if (x > 1*1024*1024*1024LL) {
x = 1024*1024*1024LL;
}
return x; /* file size for files */
} else {
return 10*1024*1024; /* 10M for non-files */
}
}
int Main_cnt()
{
int cnt = 0, reads=0, retries=0;
ssize_t sze, left, bufsize;
char *buf;
char *p;
bufsize = getbufsize();
printf("using a buffer of %g MB\n", bufsize / (1024*1024.0));
buf = malloc(bufsize);
if (!buf) {
fprintf(stderr, "couldn't allocate %lld bytes\n",
(long long) bufsize);
}
do {
again:
sze = read(fileno(stdin), buf, bufsize);
if (sze < 0) {
switch (errno) {
case EAGAIN: retries++; goto again;
default:
perror("read() failed\n");
exit(1);
}
}
reads++;
for (p = buf, left=sze; left > 0; left--)
if (*p++ == ' ')
cnt++;
} while (sze != 0);
printf("%d reads, %d retries\n", reads, retries);
return cnt;
}
int main()
{
printf("%d\n", Main_cnt());
return EXIT_SUCCESS;
}
==============================
1
0

[0/16] SBM: Simple Bytestring Microbenchmarks, Overview and Introduction
by Peter Firefly Brodersen Lund 22 Dec '07
by Peter Firefly Brodersen Lund 22 Dec '07
22 Dec '07
Table of contents:
0/16 This email
1/16 The Haskell and C benchmarks
2/16 Inner loops of the hand-tweaked assembly benchmarks
3/16 The Makefile
4/16 How to use the Makefile (how to run benchmarks etc.)
5/16 Support scripts and scriptlets
6/16 6.9.20071124 Athlon Duron
7/16 6.9.20071208 Core Duo
8/16 6.9.20071119 Pentium III
9/16 6.9.20071119 Athlon64
10/16 Graphs for 6.9.x across four cpus
11/16 Graphs for hand-tweaked assembly benchmarks
12/16 Graphs for 7 ghc/bytestring combinations on a 2GHz Athlon64 300+
13/16 Graphs that show the infidelity of -sstderr
14/16 Behind the measurements (rationale)
15/16 Predictions compared to the measurements
16/16 Discussion and Conclusion
Simple Bytestring Microbenchmarks
---------------------------------
Introduction
------------
I love parsers. I have been writing parsers for fun for over twenty years.
The nicest way to construct a parser used to be to write a recursive descent
parser by hand. If you had to work with people who'd had the misfortune of
a university education, you would resort to lex and yacc (flex and bison),
despite their many shortcomings.
Combinator parsers is the only real improvement over hand-written recursive
descent parsers that I know of. They do tend to require features that not all
languages provide. I don't know how to write a good one in C, for example.
They do work very well in Haskell, though.
So, I've started writing a parser in Haskell (ghc, really) for the programming
language X++. X++ is not a nice language but that's beside the point. The
challenge for me is to write an efficient compiler + provide good analysis
tools for X++. I think I stand a better chance of doing that in Haskell (ghc)
than in practically any other language.
There are a few drawbacks, though.
I love speed. And efficiency.
String handling in Haskell
--------------------------
Native strings are simple and generally work well, but they are slow, take up
too much memory and there's the whole encoding mess that still needs to be
sorted out.
People have worked on other string representations and libraries for quite a
while. I think packedstrings (as used in Darcs) was one of the first ones.
Bytestrings is the current incarnation. It seems to be just the right thing,
especially when combined with improved automatic fusion in the compiler so
higher-order functions don't have to be expensive.
I use Parsec as my parser combinator library at the moment, which
uses native strings. I would dearly love Parsec to be faster and use less
memory. I think bytestrings will be part of any substantial improvement in
Parsec's resource consumption.
Other performance concerns
--------------------------
File I/O is also interesting for a compiler writer. I would like to have a
program that is as fast as possible both when the source files are already
cached by the operating system and when they are not. The former situation is
best handled with mmap() and the latter is best handled by read(), preferably
in combination with multi-threading so the compiler doesn't have to waste too
much time waiting for disk seeks. Haskell seems to be very close to ideal for
me because it has very good threading support and very accessible raw access
to the operating system.
File I/O is not my current bottleneck, though. I'll probably take a closer
look at file I/O when the other performance problems have been solved.
Then there's the general quality of the generated code. Having read just about
every paper on ghc that was available back in the late nineties (when I first
looked at Haskell), I'd thought that the quality was good and that the compiler
also had extremely good high-level optimizations, in other words, that
abstraction was free.
I also read the C-- papers and thought that it was a very interesting and
promising approach. I'd expected the C-- path to have matured and be well-
optimized by now.
Unfortunately, the backend is /the/ weak spot in ghc. The frontend is heroic,
the typesystems are (too) abundant and rich, the language itself is nice -- but
the backend is not. Looking at the generated code I'd say that it is slightly
better than Turbo Pascal 3.x and about on par with Turbo Pascal 4.0, a compiler
that didn't use any intermediate code at all, compiled each statement in
isolation, was single-pass, and had a compilation speed of about 27000 lines
per minute on an 8 MHz IBM PC AT.
Ecosystem and culture
---------------------
Haskell has a very good ecosystem. Probably the second best one amongst the
modern functional languages. Ten years ago, I'd thought that Standard ML would
win but the only MLish language with a good ecosystem and culture is OCaml,
which unfortunately isn't really Standard ML.
By ecosystem I mean things like access to raw operating system calls, access
to libraries written in other languages, readily available libraries for
graphical user interfaces, databases, XML processing, network I/O. Parsing is
nice, too, but practically all functional programming languages have that --
and not much else.
The culture is also good. People actually use this stuff. They care about it.
And they don't hang around waiting for somebody to tell them what to do, they
start on their own. And they actually seem to be interested in good
performance :)
Hackage and cabal are very promising already and may be what finally makes ghc
real-world useful for more people, because most people are not interested in
working with raw source packages and fiddling with compiler flags and weird
error messages. They don't like chasing dependencies, either.
So, what's the problem?
-----------------------
The major problem with Haskell (ghc) is that its performance (in terms of both
speed and memory use) is unpredictable. The second-worst problem is that the
actual performance is not good enough.
These benchmarks
----------------
I have written a bunch of microbenchmarks that either count all the bytes in
stdin or all the spaces in stdin. And some supporting benchmarks in C. And
I've also handtweaked the assembly of one byte-counting and one space-counting
microbenchmark to illustrate what difference it would make if the backend could
use registers in a less stupid^W^W more efficient manner.
Homepage + source code
----------------------
I have put up a homepage for the benchmarks at:
http://vax64.dk/ghc-bs-tests
The raw measurements are in tarballs on that page.
The source code for the benchmarks (+ support code) is in a mercurial
repository at:
http://vax64.dyndns.org/repo/hg/ghc-bs-tests
I used scripts to install the various versions of ghc and bytestring both to
avoid operator error and so you could me look over my shoulder. The scripts
are in a mercurial repository at:
http://vax64.dyndns.org/repo/hg/ghc-installations
You can either follow the link and download any version you like as a tarball
or you can (preferably) clone the repositories with:
hg http://vax64.dyndns.org/repo/hg/ghc-bs-tests
hg http://vax64.dyndns.org/repo/hg/ghc-installations
All my code in those repositories is GPLv2.
The text file 'text.txt' in the ghc-bs-tests repository is unfortunately partly
in Danish and partly in very terse English.
Acknowledgements
----------------
Daniel Fischer, for running the benchmarks on his SuSE 8.2 Athlon Duron 1200
MHz machine and for being helpful and patient while I made the scripts work on
a 2.4 kernel and with unhelpful versions of GNU Make and strace.
Erik van der Meer, for letting me run the benchmarks (and install ghc) on his
Core Duo laptop. And for discussions on measurements over the years.
Don Stewart, for playing along and for fixing a bytestring problem. And for
contributing three benchmarks (one of which I had to change a bit before it
would compile).
Duncan Coutts, for playing along.
Jules Bean (quicksilver), for contributing a benchmark.
Bertram Felgenhauer (int-e), for contributing three benchmarks (in the form
of a single file, which I untangled to three files).
Spencer Jannsen (sjannsen), for contributing a benchmark.
William Lee Irwin III (wli), for inspiring me to add the getwchar benchmarks.
-Peter
1
0

readline problems building GHC on Mac OS X (was: Re: [Haskell-cafe] Re: ANNOUNCE: GHC version 6.8.2)
by Thorkil Naur 21 Dec '07
by Thorkil Naur 21 Dec '07
21 Dec '07
Hello,
Although I have been building various GHC versions on various PPC Mac OS X
systems for a while now, I'm afraid that I don't really have a good answer
for your questions. However, your questions provide an excellect opportunity
to discuss this, so that is what I am going to do.
There are several questions here: (1) Which readline do we use? (2) Where do
we store it? (3) What do we call it? (4) How do we make the Haskell readline
library build process select the right one? And perhaps (5) How do we
persuade the GHC build process to make the Haskell readline build that
happens as part of building GHC select the right one?
One at a time:
1. Which readline do we use? GNU readline, of course. As opposed to the
readline installed as /usr/include/readline/*.h
and /usr/lib/libreadline.dylib on our PPC Mac OS X machines which are said to
be (and can even be observed to be) symbolic links to something called
libedit and which, to me, never has managed to provide something suitable for
use by GHC. But what is GNU readline, then? I don't exactly know, but my best
guess is something like ftp://ftp.cwru.edu/pub/bash/readline-5.2.tar.gz. I
never tried to install GNU readline directly from this file. On some
occasions, I have installed readline from mac ports. Although I am fairly
confident that what was installed was some version of the GNU readline, I am
not sure. On other occasions, I have installed GNU readline from various
sources related to GHC, some times known to me, at other times not.
2.Where do we store readline? I don't know where a readline based on the GNU
download ftp://ftp.cwru.edu/pub/bash/readline-5.2.tar.gz would become
installed (by default). The mac ports version installs by default
at /opt/local/include/readline/*.h and /opt/local/lib/libreadline.*. Various
readlines related to GHC have installed themselves (or were requested to
become installed) as frameworks, this new and different Mac OS X mechanism
for referring to a set of header files and corresponding library. So they have
gone into /Library/Frameworks.
3. What do we call it? Here is where the interesting things start to happen: A
central problem has been the ambiguity caused by Apple's decision to install
symbolic links to the "edit" headers and "edit" library called "readline".
And various mechanisms have been used to work around this problem: (a) If you
have installed a mac ports readline at /opt/local/..., with GHC 6.6 at least,
you were able to use the --with-readline-* options to direct GHC/the library
build process to look in these directories first and thereby avoid the "edit"
library; (b) At some point, a (possibly modified) version of the GNU readline
library appeared, intended to be installed as a framework by the name of
"GNUreadline" (as opposed to the bare "readline" name used earlier). This
avoids the name clash caused by the Apple linking of "readline" to "edit".
The problem that the Haskell readline library now needs to refer to a
framework "GNUreadline" rather than ... (whatever it is that it refers to in
a more Unix'y setting) is solvable. In addition, however, the readline
library (or rather: The GNUreadline library derived from the readline
library) refers to itself using the bare "readline" name, so that has to be
changed also, leading to a need to maintain a complete and slightly modified
version (GNUreadline) of the readline library.
It seems to me that this situation is less than ideal. I mean, in theory,
somebody may come along at some point with some library calling itself
GNUreadline and then we would have to adapt, doing the whole thing all over
again. This manner of avoiding the name clash problem does not seem tenable
in the long run.
Instead, what we should be able to do, is to specify, directly and to the
point, that "readline", wherever we stored it, is what we want.
That possibility does not exist, unfortunately, so we will have to make the
best use that we can of the existing mechanisms, as far as we can figure out
what they are, to get the desired effect. And if it turns out that the
existing mechanisms do not allow us to do what we want, we need to request
extensions and modifications of the mechanisms, until they are able to
support our requirements.
I am not quite sure that I am done with this subject, but let me go on with
4. How do we make the Haskell readline library build process select the right
one? This is where I believe we can do something useful, making the Haskell
readline library more capable in selecting its foundation readline library. I
haven't worked out the details, some discussion is at
http://hackage.haskell.org/trac/ghc/ticket/1395 and related tickets, but I am
quite sure that methods can be found to select the desired readline library,
without resorting to reissuing that library in a changed form and under a new
name. And if this turns out to be absolutely impossible, I would much prefer
pressing for the introduction of mechanisms that makes it possible to select
the desired version of the library, removing this impossibility. Rather than
issuing the library under a different name.
Finally:
5. How do we persuade the GHC build process to make the Haskell readline build
that happens as part of building GHC select the right one? Answer: I don't
know. At some point, I did know, that was when the --with-readline-* options
were introduced for the GHC ./configure. Nowadays, I am not sure.
Generally, I believe that it is fine for the GHC build process (whatever
phase) to pass parameters to the build process of some library. But at the
same time, the fact that such passing of parameters takes place must be very
explicitly reported somewhere, in the output of the build process, probably.
Best regards
Thorkil
On Friday 21 December 2007 21:48, John Dorsey wrote:
> (Moving to the cafe)
>
> On a related topic, I've been trying to build 6.8.2 on Leopard lately.
> I've been running up against the infamous OS X readline issues. I know
> some builders here have hacked past it, but I'm looking for a good
> workaround... ideally one that works without changes outside the GHC
> build area (besides installing a real readline).
>
> Here's what I noticed before I started drowning in the build platform.
> (I'm no gnu-configure expert nor GHC insider.)
>
> I can get gnu-readline installed from Macports, no problem.
>
> The top-level configure in GHC doesn't respond to my various attempts:
>
> o using --with-readline-libraries and --with-readline-includes
> (Although it looks like the libraries/readline/configure script
> might recognize these, I can't get an option to pass through.)
> o setting LDFLAGS and CPPFLAGS environment variables (with
> -L/opt/local/lib and -I/opt/local/include resp.) in my shell
> before running configure
> o playing with the above settings and others in a mk/build.mk
>
> Until Apple fixes their broken-readline issue (maybe when the readline
> compatibility of libedit improves)... maybe the top-level configure can
> pass through flags or settings somehow?
>
> For those who've built with readline on OS X: have you had to resort to
> blasting the existing readline library link, or is there a configuration
> option within the GHC tree that you've gotten to work?
>
> Should I be filing a trac bug instead of asking here?
>
> Thanks for any help. There's no urgency for me; I'm just trying to get
> a working environment at home; I'd prefer to be able to bootstrap from
> the ground up; and I'd like to be able to contribute to testing/debugging
> on OSX.
>
> John
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe(a)haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
2
1
I'm back with another version of my cellular automata simulator. Since the
last iteration I discovered GHC's unlifted types and the primitive
operations that go with them. Using these types, rather than Ints, sped my
code up by 2x or more.
http://hpaste.org/4151#a2 -- first half of program
http://hpaste.org/4151#a3 -- remaining (note 3 lines or so from first half
are repeated)
The key observation came from looking at the Core files, which showed a lot
of int2word# and word2int# conversions going on. Figuring out how to remove
those led me to the unlifted types. Coding with these types is not easy (for
example, I couldn't see a way to write a Word# value directly - I had to
write stuff like "int2Word# 1#"), but because I had an existing algorithm to
guide me, combined with type checking, it was a fairly straightforward
implementation.
At first I felt kind of bad about using these operations, but then I noticed
they are used pretty heavily by GHC itself. If it's good enough for GHC,
it's good enough for me. The 2x performance gain didn't hurt either.
Finally, the safety that comes from using the ST monad is just awesome. I've
got low-level bit munging combined with high-level abstraction where I need
it. So cool!
I was disappointed to find early on that using higher-order functions in
tight loops was a performance killer. It's unfortunate because I started
with a very elegant, short implementation based on a simple Ring buffer and
map. The current version is certainly harder to understand and has some
weird limitations. However, having the simple implementation let me use
quickcheck to compare their results on random rules and inputs, which gave
me high confidence that my complex implemenation is correct.
One thing I absolutely love about this program is its memory performance. It
manages to stay within 1 - 10 MB of memory, depending on how much output is
produced. How cool is that?
On Dec 3, 2007 2:44 AM, Mirko Rahn <rahn(a)ira.uka.de > wrote:
> It is interesting, that the naive implementation
>
...
is only 3 times slower than your quite complex, hard to follow and hard
> to debug implementation.
>
Now the naive implementation is 100x slower, so I don't feel so bad about
this comment any more.
>
> As always, I prefer to write most code in Haskell, quick, easy, nice,
> reasonable fast, ... If speed matters, I switch to some lower level
> language, as you did staying inside Haskell.
>
I have to take exception here - I *want* to write my code in Haskell. If
Haskell isn't fast enough to beat the C implementation, I'd rather find a
way to make my Haskell program faster than switch to some other language.
Justin
4
5
I'm trying to get a better handle on eager/strict eval in haskell, and a
great way to do this is by building up from simple exercises to harder
exercises.
So far I have
exercise 1) add the integers [1..10^6]
(stack overflows if you do a naive fold, as described on wiki)
exercise 2) find the first integer such that average of [1..n] is > [10^6]
(solution involves building an accum list of (average,listLength)
tuples. again you can't do a naive fold due to stack overflow, but in this
case even strict foldl' from data.list isn't "strict enough", I had to
define my own custom fold to be strict on the tuples.)
anybody got other suggestions, or links to places where eager eval is
required to solve simply stated problems? or exercises that demystify
doing eager IO/eager whatever monad, where that is required?
Also am I correct that the terms eager and strict can be used more or less
interchangeably in this problem space?
Tired of this folk wisdom that haskell is only for the elite because
getting around stack overflow from lazy eval is impossible to teach to
newbies.
t.
---
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
6
8