Restricted commandline lenghts

Hi, Ticket #19 [1], and maybe some big projects in general, need to be able to invoke ar/ld so that we don't run into issues in commandline lenght. (And in theory, compiler's as well, but we that'd require the dreaded deps analysis.) I think this is major feature we should resolve before GHC 6.6 goes out. It affects significantly some libraries, at least gtk2hs, and I can see real size benefits on some of my own libraries as well. I posted patch to fix this in some cases for Windows [2], but upon talking about it with Duncan, in irc, there's few exta points: * Some unixy shells apparently have restrictions as well, so this shouldn't be just Windows. * On my old implementation, I just split the commandlines by filecount, not according to actual space the paths take. Is this a major point? It can affect the number of tool invokations quite radically. * The algorithm should really be parametriseable because it is, I believe, in all cases heuristic. As how to proceed on implementing this, I'm a bit unsure. * Because of unicode conversion issues, I am not entirely sure if we can accurately know the length of commandline, atleast in Windows. Or going for the bad case, 4 bytes per character gives us a lot of setback in common case. * I think the common way to workaround this is by using xargs with some constant number per params (much like my current algos). * I think there is a diffrence between complexities, because with ar we really want to do append, but with ld we can do a tree-style build. I haven't benchmarked this. * This might be useful to generalise into library function(s). I could use some advice or experiences from other buildsystems. If nobody else steps up, I'll probably implement some choice (but I'm only prepared to test it in Windows.) [1] http://hackage.haskell.org/trac/hackage/ticket/19 [2] http://www.haskell.org/pipermail/cabal-devel/2006-May/000006.html Best regards, --Esa Ilari Vuokko

On Fri, 2006-08-18 at 01:50 +0300, Esa Ilari Vuokko wrote:
Hi,
Ticket #19 [1], and maybe some big projects in general, need to be able to invoke ar/ld so that we don't run into issues in commandline lenght. (And in theory, compiler's as well, but we that'd require the dreaded deps analysis.) I think this is major feature we should resolve before GHC 6.6 goes out. It affects significantly some libraries, at least gtk2hs, and I can see real size benefits on some of my own libraries as well.
I posted patch to fix this in some cases for Windows [2], but upon talking about it with Duncan, in irc, there's few exta points: * Some unixy shells apparently have restrictions as well, so this shouldn't be just Windows. * On my old implementation, I just split the commandlines by filecount, not according to actual space the paths take. Is this a major point? It can affect the number of tool invokations quite radically. * The algorithm should really be parametriseable because it is, I believe, in all cases heuristic.
As how to proceed on implementing this, I'm a bit unsure. * Because of unicode conversion issues, I am not entirely sure if we can accurately know the length of commandline, atleast in Windows. Or going for the bad case, 4 bytes per character gives us a lot of setback in common case. * I think the common way to workaround this is by using xargs with some constant number per params (much like my current algos). * I think there is a diffrence between complexities, because with ar we really want to do append, but with ld we can do a tree-style build. I haven't benchmarked this. * This might be useful to generalise into library function(s).
I've sent a patch to the list in another email. What it does is take an xargs approach. It adds an xargs function and uses for both the ar and ld cases. It calculates the length of the command line string and uses as large a number of arguments as will fit, given a maximum size. Doing a minimum number of invocations of ar is quite important as at least GNU ar is shockingly bad when linking thousands of little .o files into an archive .a file. At the moment, linking GHC's libHSbase.a takes several invocations of ar via xargs and takes >500Mb of memory. I did actually submit a patch to binutils to bring this down to a mere 100Mb but that's only in the very latest binutils versions and it's still quite a lot. So I've put an arbitrary 32k in for unix systems, but I think many are actually larger than this, eg 128k so it might be good to either statically or dynamically find this. For ld since it doesn't do append I've made output to a temp file and it link in the previous accumulated .o file if it exists. Then it renames the temp file to the final target. I've tested it on linux, just with a very small maximum commandline size and checked that it does invoke ar/ld multiple times and that the resulting binaries work.
I could use some advice or experiences from other buildsystems. If nobody else steps up, I'll probably implement some choice (but I'm only prepared to test it in Windows.)
It'd be great if you could test this patch on Windows. Duncan

Hi Duncan,
On 8/18/06, Duncan Coutts
It'd be great if you could test this patch on Windows.
I test by building few libs that require few ar/ld invokations to get built, and used the libs in a program. I used self-built ghc-6.5 from about last weekend. If change the limit to 30k for the commandline, it works. I am not sure what it depends on. 31k wouldn't work. The paths don't contain special characters and my environment is over 3k. No idea how to make sense about this. Maybe the limit really should be configurable at setup build-prompt? Because of some glitch with my ld/ghc linker, I couldn't test ghci libs more than that they fail exactly like they do for objects not requiring multiple ld-invokations. Thanks, --Esa
participants (2)
-
Duncan Coutts
-
Esa Ilari Vuokko