Still shrinking YHI for ARM7...question

Hi all, So far I've been able to build YHI under the ARM7 toolchain (with a target size of 148KB) after making several small changes: In particular I have removed all directory related primitive functions from Sytem.c and WrapPrimitive, commented out threading code for locks, mutexes, and semaphores, removed thread.c, and also removed all expensive printf statements (can't use printf on target). Also Platform.h - #define NO_LIBFFI and NO_LIBGMP have been selected and - #define NO_SHARED has been selected So far this all compiles fine, and will fit into ROM but I have yet to test it...I would like to figure out what else I can skim...I really want to be able to run it from RAM (64KB) eventually. My question at the moment is whether I can remove stuff like pretty.c, and sanity.c (and possibly jonkers) without having a major breakdown of the system. TARGET := yhi_nxt C_VM_SOURCES := \ $(VM_DIR)/main.c \ $(VM_DIR)/basepath.c \ $(VM_DIR)/foreign.c \ $(VM_DIR)/heap.c \ $(VM_DIR)/info.c \ $(VM_DIR)/iofuncs.c \ $(VM_DIR)/mark.c \ $(VM_DIR)/mutator.c \ $(VM_DIR)/primitive.c \ $(VM_DIR)/profile.c \ $(VM_DIR)/stable.c \ $(VM_DIR)/external.c \ $(VM_DIR)/hashtable.c \ $(VM_DIR)/hsffi.c \ $(VM_DIR)/integer.c \ $(VM_DIR)/jonkers.c \ $(VM_DIR)/make.c \ $(VM_DIR)/module.c \ $(VM_DIR)/pretty.c \ $(VM_DIR)/process.c \ $(VM_DIR)/sanity.c \ $(VM_DIR)/stopcopy.c \ $(VM_DIR)/Array.c \ $(VM_DIR)/Concurrent.c \ $(VM_DIR)/FFI.c \ $(VM_DIR)/IO.c \ $(VM_DIR)/IORef.c \ $(VM_DIR)/PackedString.c \ $(VM_DIR)/Prelude.c \ $(VM_DIR)/RuntimeAPI.c \ $(VM_DIR)/WrapPrimitive.c \ $(VM_DIR)/System.c \ Thanks, -- </Alexis>

Hi Alexis, Alexis Morris wrote:
Hi all, So far this all compiles fine, and will fit into ROM but I have yet to test it...I would like to figure out what else I can skim...I really want to be able to run it from RAM (64KB) eventually.
My question at the moment is whether I can remove stuff like pretty.c, and sanity.c (and possibly jonkers) without having a major breakdown of the system.
$(VM_DIR)/profile.c \ $(VM_DIR)/stable.c \ $(VM_DIR)/pretty.c \ $(VM_DIR)/sanity.c \ $(VM_DIR)/stopcopy.c \
These should be safe to remove, indeed stopcopy isn't used at all.
$(VM_DIR)/Concurrent.c \ $(VM_DIR)/IORef.c \ $(VM_DIR)/PackedString.c \ $(VM_DIR)/RuntimeAPI.c \
These should be possible to remove, Concurrent may be a bit more tricky (but I think it's all optional).
$(VM_DIR)/Array.c \
It is used in Numeric but you'll probably never used the function that uses it (hence you should be able to remove it).
$(VM_DIR)/WrapPrimitive.c \
You can remove any definitions made redundant by removing other things.
$(VM_DIR)/System.c \
Might be able to trim it down, primExitWith is essential but everything else is not. You might try removing jonkers (and mark) but of course you'll lose garbage collection. Since you've only got a small amount of memory and Haskell programs allocate memory like crazy I suspect the garbage collector is going to be essential. The other question is how are you loading the bytecode program? In standard Yhi it's loaded from the filesystem but I'm guessing the embedded processor doesn't have a file system? In which case I guess you must load it directly from memory some how. If that's true then you might be able to remove some of the loading code (but maybe not, depends on how you do it). After that it starts getting hard to remove much more. Make sure debug information is not added and change gcc from "optimise for speed" to "optimize for size". Hope that helps :-) Tom

Hi Tom,
Thanks alot for the information. Those are great places to start. As far as
loading the bytecode goes, for now I am still looking into that, but will
have to load directly from memory, as you mentioned.
Cheers,
</Alexis>
On 7/24/07, Tom Shackell
Hi Alexis,
Alexis Morris wrote:
Hi all, So far this all compiles fine, and will fit into ROM but I have yet to test it...I would like to figure out what else I can skim...I really want to be able to run it from RAM (64KB) eventually.
My question at the moment is whether I can remove stuff like pretty.c, and sanity.c (and possibly jonkers) without having a major breakdown of the system.
$(VM_DIR)/profile.c \ $(VM_DIR)/stable.c \ $(VM_DIR)/pretty.c \ $(VM_DIR)/sanity.c \ $(VM_DIR)/stopcopy.c \
These should be safe to remove, indeed stopcopy isn't used at all.
$(VM_DIR)/Concurrent.c \ $(VM_DIR)/IORef.c \ $(VM_DIR)/PackedString.c \ $(VM_DIR)/RuntimeAPI.c \
These should be possible to remove, Concurrent may be a bit more tricky (but I think it's all optional).
$(VM_DIR)/Array.c \
It is used in Numeric but you'll probably never used the function that uses it (hence you should be able to remove it).
$(VM_DIR)/WrapPrimitive.c \
You can remove any definitions made redundant by removing other things.
$(VM_DIR)/System.c \
Might be able to trim it down, primExitWith is essential but everything else is not.
You might try removing jonkers (and mark) but of course you'll lose garbage collection. Since you've only got a small amount of memory and Haskell programs allocate memory like crazy I suspect the garbage collector is going to be essential.
The other question is how are you loading the bytecode program? In standard Yhi it's loaded from the filesystem but I'm guessing the embedded processor doesn't have a file system? In which case I guess you must load it directly from memory some how. If that's true then you might be able to remove some of the loading code (but maybe not, depends on how you do it).
After that it starts getting hard to remove much more. Make sure debug information is not added and change gcc from "optimise for speed" to "optimize for size".
Hope that helps :-)
Tom
-- </Alexis>

Alexis Morris wrote:
Hi Tom,
Thanks alot for the information. Those are great places to start. As far as loading the bytecode goes, for now I am still looking into that, but will have to load directly from memory, as you mentioned.
Cheers, </Alexis>
Hmm yes the bytecode loading is going to be tricky I suspect ... You could just convert the bytecode files into C source code representations and then load them using the module loader as usual. The disadvantage of this is it's a bit space inefficient. You use the static memory for the bytecode files and then the bytecode loader will load those into dynamic memory. The method I would recommend would be to split yhi into two parts: the part that loads the bytecode from the file and the part that runs the bytecode. Essentially your 'loader' program would load the bytecode from the file into memory and then output that data as C source code. That C source code could then be compiled and linked in with the main interpreter (this is actually similar to how nhc98 does it). This is rather involved and it depends on quite a bit of knowledge of how Yhc works internally, so I'll describe how I'd do it in detail. You'll have to bear with me if I go over stuff you're already familiar with. Also if you have your own ideas and don't want to do it this way that's great too :-) I suggest you read http://haskell.org/haskellwiki/Yhc/RTS/Heap if you haven't already as that describes the heap layout of Yhc. Essentially the idea is to take the nodes loaded into the heap by the loader and convert them into C source code that does the same thing. Ok lets start in main.c:init /* initialize program */ void init(char* mainMod, Node** mainFunc, Node** _toplevel, FInfo** _driver){ /* inits */ sanity_init(); heap_init(G_options.heapSize); #ifdef HAT hgm_init(mainMod); #endif mod_init(); /* load all globals */ initGlobals(mainMod, mainFunc, _toplevel, _driver); /* initialize the threads system */ yhi_thread_init(); hsffi_init(); /* finished with the module system now */ /* mod_exit(); ... not any more, now we still need it! */ } After calling 'initGlobals' all the modules are stored in the G_modules hashtable in modules.c. You can walk over this hashtable to get a list of all module objects, with something like: void process_modules(){ for (int i = 0; i < G_modules->size; i++){ for (HashLink* p = G_modules->table[i]; p; p = p->next){ Module* mod = (Module*)p->value; process_module(mod); } } } Then you need to process the Objects in each module. The code to do this might be something like: void process_module(Module* mod){ for (int i = 0; i < mod->lookup->size; i++){ for (HashLink* p = mod->lookup->table[i]; p; p = p->next){ ObjectRef* ref = (ObjectRef*)p->value; Object* obj = ref->object; if (!obj){ // then this object was known about but never actually // used so skip it continue; } process_object(mod, p->key, obj); } } Each Object has an Info, or a Node or both. void process_object(Module* mod, Char* name, Object* obj){ if (obj->info){ process_info(mod, name, obj->info); } if (obj->node){ process_node(mod, name, obj->node); } } Looking in node.h we can see that an Info has a tag, this tag says what kind of Info this is. There are three sensible values that you will actually see: I_FINFO, I_XINFO and I_CINFO. void process_info(Module* mod, Char* name, Info* info){ switch(info->tag){ case I_FINFO: process_finfo(mod, name, (FInfo*)info); break; case I_XINFO: process_xinfo(mod, name, (XInfo*)info); break; case I_CINFO: process_cinfo(mod, name, (CInfo*)info); break; default: abort(); } } Okay so FInfo and XInfo are very similar, so much of what applies to one applies to the other. If we look at http://haskell.org/haskellwiki/Yhc/RTS/Heap we can see that an FInfo (or XInfo) is laid out in memory exactly as: +-----------------+ | PInfo 0 / n | size 0, need n +-----------------+ | PInfo 1 / n-1 | size 1, need n-1 +-----------------+ ... +-----------------+ | PInfo n / 0 | size n-1, need 0 +-----------------+ | FInfo .... | function info table i.e. The PInfos associated with that FInfo, directly followed by the FInfo itself. We need to generate C source code that does the same thing. For example for the C source code for the PInfos and FInfo for the Prelude.sum function might look like: PInfo pinfo_Prelude_sum[] = { (PInfo){ { P_INFO }, 0, 1 }, (PInfo){ { P_INFO }, 1, 0 } }; FInfo finfo_Prelude_sum = ... If my memory of C serves me correctly if we compiled and linked that source code it it would load them into memory consecutively (and without gaps). Our function to process finfos would thus look like void process_finfo(Module* mod, Char* name, FInfo* finfo){ // process pinfos process_pinfos(mod, name, finfo->arity, finfo->papTable); // write finfo structure fprintf(cSrcFile, "FInfo finfo_%s_2E%s = (FInfo){\n", mod->name, name); ... write all the finfo fields } Now here I've just printed the mod->name and name, but in actual fact you'd need to escape them first to ensure they were valid C identifiers. This is because the module names could contain '.' and ';'(another separator) and object names could refer to operators (like '+'). The escaping system used by nhc was that any non-alpha-numeric was converted to _XX where XX was the hexadecimal of the ASCII. 'Prelude.+' would thus be: Prelude__2B. I've used this system in my examples but any non-clashing system would do. process_finfo would write out the FInfo structure to the C source file. This would mostly just follow the definition in node.h, but there are a few tricky items in the structure. - papTable: this always points to the first pinfo associated with the finfo (i.e. PInfo 0 / n). - link: just use NULL it's only used inside the GC - module: again use NULL, only used in the reflection API - name: NULL is fine again, only needed for debugging etc. - code: you'll need to allocate a block with the instructions in it like static Byte bytecode_Prelude_2Esum[] = { 0xA4, 0x43, ... }; then output the pointer as &bytecode_Prelude_2Esum - constTable: described below The constant table is used to allow a FInfo to access other constants and functions. Each item in the table is a ConstItem and has an associated const type, you might process the constants as void process_constants(Module* mod, Char* name, FInfo* finfo){ for (int i = 0; i < finfo->numConsts; i++){ ConstType type = finfo->constTypes[i]; ConstItem item = finfo->constItem[i]; if (type == CI_INFO){ process_info_constant(mod, name, (Info*)item); }else if (type == CI_NODE){ process_node_constant(mod, name, (Node*)node); } } } A CI_INFO links to an Info structure of some sort, and a CI_NODE links to a heap node. The constant table needs to contain *links* to the structures in memory so you'll want your generated constant table to look something like: ConstItem consttable_Prelude_2Esum = { &finfo_Prelude_2E_2B, &finfo_Prelude_2Esum }; Fortunately you can do this because every Info that the constant will point to will be a FInfo, XInfo or CInfo. These all store their parent module and their name so you can work out what the reference to them is called. Okay so that does with process_finfo. process_pinfos would be something like void process_pinfos(Module* mod, Char* name, Int num, PInfo* pinfos){ fprintf(cSrcFile, "PInfo pinfo_%s_2E%s[] = {\n", mod->name, name); for (int i = 0; i < num; i++){ fprintf(cSrcFile, "(PInfo){ { I_PINFO }, %d, %d },\n", pinfos[i].size, pinfos[i].need); } fprintf(cSrcFile, "};\n"); } process_cinfo should be easy, but process_xinfo will be hard. The trickiest point is the ffiFunc field. Creating a FFIFunction structure is okay but the 'funcPtr' field in FFIFunction (see hsffi.h) is tricky. You can't output a function pointer as C source so you need the name of the function as it appears in the runtime source code. Your best option is probably to use primitive.c:G_primFuncs, this maps names to function pointers. Using this you could do a reverse lookup from pointers to function names. In the vast majority of cases the name of the function in the source code is the same as the one in G_primFuncs. If it isn't you can change the C function name so it is named the same, you can just use compiler errors from your generated C code to get it to tell you when the names are different. Okay Nodes are the next tricky point, the problem with them is that the garbage collector expects them to be in it's malloced heap area so if you allocate them in static memory it'll have a tantrum. Fortunately the changes to stop it having a tantrum are (I think) fairly small. In mark.c:mrk_isMarked, change if (!(p >= (Node*)G_hpStart && p < (Node*)G_hp)){ ASSERT(p >= (Node*)G_hpStart && p < (Node*)G_hp); } to if (!(p >= (Node*)G_hpStart && p < (Node*)G_hp)){ return true; } in mark.c:mrk_mark ASSERT(p >= (Node*)G_hpStart && p < (Node*)G_hp); to if (!(p >= (Node*)G_hpStart && p < (Node*)G_hp)){ return true; } These two changes make the mark process ignore nodes outside the heap, the final one is: jonkers.c:jonk_thread, change ASSERT(*p >= (Node*)G_hpStart && *p <= (Node*)G_hp); to if (!(*p >= (Node*)G_hpStart && *p <= (Node*)G_hp)){ return; } Now hopefully that should work but the GC is a bit temperamental, so if you can't get it to work then you could try loading the nodes into the heap in a pre-processing pass. As for how to name the generated nodes, I would suggest just using the printed hexadecimal of the pointer. i.e. a Node* might be Node node_0x34a82b = ... The extra complication is 'unusual nodes' such as INode, LongNode, FloatNode, DoubleNode, StringNode. Fortunately you can tell when you've got one of these by looking at the infos in primitive.h. INode's always have the info &G_infoInt, FloatNodes always &G_infoFloat, etc. You can test the info of the node you've got an generate the right one. void process_node(Node* node){ Info* info = NODE_INFO(node); if (info == &G_nodeInt){ process_int_node((INode*)node); }else if (info == &G_nodeFloat){ process_float_node((FloatNode*)node); } ... }else{ // normal zero arg node case fprintf(cSrcFile, ...); } } The only 'special' nodes you'll see are: G_infoInt, G_infoString, G_infoDouble, G_infoFloat and G_infoInteger. So that's pretty evil, but I believe if you do that then you can load bytecode, in very small amounts of memory. I've included a little example of the generated C source code for two simple functions (one FInfo and one XInfo) ... Enjoy ;-) Tom /* this is the C code code that might be generated for the Haskell source sum :: [Int] -> Int sum [] = 0 sum (x:xs) = x + sum xs Of course you'd also need a header file declaring all these items (to allow cyclic definitions) */ // the PInfo and FInfos PInfo pinfo_Prelude_2Esum[] = { (PInfo){ { I_PINFO }, 0, 1 }, // 0 arguments present, 1 remaining (PInfo){ { I_PINFO }, 1, 0 }, // 1 arguments present, 0 remaining }; FInfo finfo_Prelude_2Esum = (FInfo){ { I_FINFO }, pinfo_Prelude_2Esum, // papTable NULL, // link 2, // arity 0, // stack (unused) 0x00, // flags (FFL_NONE, unused for your application) NULL, // module name (ignored) NULL, // function name (ignored) 23, // code size in bytes &bytecode_Prelude_2Esum, // pointer to code 3, // number of constants in the const table &ctypes_Prelude_2Esum, // pointer to constant types &ctable_Prelude_2Esum // constant table items }; // The bytecode for the function UInt8 bytecode_Prelude_2Esum[] = { 0x43, 0xA2, 0x49 .... }; // the constant types UByte ctypes_Prelude_2Esum[] = { C_INFO, C_INFO, C_NODE }; // the constant table ConstItem ctable_Prelude_2Esum[] = { (ConstItem)&finfo_Prelude_2E_2B, // Prelude.+ (ConstItem)&finfo_Prelude_2Esum, // Prelude.sum (ConstItem)&node_0x43A22917, // a node representing the Int '0' }; // the application of sum to zero arguments, this is shared since // it is always the same. Word node_0x45BB1934[] = { (Word)&pinfo_Prelude_2Esum[0] }; // a node representing the Int '0' INode node_0x43A22917[] = (INode){ { (Word)&G_infoInt }, // node info 0 // node value }; /* This is the C code for a primitive function, like Yhc.Primitive.primExitWith */ PInfo pinfo_Yhc_2EPrimitive_2EprimExitWith[] = { (PInfo){ { I_PINFO }, 1, 0 }, (PInfo){ { I_PINFO }, 0, 1 } }; XInfo xinfo_Yhc_2EPrimitive_2EprimExitWith[] = { { I_XINFO }, // info tag pinfo_Yhc_2EPrimitive_2EprimExitWith, // pap table NULL, 1, 0, NULL, NULL, // link, arity, stack, module, name &ffi_Yhc_2EPrimitive_2EprimExitWith }; FFIFunction ffi_Yhc_2EPrimitive_2EprimExitWith[] = { CC_PRIMITIVE, // calling convention (CC_PRIMITIVE for primitive) 1, // arity { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // arg types (can be ignored) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }, 0, // ret type (can also be ignored) &primExitWith, // pointer to C function in builtin/System.c }; // note you could save space here by trimming the FFIFunction to be the bare minimum that you need // in particular the arg types are especially wasteful. // indeed you could even trim the FInfo structure to remove the unneeded items: stack, flags, module and name // (link is necessary however).
participants (3)
-
Alexis Morris
-
Thomas Shackell
-
Tom Shackell