Types as first-class values
Hi, I have run into problems trying to implement a program, and think that I might be traipsing along the edge of ghc design. I have a NL system which uses a lexicon of words annotated with types and semantics. The grammer uses the type information to correctly structure the semantic nodes. The grammer uses simple combinators to manipulate the semantics, thus I have implemented my system as an untyped lambda calculus extended with a single data type. So I have
data Term a = Var String | Lam Var (Term a) | App (Term a) (App Term a) | Sem a -- Dynamic a
A correct sentence will yield a term with only App and Sem. Because I have used the (user-supplied) type information, I know that application of Sem to Sem will be correct (given correct type information). Thus I could have a term
App (Sem f) (Sem g)
where f is of type: s / np and g is of type: np I want to harness the power of haskell in my semantic terms, so currently I am implementing the member of (Sem a) in haskell and loading dynamically with hs-plugins. I have a type
data SemF a = ZSem (SST a) | USem (SST a -> SST a) etc...
This works well, except for the fact that in english some words take up to five arguments, and there are any number of possible combinations of argument types, so a particular word such as probably probably = (s\np)/(s\np) has a type (* -> *) -> (* -> *) so I need a separate data constructor
| U22Sem ((SST a -> SST a) -> (SST a -> SST a))
There are a huge number of possible types, and doing it this way requires instantiating a data constructor for each type. Modification 1: Someone suggested I use GADTs, so I tried implementing with:
data SemG where ZSemG :: SST SNode -> SemG USemG :: (Typeable b) => (b -> SST SNode) -> SemG
(so USem represents functions of type * -> *, (* -> *) -> *, ... but when I go to use the function I can't, because trying to cast gives an ambiguous type variable (I think the technical term is monomorphic restriction): applySem :: SemG -> SemG -> SemG applySem (USemG a) (ZSemG b) = ZSemG $ ((unJust $ cast a) :: SST SNode -> SST SNode) ((unJust $ cast b) :: SST SNode) applySem (BSemG a) (ZSemG b) = USemG $ ((unJust $ cast a) :: Typeable a => SST SNode -> a -> SST SNode) -- this a here is problematic ((unJust $ cast b) :: SST SNode) This code (applySem) is compiled and linked, whereas the functions which are the data in this case are compiled and loaded at run-time. This is frustrating, I want to harness ghcs power without reimplementing. (The reason I want to use haskell in the semantic terms is that I need access to IORefs and other features which would be non-trivial to implement by hand in a lambda interpreter). Why reinvent the wheel? First, am I missing something? If not... Suggestion. The problem occurs at the location of the cast, I am applying two terms, I know they are correct, given that the user supplied syntactic type is correct, but to give each a type annotation by hand in the applySem code would require, again, > 5^4 possibilities. The type information of the semantic term is available at compile time (which in this case is while the main program is running). What I propose is this: two additions, one to template haskell, the other to the language as a primitive. 1) The ability in Template Haskell to provide the type of an expression as a first class value. This would require a hook into the compiler and datatypes to express the type. 2) a castWithType function which, given a type expression and a (Typeable a) expression will attempt to perform the cast to (Maybe a). This would solve my problem, because then when I compile the semantic expressions (while the main program is running) I can include their types as part of the expression. Then, when I go to use the semantic term, I can give the castWithType function the type information and the function to cast. I realise that this is in essence what the typeable class is supposed to provide. My problem is that I need a 'general' casting code, (part of the main run-time) which is capable of applying semantic functions with many possible different types. In summary, in order to avoid hand coding thousands of possible function types, I want to use dynamic casts. I have tried GADTs (which I may not have understood correctly :-). It appears that what I want to do is not possible at the moment. I think that this modification is worthwhile. The reflection abilities it provides (along with Template Haskell) would allow extension of Haskell as opposed to reimplementation. Cheers, Vivian P.S. If I have missed something, please could you let me know, I'm stumped. In which case I apologise for the rant.
Hi, probably, I missed part of your problem, and there are quite a few concepts in your email, so I'll take it slowly. embedding untyped lambda into Haskell is fairly simple, as long as you don't want to print terms, or visualise reductions: data U = In { out :: U -> U } deriving Show this gives you a universal type, an embedding of unary functions into that type, and a projection of unary functions out of that type. so you can keep your haskell implementation busy evaluating embedded lambda terms like this one: (\x-> (out x) x) (In (\x-> (out x) x)) -- keep ready to hit ctrl-c as you say that you also want to represent the application structure, the type gets a little more complex (using the part of your Term type that is the image of correct sentences, implicitly encoding the types as well): -- just local quantification, no GADTs needed -- the type parameter encodes the type of the semantic object represented data SemTerm a = forall b . App (SemTerm (b->a)) (SemTerm b) | Sem a Unlike your Term type, this SemTerm type only holds type-correct applications and other semantic objects, such as primitive functions from your lexicon. there is no need to distinguish functions of different arities or to encode types explicitly (Z, U??, B??, ..), thanks to currying and embedded typing. so there is no need for an explosion of cases, either in the SemTerm type or in its application. just use Sem to inject your predefined functions/words, and App to apply them to arguments, and evaluation becomes simply: eval :: SemTerm a -> a eval (App f e) = (eval f) (eval e) eval (Sem e) = e so can can have your cake (representation) and eat it (embedded typing and evaluation): rv = Sem (reverse::String->String) tl = Sem tail pp = Sem (++) dot = Sem (.) hi = Sem "hi " there = Sem "there!" x = App rv (App (App (App dot tl) rv) (App (App pp hi) there)) main = print $ (x ,eval x) *Main> main ((<function> (((<function> <function>) <function>) ((<function> "hi ") "there!") )),"hi there") if you actually wanted to have representations of the function types as well, you could use Typeable, but typeOf only works for non-polymorphic types (so you can ask for 'typeOf x' or 'typeOf rv' in the attached code, but you would need to specialise the other function types to disambiguate the computation of type representations, as already done for rv). there are probably other requirements that stand against this representation (?), but at least it solves some of the issues you mentioned. and it uses fewer features (no GADTs, no template haskell). cheers, claus
Vivian McPhail
I have a NL system which uses a lexicon of words annotated with types and semantics. The grammer uses the type information to correctly structure the semantic nodes. The grammer uses simple combinators to manipulate the semantics, thus I have implemented my system as an untyped lambda calculus extended with a single data type.
I don't know if this helps any but... Epigram allows first class types - http://www.dur.ac.uk/CARG/epigram/ Types and functions are nearly the same thing in Epigram. GF (Grammatical Framework) by Aarne Ranta and Markus Forsberg implements typed natural language parsing. http://www.cs.chalmers.se/~aarne/GF/ GF has a large number of nifty features. For example, GF includes morphology and translation quiz commands, so I hacked up a simple WASH plus GF web app to help me learn Swedish. -- It seems I've been living two lives. One life is a self-employed web developer In the other life, I'm shapr, functional programmer. | www.ScannedInAvian.com One of these lives has futures (and subcontinuations!)| --Shae Matijs Erisson
participants (3)
-
Claus Reinke -
Shae Matijs Erisson -
Vivian McPhail