
On Fri, 28 Jan 2005, Jacques Carette wrote:
Henning Thielemann
wrote: This seems to be related to what I wrote yesterday http://www.haskell.org/pipermail/haskell-cafe/2005-January/008893.html
Yes, very much. Except that rather than trying to tell mathematicians that they are *wrong*, I am trying to see which of their notations I can explain (in a typed way). There will be some 'leftovers', where the notation is simply bad.
I know that's problematically to say that a notation is "wrong" since no-one is authorized to say, what's "right". Though, if notation is contradictory, I don't know what's wrong with calling it wrong. Let me motivate a bit more, why I'm interested in mathematical notation. I (and certainly every mathematician) learnt that for instance equations are a powerful tool to solve problems. You can model many problems with a mathematical equation, you can transform the equation using some accepted rules and eventually you get some simplified equation like 'x=2'. The problem is abstracted, you do the transformations without thinking about an interpretation in the real world. You know that you can trust the rules. If someone shows you that the simplified equation is wrong you know that you have applied some rule in the wrong way or you invented an improper rule. There are many tools in the past which have been introduced to make mathematics more convenient, more safe, somehow better. Arabian numbers almost replaced Roman numbers because they are more convenient for computations. Variables and formulas were introduced as a more precise alternative to phrases in natural languages. Axioms replaced intuitive definitions. But there are still issues which are unsatisfying. E.g. on the one hand calculus and functional analysis "degrades" functions to ordinary objects of mathematics, thus reaching a new level of complexity, but on the other hand the notation is not as properly developed as the theory itself, in my opinion. Let me give an example: The teacher says, solving a simple linear equation is rather easy by using separation of variables. Then he transforms a differential equation formally by some rules he don't explain in detail and I'm curios whether there _is_ a good explanation: y' = y y' / y = 1 | integrate C + ln y = x ln y = x-C y = exp (x-C) The student thinks: That's nice, let's try it with some problem I couldn't solve recently: cos x = x^2 | differentiate -sin x = 2*x So, x=0 is an obvious solution to the second equation but not to the first one. The student concludes: If the teacher applies the method, it's right, if I apply it, it's wrong. :-( Then he asks the teacher why the method fails for cos x = x^2 . The teacher says, of course what you did cannot work because y' = y is a differential equation and cos x = x^2 is not. Does this answer help the student? Why doesn't the formula show this difference? What means the difference between 'differential equation' and 'normal equation' essentially? The technique of differentation led to lots of notations. I think this is so because many of the existing notations don't model the idea of differentation very well, thus new notations are introduced when the known notations fail. d x, \partial x, ', \dot{x}, f_x, D I wonder if these notations can be unified, thus simplified. I wonder what formalism I can trust. E.g. I'm sceptic about notations using variables. What is the value of d x for certain values of x? What is the difference between d x and d y? Is f_x(x,y) and f_x(y,x) the same? My proposal: Consider the derivation as higher order function which maps a function to a function. So in the Haskell notation it would be derive :: (a -> a) -> (a -> a) More generally if we have functions (b -> a) as vectors and want to derive a function mapping from vector arguments to scalar values ((b -> a) -> a) (also called functionals), it would have signature deriveVector :: ((b -> a) -> a) -> ((b -> a) -> (b -> a)) E.g. deriveVector f x 1 evaluates the partial derivative of f with respect to the first component at the point x. deriveVector f x is the total derivative or gradient at point x. I know that after introducing higher dimensional differentation using \partial x, \partial y, d x and d y students are uncertain about what the total derivative actually is. Let's turn back to the example and view it through the functional glasses. The differential equation y' = y now means, that y is a function, (') is the differentation operator. With liftF2 I lift (/) to function values and I divide both sides with the lifted (/) by y, noting that this is only possible, if y is nowhere zero. derive y = y liftF2 (/) (derive y) y = liftF2 (/) y y liftF2 (/) (derive y) y = const 1 | integrate c (integrate fulfills fint = integrate c f => fint 0 = c, though it is problematic if the result has a singularity at 0) (c1+) . ln . y = (c+) . id (with c1 such that ln (y 0) = c) ln . y = ((c-c1)+) | (exp .) exp . ln . y = exp . ((c-c1)+) y = exp . ((c-c1)+) This is certainly not intuitive if you are familar with the common notation but it is a method where I can explain every step, it is a formalism I can trust. And it is important to have such formalism at hand if you encounter a problem with the common abbreviated mathematical style. Now it is clear, why cos x = x^2 can't be solved using differentation: This is an equation of scalar values, whereas differential equations are function equations.
I've collected some examples of abuse and bad mathematical notation: http://www.math.uni-bremen.de/~thielema/Research/notation.pdf
Some of what you point out there is bad notation. Other bits point to some misunderstanding of the issues.
Starting from your original post:
f(x) \in L(\R) where f \in L(\R) is meant
F(x) = \int f(x) \dif x where x shouldn't be visible outside the integral
First, mathematicians like to write f(x) to indicate clearly that they are denoting a function.
f(x) \in L(\R) clearly denotes that f(x) is a function, that is f(x)(y) is a scalar value. There are situations where it is sensible to claim f(x) \in L(\R) so why making them more complicated by sticking to that notation in the wrong cases? f \in L(\R) is one of the examples where the _right_ notation is shorter, thus easier to read, than the _wrong_ one. Yes, I call f(x) \in L(\R) _wrong_, because it is commonly agreed that L(\R) is a set of functions and that each element of such a set is a function, thus if you write f(x) \in L(\R) then you cannot treat f(x) as a scalar value later on.
You also write
O(n) which should be O(\n -> n) (a remark by Simon Thompson in The Craft of Functional Programming)
but the only reason for this is that computer scientists don't like open terms. Since the argument to O must be a univariate function with range the Reals, then whatever is written there must *denote* such a function. The term ``n'' can only denote only one such function, \n -> n.
What about (\m -> n) ? What means O(n^m) ? O(\n -> n^m) or O(\m -> n^m) ?
Russell's work in the early 1900, "sense and denotation" are worth reading if you want to learn more about this.
Nice to know.
On your ``notation.pdf'' (link above), you have some other mis-interpretations. On p.10 you seem to think that Mathematica is a lazy language, when it is in fact an eager language.
I know that Mathematica is both eager and a mix of functional and imperative programming.
However,
subs(x=2,'diff'(ln(x),x)) ;
diff(ln(2),2) where the '' quotes mean to delay evaluation of the underlying function. On the other hand
eval('diff'(ln(x),x),x=2) ;
eval('diff'(ln(x),x),x=2) because it makes no sense to evaluate an open term which introduces a (temporary) binding for one of its variables.
That's what I dislike: To overcome particular problems, patches like quotes and eval are introduced. They are rather hard to combine, hard to nest. I spent hours to explain Mathematica expressions that I had clearly in my mind. We can have a very clean solution without quotes and subs and eval by just considering functions as regular mathematical objects.
Note that without access to terms, it is not possible to write a function like diff (or derive as you phrase it, or D as Mathematica calls it). Mathematician's diff looks like it is has signature diff: function -> function, but it in fact is treated more often as having signature diff: function_denotation -> function_denotation.
Fine, let's consider this example in detail: If I write + in Mathematica, I know that the value of the resulting expression will be the sum of the values of the expressions of the operands. This is true independent from the particular expressions of the operands. This constraint makes working with algebraic formulas very handy. If I write f(4) or f(2+2) in mathematics doesn't matter, both denote the same value independent of f because 4 and 2+2 denote the same value. If x=4 then f(x) denotes the same value as f(4). In Haskell this constraint is fulfilled and it is essential for the simplification rules that GHC applies. I like to have this behaviour for derivation, too. So of what type must be the parameter of 'derive'? A scalar expression with a free variable or a function expression? The designers of Mathematica and Maple decided for the first alternative. But the derivation function D breaks the constraint of free combineability. If x=4 then D[x,x] is 0 or an error (I can't check it currently), if x is not bound to a value it is 1. What I suggest is that derivation should be limitted to functions. If two expressions denote the same function, then D will derive two expressions, that may not be equal but denote the same function. Indeed, the implementation of D must know the expression of the function to give the result as a formula, but if it only knows the set of argument-value-pairs it would work, too. This is the view on functions which Fourier brought to us. Before Fourier only functions given as analytical terms were considered as functions. The definition of the differential quotient don't need the particular expression for a function, it doesn't even require that there is an explicit expression. This means, the derived function will depend only on the function to be derived! Following this paradigm, Simplify must implement the identity function (identity with respect to the value, not identity with respect to the expression) and ReplaceAll should be replaced by function application.
You also seem to assume that set theory is the only axiomatization of mathematics that counts (on p.31).
I'm also aware of Russell's type theory, but my impression is that set theory is the most popular one.
I do not see functions A -> B as somehow being subsets of powerset(A x B). That to me is one possible 'implementation' of functions. This identification is just as faulty as the one you point out on p.14 of the naturals not ``really'' being a subset of the rationals. In both cases, there is a natural embedding taking place, but it is not the identity.
I agree, though I'm not aware of some symbols to express that. :-(
You also have the signature of a number of functions not-quite-right. ``indefinite'' integration does not map functions to functions, but functions to equivalence classes of functions.
Yes, I should add a parameter for the integration constant. An explicit integration constant seems to be the better choice instead of a set value, since then it is clear that for every real number there is one function.
Fourier transforms (and other integral transforms) map into functionals, not functions.
The Fourier transform I had in mind is defined as F(w) = \int_{\R} e^{-i w x} f(x) d x So F is clearly a function, though I agree that the Fourier transform defined this way is not a total function.
I hope your audience (for things like slide 35) was made of computer scientists - it is so amazingly condescending to thousands of mathematicians, it is amazing you would not get immediately booted out!
The audience were exclusively mathematicians - my colleagues. Being booted out by people who never thought intensively about what they write but who only defend their habit, wouldn't be a problem for me. The same colleagues ran into troubles before by trusting formalisms which are odd, though I believe they did not often enough, so far. But I can wait ... :-)
On p.37, you have polynomials backwards. Polynomials are formal objects, but they (uniquely) denote a function. So polynomials CANNOT be evaluated, but they denote a unique function which can.
Yes, I should better stick to the application homomorphism \varphi which maps a polynomial to a function. That is p is a polynomial, \varphi(p) is a function and \varphi(p)(x) is a scalar value.
On p.51 where you speak of ``hidden'' quantifiers, you yourself omit the thereexists quantifiers that are implicit on the last 2 lines -- why?
What do you mean exactly? In the mean-time I found that is better not to speak about equations but about the set of solutions. That is writing, or at least thinking, {x : 0 = x^2 + 2*x + 1} and {y : forall x . y'(x) = x + sin x} makes clear, what we search for and it explains, what care has to be taken for every equation transformation.
The above was meant to be constructive
Thanks for the comments!