Fwd: Can I use String without "" in ghci?

On Mon, Sep 2, 2013 at 10:43 AM, Richard A. O'Keefe
On 2/09/2013, at 3:55 PM, Rustom Mody wrote:
On Mon, Sep 2, 2013 at 5:43 AM, Richard A. O'Keefe wrote:
A slogan I have programmed by since I first met C and recognised how vastly superior to PL/I it was for text manipulation _because_ it didn't have a proper string type is "Strings are Wrong!".
C rode to fame on the back of Unix. And Unix's innovation – one of many – is that at the OS level the string type was made common fare – a universal type. So everything from file names to file contents to IPC is a string.
The idea of file names being strings was no innovation. Yes, in crippled monstrosities like TOPS-10 file names were weird records -- I can still remember too much of the details -- and every ruddy TOPS-10 program had to do its own file name parsing and it seemed as if they all did it differently. But the B6700 MCP interfaces treated file names as strings before UNIX was dreamed of.
File contents in UNIX are *not* strings and never have been -- NUL termination is no part of files and binary files have been commonplace since the beginning (an a.out file is not a string!). They are *byte arrays*.
As for IPC, since when have System V shared memory, semaphores, or message queues had anything to do with strings? (Hint: the 'name' of a System V shared memory segment is a key_t, and that's an integral type, not a string. Hint: the 'name' of a System V semaphore is also a key_t integer, not a string. Hint: the 'name' of a System V message queue is also a key_t integer, not a string. Hint: messages sent using msgsnd are not strings, they are byte arrays with a separate count parameter. )
Whoops! my bad -- I was *thinking* 'pipes' but ended up *writing* 'IPC'
:-)
So let me restate more explicitly what I intended -- pipes, FIFOs, sockets,
etc.
IOW read/write/send/recv calls and the mathematical model represented by
the (non-firstclass) pair of C data structures in those functions:
Classic UNIX uses strings for file names, and really, that's it. (The command line argv[] is not really an exception, because it was used for file names as well as options, and in fact mixing the two up caused endless problems.) Everything else in V7, S3, or SysV was identified by a *number*. Plan 9 has exit(string) but Unix has exit(byte).
From the perspective of someone who used UNIX v6 in 1979, *POSIX* IPC -- with its IPC objects *might* be in the file system but then again might *not* be so their names are sorta-kinda-like file names but not really) -- and /proc are recent innovations.
The idea that 'string' was even remotely like a "universal type" in UNIX is bizarre.
Heck, UNIX never even used 'string' for *lines* in text files!
Of course when instructing a beginning programmer your basic premise 'Strings are Wrong!' is most likely right.
No, I'm talking about experienced programmers writing high performance programs.
However if programs are seen as entities interacting with an 'external' world, the currency at the portals is invariably string.
- The currency at the portals is *not* invariably string. Learn PowerShell. - "Text" is one thing and "string" is another. This was the B6700 lesson (well, really the B5500 lesson): for many purposes you want a text *stream* not a text *string* at the interface. It's also the what-Smalltalk-got-right-and-Java-got-wrong lesson: the right way to convert objects to text is via a *stream* interface, not a *string* interface.
I realize this is a terminology issue: My usage of terminology like string/file are evidently more aligned to http://en.wikipedia.org/wiki/Vienna_Development_Method#Collections:_Sets.2C_... file(chap 4): http://red.cs.nott.ac.uk/~rxq/files/SpecificationCaseStudies.pdf Contrariwise 'file' can mean http://en.wikipedia.org/wiki/Data_set_%28IBM_mainframe%29 So coming back from terminology to principles...
And more than just noob programmers have got this wrong – think of the precious one-byte opcodes that Intel wastes on ascii and decimal arithmetic.
Hang on, they are there in order to *support* the "numbers are text" model. You can't have it both ways.
So let me restate (actually I didn't state it earlier!) my point in this example: When Intel introduced these instructions in 8008 (or whatever) decades ago, it seemed like a good idea to help programmers and reduce their burden by allowing them to do some minimal arithmetic on data without burdensome conversion-to-binary functions. 4 decades on and (Intel's very own Gordon) Moore's law ensuring our machines and networks some 7 orders of magnitude larger, the cost-equations look different. printf and scanf are a basic given in any C library so optimizing them out does not optimize anything. On the other hand having instructions --that too 1-byte instructions -- that are almost never used is terribly inefficient: - the extra transistors in the millions of CPUs that are never used - the instructions that are used become fatter. Multiply by the GBs per installation multipled by millions of installations. And so Intel made (makes) the same mistake that the typical programmer-noob makes, which you capture pithily in your 'Strings are wrong!' Put slightly more verbosely: Strings (or byte-arrays if you prefer) are invariably what come into and go out of your program. Keep them that way and you reduce your work in the immediate term but increase it in the long term since it is almost always a bad fit for both problem-domain and machine-model. Building appropriate models is the central business of programmers Brings me to the OPs question: I want to know if it is possible that I use strings without "".
If I type *Prelude>foo bar* which actually I mean *Prelude>foo "bar"* However I don't want to type ""s.
I have noticed if *bar* is predefined or it is a number, it can be used as arguments. But can other strings be used this way? Like in bash, we can use *ping 127.0.0.1* where *127.0.0.1* is an argument.
If not, can *foo* be defined as a function so that it recognize arguments like *bar* as *"bar"*?
Its not clear what your use-case is. 1. In the simplest and most common case, a few data declarations will give you what you want. eg if in place of bar you have (a few) colors like red, green etc you can do data Color = Red | Green... and your strings like "red" "green" will disappear and become Red Green etc IOW if your "bar" and like strings is a small enumerate-able set then make a corresponding enumerated type. Thereafter haskell will do the quoting for you. For the IP example you dont even need a data; just type type IP = (Int,Int,Int,Int) or if you prefer type IP = (Word8,Word8,Word8,Word8) and then the shell 127.0.0.1 would be the haskell (127,0,0,1) Sometimes though one may prefer to be a little pedantic and write newtype IP = IP (Word8,Word8,Word8,Word8) Note the exact fit of the Word8 to the IP spec. And the corresponding non-exact fit with the string form: What happens when you see the string, "500.1000.1.2" ? You need to decide... Beyond that, as others have shown you may want to consider building your own DSL perhaps using template haskell Beyond that... are you sure shell is not what you want? ie if you need full haskell power, and typical shell quoting behavior and you have thought through a design that is sound and consistent (note Albert's refs above to $ usage in shell) then I (and I guess many others here) will want to hear of it!! However for starters you probably dont want to go beyond just defining well-fitting datas and types and many of the ".." will vanish from your code Rusi -- http://www.the-magus.in http://blog.languager.org

On 3/09/2013, at 10:44 PM, Rustom Mody wrote:
Whoops! my bad -- I was *thinking* 'pipes' but ended up *writing* 'IPC' :-)
So let me restate more explicitly what I intended -- pipes, FIFOs, sockets, etc. IOW read/write/send/recv calls and the mathematical model represented by the (non-firstclass) pair of C data structures in those functions:
(or count).
Yes, but none of these have anything to do with strings.
"string" has a precise meaning in C:
7.1.1#1
A string is a contiguous sequence of characters
terminated by and including the first null character. The
term multibyte string is sometimes used instead of emphasize
special processing given to multibyte characters contained
in the string or to avoid confusion with a wide string. A
pointer to a string is a pointer to its initial (lowest
addressed) character. The length of a string is the number
of characters preceding the null character and the value of
a string is the sequence of the values of the contained
characters, in order.
7.1.1#6
(same as #1 but string->wide string and character->wide character)
If you are going to claim Humpty-Dumpty's privilege,
we cannot have a meaningful discussion.
Let me propose a more general definition of "string" which is
consistent with the three kinds of string natively supported by
the C
As an aside: modern usage types the buf as void * . The version 7 unix manuals on which I grew up (and first edition of K&R), there was no void; buf would be just 'char *buf; '
Version 7 did have void but did not have void *. Since void * and char * are required to have identical representations, this is a distinction without a difference. The point of the change was simply that any object pointer type can be converted to or from void * **without a cast**; using void * here is just POSIX telling the C compiler not to do any serious type checking.
I realize this is a terminology issue:
My usage of terminology like string/file are evidently more aligned to http://en.wikipedia.org/wiki/Vienna_Development_Method#Collections:_Sets.2C_... file(chap 4): http://red.cs.nott.ac.uk/~rxq/files/SpecificationCaseStudies.pdf
No, your use of 'string' is *not* well aligned with VDM. "For example, the type definition String = seq of char defines a type String composed of all finite strings of characters." This is precisely the way I am using "string": a *completed* (hence *finite*) sequence of characters that can be traversed more than once and in more than one way (s(i) in VDM). pipes and FIFOs and sockets might or might not be finite. The output of the classic UNIX 'yes' command is not bounded, for example. The distinction between strings and streams is a very important one. I have seen a programming language standards committee try to demand that any "file" should be able to answer its length, apparently unaware that in UNIX /dev/tty is a "file" but has no definite "length" (and even the read()-returns-zero-at-EOF hack doesn't work; having received such a signal you can just keep on reading).
Contrariwise 'file' can mean http://en.wikipedia.org/wiki/Data_set_%28IBM_mainframe%29
I have been familiar with IBM data sets for enough years to prefer the spelling with the space it it. They aren't strings, but they are completed multi-traversable sequences of records. Records might or might not be strings. (I have seen that same programming language standards committee try to demand that any "file" should be positionable at an arbitrary byte, and this despite including members who habitually used VM/CMS and others who habitually used VMS.) So it is *true* that the UNIX innovation was to take "BYTE STREAM" as a lingua franca between programs, but it is *false* that it used strings.
So let me restate (actually I didn't state it earlier!) my point in this example:
When Intel introduced these instructions in 8008 (or whatever) decades ago, it seemed like a good idea to help programmers and reduce their burden by allowing them to do some minimal arithmetic on data without burdensome conversion-to-binary functions.
Conversion to binary is not burdensome, and is not the issue. The issue is getting the *flags* right for decimal arithmetic. Intel's 4004 was for calculators. Intel's 8008 was redesigned to be more useful for calculators. Intel's 8080 had actual hardware support for decimal arithmetic (the Auxiliary Carry flag). And the 8086 was intended to be compatible with the 8080 and 8085. Not binary compatible, but source to source assembler translation is supposed to be straightforward. The DAA instruction comes from the 8080.
4 decades on and (Intel's very own Gordon) Moore's law ensuring our machines and networks some 7 orders of magnitude larger, the cost-equations look different. printf and scanf are a basic given in any C library so optimizing them out does not optimize anything.
That's a pretty massive non-sequitur. I speeded up my Smalltalk->C compiler by a factor of 2 by eliminating printf(). The reason why printf() is slow has of course *nothing* to do with number conversion: it has to do with run-time parsing of formats.
On the other hand having instructions --that too 1-byte instructions -- that are almost never used is terribly inefficient:
"Terribly" inefficient? I doubt it. I doubt it very much indeed. (Where is Andy Glew when you need him?) One-byte instructions are not really any more precious than others.
- the extra transistors in the millions of CPUs that are never used
The whole 8080 was implemented in about 6000 transistors. The 8086 had about 29000. A quad-core Intel Core i7 has 731,000,000. "The extra transistors" required to support decimal arithmetic on a modern CPU are presumably about 1/100,000th of the total. _This_ is to worry about?
- the instructions that are used become fatter. Multiply by the GBs per installation multipled by millions of installations.
We have a wide range of techniques to deal with that. In any case, if your program is compiled for 64-bit execution, the decimal instructions aren't _there_ and _don't_ "fatten up" any other instructions.
you capture pithily in your 'Strings are wrong!' Put slightly more verbosely:
Strings (or byte-arrays if you prefer) are invariably what come into and go out of your program.
Nope. *Streams*. And like I said, PowerShell shows that from a practical programming point of view, it _could_ be objects.
Brings me to the OPs question:
I want to know if it is possible that I use strings without "".
If I type Prelude>foo bar which actually I mean Prelude>foo "bar" However I don't want to type ""s.
I have noticed if bar is predefined or it is a number, it can be used as arguments. But can other strings be used this way? Like in bash, we can use ping 127.0.0.1 where 127.0.0.1 is an argument.
If not, can foo be defined as a function so that it recognize arguments like bar as "bar"?
Its not clear what your use-case is.
Now we are in agreement. Alan Perlis, epigram 34: The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.
participants (2)
-
Richard A. O'Keefe
-
Rustom Mody