Convert String to List/Array of Numbers

Dear All, I must be stuck on something pretty basic (I am struggling badly with I/O). Let us assume you have a rather simple file mydata.dat (3 columns of integer numbers), see below. 1246191122 1336 1337 1246191142 1336 1337 1246191162 1336 1337 1246191182 1336 1337 1246191202 1336 1337 1246191222 1336 1337 1246191242 1336 1337 1246191262 1336 1337 1246191282 1336 1337 1246191302 1336 1337 1246191322 1336 1337 1246191342 1336 1337 1246191362 1336 1337 1246191382 1336 1337 1246191402 1336 1337 1246191422 1336 1337 Now, my intended pipeline could be read file as string--> convert to list of integers-->pass it to hmatrix (or try to convert it into a matrix/array). Leaving aside the last step, I can easily do something like let dat=readFile "mydata.dat" in the interactive shell and get a string, but I am having problems in converting this to a list or anything more manageable (where every entry is an integer number i.e. something which can be summed, subtracted etc...). Ideally even a list where every entry is a row (a list in itself) would do. I found online this suggestion http://bit.ly/9jv1WG but I am not sure if it really applies to this case. Many thanks Lorenzo

On Wed, Sep 8, 2010 at 10:31 AM, Lorenzo Isella
in the interactive shell and get a string, but I am having problems in converting this to a list or anything more manageable (where every entry is an integer number i.e. something which can be summed, subtracted etc...). Ideally even a list where every entry is a row (a list in itself) would do.
Well, first of all you can split your input into lists using lines :: String -> [String] Then, you can split each line into columns by using words :: String -> [String] Now, on each of these columns you can convert to an integer by using: read :: Read a => String -> a So in the end, you'll end up with something of type [[Int]]. Does this help you to go into the right direction? Cheers! =) PS: Yes, that link's information sort of applies, but you'll be handling lists of lists (i.e. rows of columns). -- Felipe.

On Wednesday 08 September 2010 15:31:19, Lorenzo Isella wrote:
Dear All, I must be stuck on something pretty basic (I am struggling badly with I/O). Let us assume you have a rather simple file mydata.dat (3 columns of integer numbers), see below.
1246191122 1336 1337 1246191142 1336 1337 1246191162 1336 1337 1246191182 1336 1337 1246191202 1336 1337 1246191222 1336 1337 1246191242 1336 1337 1246191262 1336 1337 1246191282 1336 1337 1246191302 1336 1337 1246191322 1336 1337 1246191342 1336 1337 1246191362 1336 1337 1246191382 1336 1337 1246191402 1336 1337 1246191422 1336 1337
Now, my intended pipeline could be
read file as string--> convert to list of integers-->pass it to hmatrix (or try to convert it into a matrix/array). Leaving aside the last step, I can easily do something like
let dat=readFile "mydata.dat"
in the interactive shell and get a string,
Not quite. `dat' is the IO-action that reads the file, of type (IO String) and not a String. In a programme, you'd do something like main = do ... -- argument parsing perhaps txt <- readFile "mydata.dat" let dat = convert txt doSomething with dat
but I am having problems in converting this to a list or anything more manageable (where every entry is an integer number i.e. something which can be summed, subtracted etc...). Ideally even a list where every entry is a row (a list in itself) would do.
Depending on what the reult type should be, different solutions are required. The simplest solutions for such a file format are built from read -- to convert e.g. "135" to 135 lines :: String -> [String] words :: String -> [String] map :: (a -> b) -> [a] -> [b] If you want a flat list of Integers from that file, convert = map read . words will do. First, `words' splits the String on whitespace (spaces and newlines), producing a list of digit-strings, those are then read as Integers. If you want a list of lists, each line its own list inside the top level list, convert = map (map read . words) . lines is what you want. If you want to convert each line into a different data structure, say (Integer, Double, Int64), the general form would still be convert = map parseLine . lines and parseLine would depend on the structure you want. For the above, parseLine str = case words str of (a : b : c : _) -> (read a, read b, read c) _ -> error "Bad line format" would be a solution. For any but the simplest formats, you should write a real parser to deal with possible bad formatting though (writing parsers is fun in Haskell).
I found online this suggestion http://bit.ly/9jv1WG but I am not sure if it really applies to this case. Many thanks
Lorenzo

Hi Daniel, Thanks for your help. I have a couple of questions left (1) The first one is quite down to earth. The snippet below --------------------------------------------------- main :: IO () main = do txt <- readFile "mydata.dat" let dat = convert txt print dat -- this prints out my chunk of data return () convert x = lines x ----------------------------------------------- pretty much does what it is supposed to do, but if I use this definition of convert x convert x = map (map read . words) . lines x I bump into compilation errors. Is that the way I am supposed to deal with your function? (2) This is a bit more about I/O in general. I start an action with "do" to read some files and I define outside the action some functions which are supposed to operate (within the do action) on the read data. Is this the way it always has to be? I read something about monads but did not get very far (and hope that they are not badly needed for simple I/O). Is there a way in Haskell to have the action return to the outside world e.g. the value of dat and then work with it elsewhere? That is what I would do in Python or R, but I think I understood that Haskell's philosophy is different... Am I on the right track here? And what is the benefit of this? Cheers Lorenzo On 09/08/2010 04:06 PM, Daniel Fischer wrote:
On Wednesday 08 September 2010 15:31:19, Lorenzo Isella wrote:
Dear All, I must be stuck on something pretty basic (I am struggling badly with I/O). Let us assume you have a rather simple file mydata.dat (3 columns of integer numbers), see below.
1246191122 1336 1337 1246191142 1336 1337 1246191162 1336 1337 1246191182 1336 1337 1246191202 1336 1337 1246191222 1336 1337 1246191242 1336 1337 1246191262 1336 1337 1246191282 1336 1337 1246191302 1336 1337 1246191322 1336 1337 1246191342 1336 1337 1246191362 1336 1337 1246191382 1336 1337 1246191402 1336 1337 1246191422 1336 1337
Now, my intended pipeline could be
read file as string--> convert to list of integers-->pass it to hmatrix (or try to convert it into a matrix/array). Leaving aside the last step, I can easily do something like
let dat=readFile "mydata.dat"
in the interactive shell and get a string,
Not quite. `dat' is the IO-action that reads the file, of type (IO String) and not a String. In a programme, you'd do something like
main = do ... -- argument parsing perhaps txt<- readFile "mydata.dat" let dat = convert txt doSomething with dat
but I am having problems in converting this to a list or anything more manageable (where every entry is an integer number i.e. something which can be summed, subtracted etc...). Ideally even a list where every entry is a row (a list in itself) would do.
Depending on what the reult type should be, different solutions are required. The simplest solutions for such a file format are built from
read -- to convert e.g. "135" to 135 lines :: String -> [String] words :: String -> [String] map :: (a -> b) -> [a] -> [b]
If you want a flat list of Integers from that file,
convert = map read . words
will do. First, `words' splits the String on whitespace (spaces and newlines), producing a list of digit-strings, those are then read as Integers.
If you want a list of lists, each line its own list inside the top level list,
convert = map (map read . words) . lines
is what you want.
If you want to convert each line into a different data structure, say (Integer, Double, Int64), the general form would still be
convert = map parseLine . lines
and parseLine would depend on the structure you want. For the above,
parseLine str = case words str of (a : b : c : _) -> (read a, read b, read c) _ -> error "Bad line format"
would be a solution.
For any but the simplest formats, you should write a real parser to deal with possible bad formatting though (writing parsers is fun in Haskell).
I found online this suggestion http://bit.ly/9jv1WG but I am not sure if it really applies to this case. Many thanks
Lorenzo

You either need to write:
convert x = (map (map read . words) . lines) x
or you need to write:
convert x = map (map read . words) $ lines x
The original function was written as
convert = map (map read . words) . lines
The original is in what is called "point free" form. Values are called "points"
so you have left out the value making the function "point free". I think this
is one of the most annoying "features" of Haskell because you can't glance at a
function and know how many parameters it takes unless you also know how many
parameters each of the functions it uses need. But that aside, it is very
common. Real World Haskell covers it in Chapter 5.
http://book.realworldhaskell.org/read/writing-a-library-working-with-json-da...
Good luck,
Tim
----- Original Message ----
From: Lorenzo Isella
On Wednesday 08 September 2010 15:31:19, Lorenzo Isella wrote:
Dear All, I must be stuck on something pretty basic (I am struggling badly with I/O). Let us assume you have a rather simple file mydata.dat (3 columns of integer numbers), see below.
1246191122 1336 1337 1246191142 1336 1337 1246191162 1336 1337 1246191182 1336 1337 1246191202 1336 1337 1246191222 1336 1337 1246191242 1336 1337 1246191262 1336 1337 1246191282 1336 1337 1246191302 1336 1337 1246191322 1336 1337 1246191342 1336 1337 1246191362 1336 1337 1246191382 1336 1337 1246191402 1336 1337 1246191422 1336 1337
Now, my intended pipeline could be
read file as string--> convert to list of integers-->pass it to hmatrix (or try to convert it into a matrix/array). Leaving aside the last step, I can easily do something like
let dat=readFile "mydata.dat"
in the interactive shell and get a string,
Not quite. `dat' is the IO-action that reads the file, of type (IO String) and not a String. In a programme, you'd do something like
main = do ... -- argument parsing perhaps txt<- readFile "mydata.dat" let dat = convert txt doSomething with dat
but I am having problems in converting this to a list or anything more manageable (where every entry is an integer number i.e. something which can be summed, subtracted etc...). Ideally even a list where every entry is a row (a list in itself) would do.
Depending on what the reult type should be, different solutions are required. The simplest solutions for such a file format are built from
read -- to convert e.g. "135" to 135 lines :: String -> [String] words :: String -> [String] map :: (a -> b) -> [a] -> [b]
If you want a flat list of Integers from that file,
convert = map read . words
will do. First, `words' splits the String on whitespace (spaces and newlines), producing a list of digit-strings, those are then read as Integers.
If you want a list of lists, each line its own list inside the top level list,
convert = map (map read . words) . lines
is what you want.
If you want to convert each line into a different data structure, say (Integer, Double, Int64), the general form would still be
convert = map parseLine . lines
and parseLine would depend on the structure you want. For the above,
parseLine str = case words str of (a : b : c : _) -> (read a, read b, read c) _ -> error "Bad line format"
would be a solution.
For any but the simplest formats, you should write a real parser to deal with possible bad formatting though (writing parsers is fun in Haskell).
I found online this suggestion http://bit.ly/9jv1WG but I am not sure if it really applies to this case. Many thanks
Lorenzo
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On Wed, Sep 8, 2010 at 8:06 PM, Tim Perry
The original is in what is called "point free" form. Values are called "points" so you have left out the value making the function "point free". I think this is one of the most annoying "features" of Haskell because you can't glance at a function and know how many parameters it takes unless you also know how many parameters each of the functions it uses need. But that aside, it is very common. Real World Haskell covers it in Chapter 5. http://book.realworldhaskell.org/read/writing-a-library-working-with-json-da...
Given that the notion of argument number isn't quite right in Haskell and that you should put a type signature on all exported functions which provides more exact information on the function behaviour anyway... I would say that point-free is worth it for the clarity it affords to the accustomed Haskeller (all but the most twisted functions written in point-free style will only take "one" argument anyway). -- Jedaï

On Wednesday 08 September 2010 20:13:17, Chaddaï Fouché wrote:
Given that the notion of argument number isn't quite right in Haskell
Since, strictly speaking, a function always takes exactly one argument. Haskell is like mathematics in that respect. But since saying "a function which takes an argument of type a, returning a function which takes an argument of type b, returning a function which takes an argument of type c, returning ..." is much more cumbersome than saying "a function taking five arguments of types a, b, c, d, e respectively and returning a value of type f", we are using the more convenient, albeit inexact, language habitually. Haskell is like mathematics in that respect too. Be aware however, that the same function may be referred to as a function taking three, four or perhaps six arguments in other circumstances.
and that you should put a type signature on all exported functions
And also on nontrivial internal functions.
which provides more exact information on the function behaviour anyway... I would say that point-free is worth it for the clarity it affords to the accustomed Haskeller
It takes a bit to get used to (having a mthematical background helps). And point-freeing is not always a win in readability. Judge on a case-by-case basis.
(all but the most twisted functions written in point-free style will only take "one" argument anyway).
Possibly two. foo = (sum .) . enumFromThenTo 0 hasn't yet clearly crossed the border.

On Wed, Sep 08, 2010 at 07:24:12PM +0200, Lorenzo Isella wrote:
Hi Daniel, Thanks for your help. I have a couple of questions left (1) The first one is quite down to earth. The snippet below
--------------------------------------------------- main :: IO ()
main = do txt <- readFile "mydata.dat"
let dat = convert txt
print dat -- this prints out my chunk of data
return ()
convert x = lines x
-----------------------------------------------
Looks good. Note that the return () is not necessary since 'print dat' already results in ().
pretty much does what it is supposed to do, but if I use this definition of convert x
convert x = map (map read . words) . lines x
That ought to be convert = map (map read . words) . lines or alternatively convert x = map (map read . words) (lines x) The dot (.) is function composition, which lets you make "pipelines" of functions. So the first one says "convert is the function obtained by first running 'lines' on the input, and then running 'map (map read . words)' on the output of 'lines'. You can also say explicitly what to do with the input x, as in the second definition. These two definitions are exactly equivalent.
(2) This is a bit more about I/O in general. I start an action with "do" to read some files and I define outside the action some functions which are supposed to operate (within the do action) on the read data. Is this the way it always has to be? I read something about monads but did not get very far (and hope that they are not badly needed for simple I/O).
When you do I/O you are using monads whether you know it or not! But no, you don't need a deep understanding of monads to do simple I/O. In any event, this has nothing to do with monads in general, but is particular to IO. And yes, this is the way it always has to be with I/O: there is no way to "escape", that is, there is no function* with the type escapeIO :: IO a -> a The problem is that because of Haskell's laziness, if there were such a function you would have no idea when all the effects (like reading a file, writing to disk, displaying something on the screen) would happen -- or they might happen twice, or not at all! Because of Haskell's purity, the compiler is free to reorder and schedule computations however it likes, and throwing side effects into the mix would simply wreak havoc.
Am I on the right track here? And what is the benefit of this?
The benefit is precise control of side effects, and what is known as "referential transparency": if you have a function of type Int -> Int then you know for certain that it only computes a numerical function. Calling it will never result in things getting written to disk or the screen or anything like that, and calling it with the same input will always give you the same result. This is a very strong guarantee that gives you powerful ways to reason about programs. -Brent * Actually, there is, but it is only for use in very special low-level sorts of situations by those who really know what they are doing.

On Wednesday 08 September 2010 19:24:12, Lorenzo Isella wrote:
Hi Daniel, Thanks for your help. I have a couple of questions left (1) The first one is quite down to earth. The snippet below
--------------------------------------------------- main :: IO ()
main = do txt <- readFile "mydata.dat"
let dat = convert txt
print dat -- this prints out my chunk of data
return ()
That `return ()' is superfluous, print already has the appropriate type, print :: Show a => a -> IO () return () is only needed to - fill in do-nothing branches, if condition then doSomething else return () or case expression of pat1 -> doSomething pat2 -> doSomethingElse _ -> return () - convert something to the appropriate type, e.g. if action :: IO ExitCode and you need an IO () in some place, then you use action >> return ()
convert x = lines x
-----------------------------------------------
pretty much does what it is supposed to do, but if I use this definition of convert x
convert x = map (map read . words) . lines x
I bump into compilation errors. Is that the way I am supposed to deal with your function?
Yes and no. First of all, function application binds tighter than composition, so convert x = map (map read . words) . lines x is parsed as convert x = (map ((map read) . words)) . (lines x) which gives a type error because (lines x) :: [String], while the composition expects something of type (a -> b) as second argument. The correct form of convert could be convert x = (map (map read . words) . lines) x or convert x = map (map read . words) . lines $ x or, point-free, convert = map (map read . words) . lines In the latter case, you have to give it a type signature, convert :: Read a => String -> [[a]] or disable the monomorphism restriction ({-# LANGUAGE NoMonomorphismRestriction #-} pragma in the file resp. the command-line flag -XNoMonomorphismRestriction), otherwise it'll likely give rise to other type errors. Once that is fixed, your problems aren't over yet. Then you get compilation errors because the compiler has no way to infer at which type to use read, should it try to read Integers, Bools, ... ? Usually, in real code the type can be inferred from the context, at least enough for the defaulting rules to apply (if you pass dat to something expecting [[Bool]], the compiler knows it should use Bool's Read instance, if it's expecting (Num a => [[a]]), it can be defaulted (and will be defaulted to Integer unless you have an explicit default declaration stating otherwise). In the code above, all the compiler can find out is that dat :: (Read a, Show a) => [[a]] GHC will compile it if you pass -XExtendedDefaultRules on the command line (or put {-# LANGUAGE ExtendedDefaultRules #-} at the top of the module), then the type variable a will be defaulted to () [which is rather useless]. More realistically, you need to tell the compiler the type of dat, let dat :: [[Integer]] -- or ((Num a, Read a) => [[a]]) dat = convert txt
(2) This is a bit more about I/O in general. I start an action with "do" to read some files and I define outside the action some functions which are supposed to operate (within the do action) on the read data.
Yes, you define the functions that do the actual work as pure functions (mostly) and then bind them together in a - preferably small - main function doing the necessary I/O (reading data or configuration files, outputting results).
Is this the way it always has to be? I read something about monads but did not get very far (and hope that they are not badly needed for simple I/O).
To do basic I/O, you don't need to know anything about monads, all you need is a little nowledge of the do-notation.
Is there a way in Haskell to have the action return to the outside world e.g. the value of dat and then work with it elsewhere?
For the few cases where it's necessary, there is such a beast. Its name begins with the word `unsafe', for good reasons (the full name is unsafePerformIO, available from System.IO.Unsafe). When you're tempted to use it, ask yourself "Is this really a good idea?" (like if you're tempted to use goto in C, only more so - sometimes it is, but rarely).
That is what I would do in Python or R, but I think I understood that Haskell's philosophy is different...
Well, you pass it as a parameter to other functions and IO-actions.
Am I on the right track here? And what is the benefit of this?
Purity allows some optimisations that can't be done for functions which might have side-effects. And it's much easier to reason about pure (side-effect-free) functions.
Cheers
Lorenzo

On Wed, Sep 08, 2010 at 03:31:19PM +0200, Lorenzo Isella wrote:
Now, my intended pipeline could be
read file as string--> convert to list of integers-->pass it to hmatrix (or try to convert it into a matrix/array). Leaving aside the last step, I can easily do something like
let dat=readFile "mydata.dat"
in the interactive shell and get a string, but I am having problems
Note, this may be a bit misleading! The interactive shell does some special handling of things involving I/O. The type of readFile "mydata.dat" is readFile "mydata.dat" :: IO String That is, an *I/O operation which, when performed*, will yield a String. This is not at all the same thing as having a String! In order to get your hands on the String, you will want to do something like this: do dat <- readFile "mydata.dat" -- dat :: String let mat = parseMat dat ... do other stuff with mat ... parseMat :: String -> [[Integer]] parseMat = ... You may want to read http://www.haskell.org/haskellwiki/Introduction_to_IO or, really, any good Haskell tutorial (e.g. LYAH [1] or RWH [2]) will cover this. -Brent [1] http://learnyouahaskell.com/ [2] http://book.realworldhaskell.org/
participants (6)
-
Brent Yorgey
-
Chaddaï Fouché
-
Daniel Fischer
-
Felipe Lessa
-
Lorenzo Isella
-
Tim Perry