
Hello all, Now I have an IO-problem, too. SPOJ problem 41 asks basically to determine whether a directed graph allows a path that uses every edge exactly once. The data from which the graphs are to be constructed are contained in a (huge) file, every item (number of test cases, size of test case, edges) on a single line. I've created my own test case file (much smaller, but already 217 MB, 2.2 million lines) to check my programme's performance. Now profiling reveals that over 90% of time and allocation are consumed by reading the file (line by line, reading the whole file as a lazy String and then chopping that to pieces is rather worse). So how do I quickly 1. read an Integer n from the file, 2. read the next n lines and 3. pass the first and last letter of those to the function that builds and checks the graph? When that is achieved, I could see whether my algorithm is good. Thanks for any help, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

Hello,
Try Don Stewart's ByteString library
(http://www.cse.unsw.edu.au/~dons/fps.html). It is much faster than
the standard Haskell IO and now has lazy.
-Jeff
On 9/9/06, Daniel Fischer
Hello all, Now I have an IO-problem, too. SPOJ problem 41 asks basically to determine whether a directed graph allows a path that uses every edge exactly once. The data from which the graphs are to be constructed are contained in a (huge) file, every item (number of test cases, size of test case, edges) on a single line. I've created my own test case file (much smaller, but already 217 MB, 2.2 million lines) to check my programme's performance. Now profiling reveals that over 90% of time and allocation are consumed by reading the file (line by line, reading the whole file as a lazy String and then chopping that to pieces is rather worse).
So how do I quickly 1. read an Integer n from the file, 2. read the next n lines and 3. pass the first and last letter of those to the function that builds and checks the graph?
When that is achieved, I could see whether my algorithm is good.
Thanks for any help, Daniel --
"In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am Sonntag, 10. September 2006 02:29 schrieben Sie:
Hello,
Try Don Stewart's ByteString library (http://www.cse.unsw.edu.au/~dons/fps.html). It is much faster than the standard Haskell IO and now has lazy.
-Jeff
Yay, that's really an improvement! Depending on the size of the file/graph data, using ByteString.Lazy.Char8 reduces the run time by ranging from 'just a little' to a factor of over 30 -- and some data that made the programme segfault before now run in a couple of seconds. Thanks Bryan, Simon, David, Don & Duncan. However, the programme still spends the overwhelming part of the time doing IO, and I've started thinking about it. The problem spec states that the input file contains about 500 test cases, each given by between 1 and 100,000 lines, each line containing a single word of between 2 and 1000 letters. So the file should be about 12.5G on average. A time limit of 7s is given. So even if we just counted newlines, we would have to scan 1,700 million (1.7*10^9) chars per second. Could any ordinary computer of today do that, using whatever language? Cheers, Daniel
On 9/9/06, Daniel Fischer
wrote: Hello all, Now I have an IO-problem, too. SPOJ problem 41 asks basically to determine whether a directed graph allows a path that uses every edge exactly once. The data from which the graphs are to be constructed are contained in a (huge) file, every item (number of test cases, size of test case, edges) on a single line. I've created my own test case file (much smaller, but already 217 MB, 2.2 million lines) to check my programme's performance. Now profiling reveals that over 90% of time and allocation are consumed by reading the file (line by line, reading the whole file as a lazy String and then chopping that to pieces is rather worse).
So how do I quickly 1. read an Integer n from the file, 2. read the next n lines and 3. pass the first and last letter of those to the function that builds and checks the graph?
When that is achieved, I could see whether my algorithm is good.
Thanks for any help, Daniel --
-- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

Hello Daniel, Monday, September 11, 2006, 6:05:38 PM, you wrote:
The problem spec states that the input file contains about 500 test cases, each given by between 1 and 100,000 lines, each line containing a single word of between 2 and 1000 letters. So the file should be about 12.5G on average.
are you mean arithmetic or geometric average? ;) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 9/11/06, Daniel Fischer
The problem spec states that the input file contains about 500 test cases, each given by between 1 and 100,000 lines, each line containing a single word of between 2 and 1000 letters. So the file should be about 12.5G on average.
I don't think that that necessarily follows. Although I've never seen the input file, of course, I imagine that many cases are fairly small, but designed to test the accuracy of your algorithm. A few are large (in one way or another) to test the extremes of your algorithm. But the overall size of the input file is probably much, much smaller than that estimate. (Maybe 1MB? Maybe 10MB?)
A time limit of 7s is given.
That's CPU time, and thus not including I/O, right? I couldn't find the answer on the SPOJ site. Their FAQ is a forum, but I don't see it there: http://www.spoj.pl/forum/viewforum.php?f=6&sid=6c8fb9c3216c3abd1e720f8b4b5682b3 In any case, the problem can't be too large; the top twenty programs all finished in under 0.35 (CPU seconds?). Even if yours is a tenth as fast as the C and C++ programs, that would be 3.5s -- twice as fast as it needs to be. http://www.spoj.pl/ranks/WORDS1/ Of course, you don't have access to these other programs for comparison; but I hope that this gives you a better idea of the size (and manageability) of the task. Good luck with it! --Tom Phoenix

Am Montag, 11. September 2006 18:22 schrieben Sie:
On 9/11/06, Daniel Fischer
wrote: The problem spec states that the input file contains about 500 test cases, each given by between 1 and 100,000 lines, each line containing a single word of between 2 and 1000 letters. So the file should be about 12.5G on average.
I don't think that that necessarily follows. Although I've never seen Not necessarily of course. But as the problem contains a warning about large input/output data, I assumed the worst reasonable case (uniform distribution).
the input file, of course, I imagine that many cases are fairly small, but designed to test the accuracy of your algorithm. A few are large (in one way or another) to test the extremes of your algorithm. But the overall size of the input file is probably much, much smaller than that estimate. (Maybe 1MB? Maybe 10MB?)
That'd be peanuts with ByteString, 1MB even without.
A time limit of 7s is given.
That's CPU time, and thus not including I/O, right? I couldn't find
Doesn't IO use the CPU? But seriously, I would have thought that's what 'time' lists under user and that includes IO-time as far as I can tell.
the answer on the SPOJ site. Their FAQ is a forum, but I don't see it there:
http://www.spoj.pl/forum/viewforum.php?f=6&sid=6c8fb9c3216c3abd1e720f8b4b56 82b3
In any case, the problem can't be too large; the top twenty programs all finished in under 0.35 (CPU seconds?). Even if yours is a tenth as fast as the C and C++ programs, that would be 3.5s -- twice as fast as it needs to be.
http://www.spoj.pl/ranks/WORDS1/
Of course, you don't have access to these other programs for comparison; but I hope that this gives you a better idea of the size (and manageability) of the task.
Good luck with it!
--Tom Phoenix
-- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton
participants (4)
-
Bulat Ziganshin
-
Daniel Fischer
-
jeff p
-
Tom Phoenix