RE: [Haskell-cafe] Re: ocr'ed version of "The implementation offunctional languages"

Matthew Yes, I'm happy for you to OCR the book, but can I ask that whatever you get be made accessible from my web site, so there's one place people can go to find everything that's available? What would OCR buy us? Searching, I guess, which is a fantastic plus. Anything else? Thanks very much for offering to help. I've replied to haskell-café, so everyone knows what's up, but we can now save everyone's bandwidth by narrowing the thread to Ivan, Marnie (who did the original work), you, and me. If anyone else wants to join in, do yell. Simon | -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Ivan | Boldyrev | Sent: 31 January 2005 06:48 | To: haskell-cafe@haskell.org | Subject: [Haskell-cafe] Re: ocr'ed version of "The implementation offunctional languages" | | On 9005 day of my life Matthew Roberts wrote: | > I have just embarked on creating a ocr'ed version of the jpeg images | > that have been made available for "The implementation of functional | > languages". | | I have high-resolution scans of this books. If Simon permits, I will | create OCR from these scans. But I think, we must obtain author's | permission before OCR'ing. | | PNG version of scans will be released soon. While size of page is | same, quality is much better, probably it will improve speed of | scanning. | | And compact DjVu version will be released too. Perhaps I could use | OCR'ed text for creating searchable version of DjVu? | | > However, my ocr software is clunky at best and (not surprisingly) it is | > slow going. | | What kind of software do you use? | | -- | Ivan Boldyrev | | Assembly of a Japanese bicycle requires greatest peace of spirit. | | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe

"Simon Peyton-Jones"
What would OCR buy us? Searching, I guess, which is a fantastic plus. Anything else?
- The ability to cut and paste passages into e.g. e-mail. - Availability for text-only access - e.g. for the vision impaired, or people on low bandwidth connections. - Possibility of re-typesetting it with e.g. LaTeX, to make nice-looking reprints on a laserprinter in your neighborhood. Basically, making the text (and not just pictures of it) available empowers its users in a general way -- not unlike publishing the source code to e.g. GHC. (I'm going to have to read it, just on principle :-) -kzm -- If I haven't seen further, it is by standing in the footprints of giants

"Simon Peyton-Jones"
Matthew
Yes, I'm happy for you to OCR the book, but can I ask that whatever you get be made accessible from my web site, so there's one place people can go to find everything that's available?
What would OCR buy us? Searching, I guess, which is a fantastic plus. Anything else?
It's just more convenient to read OCRed books than raster ones. Zoom without interpolation, possibility to convert to whatever format you like without much effort, etc.
Thanks very much for offering to help. I've replied to haskell-café, so everyone knows what's up, but we can now save everyone's bandwidth by narrowing the thread to Ivan, Marnie (who did the original work), you, and me. If anyone else wants to join in, do yell
I would be happy to participate in the process too. -- WBR, Victor V. Snezhko e-mail: snezhko@indorsoft.ru

On 9006 day of my life Victor Snezhko wrote:
It's just more convenient to read OCRed books than raster ones. Zoom without interpolation,
You haven't seen the book in DjVu format :) BTW, DjVu can contain text, but I haven't learned proper spell yet :) I use free tools, so it may be difficult or impossible. PS/PDF generated from LaTeX is even better than DjVu, but it is very difficult to produce -- lot of hand work. If we will gather large command it will be possible. -- Ivan Boldyrev "Assembly of Japanese bicycle require great peace of mind."

Ivan Boldyrev
It's just more convenient to read OCRed books than raster ones. Zoom without interpolation,
You haven't seen the book in DjVu format :) BTW, DjVu can contain
I saw such books, but didn't have enough time to find good viewers. I viewed them with IE plugin, and didn't like it.
text, but I haven't learned proper spell yet :) I use free tools, so it may be difficult or impossible.
PS/PDF generated from LaTeX is even better than DjVu, but it is very difficult to produce -- lot of hand work.
Do you mean creation of TeX sources?
If we will gather large command it will be possible.
I think even if we don't gather command that is large enough we should create the best version possible. It will be slower and more time-consuming, but who cares, the book is already 18 years old :) -- WBR, Victor V. Snezhko e-mail: snezhko@indorsoft.ru

On 9006 day of my life Victor Snezhko wrote:
You haven't seen the book in DjVu format :) BTW, DjVu can contain
I saw such books, but didn't have enough time to find good viewers. I viewed them with IE plugin, and didn't like it.
http://sourceforge.net/projects/windjview
PS/PDF generated from LaTeX is even better than DjVu, but it is very difficult to produce -- lot of hand work.
Do you mean creation of TeX sources?
LaTeX is macropackage for TeX; then yes :) -- Ivan Boldyrev "Assembly of Japanese bicycle require great peace of mind."

On Mon, Jan 31, 2005 at 05:44:05PM +0600, Victor Snezhko wrote:
Ivan Boldyrev
writes: It's just more convenient to read OCRed books than raster ones. Zoom without interpolation, You haven't seen the book in DjVu format :) BTW, DjVu can contain
DjVu can contain the scanned image, ps or whatnot and a selectable invisible (at least by default) OCR version that will be used when selecting text.
I saw such books, but didn't have enough time to find good viewers. I viewed them with IE plugin, and didn't like it.
The qt viewer (and plugin) is also quite horrible, I'm afraid.
text, but I haven't learned proper spell yet :) I use free tools, so it may be difficult or impossible.

Harri Haataja
I saw such books, but didn't have enough time to find good viewers. I viewed them with IE plugin, and didn't like it.
The qt viewer (and plugin) is also quite horrible, I'm afraid.
windjview is not so quite bad... At least I don't have to run IE in order to use it :) For linux I heard about djvulibre, haven't tested it yet...
text, but I haven't learned proper spell yet :) I use free tools, so it may be difficult or impossible.
I have downloaded many books there one year and a half ago :) Have they opened it again? Didn't know, I will look there again... AFAIR, about a year ago they restricted downloading of djvu's by ip address that submitted original images. (history of djvus was available, but for 24 hours only...) -- WBR, Victor V. Snezhko EMail: snezhko@indorsoft.ru

On 9007 day of my life Harri Haataja wrote:
text, but I haven't learned proper spell yet :) I use free tools, so it may be difficult or impossible.
Unfortunately, creating script for automated (there are 400+ pages!) usage of this site is too complicated. And my Inet connection is too limited. Creation of LaTeX version is much better. -- Ivan Boldyrev Today is the first day of the rest of your life.

On Wed, Feb 02, 2005 at 01:27:30PM +0600, Ivan Boldyrev wrote:
On 9007 day of my life Harri Haataja wrote:
text, but I haven't learned proper spell yet :) I use free tools, so it may be difficult or impossible.
Unfortunately, creating script for automated (there are 400+ pages!) usage of this site is too complicated. And my Inet connection is too limited.
There is a script. First you would only have to turn that set of raster images into a ps(.gz) or pdf document. I believe the site will turn those to djvu. I have done that with a few odd manuals, so I thought that might be one way. It is not guaranteed that the OCR will work or work well. For some things it does and for some it doesn't. But that is a simple way to access the non-free encoders and do the (or a) conversion with relatively little pain. Something to start with maybe.
Creation of LaTeX version is much better.
Naturally. But as already mentioned, that means much more manual writing work (and/or some other OCR).

On 9006 day of my life Simon Peyton-Jones wrote:
Matthew
Yes, I'm happy for you to OCR the book, but can I ask that whatever you get be made accessible from my web site, so there's one place people can go to find everything that's available?
Certainly. This is *your* book. I'm not going to put it somewhere else.
What would OCR buy us? Searching, I guess, which is a fantastic plus. Anything else?
As other already suggested, we can re-typeset you book with LaTeX. And produce even more compact and good-looking PDF document with searching, copy-pasting, printing, and so on. And you can use sources for new improved edition of the book. :)
Thanks very much for offering to help. I've replied to haskell-café, so everyone knows what's up, but we can now save everyone's bandwidth by narrowing the thread to Ivan, Marnie (who did the original work), you, and me. If anyone else wants to join in, do yell.
It's going to be large project, what about creating new list? :) Can people at haskell.org help? P.S. I will upload DjVu version on Monday to Marnie's site. P.P.S. Perhaps, I can perform OCR myself if I will find a way to automate it. I will use high-res scans, which are much better than JPEGs for OCRing. Give me some days for experiments. I need help for converting to LaTeX and proofreading. -- Ivan Boldyrev

On Mon, Jan 31, 2005 at 09:56:45AM -0000, Simon Peyton-Jones wrote:
Matthew
Yes, I'm happy for you to OCR the book, but can I ask that whatever you get be made accessible from my web site, so there's one place people can go to find everything that's available?
What would OCR buy us? Searching, I guess, which is a fantastic plus. Anything else?
Thanks very much for offering to help. I've replied to haskell-café, so everyone knows what's up, but we can now save everyone's bandwidth by narrowing the thread to Ivan, Marnie (who did the original work), you, and me. If anyone else wants to join in, do yell.
An OCRed version might help with my publishing the book via cafepress. The basic problem is that in order to create a pdf from the tiffs, I end up embedding the raw bitmap data (at a very high resolution for decent printing) and end up with a pdf that is way to big for cafepress to handle (even with bitmap compression). I have had some luck with autotrace and other tools to turn bitmaps into outlines, but not any that produced readable output of a suitable size. if the text were OCRed, then I could use outline fonts and considerably improve the printed quality and keep the file size down. I am not sure how easy it will be to integrate the output of the OCR software into an appropriate pdf, but I can try. John -- John Meacham - ⑆repetae.net⑆john⑈
participants (6)
-
Harri Haataja
-
Ivan Boldyrev
-
John Meacham
-
Ketil Malde
-
Simon Peyton-Jones
-
Victor Snezhko