ANNOUNCE: text and text-icu, fast and comprehensive Unicode support using stream fusion

On behalf of the Data.Text team, I am delighted to announce the release of
preview versions of two new packages:
text 0.1
Fast, packed Unicode text support, using a modern stream fusion framework.
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/text
text-icu 0.1
Augments text with comprehensive character set conversion support and
normalization (and soon more), via bindings to the ICU library.
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/text-icu
These packages fill out critical pieces of functionality for the Haskell
platform, without compromising on either performance or safety.
We are referring to these as *preview* releases because although the text
package in particular has been quite heavily tested, it has not been
thoroughly tuned, and we have not yet implemented a chunked lazy text
representation suitable for streaming gigabytes of data. The APIs are pretty
conventional, but are still subject to change.
If you want to contribute, please get copies of the source trees from here:
darcs get http://code.haskell.org/text
darcs get http://darcs.serpentine.com/text-icu
Please send bug reports and patches to your friendly Data.Text team:
Tom Harper

On Fri, 2009-02-27 at 00:01 -0800, Bryan O'Sullivan wrote:
text-icu 0.1 Augments text with comprehensive character set conversion support and normalization (and soon more), via bindings to the ICU library. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/text-icu
Excellent! I was just wishing for this two days ago :D

Unfortunately it doesn’t build for me. I have libicu-dev 3.8.1 installed.
$ cabal install text-icu Resolving dependencies... 'text-icu-0.1' is cached. Configuring text-icu-0.1... Preprocessing library text-icu-0.1... Error.hsc: In function ‘main’: Error.hsc:229: error: ‘U_ARGUMENT_TYPE_MISMATCH’ undeclared (first use in this function) Error.hsc:229: error: (Each undeclared identifier is reported only once Error.hsc:229: error: for each function it appears in.) Error.hsc:230: error: ‘U_DUPLICATE_KEYWORD’ undeclared (first use in this function) Error.hsc:231: error: ‘U_UNDEFINED_KEYWORD’ undeclared (first use in this function) Error.hsc:232: error: ‘U_DEFAULT_KEYWORD_MISSING’ undeclared (first use in this function) Error.hsc:260: error: ‘U_REGEX_OCTAL_TOO_BIG’ undeclared (first use in this function) Error.hsc:261: error: ‘U_REGEX_INVALID_RANGE’ undeclared (first use in this function) Error.hsc:262: error: ‘U_REGEX_STACK_OVERFLOW’ undeclared (first use in this function) Error.hsc:263: error: ‘U_REGEX_TIME_OUT’ undeclared (first use in this function) Error.hsc:264: error: ‘U_REGEX_STOPPED_BY_CALLER’ undeclared (first use in this function) compiling dist/build/Data/Text/ICU/Error_hsc_make.c failed command was: /usr/bin/gcc -c -D__GLASGOW_HASKELL__=610 -I/usr/local/lib/ghc-6.10.1/bytestring-0.9.1.4/include -I/usr/local/lib/ghc-6.10.1/base-4.0.0.0/include -I/usr/local/lib/ghc-6.10.1/include -IPAPI_INCLUDE_DIR dist/build/Data/Text/ICU/Error_hsc_make.c -o dist/build/Data/Text/ICU/Error_hsc_make.o cabal: Error: some packages failed to install: text-icu-0.1 failed during the building phase. The exception was: exit: ExitFailure 1

On Fri, Feb 27, 2009 at 12:57 AM, George Pollard
Unfortunately it doesn’t build for me. I have libicu-dev 3.8.1 installed.
Yes, as the README states, the text-icu package needs ICU 4.0. The basic text library has no such external dependencies.

On Fri, 2009-02-27 at 08:59 -0800, Bryan O'Sullivan wrote:
On Fri, Feb 27, 2009 at 12:57 AM, George Pollard
wrote: Unfortunately it doesn’t build for me. I have libicu-dev 3.8.1 installed. Yes, as the README states, the text-icu package needs ICU 4.0. The basic text library has no such external dependencies.
Oops, sorry for noise. I noticed that it hadn't built on hackage either, so I thought there might be something missing.

2009/2/27 Bryan O'Sullivan
On behalf of the Data.Text team, I am delighted to announce the release of preview versions of two new packages:
text 0.1 Fast, packed Unicode text support, using a modern stream fusion framework. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/text
This is a nice news. What is the preferred way to parse some Text ? Thanks, Thu

2009/2/27 minh thu
2009/2/27 Bryan O'Sullivan
: On behalf of the Data.Text team, I am delighted to announce the release of preview versions of two new packages:
text 0.1 Fast, packed Unicode text support, using a modern stream fusion framework. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/text
This is a nice news. What is the preferred way to parse some Text ?
I mean it is straightforward to rewrite attoparsec to use Text instead of (Lazy and Strict) Bytestring, but is it the way to go ? Thanks, Thu

On Fri, Feb 27, 2009 at 7:27 AM, minh thu
I mean it is straightforward to rewrite attoparsec to use Text instead of (Lazy and Strict) Bytestring, but is it the way to go ?
My first priority is to write Data.Text.Lazy, since I don't think that it makes sense to layer a parsing library atop the current Data.Text module. After that, porting Parsec3 and attoparsec should just be a matter of some keyboarding.

Bryan O'Sullivan ha scritto:
On behalf of the Data.Text team, I am delighted to announce the release of preview versions of two new packages: [...] text-icu 0.1 Augments text with comprehensive character set conversion support and normalization (and soon more), via bindings to the ICU library. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/text-icu
This is interesting. Any plans to write a pure Haskell package, using Unicode CLDR data? http://unicode.org/cldr/ As an example, Python Babel does this, and it also have support for GNU gettext catalogs.
[...]
Regards Manlio Perillo

On Sun, Mar 1, 2009 at 5:05 AM, Manlio Perillo
This is interesting. Any plans to write a pure Haskell package, using Unicode CLDR data? http://unicode.org/cldr/
Foundational l10n work, however important it may be, is an unrewarding slog, so it's the kind of work I'd do under contract, but not for fun.
participants (4)
-
Bryan O'Sullivan
-
George Pollard
-
Manlio Perillo
-
minh thu