Re: [Haskell] ANNOUNCE: Haskell XML Toolbox Version 4.00

At 11:37 06/04/04 +0200, Uwe Schmidt wrote:
Haskell XML Toolbox 4.00
I would like to announce a new version of the Haskell XML Toolbox for processing XML including a validating parser and a new XPath module.
Testing the new kit using Hugs under Windows, with test program Examples/HUnit/HUnitExample.hs. Because I'm starting from a clean distribution, I'm hoping that this will show up all of the Windows/Hugs incompatibilities that I may have missed reporting previously. This message is constructed in a stream-of-consciousness style, as I wanted to capture every step I took to get the code running. (I *did* get the code running with Hugs under Windows, with just a little butchery ;-) My first step was to unpack the distribution kit into a directory: D:\Cvs\DEV\HaskellUtils\HXmlToolbox-4.00 I then created a batch file to run Hugs from the Examples directory: [[ rem Use Hugs version compiled with Unicode support set HUGS=C:\DEV\Hugs98\hugs-20040109.exe set LIBS=..\hdom;..\hparser;..\hvalidator;..\hxpath;..\http;..\popen %HUGS% -P"%LIBS%;" PAUSE ]] (I should check if the HXML Toolbox requires Haskell support for 'char's with codepoints >255. Later.) ... [[ Reading file "..\http\HTTP.hs": Parsing ERROR "..\http\HTTP.hs":132 - Syntax error in import declaration (unexpected symbol "partition") Network.Socket> ]] import Data.List ({- isPrefixOf -},partition,elemIndex) -- uwe changed to: import Data.List ({- isPrefixOf, -} partition,elemIndex) -- uwe ... I experienced a problem with http.hs, because my version of Network.URI has a different internal structure. It's not clear to me why there's a need to force a '/' at the front of the path (i.e. if a relative path is supplied here, isn't it an error in any case?): [[ alt_uri = show $ if null (path u) || head (path u) /= '/' then u { uriPath = '/' : path u } else u ]] (this is modified to use uriPath, so it works with my Network.URI) In the longer run, I'm hoping that the HTTP module will become part of the Network library module, based on work by Bjorn Bringert. ... [[ Reading file "..\http\Browser.hs": Parsing ERROR "..\http\Browser.hs":70 - Syntax error in import declaration (unexpected comma) HTTP> ]] import Control.Monad (foldM,filterM, {- liftMf -},when) changed to: import Control.Monad (foldM,filterM, {- liftMf, -} when) ... I also had a problem with functions trimHost and formToRequest in Browser.hs and my URI module, similar to that noted above. Again, I hope this problem can be avoided when http is part of the common library. ... [[ Reading file "..\hparser\ProtocolHandlerHttpCurl.hs": Parsing ERROR "..\hparser\ProtocolHandlerHttpCurl.hs" - Can't find imported module "Posix" ProtocolHandlerHttpNative> ]] This is the old well-known POpen problem. I've copied my version of POpen into the HXMLtoolbox code tree (it compiles, but still doesn't work), and changed the import Posix(popen) to import POPen(popen). ... With the above changes, the code runs but reports 8 failures: [[ Cases: 110 Tried: 110 Errors: 0 Failures: 8 Counts{cases=110,tried=110,errors=0,failures=8} ]] Failures are: [[ fatal error: illegal URI for input: "mini1.xml" : ### Failure in: 3:input tests:3 expected: "<x/>" but got: "" Cases: 110 Tried: 36 Errors: 0 Failures: 1 fatal error: illegal URI for input: "mini1.xml" ### Failure in: 3:input tests:4 expected: "0" but got: "3" Cases: 110 Tried: 37 Errors: 0 Failures: 2 fatal error: illegal URI for input: "mini1.xml" : ### Failure in: 3:input tests:5 expected: "4" but got: "0" Cases: 110 Tried: 38 Errors: 0 Failures: 3 fatal error: illegal URI for input: "mini2.xml" ### Failure in: 3:input tests:6 expected: "0" but got: "3" Cases: 110 Tried: 39 Errors: 0 Failures: 4 fatal error: illegal URI for input: "notThere.xml" : fatal error: illegal URI for input: "mini2.xml" : ### Failure in: 4:document transformation tests:0 expected: " ignore unknown important important " but got: "" Cases: 110 Tried: 42 Errors: 0 Failures: 5 fatal error: illegal URI for input: "mini2.xml" : ### Failure in: 4:document transformation tests:1 expected: "ignoreunknownimportantimportant" but got: "" Cases: 110 Tried: 43 Errors: 0 Failures: 6 fatal error: illegal URI for input: "mini2.xml" : ### Failure in: 4:document transformation tests:2 expected: " important important " but got: "" Cases: 110 Tried: 44 Errors: 0 Failures: 7 fatal error: illegal URI for input: "mini2.xml" : ### Failure in: 4:document transformation tests:3 expected: " ignore important " but got: "" Cases: 110 Tried: 110 Errors: 0 Failures: 8 Counts{cases=110,tried=110,errors=0,failures=8} Main> ]] Thus it looks as if all of the failures are due to an invalid URI for a local file. This is a re-run of an old problem, traced back to: [[ setDefaultURI :: XState state () setDefaultURI = do wd <- io getCurrentDirectory setSysParam transferDefaultURI ("file://localhost" ++ wd ++ "/") ]] -- XmlInput.hs The value returned by getCurrentDirectory on Windows is not valid as part of a URI. I previously fixed this by replacing the above with the following: [[ {- original: setDefaultURI :: XState state () setDefaultURI = do wd <- io getCurrentDirectory setSysParam transferDefaultURI ("file://localhost" ++ wd ++ "/") -} -- Revised version to allow Windows directory strings. [[[GK]]] -- -- If the current directory starts with 'd:', it is assumed to be a Windows -- directory, and all '\' characters are mapped to '/'. -- -- In any case, any non-URI or reserved character is escaped. setDefaultURI :: XState state () setDefaultURI = do wd <- io getCurrentDirectory let wd1 = case wd of d:':':_ | driveLetter d -> '/':concatMap win32ToUriChar wd otherwise -> concatMap escapeNonUriChar wd setSysParam transferDefaultURI ("file://localhost" ++ wd1 ++ "/") -- [[[I'd prefer to leave 'localhost' as null, but this raises -- awkward questions about whether it's OK to remove a -- null authority from a URI]]] where win32ToUriChar '\\' = "/" win32ToUriChar c = escapeNonUriChar c escapeNonUriChar c = escapeChar isUnescapedInURI c -- from Network.URI driveLetter d = d `elem` ['A'..'Z'] -- to test: -- run () $ do { uri <- getDefaultURI ; io (putStrLn uri) } {- Excerpt from: http://www.ietf.org/internet-drafts/draft-hoffman-rfc1738bis-01.txt 2.7 FILES The file URL scheme is used to designate files accessible on a particular host computer. This scheme, unlike most other URL schemes, does not designate a resource that is universally accessible over the Internet. A file URL takes the form: file://<host>/<path> where <host> is the fully qualified domain name of the system on which the <path> is accessible, and <path> is a hierarchical directory path of the form <directory>/<directory>/.../<name>. As a special case, <host> can be the string "localhost" or the empty string; this is interpreted as "the machine from which the URL is being interpreted". However, this part of the syntax has been ignored on many systems. That is, for some systems, the following are considered equal, while on others they are not: file://localhost/path/to/file.txt file:///path/to/file.txt Some systems allow URLs to point to directories. In this case, there is usually (but not always) a terminating "/" character, such as in: file://usr/local/bin/ On systems running some versions of Microsoft Windows, the local drive specification is preceded by a "/" character. Thus, for a file called "example.ini" in the "windows" directory on the "c:" drive, the URL would be: file:///c:/windows/example.ini For Windows shares, there is an additional "/" prepended to the name. Thus, the file "example.doc" on the shared directory "department" would have the URL: file:////department/example.doc The file URL scheme is unusual in that it does not specify an Internet protocol or access method for such files; as such, its utility in network protocols between hosts is limited. -} ]] I also needed to add an import to XmlInput.hs: [[ import Network.URI ( isUnescapedInURI, escapeChar ) ]] ... I also find I must modify ProtocolHandlerFile.hs to strip off the leading '/' from the path to work on Windows. The following change is made to function getFileContents: [[ {- Original: source = path uri readErr msg = addFatal msg n -} -- [[[GK]]] strip off leading '/' from Windows drive name source = fileuripath (path uri) fileuripath ('/':file@(d:':':more)) | driveLetter d = file fileuripath file = file driveLetter d = d `elem` ['A'..'Z'] readErr msg = addFatal msg n ]] ... Finally, I must copy mini1.xml and mini2.xml into the directory from which I'm running Hugs (i.e. HXmlToolbox-4.00/examples), and the test program runs with no errors: [[ Cases: 110 Tried: 110 Errors: 0 Failures: 0 Counts{cases=110,tried=110,errors=0,failures=0} ]] ... Well, that wasn't too hard, second time around. Maybe we should abstract out the URI<->filename mapping logic into another module that is specifically to handle the file: URI scheme. Dopes it make sense to add such a module to the network branch of the library hierarchy, to sit alongside the HTTP modules? #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
participants (1)
-
Graham Klyne