
Hi all, I'm playing with the TagSoup library trying to extract links to original pictures from my Flickr Sets page. This programs first loads the Sets page, open links to each set, get links to pictures and then search for original picture link (see steps in main function). It does the job, but for the tests I just wanted to take 10 links to reduce the time the program runs. Just hoping that haskell laziness would magically take the minimum amount of data required to get the first 10 links out of this set of pages. I did this replacing: (putStrLn . unlines . concat) origLinks with (putStrLn . unlines . take 10 . concat) origLinks in the main function. With the last version of that line, I effectively only get 10 links but the runtime is exactly the same for both main functions. As I'm a newbie haskell programmer I certainly missing something. By the way I know Flickr has an api I could use, but the purpose was playing with TagSoup. Thanks for any advice. Olivier. Here's the code: module Main where import Data.Html.TagSoup import Control.Monad (liftM) import Data.List (isPrefixOf, groupBy) import Data.Maybe (mapMaybe) import System (getArgs) import System.Time import IO (hPutStrLn, stderr) base = "http://www.flickr.com" setsUrl name = "/photos/" ++ name ++ "/sets/" main :: IO () main = do args <- getArgs tStart <- getClockTime setLinks <- getLinksByAttr ("class", "Seta") (base ++ setsUrl (args !! 0)) picLinks <- mapM (getLinksByAttr ("class", "image_link")) setLinks origLinks <- mapM (getLinksAfterImgByAttr ("src", "http://l.yimg.com/www.flickr.com/images/icon_download.gif")) $ (mapMaybe linkToOrigSize . concat) picLinks (putStrLn . unlines . concat) origLinks tEnd <- getClockTime hPutStrLn stderr ( timeDiffToString $ diffClockTimes tEnd tStart ) -- | extract all links from "a" tag types having given attribute getLinksByAttr :: (String, String) -> String -> IO [String] getLinksByAttr attr url = do sects <- getSectionsByTypeAndAttr "a" attr url return $ hrefs sects -- | get "a" tags following a "img" having a specific attribute getLinksAfterImgByAttr :: (String, String) -> String -> IO [String] getLinksAfterImgByAttr attr url = do sects <- getSectionsByTypeAndAttr "img" attr url return $ hrefs $ map (dropWhile (not . isTagOpen) . drop 1) sects -- | create sections from tag type and attribute getSectionsByTypeAndAttr :: String -> (String, String) -> String -> IO [[Tag]] getSectionsByTypeAndAttr tagType attr url = do tags <- liftM parseTags $ openURL $ url (return . filterByTypeAndAttr tagType attr) tags where filterByTypeAndAttr :: String -> (String, String) -> [Tag] -> [[Tag]] filterByTypeAndAttr t a = sections (~== TagOpen t [a]) -- | extract href values from sections of "a" tags hrefs :: [[Tag]] -> [String] hrefs = map (addBase . fromAttrib "href" . head) where addBase :: String -> String addBase s | "http://" `isPrefixOf` s = s addBase s | otherwise = base ++ s -- | transform a link to a picture into a link to the original size picture linkToOrigSize :: String -> Maybe String linkToOrigSize link = if parts !! 3 == "photos" then Just $ newUrl parts else Nothing where parts = map tail $ groupBy (const(/='/')) link newUrl p = "http://www.flickr.com/photo_zoom.gne?id=" ++ p !! 5 ++ "&size=o&context=" ++ p !! 7