Page rank and GHC docs directory organization

GHC docs seem to have the problem that newer versions only gradually overtake older ones in page rank, resulting in the effect that if one uses Google to find library documentation, they may accidentally look at an old version. For example, if I google "Data.Data Haskell" the first link brings me to: http://www.haskell.org/ghc/docs/6.10.2/html/libraries/base/Data-Data.html Oops, version 6.10.2! I'm no web expert, but I think the problem is that the "latest" directory isn't used consistently by others and/or the fact that latest/ redirects to a concrete version number. Thus if you go to: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Data.html It redirects immediately to 6.12.2: http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base/Data-Data.html So is the 6.12.2 target accruing pagerank rather than the latest one? Even if someone links the /latest/ URL? If that's the problem, would it fix things just to make latest/ a full directory structure in its own right (a clone rather than redirect)? Cheers, -Ryan

On 22/07/10 15:33, Ryan Newton wrote: [snip]
So is the 6.12.2 target accruing pagerank rather than the latest one? Even if someone links the /latest/ URL? If that's the problem, would it fix things just to make latest/ a full directory structure in its own right (a clone rather than redirect)?
For the URLs under 'latest/' the server returns a "301 Moved Permanently" response, which is used to indicate that the original URL is no longer in use and that references to it should be updated to the target URL [1]. Hence, search engines will only index the targets of the redirects and ignore the original URLs [2]. Using a "302 Found" redirect instead might produce better results, at least for Google [2]. [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2 [2] http://www.bigoakinc.com/blog/when-to-use-a-301-vs-302-redirect/ -- Robin KAY

Robin KAY
the redirects and ignore the original URLs [2]. Using a "302 Found" redirect instead might produce better results, at least for Google
But the page you point to suggests 302 is discouraged, and says they don't help for the other search engines. Perhaps 'latest' could just be a symlink to the latest version instead of a redirect? In addition, all versions could have a blurb saying this is for version x.y.z, the latest version can be found (-> url with /latest). I believe this should boost the page rank of the 'latest' URL. -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde wrote:
Robin KAY
writes: the redirects and ignore the original URLs [2]. Using a "302 Found" redirect instead might produce better results, at least for Google
But the page you point to suggests 302 is discouraged, and says they don't help for the other search engines. Perhaps 'latest' could just be a symlink to the latest version instead of a redirect?
In addition, all versions could have a blurb saying this is for version x.y.z, the latest version can be found (-> url with /latest). I believe this should boost the page rank of the 'latest' URL.
If you both implement 'latest' as a symlink and have this blurb, then the "latest" page will always have a rather silly looking link to itself. If the latest numbered version and latest could be generated separately, then I think this would work very well. The quick-n-dirty thing to do would be to switch to the 302 redirects, which given the dominance of google, may well be good enough. Ganesh =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================

On Thu, Jul 22, 2010 at 4:33 PM, Ryan Newton
GHC docs seem to have the problem that newer versions only gradually overtake older ones in page rank, resulting in the effect that if one uses Google to find library documentation, they may accidentally look at an old version. For example, if I google "Data.Data Haskell" the first link brings me to:
http://www.haskell.org/ghc/docs/6.10.2/html/libraries/base/Data-Data.html
Oops, version 6.10.2!
I'm no web expert, but I think the problem is that the "latest" directory isn't used consistently by others and/or the fact that latest/ redirects to a concrete version number. Thus if you go to:
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Data.html
It redirects immediately to 6.12.2:
http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base/Data-Data.html
So is the 6.12.2 target accruing pagerank rather than the latest one? Even if someone links the /latest/ URL? If that's the problem, would it fix things just to make latest/ a full directory structure in its own right (a clone rather than redirect)?
If we had a permanent entry page to the documentation, such as docs.haskell.org, that linked to the different kind of documentation (and was attractive enough that people would link to it) that might help. Johan
participants (5)
-
Johan Tibell
-
Ketil Malde
-
Robin KAY
-
Ryan Newton
-
Sittampalam, Ganesh