
On Fri, Apr 30, 2010 at 11:51 AM, Jesper Louis Andersen
On Fri, Apr 30, 2010 at 5:38 PM, Gwern Branwen
wrote: Nothing in http://develop.github.com/ seems especially useful for grabbing the git:// URLs of all repos by language - just by user.
The only real list of repos by language seems to be gotten at via http://github.com/languages/Haskell/updated or http://github.com/languages/Haskell/created . (You might think http://github.com/languages/Haskell would be good, but no, it's just a few random repos by interest and not a full listing.)
Github has a REST API for accessing data. Unfortunately it can't give you the wanted breakdown, but I would ask them for it. It is much simpler for you,
You mean ask for a new feature? (Just a one-time list is no good since I intend to repeat it regularly to pick up new repos, just like with patch-tag.)
and it does not put an extra strain on their servers due to the scraping.
Well, it'd only be about 2000 HTTP hits. (98 + (20 * 98)). The downloading of the repos would probably reduce that demand to insignificance, especially the first time around when most of the repos would need to be downloaded.
Usually, the github guys are helpful when you have a question.
Any suggested method besides the obvious http://github.com/contact ? -- gwern