Re: [Haskell-cafe] [Haskell] Spam on the Haskell wiki

On Mon, 16 Jul 2012 00:03:49 +0200, Henk-Jan van Tuyl
I am willing to do administrator tasks.
4. ReCAPTCHA enabled for 'edits adding new, unrecognized external links' - which is all of the spam.
This is already enabled.
The HaskellWiki is still flooded with spam; we should take some measure to reduce the stream severely. Most spam seems to be created (semi-)automated; the pages do not contain links, the usernames end with two digits, most of the time. Some cures I have thought up: - Verify new wiki accounts, before granting them rights, based on e-mails in the Haskell mailing lists (or subscription of a Haskell mailing list) - Let new users only change pages, not create new pages - Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row - Block creation of pages with words in a certain list (Coach, Vuitton, Chanel, handbags, purses, outlet, luggage, "Nike Air Jordan") Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming --

Can we have at least 5 consonants? There are enough people with names such
as "Srbský" in eastern European.... In fact, the Czechs can make use of as
many as 9 consonants in a row! http://ld.johanesville.net/perlicky/03-
jazykova-nej-a-jine-hricky
On a side note, image based CAPACHA's can cause problems for blind people.
---------- Původní zpráva ----------
Od: Henk-Jan van Tuyl
I am willing to do administrator tasks.
4. ReCAPTCHA enabled for 'edits adding new, unrecognized external links' - which is all of the spam.
This is already enabled.
The HaskellWiki is still flooded with spam; we should take some measure to reduce the stream severely. Most spam seems to be created (semi-)automated; the pages do not contain links, the usernames end with two digits, most of the time. Some cures I have thought up: - Verify new wiki accounts, before granting them rights, based on e-mails in the Haskell mailing lists (or subscription of a Haskell mailing list) - Let new users only change pages, not create new pages - Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row - Block creation of pages with words in a certain list (Coach, Vuitton, Chanel, handbags, purses, outlet, luggage, "Nike Air Jordan") Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/(http://Van.Tuyl.eu/) http://members.chello.nl/hjgtuyl/tourdemonad.html (http://members.chello.nl/hjgtuyl/tourdemonad.html) Haskell programming -- _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe (http://www.haskell.org/mailman/listinfo/haskell-cafe)"

On Tue, 31 Jul 2012 00:42:40 +0200,
On a side note, image based CAPACHA's can cause problems for blind people.
Googles ReCaptcha can pronounce the text to type. Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming --

On Mon, Jul 30, 2012 at 2:35 PM, Henk-Jan van Tuyl
- Verify new wiki accounts, before granting them rights, based on e-mails in the Haskell mailing lists (or subscription of a Haskell mailing list)
This is a nice idea, but I think it will end up moving spam onto the mailing lists. There is hardly any policy in place to keep people out of the mailing lists. Mailing list spam is attractive to spammers, since it all gets mirrored to archive sites all over the place. Not to volunteer others, but how feasible would it be to require credentials from Haskellers.org?
- Let new users only change pages, not create new pages
This is good for stopping the creation of walled gardens full of spam. But it won't stop "vandalism" spam, where somebody goes to a page that isn't accessed much and changes it. Does anybody have statistics about how often pages are edited/added? If the numbers aren't too big, I'd volunteer to "moderate" insofar as scanning new edits/adds for spam. Maybe this role should just forward articles with spam on them to a "real" moderator to roll-back. We could even have a "report spam" button on each page, and if enough users click on it (for a given revision), the revision gets forwarded to a moderator.
- Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row
I don't see this providing any security against spam, and I'm thinking it will take longer to implement than it will take for a spammer to fix his scripts in response.
- Block creation of pages with words in a certain list (Coach, Vuitton, Chanel, handbags, purses, outlet, luggage, "Nike Air Jordan")
Same.

On Tue, 31 Jul 2012 00:59:28 +0200, Alexander Solla
Does anybody have statistics about how often pages are edited/added?
In the last seven days, there were 251 new (user)pages created; there was no spam added to existing pages. I also discovered spam added to pages at http://hackage.haskell.org/trac/hackage/ A search for "rio bouygues"[0] gave 118 results, "virgin mobile" gave 124 results; there are probably more. Regards, Henk-Jan van Tuyl [0] http://hackage.haskell.org/trac/hackage/search?q=%22rio+bouygues%22&noquickjump=1&ticket=on&milestone=on&wiki=on -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming --

On Mon, Jul 30, 2012 at 6:59 PM, Alexander Solla
We could even have a "report spam" button on each page, and if enough users click on it (for a given revision), the revision gets forwarded to a moderator.
This would be useless. The problem is not detecting spam, since that's quite trivial: it's very hard to miss. The problem is that the moderator (ie. me) is already overworked. The spam needs to be reduced to begin with, not detected. -- gwern http://www.gwern.net

Hi Gwern, First of all, thanks for your patience. I am willing to do administrator tasks.
4. ReCAPTCHA enabled for 'edits adding new, unrecognized external
links' - which is all of the spam.
This is already enabled.
I guess the problem may be due to "ReCAPTCHAhttp://www.google.com/recaptcha/learnmore"; so you can choose to use a custom built CAPTCHA that is more difficult to crack. You may find some open source captcha systems better than the "ReCAPTCHA". http://jcaptcha.sourceforge.net/ To forge the relay attacks on CAPTCHA, you may try early timeouts and/or increasing length of CAPTCHA text. This potentially may mean more trouble and nuisance to legit users, but I guess, the Haskellers will be willing to pay this "small" price for a better web-site experience for them. :) Relay attacks: Remember that there are human solvers employed in countries like India, China, so any human solvable captcha will fail to work as desired. http://en.wikipedia.org/wiki/CAPTCHA#Human_solvers The problem is not detecting spam, since that's
quite trivial: it's very hard to miss.
Thanks for providing more info.
So, another doubt, if detecting spam is trivial, then why not just send the
detected spam to trash directly without any human inspection?
This may mean some trouble for the posters due to "false positives"; but
the moderator's job can be reduced to some extent.
I hope, this is useful. If not, please forgive me for causing more reading
trouble for you.
Regards,
-Damodar
On Fri, Aug 3, 2012 at 7:29 PM, Gwern Branwen
On Mon, Jul 30, 2012 at 6:59 PM, Alexander Solla
wrote: We could even have a "report spam" button on each page, and if enough users click on it (for a given revision), the revision gets forwarded to a moderator.
This would be useless. The problem is not detecting spam, since that's quite trivial: it's very hard to miss. The problem is that the moderator (ie. me) is already overworked. The spam needs to be reduced to begin with, not detected.
-- gwern http://www.gwern.net
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Aug 3, 2012 at 10:34 PM, damodar kulkarni
So, another doubt, if detecting spam is trivial, then why not just send the detected spam to trash directly without any human inspection? This may mean some trouble for the posters due to "false positives"; but the moderator's job can be reduced to some extent.
Which is pretty much what this whole thread is about: asking that the sysadmins Do Something about this trivial yet overwhelming spam. -- gwern http://www.gwern.net

Anyone ran SpamAssassin on the offending content created by the spammers?
I've been using it on hpaste and it's been very effective at cutting out
the crap.
On 4 August 2012 19:15, Gwern Branwen
So, another doubt, if detecting spam is trivial, then why not just send
detected spam to trash directly without any human inspection? This may mean some trouble for the posters due to "false positives"; but
On Fri, Aug 3, 2012 at 10:34 PM, damodar kulkarni
wrote: the the moderator's job can be reduced to some extent.
Which is pretty much what this whole thread is about: asking that the sysadmins Do Something about this trivial yet overwhelming spam.
-- gwern http://www.gwern.net
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 07/30/2012 05:35 PM, Henk-Jan van Tuyl wrote:
On Mon, 16 Jul 2012 00:03:49 +0200, Henk-Jan van Tuyl
wrote: I am willing to do administrator tasks.
4. ReCAPTCHA enabled for 'edits adding new, unrecognized external links' - which is all of the spam.
This is already enabled.
The HaskellWiki is still flooded with spam; we should take some measure to reduce the stream severely. Most spam seems to be created (semi-)automated; the pages do not contain links, the usernames end with two digits, most of the time. Some cures I have thought up:
There are two (easy) things that will make a huge dent in the automated stuff. 1. Add a fake field, hidden through CSS, labeled something like "You must leave this field blank to submit the form" (for non-visual browsers). Put it on every page with a submit button. If it isn't empty, don't process the submission. You can give it a /name/ that sounds tempting, though. 2. Force previews. If the bots are targeted at your wiki software and you modify it to preview all submissions, the bots will stop working.

On 7/30/12 5:35 PM, Henk-Jan van Tuyl wrote:
- Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row
As other's've mentioned, many of these constraints impose undue burden on users with linguistic heritage outside of western Europe. Creating a decent filter for recognizing legitimate names across the majority of languages is quite difficult. Though there's no reason this has to be a strong blacklisting of usernames. If there's a willing volunteer (as seems to have been implied), then something like this could serve as a filter requiring manual override. All usernames are available... but some take longer to activate. Of course, there's always the power-to-weight issue for this kind of solution. -- Live well, ~wren

On Thu, Aug 2, 2012 at 4:46 PM, wren ng thornton
On 7/30/12 5:35 PM, Henk-Jan van Tuyl wrote:
- Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row
As other's've mentioned, many of these constraints impose undue burden on users with linguistic heritage outside of western Europe. Creating a decent filter for recognizing legitimate names across the majority of languages is quite difficult.
Though there's no reason this has to be a strong blacklisting of usernames. If there's a willing volunteer (as seems to have been implied), then something like this could serve as a filter requiring manual override. All usernames are available... but some take longer to activate. Of course, there's always the power-to-weight issue for this kind of solution.
Yeah, I volunteered. I'd like to see some kind of random round-robin system to dispatch approval edits to a group of volunteers (i.e., if I only had to scan 10 or so edits for spam a day -- I don't feel inclined to read for correctness). It wouldn't be so bad if there was 10-20 volunteers. I suppose a lot less could do it if it was just approving user requests (but, I also think that would be less effective at stopping spam)

We could even have a "report spam" button on each page, and if enough users click on it (for a given revision), the revision gets forwarded to a moderator.
I think, this will be of real use, but should be used along with CAPTCHA
because then spammers may "report spam" for everything and anything on the
site.
But with captcha, it will be real helpful, as it means the moderation task
is more or less crowd-sourced.
regards,
Damodar
On Fri, Aug 3, 2012 at 7:36 AM, Alexander Solla
On Thu, Aug 2, 2012 at 4:46 PM, wren ng thornton
wrote: On 7/30/12 5:35 PM, Henk-Jan van Tuyl wrote:
- Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row
As other's've mentioned, many of these constraints impose undue burden on users with linguistic heritage outside of western Europe. Creating a decent filter for recognizing legitimate names across the majority of languages is quite difficult.
Though there's no reason this has to be a strong blacklisting of usernames. If there's a willing volunteer (as seems to have been implied), then something like this could serve as a filter requiring manual override. All usernames are available... but some take longer to activate. Of course, there's always the power-to-weight issue for this kind of solution.
Yeah, I volunteered. I'd like to see some kind of random round-robin system to dispatch approval edits to a group of volunteers (i.e., if I only had to scan 10 or so edits for spam a day -- I don't feel inclined to read for correctness). It wouldn't be so bad if there was 10-20 volunteers. I suppose a lot less could do it if it was just approving user requests (but, I also think that would be less effective at stopping spam)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (9)
-
Alexander Solla
-
Christopher Done
-
damodar kulkarni
-
Gwern Branwen
-
Henk-Jan van Tuyl
-
Michael Orlitzky
-
Ricardo Wurmus
-
timothyhobbs@seznam.cz
-
wren ng thornton