Re: [Haskell-cafe] [Haskell] Spam on the Haskell wiki

newer
Why GHC is written in Happy and...

older
ICFP Programming Contest (starts...

Henk-Jan van Tuyl

31 Jul 2012 31 Jul '12

3:05 a.m.

On Mon, 16 Jul 2012 00:03:49 +0200, Henk-Jan van Tuyl wrote:

...

I am willing to do administrator tasks.

...
4. ReCAPTCHA enabled for 'edits adding new, unrecognized external links' - which is all of the spam.

This is already enabled.

The HaskellWiki is still flooded with spam; we should take some measure to reduce the stream severely. Most spam seems to be created (semi-)automated; the pages do not contain links, the usernames end with two digits, most of the time. Some cures I have thought up: - Verify new wiki accounts, before granting them rights, based on e-mails in the Haskell mailing lists (or subscription of a Haskell mailing list) - Let new users only change pages, not create new pages - Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row - Block creation of pages with words in a certain list (Coach, Vuitton, Chanel, handbags, purses, outlet, luggage, "Nike Air Jordan") Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming --

Show replies by date

timothyhobbs＠seznam.cz

31 Jul 31 Jul

4:12 a.m.

New subject: [Haskell] Spam on the Haskell wiki

Can we have at least 5 consonants? There are enough people with names such as "Srbský" in eastern European.... In fact, the Czechs can make use of as many as 9 consonants in a row! http://ld.johanesville.net/perlicky/03- jazykova-nej-a-jine-hricky On a side note, image based CAPACHA's can cause problems for blind people. ---------- Původní zpráva ---------- Od: Henk-Jan van Tuyl Datum: 30. 7. 2012 Předmět: Re: [Haskell-cafe] [Haskell] Spam on the Haskell wiki "On Mon, 16 Jul 2012 00:03:49 +0200, Henk-Jan van Tuyl wrote:

...

I am willing to do administrator tasks.

...
4. ReCAPTCHA enabled for 'edits adding new, unrecognized external links' - which is all of the spam.

This is already enabled.

The HaskellWiki is still flooded with spam; we should take some measure to reduce the stream severely. Most spam seems to be created (semi-)automated; the pages do not contain links, the usernames end with two digits, most of the time. Some cures I have thought up: - Verify new wiki accounts, before granting them rights, based on e-mails in the Haskell mailing lists (or subscription of a Haskell mailing list) - Let new users only change pages, not create new pages - Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row - Block creation of pages with words in a certain list (Coach, Vuitton, Chanel, handbags, purses, outlet, luggage, "Nike Air Jordan") Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/(http://Van.Tuyl.eu/) http://members.chello.nl/hjgtuyl/tourdemonad.html (http://members.chello.nl/hjgtuyl/tourdemonad.html) Haskell programming -- _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe (http://www.haskell.org/mailman/listinfo/haskell-cafe)"

Henk-Jan van Tuyl

5 p.m.

New subject: [Haskell] Spam on the Haskell wiki

On Tue, 31 Jul 2012 00:42:40 +0200, wrote:

...

On a side note, image based CAPACHA's can cause problems for blind people.

Googles ReCaptcha can pronounce the text to type. Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming --

Alexander Solla

4:29 a.m.

New subject: [Haskell] Spam on the Haskell wiki

On Mon, Jul 30, 2012 at 2:35 PM, Henk-Jan van Tuyl wrote:

...

- Verify new wiki accounts, before granting them rights, based on e-mails in the Haskell mailing lists (or subscription of a Haskell mailing list)

This is a nice idea, but I think it will end up moving spam onto the mailing lists. There is hardly any policy in place to keep people out of the mailing lists. Mailing list spam is attractive to spammers, since it all gets mirrored to archive sites all over the place. Not to volunteer others, but how feasible would it be to require credentials from Haskellers.org?

...

- Let new users only change pages, not create new pages

This is good for stopping the creation of walled gardens full of spam. But it won't stop "vandalism" spam, where somebody goes to a page that isn't accessed much and changes it. Does anybody have statistics about how often pages are edited/added? If the numbers aren't too big, I'd volunteer to "moderate" insofar as scanning new edits/adds for spam. Maybe this role should just forward articles with spam on them to a "real" moderator to roll-back. We could even have a "report spam" button on each page, and if enough users click on it (for a given revision), the revision gets forwarded to a moderator.

...

- Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row

I don't see this providing any security against spam, and I'm thinking it will take longer to implement than it will take for a spammer to fix his scripts in response.

...

- Block creation of pages with words in a certain list (Coach, Vuitton, Chanel, handbags, purses, outlet, luggage, "Nike Air Jordan")

Same.

Henk-Jan van Tuyl

5:10 p.m.

New subject: [Haskell] Spam on the Haskell wiki

On Tue, 31 Jul 2012 00:59:28 +0200, Alexander Solla wrote:

...

Does anybody have statistics about how often pages are edited/added?

In the last seven days, there were 251 new (user)pages created; there was no spam added to existing pages. I also discovered spam added to pages at http://hackage.haskell.org/trac/hackage/ A search for "rio bouygues"[0] gave 118 results, "virgin mobile" gave 124 results; there are probably more. Regards, Henk-Jan van Tuyl [0] http://hackage.haskell.org/trac/hackage/search?q=%22rio+bouygues%22&noquickjump=1&ticket=on&milestone=on&wiki=on -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming --

Gwern Branwen

3 Aug 3 Aug

7:29 p.m.

New subject: [Haskell] Spam on the Haskell wiki

On Mon, Jul 30, 2012 at 6:59 PM, Alexander Solla wrote:

...

We could even have a "report spam" button on each page, and if enough users click on it (for a given revision), the revision gets forwarded to a moderator.

This would be useless. The problem is not detecting spam, since that's quite trivial: it's very hard to miss. The problem is that the moderator (ie. me) is already overworked. The spam needs to be reduced to begin with, not detected. -- gwern http://www.gwern.net

damodar kulkarni

4 Aug 4 Aug

8:04 a.m.

New subject: [Haskell] Spam on the Haskell wiki

Hi Gwern, First of all, thanks for your patience. I am willing to do administrator tasks.

...

4. ReCAPTCHA enabled for 'edits adding new, unrecognized external

...
links' - which is all of the spam.

This is already enabled.

I guess the problem may be due to "ReCAPTCHAhttp://www.google.com/recaptcha/learnmore"; so you can choose to use a custom built CAPTCHA that is more difficult to crack. You may find some open source captcha systems better than the "ReCAPTCHA". http://jcaptcha.sourceforge.net/ To forge the relay attacks on CAPTCHA, you may try early timeouts and/or increasing length of CAPTCHA text. This potentially may mean more trouble and nuisance to legit users, but I guess, the Haskellers will be willing to pay this "small" price for a better web-site experience for them. :) Relay attacks: Remember that there are human solvers employed in countries like India, China, so any human solvable captcha will fail to work as desired. http://en.wikipedia.org/wiki/CAPTCHA#Human_solvers The problem is not detecting spam, since that's

...

quite trivial: it's very hard to miss.

Thanks for providing more info. So, another doubt, if detecting spam is trivial, then why not just send the detected spam to trash directly without any human inspection? This may mean some trouble for the posters due to "false positives"; but the moderator's job can be reduced to some extent. I hope, this is useful. If not, please forgive me for causing more reading trouble for you. Regards, -Damodar On Fri, Aug 3, 2012 at 7:29 PM, Gwern Branwen wrote:

...

On Mon, Jul 30, 2012 at 6:59 PM, Alexander Solla wrote:

...
We could even have a "report spam" button on each page, and if enough users click on it (for a given revision), the revision gets forwarded to a moderator.

This would be useless. The problem is not detecting spam, since that's quite trivial: it's very hard to miss. The problem is that the moderator (ie. me) is already overworked. The spam needs to be reduced to begin with, not detected.

-- gwern http://www.gwern.net

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Gwern Branwen

10:45 p.m.

New subject: [Haskell] Spam on the Haskell wiki

On Fri, Aug 3, 2012 at 10:34 PM, damodar kulkarni wrote:

...

So, another doubt, if detecting spam is trivial, then why not just send the detected spam to trash directly without any human inspection? This may mean some trouble for the posters due to "false positives"; but the moderator's job can be reduced to some extent.

Which is pretty much what this whole thread is about: asking that the sysadmins Do Something about this trivial yet overwhelming spam. -- gwern http://www.gwern.net

Christopher Done

3 Aug 3 Aug

9:56 p.m.

New subject: [Haskell] Spam on the Haskell wiki

Anyone ran SpamAssassin on the offending content created by the spammers? I've been using it on hpaste and it's been very effective at cutting out the crap. On 4 August 2012 19:15, Gwern Branwen wrote:

...

...
So, another doubt, if detecting spam is trivial, then why not just send

...
detected spam to trash directly without any human inspection? This may mean some trouble for the posters due to "false positives"; but

On Fri, Aug 3, 2012 at 10:34 PM, damodar kulkarni wrote: the the

...
moderator's job can be reduced to some extent.

Which is pretty much what this whole thread is about: asking that the sysadmins Do Something about this trivial yet overwhelming spam.

-- gwern http://www.gwern.net

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Michael Orlitzky

31 Jul 31 Jul

5:31 a.m.

New subject: [Haskell] Spam on the Haskell wiki

On 07/30/2012 05:35 PM, Henk-Jan van Tuyl wrote:

...

On Mon, 16 Jul 2012 00:03:49 +0200, Henk-Jan van Tuyl wrote:

...
I am willing to do administrator tasks.

...
4. ReCAPTCHA enabled for 'edits adding new, unrecognized external links' - which is all of the spam.

This is already enabled.

The HaskellWiki is still flooded with spam; we should take some measure to reduce the stream severely. Most spam seems to be created (semi-)automated; the pages do not contain links, the usernames end with two digits, most of the time. Some cures I have thought up:

There are two (easy) things that will make a huge dent in the automated stuff. 1. Add a fake field, hidden through CSS, labeled something like "You must leave this field blank to submit the form" (for non-visual browsers). Put it on every page with a submit button. If it isn't empty, don't process the submission. You can give it a /name/ that sounds tempting, though. 2. Force previews. If the bots are targeted at your wiki software and you modify it to preview all submissions, the bots will stop working.

Ricardo Wurmus

5:32 a.m.

New subject: [Haskell] Spam on the Haskell wiki

On 31 July 2012 05:35, Henk-Jan van Tuyl wrote:

...

... with more than one x or q

This would exclude legitimate Chinese (pinyin) usernames for not much gain.

wren ng thornton

3 Aug 3 Aug

5:16 a.m.

New subject: [Haskell] Spam on the Haskell wiki

On 7/30/12 5:35 PM, Henk-Jan van Tuyl wrote:

...

- Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row

As other's've mentioned, many of these constraints impose undue burden on users with linguistic heritage outside of western Europe. Creating a decent filter for recognizing legitimate names across the majority of languages is quite difficult. Though there's no reason this has to be a strong blacklisting of usernames. If there's a willing volunteer (as seems to have been implied), then something like this could serve as a filter requiring manual override. All usernames are available... but some take longer to activate. Of course, there's always the power-to-weight issue for this kind of solution. -- Live well, ~wren

Alexander Solla

7:36 a.m.

New subject: [Haskell] Spam on the Haskell wiki

On Thu, Aug 2, 2012 at 4:46 PM, wren ng thornton wrote:

...

On 7/30/12 5:35 PM, Henk-Jan van Tuyl wrote:

...
- Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row

As other's've mentioned, many of these constraints impose undue burden on users with linguistic heritage outside of western Europe. Creating a decent filter for recognizing legitimate names across the majority of languages is quite difficult.

Though there's no reason this has to be a strong blacklisting of usernames. If there's a willing volunteer (as seems to have been implied), then something like this could serve as a filter requiring manual override. All usernames are available... but some take longer to activate. Of course, there's always the power-to-weight issue for this kind of solution.

Yeah, I volunteered. I'd like to see some kind of random round-robin system to dispatch approval edits to a group of volunteers (i.e., if I only had to scan 10 or so edits for spam a day -- I don't feel inclined to read for correctness). It wouldn't be so bad if there was 10-20 volunteers. I suppose a lot less could do it if it was just approving user requests (but, I also think that would be less effective at stopping spam)

damodar kulkarni

8:10 a.m.

New subject: [Haskell] Spam on the Haskell wiki

...

We could even have a "report spam" button on each page, and if enough users click on it (for a given revision), the revision gets forwarded to a moderator.

I think, this will be of real use, but should be used along with CAPTCHA because then spammers may "report spam" for everything and anything on the site. But with captcha, it will be real helpful, as it means the moderation task is more or less crowd-sourced. regards, Damodar On Fri, Aug 3, 2012 at 7:36 AM, Alexander Solla wrote:

...

On Thu, Aug 2, 2012 at 4:46 PM, wren ng thornton wrote:

...
On 7/30/12 5:35 PM, Henk-Jan van Tuyl wrote:

...
- Block creation of usernames o ending with two or more digits o with more than one x or q o starting with "buy" o longer than 20 characters o with more than 4 consonants in a row

As other's've mentioned, many of these constraints impose undue burden on users with linguistic heritage outside of western Europe. Creating a decent filter for recognizing legitimate names across the majority of languages is quite difficult.

Though there's no reason this has to be a strong blacklisting of usernames. If there's a willing volunteer (as seems to have been implied), then something like this could serve as a filter requiring manual override. All usernames are available... but some take longer to activate. Of course, there's always the power-to-weight issue for this kind of solution.

Yeah, I volunteered. I'd like to see some kind of random round-robin system to dispatch approval edits to a group of volunteers (i.e., if I only had to scan 10 or so edits for spam a day -- I don't feel inclined to read for correctness). It wouldn't be so bad if there was 10-20 volunteers. I suppose a lot less could do it if it was just approving user requests (but, I also think that would be less effective at stopping spam)

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

4544

Age (days ago)

4913

Last active (days ago)

List overview

Download

13 comments

9 participants

participants (9)

Alexander Solla
Christopher Done
damodar kulkarni
Gwern Branwen
Henk-Jan van Tuyl
Michael Orlitzky
Ricardo Wurmus
timothyhobbs＠seznam.cz
wren ng thornton