2018 state of Haskell survey results

Taylor Fausak

17 Nov 2018 17 Nov '18

9:56 p.m.

Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5... If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148 I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time. Thank you!

Show replies by date

Gershom B

18 Nov 18 Nov

1:10 a.m.

This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants? On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

...

Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Chris Smith

1:17 a.m.

This isn't TOO surprising to me. There's been a lot of confusion lately about different surveys by several different parties. It seems likely that several hundred people took some other survey, and thought it was last year's version of this one. Also, people have extremely bad memories of what they did last year. Some hundreds more probably remember seeing last year's survey and being interested, but they've forgotten that they never took it. On Sun, Nov 18, 2018 at 1:10 AM Gershom B wrote:

...

This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants?

On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

...
Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Gershom B

2:17 a.m.

I also noticed a number of other bizarre statistical anomolies when looking at the full results. I know this is a bit much to ask — but if you could rerun the statistics filtering out people that did not give demographic information (i.e. country of origin or education, etc) I think the results will change drastically. By all statistical logic, this should _not_ be the case, and points to a serious problem. In particular, this drops the results by a huge amount — only 1,200 or so remain. However, the remaining results tend to make a lot more sense. For example — of the “no demographics” group, there are 713 users who claim to develop with notepad++ but all of these say they develop on mac and linux, and none on windows — which is impossible, as notepad++ is a windows program. Further if you drop the “no demographics” group, then you find that almost everyone uses at least ghc 8.0.2, while in the “no demographics” group, a stunning number of people claim to be on 7.8.3. Even more bizarrely, people claim to be using the 7.8 series while only having used Haskell for less than one year. And people claim to have used haskell for “one week to one month” and also to be advanced and expert users! The differences continue and defy all probability. Of the “no demographics” group, almost everyone dislikes the new release schedule. Of the “demographics” group there are answers that like it, were not aware of it, or are indifferent, but almost nobody dislikes it. There is naturally a difference in proportions of cabal/stack and hackage/stackage responses as well. There are a lot of other things I could point to as well. But, bluntly put, I think that some disaffected party or parties wrote a crude script and submitted over 3,000 fake responses. Luckily for us, they were not very smart, and made some obvious errors, so in this case we can weed out the bad responses (although, sadly, losing at least a few real ones as well). However, assuming this party isn’t entirely stupid, it doesn’t bode well for future surveys as they may get at least slightly less dumb in the future if they decide to keep it up :-/ —Gershom On November 18, 2018 at 1:10:31 AM, Gershom B (gershomb@gmail.com) wrote: This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants? On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

...

Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Gershom B

2:27 a.m.

Finally, if anyone doubts some sort of scripted attack on the survey, is it really possible that over 200 people each want NPlusKPatterns, DatatypeContexts, or JavascriptFFI enabled by default!? Even a bot should make _some_ sense! —gershom On November 18, 2018 at 2:17:34 AM, Gershom B (gershomb@gmail.com) wrote: I also noticed a number of other bizarre statistical anomolies when looking at the full results. I know this is a bit much to ask — but if you could rerun the statistics filtering out people that did not give demographic information (i.e. country of origin or education, etc) I think the results will change drastically. By all statistical logic, this should _not_ be the case, and points to a serious problem. In particular, this drops the results by a huge amount — only 1,200 or so remain. However, the remaining results tend to make a lot more sense. For example — of the “no demographics” group, there are 713 users who claim to develop with notepad++ but all of these say they develop on mac and linux, and none on windows — which is impossible, as notepad++ is a windows program. Further if you drop the “no demographics” group, then you find that almost everyone uses at least ghc 8.0.2, while in the “no demographics” group, a stunning number of people claim to be on 7.8.3. Even more bizarrely, people claim to be using the 7.8 series while only having used Haskell for less than one year. And people claim to have used haskell for “one week to one month” and also to be advanced and expert users! The differences continue and defy all probability. Of the “no demographics” group, almost everyone dislikes the new release schedule. Of the “demographics” group there are answers that like it, were not aware of it, or are indifferent, but almost nobody dislikes it. There is naturally a difference in proportions of cabal/stack and hackage/stackage responses as well. There are a lot of other things I could point to as well. But, bluntly put, I think that some disaffected party or parties wrote a crude script and submitted over 3,000 fake responses. Luckily for us, they were not very smart, and made some obvious errors, so in this case we can weed out the bad responses (although, sadly, losing at least a few real ones as well). However, assuming this party isn’t entirely stupid, it doesn’t bode well for future surveys as they may get at least slightly less dumb in the future if they decide to keep it up :-/ —Gershom On November 18, 2018 at 1:10:31 AM, Gershom B (gershomb@gmail.com) wrote: This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants? On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

...

Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Simon Marlow

2:30 a.m.

Good spot Gershom. Maybe it would be revealing to look at the times that responses were received for the no-demographics group? On Sun, 18 Nov 2018, 07:17 Gershom B

...

I also noticed a number of other bizarre statistical anomolies when looking at the full results. I know this is a bit much to ask — but if you could rerun the statistics filtering out people that did not give demographic information (i.e. country of origin or education, etc) I think the results will change drastically. By all statistical logic, this should _not_ be the case, and points to a serious problem.

In particular, this drops the results by a huge amount — only 1,200 or so remain. However, the remaining results tend to make a lot more sense. For example — of the “no demographics” group, there are 713 users who claim to develop with notepad++ but all of these say they develop on mac and linux, and none on windows — which is impossible, as notepad++ is a windows program. Further if you drop the “no demographics” group, then you find that almost everyone uses at least ghc 8.0.2, while in the “no demographics” group, a stunning number of people claim to be on 7.8.3. Even more bizarrely, people claim to be using the 7.8 series while only having used Haskell for less than one year. And people claim to have used haskell for “one week to one month” and also to be advanced and expert users!

The differences continue and defy all probability. Of the “no demographics” group, almost everyone dislikes the new release schedule. Of the “demographics” group there are answers that like it, were not aware of it, or are indifferent, but almost nobody dislikes it. There is naturally a difference in proportions of cabal/stack and hackage/stackage responses as well.

There are a lot of other things I could point to as well. But, bluntly put, I think that some disaffected party or parties wrote a crude script and submitted over 3,000 fake responses. Luckily for us, they were not very smart, and made some obvious errors, so in this case we can weed out the bad responses (although, sadly, losing at least a few real ones as well).

However, assuming this party isn’t entirely stupid, it doesn’t bode well for future surveys as they may get at least slightly less dumb in the future if they decide to keep it up :-/

—Gershom

On November 18, 2018 at 1:10:31 AM, Gershom B (gershomb@gmail.com) wrote:

This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants?

On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

...
Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Chris Smith

3:40 a.m.

Sadly, it looks like a Cabal/Stack thing. Of the responses with a country provided, 618 of 1226 claim to use Cabal, and 948 of 1226 claim to use Stack. Of the responses with no country, only 35 of 3868 claim to use Cabal, while 3781 of the 3868 claim to use Stack. Assuming independence, you'd expect that last number to be about 50, meaning there are probably around 3700 fake responses generated just to answer "Stack". To partially answer Simon's question, the flood of no-demographics responses started on November 2, around the 750-response point, and continued unabated through the close of the survey. And, indeed, looking at just the first 750 responses gives similar distributions to what we get by ignoring the no-demographic responses. For example, of the first 750 responses, 359 claim to use Cabal, and 568 claim to use Stack. On Sun, Nov 18, 2018 at 2:31 AM Simon Marlow wrote:

...

Good spot Gershom. Maybe it would be revealing to look at the times that responses were received for the no-demographics group?

On Sun, 18 Nov 2018, 07:17 Gershom B
...
I also noticed a number of other bizarre statistical anomolies when looking at the full results. I know this is a bit much to ask — but if you could rerun the statistics filtering out people that did not give demographic information (i.e. country of origin or education, etc) I think the results will change drastically. By all statistical logic, this should _not_ be the case, and points to a serious problem.

In particular, this drops the results by a huge amount — only 1,200 or so remain. However, the remaining results tend to make a lot more sense. For example — of the “no demographics” group, there are 713 users who claim to develop with notepad++ but all of these say they develop on mac and linux, and none on windows — which is impossible, as notepad++ is a windows program. Further if you drop the “no demographics” group, then you find that almost everyone uses at least ghc 8.0.2, while in the “no demographics” group, a stunning number of people claim to be on 7.8.3. Even more bizarrely, people claim to be using the 7.8 series while only having used Haskell for less than one year. And people claim to have used haskell for “one week to one month” and also to be advanced and expert users!

The differences continue and defy all probability. Of the “no demographics” group, almost everyone dislikes the new release schedule. Of the “demographics” group there are answers that like it, were not aware of it, or are indifferent, but almost nobody dislikes it. There is naturally a difference in proportions of cabal/stack and hackage/stackage responses as well.

There are a lot of other things I could point to as well. But, bluntly put, I think that some disaffected party or parties wrote a crude script and submitted over 3,000 fake responses. Luckily for us, they were not very smart, and made some obvious errors, so in this case we can weed out the bad responses (although, sadly, losing at least a few real ones as well).

However, assuming this party isn’t entirely stupid, it doesn’t bode well for future surveys as they may get at least slightly less dumb in the future if they decide to keep it up :-/

—Gershom

On November 18, 2018 at 1:10:31 AM, Gershom B (gershomb@gmail.com) wrote:

This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants?

On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

...
Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Taylor Fausak

8:11 a.m.

Thanks for finding those anomalies, Gershom! I'm disappointed that someone submitted bogus responses, apparently to tip the scales of Cabal versus Stack. I intend to identify those responses and exclude them from the results. The work you've done so far will help a great deal in finding them. You said that there are about 1,200 responses with demographic information. That makes sense considering the number of submissions I got last year. Also, there are 1,185 responses that included an answer to at least one of the free-response questions. So perhaps whoever wrote the script didn't bother to put an answer for those types of questions. Unfortunately I do not have precise submission times or IP address information about submissions. Beyond what's in the CSV, the only other thing I have is (some) email addresses. Fortunately I wrote a script to output all the charts and tables from the survey responses. Once I've identified the problematic responses, I should be able to update the script to ignore them and regenerate all the output. On Sun, Nov 18, 2018, at 3:40 AM, Chris Smith wrote:

...

Sadly, it looks like a Cabal/Stack thing. Of the responses with a country provided, 618 of 1226 claim to use Cabal, and 948 of 1226 claim to use Stack. Of the responses with no country, only 35 of 3868 claim to use Cabal, while 3781 of the 3868 claim to use Stack. Assuming independence, you'd expect that last number to be about 50, meaning there are probably around 3700 fake responses generated just to answer "Stack".> To partially answer Simon's question, the flood of no-demographics responses started on November 2, around the 750-response point, and continued unabated through the close of the survey. And, indeed, looking at just the first 750 responses gives similar distributions to what we get by ignoring the no-demographic responses. For example, of the first 750 responses, 359 claim to use Cabal, and 568 claim to use Stack.> On Sun, Nov 18, 2018 at 2:31 AM Simon Marlow wrote:>> Good spot Gershom. Maybe it would be revealing to look at the times

...
that responses were received for the no-demographics group?>> On Sun, 18 Nov 2018, 07:17 Gershom B
...
I also noticed a number of other bizarre statistical anomolies when looking at the full results. I know this is a bit much to ask — but if you could rerun the statistics filtering out people that did not give demographic information (i.e. country of origin or education, etc) I think the results will change drastically. By all statistical logic, this should _not_ be the case, and points to a serious problem.>>> In particular, this drops the results by a huge amount — only 1,200 or so remain. However, the remaining results tend to make a lot more sense. For example — of the “no demographics” group, there are 713 users who claim to develop with notepad++ but all of these say they develop on mac and linux, and none on windows — which is impossible, as notepad++ is a windows program. Further if you drop the “no demographics” group, then you find that almost everyone uses at least ghc 8.0.2, while in the “no demographics” group, a stunning number of people claim to be on 7.8.3. Even more bizarrely, people claim to be using the 7.8 series while only having used Haskell for less than one year. And people claim to have used haskell for “one week to one month” and also to be advanced and expert users!>>> The differences continue and defy all probability. Of the “no demographics” group, almost everyone dislikes the new release schedule. Of the “demographics” group there are answers that like it, were not aware of it, or are indifferent, but almost nobody dislikes it. There is naturally a difference in proportions of cabal/stack and hackage/stackage responses as well.>>> There are a lot of other things I could point to as well. But, bluntly put, I think that some disaffected party or parties wrote a crude script and submitted over 3,000 fake responses. Luckily for us, they were not very smart, and made some obvious errors, so in this case we can weed out the bad responses (although, sadly, losing at least a few real ones as well).>>> However, assuming this party isn’t entirely stupid, it doesn’t bode well for future surveys as they may get at least slightly less dumb in the future if they decide to keep it up :-/>>> —Gershom

On November 18, 2018 at 1:10:31 AM, Gershom B (gershomb@gmail.com) wrote:>>>>

...
This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants?>>>>

On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

...
Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community>>>>

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Taylor Fausak

10:58 a.m.

I have filtered out the bogus responses and re-generated all the charts and tables. You can see the updated results here: https://github.com/tfausak/tfausak.github.io/blob/ee29da5bd8389c19763ac2b4db... Note that until I post the results on my blog, they are not published. Please don't share the preliminary results on social media! On Sun, Nov 18, 2018, at 8:11 AM, Taylor Fausak wrote:

...

Thanks for finding those anomalies, Gershom! I'm disappointed that someone submitted bogus responses, apparently to tip the scales of Cabal versus Stack. I intend to identify those responses and exclude them from the results. The work you've done so far will help a great deal in finding them.> You said that there are about 1,200 responses with demographic information. That makes sense considering the number of submissions I got last year. Also, there are 1,185 responses that included an answer to at least one of the free-response questions. So perhaps whoever wrote the script didn't bother to put an answer for those types of questions.> Unfortunately I do not have precise submission times or IP address information about submissions. Beyond what's in the CSV, the only other thing I have is (some) email addresses.> Fortunately I wrote a script to output all the charts and tables from the survey responses. Once I've identified the problematic responses, I should be able to update the script to ignore them and regenerate all the output.>

On Sun, Nov 18, 2018, at 3:40 AM, Chris Smith wrote:

...
Sadly, it looks like a Cabal/Stack thing. Of the responses with a country provided, 618 of 1226 claim to use Cabal, and 948 of 1226 claim to use Stack. Of the responses with no country, only 35 of 3868 claim to use Cabal, while 3781 of the 3868 claim to use Stack. Assuming independence, you'd expect that last number to be about 50, meaning there are probably around 3700 fake responses generated just to answer "Stack".>> To partially answer Simon's question, the flood of no-demographics responses started on November 2, around the 750-response point, and continued unabated through the close of the survey. And, indeed, looking at just the first 750 responses gives similar distributions to what we get by ignoring the no-demographic responses. For example, of the first 750 responses, 359 claim to use Cabal, and 568 claim to use Stack.>> On Sun, Nov 18, 2018 at 2:31 AM Simon Marlow wrote:>>> Good spot Gershom. Maybe it would be revealing to look at the times

...
that responses were received for the no-demographics group?>>> On Sun, 18 Nov 2018, 07:17 Gershom B
...
I also noticed a number of other bizarre statistical anomolies when looking at the full results. I know this is a bit much to ask — but if you could rerun the statistics filtering out people that did not give demographic information (i.e. country of origin or education, etc) I think the results will change drastically. By all statistical logic, this should _not_ be the case, and points to a serious problem.>>>> In particular, this drops the results by a huge amount — only 1,200 or so remain. However, the remaining results tend to make a lot more sense. For example — of the “no demographics” group, there are 713 users who claim to develop with notepad++ but all of these say they develop on mac and linux, and none on windows — which is impossible, as notepad++ is a windows program. Further if you drop the “no demographics” group, then you find that almost everyone uses at least ghc 8.0.2, while in the “no demographics” group, a stunning number of people claim to be on 7.8.3. Even more bizarrely, people claim to be using the 7.8 series while only having used Haskell for less than one year. And people claim to have used haskell for “one week to one month” and also to be advanced and expert users!>>>> The differences continue and defy all probability. Of the “no demographics” group, almost everyone dislikes the new release schedule. Of the “demographics” group there are answers that like it, were not aware of it, or are indifferent, but almost nobody dislikes it. There is naturally a difference in proportions of cabal/stack and hackage/stackage responses as well.>>>> There are a lot of other things I could point to as well. But, bluntly put, I think that some disaffected party or parties wrote a crude script and submitted over 3,000 fake responses. Luckily for us, they were not very smart, and made some obvious errors, so in this case we can weed out the bad responses (although, sadly, losing at least a few real ones as well).>>>> However, assuming this party isn’t entirely stupid, it doesn’t bode well for future surveys as they may get at least slightly less dumb in the future if they decide to keep it up :-/>>>> —Gershom

On November 18, 2018 at 1:10:31 AM, Gershom B (gershomb@gmail.com) wrote:>>>>>

...
This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants?>>>>>

On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

...
Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community>>>>>

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community>>> _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Gershom B

12:51 p.m.

Hi Taylor. I think we're closer to the real results here, but I'm still pretty sure that there are a fair number of phony responses. In particular, looking at your filter function, I don't think that _all_ bogus responses said "I dislike it" with regards to the ghc release schedule. A fair number that hit all the other criteria also seem to have left it blank. I suspect this will be enough to do the trick, but can't be sure... This attempted sabotage of the survey is really frustrating and disappointing. -g On Sun, Nov 18, 2018 at 10:58 AM Taylor Fausak wrote:

...

I have filtered out the bogus responses and re-generated all the charts and tables. You can see the updated results here: https://github.com/tfausak/tfausak.github.io/blob/ee29da5bd8389c19763ac2b4db...

Note that until I post the results on my blog, they are not published. Please don't share the preliminary results on social media!

On Sun, Nov 18, 2018, at 8:11 AM, Taylor Fausak wrote:

Thanks for finding those anomalies, Gershom! I'm disappointed that someone submitted bogus responses, apparently to tip the scales of Cabal versus Stack. I intend to identify those responses and exclude them from the results. The work you've done so far will help a great deal in finding them.

You said that there are about 1,200 responses with demographic information. That makes sense considering the number of submissions I got last year. Also, there are 1,185 responses that included an answer to at least one of the free-response questions. So perhaps whoever wrote the script didn't bother to put an answer for those types of questions.

Unfortunately I do not have precise submission times or IP address information about submissions. Beyond what's in the CSV, the only other thing I have is (some) email addresses.

Fortunately I wrote a script to output all the charts and tables from the survey responses. Once I've identified the problematic responses, I should be able to update the script to ignore them and regenerate all the output.

On Sun, Nov 18, 2018, at 3:40 AM, Chris Smith wrote:

Sadly, it looks like a Cabal/Stack thing. Of the responses with a country provided, 618 of 1226 claim to use Cabal, and 948 of 1226 claim to use Stack. Of the responses with no country, only 35 of 3868 claim to use Cabal, while 3781 of the 3868 claim to use Stack. Assuming independence, you'd expect that last number to be about 50, meaning there are probably around 3700 fake responses generated just to answer "Stack".

To partially answer Simon's question, the flood of no-demographics responses started on November 2, around the 750-response point, and continued unabated through the close of the survey. And, indeed, looking at just the first 750 responses gives similar distributions to what we get by ignoring the no-demographic responses. For example, of the first 750 responses, 359 claim to use Cabal, and 568 claim to use Stack.

On Sun, Nov 18, 2018 at 2:31 AM Simon Marlow wrote:

Good spot Gershom. Maybe it would be revealing to look at the times that responses were received for the no-demographics group?

On Sun, 18 Nov 2018, 07:17 Gershom B
I also noticed a number of other bizarre statistical anomolies when looking at the full results. I know this is a bit much to ask — but if you could rerun the statistics filtering out people that did not give demographic information (i.e. country of origin or education, etc) I think the results will change drastically. By all statistical logic, this should _not_ be the case, and points to a serious problem.

In particular, this drops the results by a huge amount — only 1,200 or so remain. However, the remaining results tend to make a lot more sense. For example — of the “no demographics” group, there are 713 users who claim to develop with notepad++ but all of these say they develop on mac and linux, and none on windows — which is impossible, as notepad++ is a windows program. Further if you drop the “no demographics” group, then you find that almost everyone uses at least ghc 8.0.2, while in the “no demographics” group, a stunning number of people claim to be on 7.8.3. Even more bizarrely, people claim to be using the 7.8 series while only having used Haskell for less than one year. And people claim to have used haskell for “one week to one month” and also to be advanced and expert users!

The differences continue and defy all probability. Of the “no demographics” group, almost everyone dislikes the new release schedule. Of the “demographics” group there are answers that like it, were not aware of it, or are indifferent, but almost nobody dislikes it. There is naturally a difference in proportions of cabal/stack and hackage/stackage responses as well.

There are a lot of other things I could point to as well. But, bluntly put, I think that some disaffected party or parties wrote a crude script and submitted over 3,000 fake responses. Luckily for us, they were not very smart, and made some obvious errors, so in this case we can weed out the bad responses (although, sadly, losing at least a few real ones as well).

However, assuming this party isn’t entirely stupid, it doesn’t bode well for future surveys as they may get at least slightly less dumb in the future if they decide to keep it up :-/

—Gershom

On November 18, 2018 at 1:10:31 AM, Gershom B (gershomb@gmail.com) wrote:

This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants?

On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Chris Smith

12:58 p.m.

If I could make a suggestion... although this is at the forefront of our minds right now, I don't think that you want the attempted hack of survey responses to be THE big news about the survey. I have no doubt it will garner lots of attention anyway, and you are certainly right to explain what happened and what your methodology was; but I think it would be better to state the legitimate results first... i.e., by saying "This year we received 1,679 [*] responses, which is quite an improvement.", and waiting until later to explain about the bogus submissions. Hopefully, then, more of the reaction will be around the data this provides, and less around ugly drama with what seems like only ONE bad actor. On Sun, Nov 18, 2018 at 10:58 AM Taylor Fausak wrote:

...

I have filtered out the bogus responses and re-generated all the charts and tables. You can see the updated results here: https://github.com/tfausak/tfausak.github.io/blob/ee29da5bd8389c19763ac2b4db...

Note that until I post the results on my blog, they are not published. Please don't share the preliminary results on social media!

On Sun, Nov 18, 2018, at 8:11 AM, Taylor Fausak wrote:

Thanks for finding those anomalies, Gershom! I'm disappointed that someone submitted bogus responses, apparently to tip the scales of Cabal versus Stack. I intend to identify those responses and exclude them from the results. The work you've done so far will help a great deal in finding them.

You said that there are about 1,200 responses with demographic information. That makes sense considering the number of submissions I got last year. Also, there are 1,185 responses that included an answer to at least one of the free-response questions. So perhaps whoever wrote the script didn't bother to put an answer for those types of questions.

Unfortunately I do not have precise submission times or IP address information about submissions. Beyond what's in the CSV, the only other thing I have is (some) email addresses.

Fortunately I wrote a script to output all the charts and tables from the survey responses. Once I've identified the problematic responses, I should be able to update the script to ignore them and regenerate all the output.

On Sun, Nov 18, 2018, at 3:40 AM, Chris Smith wrote:

Sadly, it looks like a Cabal/Stack thing. Of the responses with a country provided, 618 of 1226 claim to use Cabal, and 948 of 1226 claim to use Stack. Of the responses with no country, only 35 of 3868 claim to use Cabal, while 3781 of the 3868 claim to use Stack. Assuming independence, you'd expect that last number to be about 50, meaning there are probably around 3700 fake responses generated just to answer "Stack".

To partially answer Simon's question, the flood of no-demographics responses started on November 2, around the 750-response point, and continued unabated through the close of the survey. And, indeed, looking at just the first 750 responses gives similar distributions to what we get by ignoring the no-demographic responses. For example, of the first 750 responses, 359 claim to use Cabal, and 568 claim to use Stack.

On Sun, Nov 18, 2018 at 2:31 AM Simon Marlow wrote:

Good spot Gershom. Maybe it would be revealing to look at the times that responses were received for the no-demographics group?

On Sun, 18 Nov 2018, 07:17 Gershom B
I also noticed a number of other bizarre statistical anomolies when looking at the full results. I know this is a bit much to ask — but if you could rerun the statistics filtering out people that did not give demographic information (i.e. country of origin or education, etc) I think the results will change drastically. By all statistical logic, this should _not_ be the case, and points to a serious problem.

In particular, this drops the results by a huge amount — only 1,200 or so remain. However, the remaining results tend to make a lot more sense. For example — of the “no demographics” group, there are 713 users who claim to develop with notepad++ but all of these say they develop on mac and linux, and none on windows — which is impossible, as notepad++ is a windows program. Further if you drop the “no demographics” group, then you find that almost everyone uses at least ghc 8.0.2, while in the “no demographics” group, a stunning number of people claim to be on 7.8.3. Even more bizarrely, people claim to be using the 7.8 series while only having used Haskell for less than one year. And people claim to have used haskell for “one week to one month” and also to be advanced and expert users!

The differences continue and defy all probability. Of the “no demographics” group, almost everyone dislikes the new release schedule. Of the “demographics” group there are answers that like it, were not aware of it, or are indifferent, but almost nobody dislikes it. There is naturally a difference in proportions of cabal/stack and hackage/stackage responses as well.

There are a lot of other things I could point to as well. But, bluntly put, I think that some disaffected party or parties wrote a crude script and submitted over 3,000 fake responses. Luckily for us, they were not very smart, and made some obvious errors, so in this case we can weed out the bad responses (although, sadly, losing at least a few real ones as well).

However, assuming this party isn’t entirely stupid, it doesn’t bode well for future surveys as they may get at least slightly less dumb in the future if they decide to keep it up :-/

—Gershom

On November 18, 2018 at 1:10:31 AM, Gershom B (gershomb@gmail.com) wrote:

This is interesting, but I’m thoroughly confused. Over 2500 people said they took last year’s survey, but it only had roughly 1,300 respondants?

On Sat, Nov 17, 2018 at 9:56 PM Taylor Fausak wrote:

Hello! It took a little longer than I expected, but I am nearly ready to announce the 2018 state of Haskell survey results. Some community members have expressed interest in seeing the announcement post before it's published. If you are one of those people, you can see the results here: https://github.com/tfausak/tfausak.github.io/blob/7e4937e284a3068add9e9af6b5...

If you would like to suggest changes to the announcement post, please respond to this email, send me an email directly, or reply to this pull request on GitHub: https://github.com/tfausak/tfausak.github.io/pull/148

I plan on publishing the results tomorrow. Once the results are published, the post is by no means set in stone. I will happily accept suggestions from anyone at any time.

Thank you! _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

*_______________________________________________* Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Imants Cekusins

1:26 p.m.

What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year? The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary. Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels? Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats. Is either library likely to be influenced by this survey?

Taylor Fausak

2:17 p.m.

Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java. That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses. On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:

...

What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?> The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.>

Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?> Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.> Is either library likely to be influenced by this survey? _________________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Michael Snoyman

2:32 p.m.

Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report.

...

On 18 Nov 2018, at 21:17, Taylor Fausak wrote:

Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java.

That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses.

On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:

...
What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?

The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.

Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?

Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.

Is either library likely to be influenced by this survey? _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org mailto:Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org mailto:Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Taylor Fausak

2:55 p.m.

Ok, I updated the function that checks for bad responses, re-ran the script, and updated the announcement along with all the assets (charts, tables, and CSV). Hopefully it's the last time, as I can't justify spending much more time on this. https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4... On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote:

...

Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report.>

...
On 18 Nov 2018, at 21:17, Taylor Fausak wrote:

Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over- representing Stack and Stackage. Also, bizarrely, Java.>> That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses.>>

On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:

...
What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?>>> The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.>>>

Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?>>> Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.>>> Is either library likely to be influenced by this survey? _________________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Gershom B

3:07 p.m.

The language extensions section doesn’t appear to be sorted properly. Outside of that, I think that these results are looking much better and any effort to find any additional outliers is probably not worth it for the moment. Thanks for your work on this, and I appreciate you being responsive and attentive when problems with the data were pointed out. There’s certainly some interesting and helpful information to be gleaned from this data. Cheers, Gershom On November 18, 2018 at 2:55:10 PM, Taylor Fausak (taylor@fausak.me) wrote: Ok, I updated the function that checks for bad responses, re-ran the script, and updated the announcement along with all the assets (charts, tables, and CSV). Hopefully it's the last time, as I can't justify spending much more time on this. https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4... On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote: Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report. On 18 Nov 2018, at 21:17, Taylor Fausak wrote: Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java. That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses. On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote: What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year? The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary. Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels? Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats. Is either library likely to be influenced by this survey? *_______________________________________________* Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Taylor Fausak

4:31 p.m.

Oops, the ordering of the answer choices is manual because some questions have a natural order while others should just be most to least popular. I've made another run through to make sure everything is sorted properly. I'll probably hit publish in the next half hour or so unless there are any objections. https://github.com/tfausak/tfausak.github.io/blob/fce97d07c369856d4c05b756c4... On Sun, Nov 18, 2018, at 3:07 PM, Gershom B wrote:

...

The language extensions section doesn’t appear to be sorted properly. Outside of that, I think that these results are looking much better and any effort to find any additional outliers is probably not worth it for the moment. Thanks for your work on this, and I appreciate you being responsive and attentive when problems with the data were pointed out. There’s certainly some interesting and helpful information to be gleaned from this data.> Cheers, Gershom

On November 18, 2018 at 2:55:10 PM, Taylor Fausak (taylor@fausak.me) wrote:>>

...
Ok, I updated the function that checks for bad responses, re-ran the script, and updated the announcement along with all the assets (charts, tables, and CSV). Hopefully it's the last time, as I can't justify spending much more time on this.>> https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4ce6a82a8b1a73f/_posts/2018-11-18-2018-state-of-haskell-survey-results.markdown>>

On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote:

...
Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report.>>>

...
On 18 Nov 2018, at 21:17, Taylor Fausak wrote:

Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java.>>>> That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses.>>>>

On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:

...
What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?>>>>> The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.>>>>>

Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?>>>>> Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.>>>>> Is either library likely to be influenced by this survey? _________________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community>>>>

Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community>>

Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community>>

Richard Eisenberg

11:20 p.m.

I have not analyzed the data myself, but I wonder how we jumped to the conclusion that the troll was trying to promote Stack. Is there statistical data that supports that conclusion? For example, just reading this thread, it sounds like the bogus responses also really don't like the new release schedule. Maybe the troll wants the old release schedule back and was just lazy about programming the tool to vary the stack/cabal question answers adequately. Given the contention around cabal vs stack, I agree that sociological concerns suggest that the troll meant to tilt those scales. But I wouldn't want a public accusation without at least some statistical analysis that independently supports that conclusion. In any case, thanks to all for putting this together! Richard

...

On Nov 18, 2018, at 4:31 PM, Taylor Fausak wrote:

Oops, the ordering of the answer choices is manual because some questions have a natural order while others should just be most to least popular. I've made another run through to make sure everything is sorted properly. I'll probably hit publish in the next half hour or so unless there are any objections.

https://github.com/tfausak/tfausak.github.io/blob/fce97d07c369856d4c05b756c4... https://github.com/tfausak/tfausak.github.io/blob/fce97d07c369856d4c05b756c4...

On Sun, Nov 18, 2018, at 3:07 PM, Gershom B wrote:

...
The language extensions section doesn’t appear to be sorted properly. Outside of that, I think that these results are looking much better and any effort to find any additional outliers is probably not worth it for the moment. Thanks for your work on this, and I appreciate you being responsive and attentive when problems with the data were pointed out. There’s certainly some interesting and helpful information to be gleaned from this data.

Cheers, Gershom

On November 18, 2018 at 2:55:10 PM, Taylor Fausak (taylor@fausak.me mailto:taylor@fausak.me) wrote:

...
Ok, I updated the function that checks for bad responses, re-ran the script, and updated the announcement along with all the assets (charts, tables, and CSV). Hopefully it's the last time, as I can't justify spending much more time on this.

https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4... https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4...

On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote:

...
Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report.

...
On 18 Nov 2018, at 21:17, Taylor Fausak mailto:taylor@fausak.me> wrote:

Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java.

That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses.

On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:

...
What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?

The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.

Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?

Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.

Is either library likely to be influenced by this survey? _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org mailto:Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org mailto:Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org mailto:Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Chris Smith

11:56 p.m.

...

For example, just reading this thread, it sounds like the bogus responses also really don't like the new release schedule. Maybe the troll wants the old release schedule back and was just lazy about programming the tool to vary the stack/cabal question answers adequately.

There is another scenario, though, which should caution against making official statements about motivation. There was a set of people who worked very hard while the survey was open to preemptively cast doubt on its motivation and goals. It may be that someone was mainly attempting to sabotage the survey results themselves, rather than taking a side in any specific dispute. Of course, had the results been published claiming that a mere 12% of Haskellers use Cabal, it would have been immediately dismissed by many people as obviously biased, which would have achieved that goal, too. I think Taylor's post handled this well, saying what we know to be true, that the attack targeted divisive issues, but without drawing unnecessary conclusions. On Sun, Nov 18, 2018 at 11:21 PM Richard Eisenberg wrote:

...

I have not analyzed the data myself, but I wonder how we jumped to the conclusion that the troll was trying to promote Stack. Is there statistical data that supports that conclusion? For example, just reading this thread, it sounds like the bogus responses also really don't like the new release schedule. Maybe the troll wants the old release schedule back and was just lazy about programming the tool to vary the stack/cabal question answers adequately.

Given the contention around cabal vs stack, I agree that sociological concerns suggest that the troll meant to tilt those scales. But I wouldn't want a public accusation without at least some statistical analysis that independently supports that conclusion.

In any case, thanks to all for putting this together!

Richard

On Nov 18, 2018, at 4:31 PM, Taylor Fausak wrote:

Oops, the ordering of the answer choices is manual because some questions have a natural order while others should just be most to least popular. I've made another run through to make sure everything is sorted properly. I'll probably hit publish in the next half hour or so unless there are any objections.

https://github.com/tfausak/tfausak.github.io/blob/fce97d07c369856d4c05b756c4...

On Sun, Nov 18, 2018, at 3:07 PM, Gershom B wrote:

The language extensions section doesn’t appear to be sorted properly. Outside of that, I think that these results are looking much better and any effort to find any additional outliers is probably not worth it for the moment. Thanks for your work on this, and I appreciate you being responsive and attentive when problems with the data were pointed out. There’s certainly some interesting and helpful information to be gleaned from this data.

Cheers, Gershom

On November 18, 2018 at 2:55:10 PM, Taylor Fausak (taylor@fausak.me) wrote:

Ok, I updated the function that checks for bad responses, re-ran the script, and updated the announcement along with all the assets (charts, tables, and CSV). Hopefully it's the last time, as I can't justify spending much more time on this.

https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4...

On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote:

Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report.

On 18 Nov 2018, at 21:17, Taylor Fausak wrote:

Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java.

That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses.

On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:

What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?

The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.

Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?

Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.

Is either library likely to be influenced by this survey? *_______________________________________________* Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Richard Eisenberg

11:58 p.m.

...

On Nov 18, 2018, at 11:56 PM, Chris Smith wrote:

I think Taylor's post handled this well, saying what we know to be true, that the attack targeted divisive issues, but without drawing unnecessary conclusions.

I agree 100%. I hadn't read the post before writing my email on this thread. Thanks, Taylor. Richard

Francesco Ariis

19 Nov 19 Nov

12:03 a.m.

Hello Richard, On Sun, Nov 18, 2018 at 11:20:52PM -0500, Richard Eisenberg wrote:

...

I have not analyzed the data myself, but I wonder how we jumped to the conclusion that the troll was trying to promote Stack. Is there statistical data that supports that conclusion? For example, just reading this thread, it sounds like the bogus responses also really don't like the new release schedule. Maybe the troll wants the old release schedule back and was just lazy about programming the tool to vary the stack/cabal question answers adequately.

If you filter the results for the (impossible) "linux/mac AND notepad++" combination, you can check the pattern-of-action of the troll. Every demographic question is skipped; every "write in" answer is skipped; all the other questions are filled in with a random value, bar the "build tools" one and the "release schedule" one, both having a constant value.

Gershom B

12:06 a.m.

On Sun, Nov 18, 2018 at 11:20 PM Richard Eisenberg wrote:

...

I have not analyzed the data myself, but I wonder how we jumped to the conclusion that the troll was trying to promote Stack. Is there statistical data that supports that conclusion? For example, just reading this thread, it sounds like the bogus responses also really don't like the new release schedule. Maybe the troll wants the old release schedule back and was just lazy about programming the tool to vary the stack/cabal question answers adequately.

Roughly 90% of the bogus responses disliked the new ghc schedule and 10% left the answer blank. As far as I know, 100% of the bogus responses said they used stack exclusively. The answers to almost every other question (except, I think, for targeted platform?) varied significantly (although according to either uniform, linear, or normal distributions for the most part). So as guesses go, this seems pretty strong. I will also say, though there's speculation about "false flags" and other silliness floating around that I personally have a very good guess as to who did this. There's one well-known troll who has these preoccupations and is known for creating serial sockpuppet accounts, and is just the right amount of obsessed to do something like this. A few of the bogus responses actually had comments, and the comments were all written in a voice that was unmistakeable as this troll as well. Occam's razor seems to apply. Finally, let me add why I don't think this was a "false flag" -- while there were enough telltale markers that the fake answers could seem to be detected, I don't think this was on purpose. There was _too much_ effort put into distributions of other choices, etc. If they had wanted the fakes to be detected they would have left much stronger evidence. Rather, from a forensic standpoint, this seems pretty clear to me that the pattern of data is of someone _trying_ to cover their tracks, but just making four or five errors which I could assemble into a pattern. If they hadn't made those errors -- likely based on bad priors about what the organic data would be that theirs would need to "mesh" into -- then I think the deception would have been much harder to detect. --Gershom

...

Given the contention around cabal vs stack, I agree that sociological concerns suggest that the troll meant to tilt those scales. But I wouldn't want a public accusation without at least some statistical analysis that independently supports that conclusion.

In any case, thanks to all for putting this together!

Richard

On Nov 18, 2018, at 4:31 PM, Taylor Fausak wrote:

Oops, the ordering of the answer choices is manual because some questions have a natural order while others should just be most to least popular. I've made another run through to make sure everything is sorted properly. I'll probably hit publish in the next half hour or so unless there are any objections.

https://github.com/tfausak/tfausak.github.io/blob/fce97d07c369856d4c05b756c4...

On Sun, Nov 18, 2018, at 3:07 PM, Gershom B wrote:

The language extensions section doesn’t appear to be sorted properly. Outside of that, I think that these results are looking much better and any effort to find any additional outliers is probably not worth it for the moment. Thanks for your work on this, and I appreciate you being responsive and attentive when problems with the data were pointed out. There’s certainly some interesting and helpful information to be gleaned from this data.

Cheers, Gershom

On November 18, 2018 at 2:55:10 PM, Taylor Fausak (taylor@fausak.me) wrote:

Ok, I updated the function that checks for bad responses, re-ran the script, and updated the announcement along with all the assets (charts, tables, and CSV). Hopefully it's the last time, as I can't justify spending much more time on this.

https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4...

On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote:

Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report.

On 18 Nov 2018, at 21:17, Taylor Fausak wrote:

Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java.

That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses.

On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:

What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?

The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.

Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?

Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.

Is either library likely to be influenced by this survey? _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Richard Eisenberg

12:16 a.m.

OK. Thanks for sharing some statistics. I'm now convinced as to the characterization of the attack. I'm still glad for how the public post diplomatically handled this.

...

I will also say, though there's speculation about "false flags" and

Oof. That thought never crossed my mind. I can only imagine this is on some social media where I don't participate. Every day, I am more and more pleased with my non-presence on most social media. :) Besides, just keeping up with email is enough of a challenge. Thanks for the clarification. Richard

...

other silliness floating around that I personally have a very good guess as to who did this. There's one well-known troll who has these preoccupations and is known for creating serial sockpuppet accounts, and is just the right amount of obsessed to do something like this. A few of the bogus responses actually had comments, and the comments were all written in a voice that was unmistakeable as this troll as well. Occam's razor seems to apply.

Finally, let me add why I don't think this was a "false flag" -- while there were enough telltale markers that the fake answers could seem to be detected, I don't think this was on purpose. There was _too much_ effort put into distributions of other choices, etc. If they had wanted the fakes to be detected they would have left much stronger evidence. Rather, from a forensic standpoint, this seems pretty clear to me that the pattern of data is of someone _trying_ to cover their tracks, but just making four or five errors which I could assemble into a pattern. If they hadn't made those errors -- likely based on bad priors about what the organic data would be that theirs would need to "mesh" into -- then I think the deception would have been much harder to detect.

--Gershom

...
Given the contention around cabal vs stack, I agree that sociological concerns suggest that the troll meant to tilt those scales. But I wouldn't want a public accusation without at least some statistical analysis that independently supports that conclusion.

In any case, thanks to all for putting this together!

Richard

On Nov 18, 2018, at 4:31 PM, Taylor Fausak wrote:

Oops, the ordering of the answer choices is manual because some questions have a natural order while others should just be most to least popular. I've made another run through to make sure everything is sorted properly. I'll probably hit publish in the next half hour or so unless there are any objections.

https://github.com/tfausak/tfausak.github.io/blob/fce97d07c369856d4c05b756c4...

On Sun, Nov 18, 2018, at 3:07 PM, Gershom B wrote:

The language extensions section doesn’t appear to be sorted properly. Outside of that, I think that these results are looking much better and any effort to find any additional outliers is probably not worth it for the moment. Thanks for your work on this, and I appreciate you being responsive and attentive when problems with the data were pointed out. There’s certainly some interesting and helpful information to be gleaned from this data.

Cheers, Gershom

On November 18, 2018 at 2:55:10 PM, Taylor Fausak (taylor@fausak.me) wrote:

Ok, I updated the function that checks for bad responses, re-ran the script, and updated the announcement along with all the assets (charts, tables, and CSV). Hopefully it's the last time, as I can't justify spending much more time on this.

https://github.com/tfausak/tfausak.github.io/blob/6f9991758ffeed085c45dd97e4...

On Sun, Nov 18, 2018, at 2:32 PM, Michael Snoyman wrote:

Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report.

On 18 Nov 2018, at 21:17, Taylor Fausak wrote:

Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java.

That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses.

On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote:

What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year?

The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary.

Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels?

Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats.

Is either library likely to be influenced by this survey? _______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

_______________________________________________ Haskell-community mailing list Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

Simon Peyton Jones

5:51 a.m.

Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report. I'd like to add +1 to that. It's a source of astonishment, and some dismay, to me that anyone would go to so much trouble to affect a survey about Haskell. (Brexit, perhaps, but Haskell??) But many thanks to Gershom and Taylor for dealing with it so professionally. Simon From: Haskell-community On Behalf Of Michael Snoyman Sent: 18 November 2018 19:32 To: Taylor Fausak Cc: haskell-community@haskell.org Subject: Re: [Haskell-community] 2018 state of Haskell survey results Just wanted to add in: good catch Gershom on identifying the problem, and thank you Taylor for working to remove them from the report. On 18 Nov 2018, at 21:17, Taylor Fausak mailto:taylor@fausak.me> wrote: Great catch, Gershom! There are indeed about 300 responses that tick all the boxes except for disliking the new GHC release schedule. The main thing the attacker seemed to be interested in was over-representing Stack and Stackage. Also, bizarrely, Java. That brings the number of bogus responses up to 3,735, which puts the number of legitimate responses at 1,361. For context, last year's survey asked far fewer questions and had 1,335 responses. On Sun, Nov 18, 2018, at 1:26 PM, Imants Cekusins wrote: What if the announcement mentioned a large number of potentially bogus responses, explained the grounds for this conclusion, with a new survey conducted early next year? The next survey would then need to be done differently from this one somehow. To improve the reliability, some authentication may be necessary. Maybe Stack, Cabal questions could be grouped as separate distinct surveys, conducted by their maintainers through own channels? Not sure how much value is in exact numbers of users of Stack or Cabal. Both groups are large enough. The maintainers of both groups are aware about usage stats. Is either library likely to be influenced by this survey? _______________________________________________ Haskell-community mailing list Haskell-community@haskell.orgmailto:Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community _______________________________________________ Haskell-community mailing list Haskell-community@haskell.orgmailto:Haskell-community@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-community

2374

Age (days ago)

2375

Last active (days ago)

List overview

Download

23 comments

9 participants

participants (9)

Chris Smith
Francesco Ariis
Gershom B
Imants Cekusins
Michael Snoyman
Richard Eisenberg
Simon Marlow
Simon Peyton Jones
Taylor Fausak