2 views

Skip to first unread message

Nov 21, 1999, 3:00:00 AM11/21/99

to

The authors have agreed to let me release the full statistics for the

competition; you can find them at

http://www.textfire.com/comp99/results.html.

competition; you can find them at

http://www.textfire.com/comp99/results.html.

The statistics include the top and bottom scores and the standard

deviation of the scores that each game received, and the number of

votes cast for each game.

Stephen

--

Stephen Granade | Interested in adventure games?

sgra...@phy.duke.edu | Visit About.com's IF Page

Duke University, Physics Dept | http://interactfiction.about.com

Nov 22, 1999, 3:00:00 AM11/22/99

to

There looks to be some twinkish reviewing going on with every game receiving at

least one '1' and many of those at the bottom getting '9' or 10'. Would the

results have been much different throwing out the top and bottom 20% or so

similar to gymnastics or diving judging? Maybe it's the middle 5/7 something

to that effect. It's a good idea in general to cut off the ends if you have

sufficient votes to work with.

least one '1' and many of those at the bottom getting '9' or 10'. Would the

results have been much different throwing out the top and bottom 20% or so

similar to gymnastics or diving judging? Maybe it's the middle 5/7 something

to that effect. It's a good idea in general to cut off the ends if you have

sufficient votes to work with.

Pax,

Ron

Nov 22, 1999, 3:00:00 AM11/22/99

to

Hmm. Convince me of why.

I don't have any problem with a few people being very generous or very

grumpy. That's what averaging is for.

(I wanted to add that if one person gives every game a 1, it doesn't

affect the final scores at all. That's not quite true; it

disproportionately affects games that fewer people voted on. But the

skewing is still small.)

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the

borogoves..."

Nov 22, 1999, 3:00:00 AM11/22/99

to

In article <81c6b2$tr5$1...@nntp6.atl.mindspring.net> Andrew Plotkin <erky...@netcom.com> writes:

>Ron Moore <hum...@aol.com> wrote:

>> There looks to be some twinkish reviewing going on with every game receiving

>> at least one '1' and many of those at the bottom getting '9' or 10'. Would

>> the results have been much different throwing out the top and bottom 20% or

>> so similar to gymnastics or diving judging? Maybe it's the middle 5/7

>> something to that effect. It's a good idea in general to cut off the ends

>> if you have sufficient votes to work with.

>Ron Moore <hum...@aol.com> wrote:

>> There looks to be some twinkish reviewing going on with every game receiving

>> at least one '1' and many of those at the bottom getting '9' or 10'. Would

>> the results have been much different throwing out the top and bottom 20% or

>> so similar to gymnastics or diving judging? Maybe it's the middle 5/7

>> something to that effect. It's a good idea in general to cut off the ends

>> if you have sufficient votes to work with.

Yes, I believe that's how it's done in sports, they drop the one highest and

the one lowest score and then average the remaining ones together. I'm not

sure if this is always statistically valid, but there's probably some

justification for it (e.g., one judge rating all the competitors from his/her

country higher than anyone else).

>Hmm. Convince me of why.

Well, if you don't mind some book quoting, I'll give an example. :) The

book is _An_Introduction_To_Error_Analysis_ by John R. Taylor, chapter 6.

Actually, this will be a paraphrasing because I just have my notes on the

book, not the actual book (I don't own it). He uses Chauvenet's criterion

for rejection of data points and demonstrates its use by example. I'm going

to assume you understand the basics talked about in here and not explain

everything; if you don't and want more, tell me and I should be able to help.

Say you have six measurements: 3.8, 3.5, 3.9, 3.9, 3.4, 1.8 and all are

legitimate. Then the average/mean is 3.4 and the standard deviation (sigma)

is 0.8. The 1.8 measurement differs from the mean (3.4) by 1.6 or two

standard deviations. Using Gaussians, the probability of a measurement being

outside 2*sigma is P(outside 2*sigma) = 1 - P(inside 2*sigma) or 1-0.95=0.05;

i.e., 5% or 1 in 20 measurements. With only six measurements we expect

0.05*6=0.3 or 1/3 of a measurement as bad as the 1.8 observed. If 1/3 of a

measurement is considered "ridiculously improbable" then we can reject the

1.8 reading.

Chauvenet's criterion, as normally given, states that if the expected number

of measurements at least as bad as the suspect measurement is less than 1/2,

then the suspect measurement should be rejected. Obviously the choice of

1/2 is arbitrary; but it is also reasonable and can be defended.

No, I don't have how the 1/2 can be defended. :)

>I don't have any problem with a few people being very generous or very

>grumpy. That's what averaging is for.

True, but good statistics does more than just a straight average. I'm not

claiming to be a statistics master or anything, but if you've got some

suspect deviant points and a valid reason to reject them (e.g., Chauvenet's

criterion), it's probably better to toss them out. But you do need to have

a valid reason for rejecting the data, be it purely mathematical/statistical

as you might do for voting like this or some physical reason in the case of

say some scientific research. You can't just toss data you don't like :)

but if you have a good reason for rejecting it, you probably should.

>(I wanted to add that if one person gives every game a 1, it doesn't

>affect the final scores at all. That's not quite true; it

>disproportionately affects games that fewer people voted on. But the

>skewing is still small.)

I think that would depend on the number of votes you're talking about.

I don't know what normal is for comp games, but the smaller the number of

votes, the greater the skewing will be due to an effect like you mention.

Quickie example:

50 votes at 7 points and 1 vote at 1 point: average = 6.882

20 votes at 7 points and 1 vote at 1 point: average = 6.714

10 votes at 7 points and 1 vote at 1 point: average = 6.455

5 votes at 7 points and 1 vote at 1 point: average = 6.000

Ironically, the last choice has 6 data points and 1 suspect point 2*sigma

away from the mean (average) so it fits the example of Chauvenet's criterion

above and can be thrown out. Note that down at that small a number of

votes (6) the one deviant point has dropped the average by one full point

by being included; that could be highly significant in the voting. With high

enough numbers of votes like 51, that one 1 point vote is almost certainly

statistically acceptable and could be kept in that case. For the middle

ranges of votes, you'd probably have to check to see if the one 1 vote is

statistically acceptable. As you may have gathered, I don't know the

details of the competition voting, but hopefully this post is of some use

in the discussion about rejecting bad data. :)

--

. . . . -- James Marshall (CAS) . .

,. -- )-- , , . -- )-- , mars...@astro.umd.edu ., .

' ' http://www.astro.umd.edu/~marshall

"Equations are living things." .

Nov 22, 1999, 3:00:00 AM11/22/99

to

James Marshall wrote:

> 5 votes at 7 points and 1 vote at 1 point: average = 6.000

> Ironically, the last choice has 6 data points and 1 suspect point 2*sigma

> away from the mean (average) so it fits the example of Chauvenet's criterion

> above and can be thrown out. Note that down at that small a number of

> votes (6) the one deviant point has dropped the average by one full point

> by being included; that could be highly significant in the voting. With high

> enough numbers of votes like 51, that one 1 point vote is almost certainly

> statistically acceptable and could be kept in that case. For the middle

> ranges of votes, you'd probably have to check to see if the one 1 vote is

> statistically acceptable. As you may have gathered, I don't know the

> details of the competition voting, but hopefully this post is of some use

> in the discussion about rejecting bad data. :)

> 5 votes at 7 points and 1 vote at 1 point: average = 6.000

> Ironically, the last choice has 6 data points and 1 suspect point 2*sigma

> away from the mean (average) so it fits the example of Chauvenet's criterion

> above and can be thrown out. Note that down at that small a number of

> votes (6) the one deviant point has dropped the average by one full point

> by being included; that could be highly significant in the voting. With high

> enough numbers of votes like 51, that one 1 point vote is almost certainly

> statistically acceptable and could be kept in that case. For the middle

> ranges of votes, you'd probably have to check to see if the one 1 vote is

> statistically acceptable. As you may have gathered, I don't know the

> details of the competition voting, but hopefully this post is of some use

> in the discussion about rejecting bad data. :)

The problem I have with this analysis is that it confuses two completely

different domains.

In the domain of scientific measurement this sort of logic is perfectly

reasonable. There's some single, objective truth out there and you're

trying to find out what it is. Measurements that disagree with the

facts are just plain wrong, and should be discarded.

In this context though, we're talking about artistic judgement, and here

there is no single, objective truth. There is no objective basis for

saying "this opinion is wrong".

Instead, applying your analysis we would end up saying that the sixth

person's opinion is "bad" and not "acceptable" purely and simply because

he strongly disagrees with the other five people. This is a form of

argument that strikes me as being itself unacceptable.

--

"To summarize the summary of the summary: people are a problem."

Russell Wallace

mano...@iol.ie

Nov 22, 1999, 3:00:00 AM11/22/99

to

Your analysis is good for certain systems but it doesn't apply to

opinions. An opinion can not be an erroneous data point because it

is... an opinion.

opinions. An opinion can not be an erroneous data point because it

is... an opinion.

-Jim Power

James Marshall wrote:

>

>

Lot's of stuff on statistics.

Nov 22, 1999, 3:00:00 AM11/22/99

to

Andrew Plotkin wrote:

> (I wanted to add that if one person gives every game a 1, it doesn't

> affect the final scores at all. That's not quite true; it

> disproportionately affects games that fewer people voted on. But the

> skewing is still small.)

Even if all games have the same number of votes, higher rated games will

be proportionately affected more than lower rated games. The order of

the results won't change but the relative scoring will. A game scoring

an 8 is rated (appreciated) twice as high as a game scoring a 4. After

adding the *ones* to the calculation the first game will be rated *less*

than twice as high as the other one. (Naturally, the skewing will again

be small)

Martijn

Nov 23, 1999, 3:00:00 AM11/23/99

to

>In this context though, we're talking about artistic judgement, and here

>there is no single, objective truth. There is no objective basis for

>saying "this opinion is wrong".

I agree with this, but. . .>there is no single, objective truth. There is no objective basis for

>saying "this opinion is wrong".

>Instead, applying your analysis we would end up saying that the sixth

>person's opinion is "bad" and not "acceptable" purely and simply because

>he strongly disagrees with the other five people. This is a form of

>argument that strikes me as being itself unacceptable.

. . .we also have to consider that our goal is to rank the games based on

how much the judges, as a group, liked them. The issue is: how do we arrive at

the group decision, given only the individual decisions. The current answer

is: average them.

I'm still pondering over this myself.

From,

Brendan B. B. (Bren...@aol.com)

(Name in header has spam-blocker, use the address above instead.)

"Do not follow where the path may lead;

go, instead, where there is no path, and leave a trail."

--Author Unknown

Nov 23, 1999, 3:00:00 AM11/23/99

to

BrenBarn wrote:

> . . .we also have to consider that our goal is to rank the games based on

> how much the judges, as a group, liked them. The issue is: how do we arrive at

> the group decision, given only the individual decisions. The current answer

> is: average them.

> . . .we also have to consider that our goal is to rank the games based on

> how much the judges, as a group, liked them. The issue is: how do we arrive at

> the group decision, given only the individual decisions. The current answer

> is: average them.

Yep.

It seems to me that whatever method is used must give equal weight to

every judge's opinion; I can't think of any better way than averaging to

do that. (Doesn't necessarily mean there isn't one, of course.)

Nov 23, 1999, 3:00:00 AM11/23/99

to

Martijn <m.r.e...@palm.a2000.nl> wrote:

> Andrew Plotkin wrote:

>

>> (I wanted to add that if one person gives every game a 1, it doesn't

>> affect the final scores at all. That's not quite true; it

>> disproportionately affects games that fewer people voted on. But the

>> skewing is still small.)

>

> Even if all games have the same number of votes, higher rated games will

> be proportionately affected more than lower rated games. The order of

> the results won't change but the relative scoring will.

> Andrew Plotkin wrote:

>

>> (I wanted to add that if one person gives every game a 1, it doesn't

>> affect the final scores at all. That's not quite true; it

>> disproportionately affects games that fewer people voted on. But the

>> skewing is still small.)

>

> Even if all games have the same number of votes, higher rated games will

> be proportionately affected more than lower rated games. The order of

> the results won't change but the relative scoring will.

I don't think anything *matters* except the order of the results.

I haven't heard anyone say "Yay, I got a 6.43, that's nearly twice as high

as _Pass The Banana_!" Most years, in fact, the numbers haven't even been

released. Only the rankings.

Nov 23, 1999, 3:00:00 AM11/23/99

to

Andrew Plotkin wrote:

>

> Martijn <m.r.e...@palm.a2000.nl> wrote:

> > Andrew Plotkin wrote:

> >

> >> (I wanted to add that if one person gives every game a 1, it doesn't

> >> affect the final scores at all. That's not quite true; it

> >> disproportionately affects games that fewer people voted on. But the

> >> skewing is still small.)

> >

> > Even if all games have the same number of votes, higher rated games will

> > be proportionately affected more than lower rated games. The order of

> > the results won't change but the relative scoring will.

>

> I don't think anything *matters* except the order of the results.

>

> I haven't heard anyone say "Yay, I got a 6.43, that's nearly twice as high

> as _Pass The Banana_!" Most years, in fact, the numbers haven't even been

> released. Only the rankings.

>

> Martijn <m.r.e...@palm.a2000.nl> wrote:

> > Andrew Plotkin wrote:

> >

> >> (I wanted to add that if one person gives every game a 1, it doesn't

> >> affect the final scores at all. That's not quite true; it

> >> disproportionately affects games that fewer people voted on. But the

> >> skewing is still small.)

> >

> > Even if all games have the same number of votes, higher rated games will

> > be proportionately affected more than lower rated games. The order of

> > the results won't change but the relative scoring will.

>

> I don't think anything *matters* except the order of the results.

>

> I haven't heard anyone say "Yay, I got a 6.43, that's nearly twice as high

> as _Pass The Banana_!" Most years, in fact, the numbers haven't even been

> released. Only the rankings.

Agree, it's more of a theoretical than a practical argument. On the

other hand, if such a 1 rater emerges during the 99 competition, the

games will rate too low relative to comp 98 games. It may reflect bad on

the quality of comp 99. Worse, it may also (partly) hide the progress in

the writing of a person who entered in both comps. Of course with about

100 voters a game it will be only a minor distortion so perhaps again a

mainly theoretical argument but it will still yield a not desirable

result.

Martijn

Nov 23, 1999, 3:00:00 AM11/23/99

to

The reason you cut off the tails in subjective judging is not for fear of the

aggregate average scores being inflated or deflated - but to counter abuse and

bias with regard to specific competitors.

aggregate average scores being inflated or deflated - but to counter abuse and

bias with regard to specific competitors.

There's always the potential of a disgruntled author or someone who's just

pissed at some of the authors to sabotage their rankings with a 1 or the

opposite case handing out free 10's to friends. These charity scores have much

less effect on the top rated games than the first case, which is more typical

anyway.

Not that the sky is falling and there's a bunch of sociopathic reviewers out

there... but it's common practice with any subjective sample of decent size to

account for human nature. The farther a vote is from the mean, the more weight

it holds so the top rated games are the most vulnerable to losing places. To

finish with a 7 average you need three 9's or six 8's to make up for a 1 vote.

Either way, that's a lot of weight for the top contenders to have to give up.

It would take spamming to get a bad game into the top echelon that should be

very easy to spot.

It would be very interesting to see graphs of the top ten and compare them with

bell curves or the graph of all_votes.

Just because everyone can vote does not mean that all votes are equal.

Pax,

Ron

Nov 23, 1999, 3:00:00 AM11/23/99

to

Martijn <m.r.e...@palm.a2000.nl> wrote:

> Agree, it's more of a theoretical than a practical argument. On the

> other hand, if such a 1 rater emerges during the 99 competition, the

> games will rate too low relative to comp 98 games. It may reflect bad on

> the quality of comp 99.

I suspect most judges don't have an absolute judging scale that they use every

year exactly the same way. For example, I decided to boost several scores

after I'd finished playing this year because my highest score was low enough

that I could split up some of the games I'd played earlier in the judging

and had ranked as (say) 5's, since I didn't know I'd have room to give the

better ones 6. Nothing this year (for me) was competing against, say,

Photopia except that several made me think "Photopia did this sort of thing

better last year, and I wasn't that wild about it then. This isn't going to

score that well." However, comparisons to IF that I'd played in the past

occurred to me every year I've judged the comp, and not just with older comp

games. (I don't even recall what the exact score I gave Photopia was, which

would kill an absolute scale right there).

Ja, mata

--

Kevin Lighton lig...@bestweb.net or shin...@operamail.com

http://members.tripod.com/~shinma_kl/main.html

"Townsfolk can get downright touchy over the occasional earth-elemental in

the scullery. Can't imagine why..." Quenten _Winds of Fate_

Nov 23, 1999, 3:00:00 AM11/23/99

to

david lynch <dfly...@louisville.edu> wrote in message

news:81eb4...@news1.newsguy.com...

news:81eb4...@news1.newsguy.com...

> least. It's probably because hardly anybody voted on it. The other

> entries tended to get around 100 votes apiece.

Except the MS-DOS ones (mine included) which received around 50 votes (less

than half of what looks like the "average" number of votes). This reinforces

something people have periodically stated: the PC-Only games reach a smaller

audiance, at least within the scope of R.*.I-F and the competition.

Mike.

Nov 23, 1999, 3:00:00 AM11/23/99

to

Ron Moore <hum...@aol.com> wrote in message

news:19991123070359...@ng-cj1.aol.com...

news:19991123070359...@ng-cj1.aol.com...

> Just because everyone can vote does not mean that all votes are equal.

At cgi-resources.com (where a couple of my scripts are high on their

respective lists) they disregard the top 10% and bottom 10% of the votes.

This probably arose at some point from looking at the stats and making some

comparisons between the votes from different people -- or, it might have

just been a safeguard. I don't know what the end result is.

For the IF competition, I don't think it's necessary. I'm willing to accept

extreme high and low votes as part of the overall score, because I have to

think that people casting those votes honestly had that high or low of an

opinion of my game. I could be naive.

If I had wanted to win the competition, it would have been easy enough to

do. My online Lunatix game has 300 players (at present). By posting this

announcement:

"Hey guys! Get TWO FREE MONTHS of play time. All you have to do is to go

www.textfire.com and download the IF-Competition games, play at least 5 and

vote on them. Make sure you play "The Insanity Circle" and help support us!

Let us know after you've voted, and we'll add the free time to your

account."

I win.

But that would have been asinine and would have defeated the reason I was

entering the contest - to get honest feedback and see how well my game

compares overall to the others. I suspect this is important to the other

authors as well. Many entered under an alias or anonymous for this reason.

The IF competition isn't a Mr./Mrs. Popularity contest, and if it were, it

would be pointless.

I have to think that most (if not all) votes were honest. With that in mind,

I wouldn't want to ignore those opinions, however bad (or good) they might

be. Now, if there actually *was* vote-fixing going on, then that's another

problem entirely and I'm not sure desregarding the bottom/top extremes would

be the right solution (not sure what it *would* be though).

Mike.

Nov 24, 1999, 3:00:00 AM11/24/99

to

On Tue, 23 Nov 1999, Mike Snyder wrote:

> For the IF competition, I don't think it's necessary. I'm willing to accept

> extreme high and low votes as part of the overall score, because I have to

> think that people casting those votes honestly had that high or low of an

> opinion of my game. I could be naive.

I'd think, from an author's point of view, the most interesting statistics

would be the frequency distribution (i.e. how many people gave the game a

10, how many gave it a 9, etc.).

There's a big difference between "most people thought my game was a 7" and

"half the people thought it was a 10 and half thought it was a 4".

SMTIRCAHIAGEHLT

Nov 24, 1999, 3:00:00 AM11/24/99

to

That's what the standard distribution represents, if you're willing to

accept it condensed down to a single number.

Nov 24, 1999, 3:00:00 AM11/24/99

to

Andrew Plotkin <erky...@netcom.com> wrote in message

news:81gu8q$nik$3...@nntp5.atl.mindspring.net...

news:81gu8q$nik$3...@nntp5.atl.mindspring.net...

> That's what the standard distribution represents, if you're willing to

> accept it condensed down to a single number.

That's something I was trying to figure out. Math isn't my strong suit (at

all). If a game scored "6" and the Standard Deviation was "2" does that mean

the votes were typically 4 to 8 with "6" being the average (2 on either

side) or was it 5 to 7 with a range of 2?

I think it would be interesting (although, maybe not feasable or ethical) to

see detailed scores (30 1's, 10 2's, 15 3's, etc).

Mike.

Nov 24, 1999, 3:00:00 AM11/24/99

to

In article <3839BF...@iol.ie>, Russell Wallace <mano...@iol.ie> wrote:

>Instead, applying your analysis we would end up saying that the sixth

>person's opinion is "bad" and not "acceptable" purely and simply because

>he strongly disagrees with the other five people. This is a form of

>argument that strikes me as being itself unacceptable.

>Instead, applying your analysis we would end up saying that the sixth

>person's opinion is "bad" and not "acceptable" purely and simply because

>he strongly disagrees with the other five people. This is a form of

>argument that strikes me as being itself unacceptable.

Hear, hear!

Throwing out outliers would be good practice if the distribution of

votes for a game were centred around an expectation value. But that is

an unwarranted assumption. In fact, some games are of the "either you

hate it or you love it" variety, and in that case you could expect a

distribution with two peaks. In the extreme case, a game would receive

only 1's and 10's.

--

Magnus Olsson (m...@df.lth.se, zeb...@pobox.com)

------ http://www.pobox.com/~zebulon ------

Nov 24, 1999, 3:00:00 AM11/24/99

to

Mike Snyder <mikes...@worldnet.att.net> wrote:

> Andrew Plotkin <erky...@netcom.com> wrote in message

> news:81gu8q$nik$3...@nntp5.atl.mindspring.net...

>

>> That's what the standard distribution represents, if you're willing to

>> accept it condensed down to a single number.

>

> That's something I was trying to figure out. Math isn't my strong suit (at

> all). If a game scored "6" and the Standard Deviation was "2" does that mean

> the votes were typically 4 to 8 with "6" being the average (2 on either

> side) or was it 5 to 7 with a range of 2?

> Andrew Plotkin <erky...@netcom.com> wrote in message

> news:81gu8q$nik$3...@nntp5.atl.mindspring.net...

>

>> That's what the standard distribution represents, if you're willing to

>> accept it condensed down to a single number.

>

> That's something I was trying to figure out. Math isn't my strong suit (at

> all). If a game scored "6" and the Standard Deviation was "2" does that mean

> the votes were typically 4 to 8 with "6" being the average (2 on either

> side) or was it 5 to 7 with a range of 2?

The standard deviation is a bit more complicated than that. Roughly, it

means that 70% of the votes were within 2 of the average (the 4-8 range).

And 95% of the votes were within *4* of the average, and 99% were within 6

of the average. It recognizes that there are outliers.

Of course, this is a pretty rough summary. (For one thing, we know

perfectly well that every vote was in the 1-10 range.) The standard

deviation implicitly assumes that the votes lie on a smooth bell curve,

including fractional values and values outside 1-10.

When they don't, the standard deviation is really fitting the best bell

curve it can to the actual votes.

Nov 24, 1999, 3:00:00 AM11/24/99

to

In article <81h62i$bsd$2...@nntp3.atl.mindspring.net>,

Andrew Plotkin <erky...@netcom.com> wrote:

>The standard deviation is a bit more complicated than that. Roughly, it

>means that 70% of the votes were within 2 of the average (the 4-8 range).

>And 95% of the votes were within *4* of the average, and 99% were within 6

>of the average. It recognizes that there are outliers.

>

>Of course, this is a pretty rough summary. (For one thing, we know

>perfectly well that every vote was in the 1-10 range.) The standard

>deviation implicitly assumes that the votes lie on a smooth bell curve,

>including fractional values and values outside 1-10.

Andrew Plotkin <erky...@netcom.com> wrote:

>The standard deviation is a bit more complicated than that. Roughly, it

>means that 70% of the votes were within 2 of the average (the 4-8 range).

>And 95% of the votes were within *4* of the average, and 99% were within 6

>of the average. It recognizes that there are outliers.

>

>Of course, this is a pretty rough summary. (For one thing, we know

>perfectly well that every vote was in the 1-10 range.) The standard

>deviation implicitly assumes that the votes lie on a smooth bell curve,

>including fractional values and values outside 1-10.

That's not true: the standard deviation doesn't assume anything about

the distribution, it's just a function of the votes. It's your

interpretation of the standard deviation which assumes a bell

curve. And since the votes aren't distributed on a bell curve, that

interpretation is misleading.

Let's just say that the standard deviation is a measure of how

spread-out the votes were; a standard deviation of zero means that all

the voters gave the game the same score, and a high standard deviation

means that they gave very different scores.

Nov 24, 1999, 3:00:00 AM11/24/99

to

In article <81hd6e$867$1...@bartlet.df.lth.se>,

Magnus Olsson <m...@bartlet.df.lth.se> wrote:

>Let's just say that the standard deviation is a measure of how

>spread-out the votes were; a standard deviation of zero means that all

>the voters gave the game the same score, and a high standard deviation

>means that they gave very different scores.

Magnus Olsson <m...@bartlet.df.lth.se> wrote:

>Let's just say that the standard deviation is a measure of how

>spread-out the votes were; a standard deviation of zero means that all

>the voters gave the game the same score, and a high standard deviation

>means that they gave very different scores.

For those who don't mind *a little* maths, an explanation follows:

You know how to compute the mean score for a game, right. Now,

most of the votes probably differs from the mean (unless all

the voters agreed 100%). So let's consider how much the votes

disagree from the mean.

Let's call the mean score M, and a particular vote v. Then the

difference between the vote and the mean is

v - M

But one problem with this is that this is a negative number if the

vote is lower than the mean. So let's square the number, so we

always get a positive result:

(v - M)^2

Now for the trick: take the average of this for all the votes. This is

called the variance, and is a measure of how much the voters disagreed

with each other. If all of them agreed, the variance is 0. If they

disagreed very much, it will be a large number (since most of the

(v - M)^2 numbers will be large).

But, I hear you say, what about the standard deviation? Well, the

variance has one problem, and that's to do with the squares. If,

for example, we measure the variance of the length of IF authros

(rather than the quality of their works), we'll get a variance which

is measured in square meters, i.e. an area. It's more convenient

to work with a number that has the same unit as the original numbers,

so we take the square root of the variance.

And that, dear reader, is the standard deviation.

Nov 24, 1999, 3:00:00 AM11/24/99

to

Magnus Olsson <m...@bartlet.df.lth.se> wrote in message

news:81hduo$i88$1...@bartlet.df.lth.se...

news:81hduo$i88$1...@bartlet.df.lth.se...

> And that, dear reader, is the standard deviation.

Thanks Magnus & Andrew for helping answer this. :)

Mike.

Nov 24, 1999, 3:00:00 AM11/24/99

to

In article <81h22v$pvs$1...@bgtnsc01.worldnet.att.net>,

Mike Snyder <mikes...@worldnet.att.net> wrote:

}Andrew Plotkin <erky...@netcom.com> wrote in message

}news:81gu8q$nik$3...@nntp5.atl.mindspring.net...

}

}> That's what the standard distribution represents, if you're willing to

}> accept it condensed down to a single number.

}

}That's something I was trying to figure out. Math isn't my strong suit (at

}all). If a game scored "6" and the Standard Deviation was "2" does that mean

}the votes were typically 4 to 8 with "6" being the average (2 on either

}side) or was it 5 to 7 with a range of 2?

Mike Snyder <mikes...@worldnet.att.net> wrote:

}Andrew Plotkin <erky...@netcom.com> wrote in message

}news:81gu8q$nik$3...@nntp5.atl.mindspring.net...

}

}> That's what the standard distribution represents, if you're willing to

}> accept it condensed down to a single number.

}

}That's something I was trying to figure out. Math isn't my strong suit (at

}all). If a game scored "6" and the Standard Deviation was "2" does that mean

}the votes were typically 4 to 8 with "6" being the average (2 on either

}side) or was it 5 to 7 with a range of 2?

Neither. Standard deviation is the square root of the average of the

squares of the differences of the scores from the mean. Say that

three times fast.

--

Matthew T. Russotto russ...@pond.com

"Extremism in defense of liberty is no vice, and moderation in pursuit

of justice is no virtue."

Nov 24, 1999, 3:00:00 AM11/24/99

to

Magnus Olsson wrote:

> But one problem with this is that this is a negative number if the

> vote is lower than the mean. So let's square the number, so we

> always get a positive result:

>

> (v - M)^2

>

I always wondered, why not just calculate some value X which is the

average of the *absolute* values of [Vi - M], (i = 1,2,3...n). Squaring

(and rooting) isn't necessary to avoid negative values. The resulting

average will be different from the variance but it will say something

about the distribution of the votes.

Also, when calculating variance, as a result of the squaring very high

and low votes have a higher weigh than votes close to the mean value.

The value X will avoid this problem. Isn't such a value more honest,

particularly when dealing with opinion values instead of scientific

observations? Is the variance figure more informative?

Martijn

Nov 24, 1999, 3:00:00 AM11/24/99

to

Magnus Olsson <m...@bartlet.df.lth.se> wrote:

> In article <81h62i$bsd$2...@nntp3.atl.mindspring.net>,

> Andrew Plotkin <erky...@netcom.com> wrote:

>>The standard deviation is a bit more complicated than that. Roughly, it

>>means that 70% of the votes were within 2 of the average (the 4-8 range).

>>And 95% of the votes were within *4* of the average, and 99% were within 6

>>of the average. It recognizes that there are outliers.

>>

>>Of course, this is a pretty rough summary. (For one thing, we know

>>perfectly well that every vote was in the 1-10 range.) The standard

>>deviation implicitly assumes that the votes lie on a smooth bell curve,

>>including fractional values and values outside 1-10.

>

> That's not true: the standard deviation doesn't assume anything about

> the distribution, it's just a function of the votes. It's your

> interpretation of the standard deviation which assumes a bell

> curve. And since the votes aren't distributed on a bell curve, that

> interpretation is misleading.

> In article <81h62i$bsd$2...@nntp3.atl.mindspring.net>,

> Andrew Plotkin <erky...@netcom.com> wrote:

>>The standard deviation is a bit more complicated than that. Roughly, it

>>means that 70% of the votes were within 2 of the average (the 4-8 range).

>>And 95% of the votes were within *4* of the average, and 99% were within 6

>>of the average. It recognizes that there are outliers.

>>

>>Of course, this is a pretty rough summary. (For one thing, we know

>>perfectly well that every vote was in the 1-10 range.) The standard

>>deviation implicitly assumes that the votes lie on a smooth bell curve,

>>including fractional values and values outside 1-10.

>

> That's not true: the standard deviation doesn't assume anything about

> the distribution, it's just a function of the votes. It's your

> interpretation of the standard deviation which assumes a bell

> curve. And since the votes aren't distributed on a bell curve, that

> interpretation is misleading.

Mmf. I grant the point.

I was trying to get at the idea that for a bell curve, the average and

S.D. tell you *exactly* how spread-out the samples are -- in fact, they

tell you everything about the samples, because there's only one normal

curve with a given average and S.D.

For a distribution which isn't a normal curve, the average and S.D. tell

you less.

But if that makes no sense to you, never mind. :-)

> Let's just say that the standard deviation is a measure of how

> spread-out the votes were; a standard deviation of zero means that all

> the voters gave the game the same score, and a high standard deviation

> means that they gave very different scores.

(Or, for more detail, Magnus's following post.)

Nov 25, 1999, 3:00:00 AM11/25/99

to

Matthew T. Russotto <russ...@wanda.vf.pond.com> wrote:

>Neither. Standard deviation is the square root of the average of the

>squares of the differences of the scores from the mean. Say that

>three times fast.

>Neither. Standard deviation is the square root of the average of the

>squares of the differences of the scores from the mean. Say that

>three times fast.

Gah. I'm having flashbacks to my Graph Theory class where the prof never

used the word "graph". It was "set of subsets of size 2 of a set".

Joe

Nov 25, 1999, 3:00:00 AM11/25/99

to

What did he say instead of "real number"?

Nov 25, 1999, 3:00:00 AM11/25/99

to

Andrew Plotkin <erky...@netcom.com> wrote:

>Joe Mason <jcm...@uwaterloo.ca> wrote:

>> Matthew T. Russotto <russ...@wanda.vf.pond.com> wrote:

>>>Neither. Standard deviation is the square root of the average of the

>>>squares of the differences of the scores from the mean. Say that

>>>three times fast.

>>

>> Gah. I'm having flashbacks to my Graph Theory class where the prof never

>> used the word "graph". It was "set of subsets of size 2 of a set".

>

>What did he say instead of "real number"?

>Joe Mason <jcm...@uwaterloo.ca> wrote:

>> Matthew T. Russotto <russ...@wanda.vf.pond.com> wrote:

>>>Neither. Standard deviation is the square root of the average of the

>>>squares of the differences of the scores from the mean. Say that

>>>three times fast.

>>

>> Gah. I'm having flashbacks to my Graph Theory class where the prof never

>> used the word "graph". It was "set of subsets of size 2 of a set".

>

>What did he say instead of "real number"?

She. And it was a theory course - we never dealt with real numbers at all.

Joe

Nov 25, 1999, 3:00:00 AM11/25/99

to

In article <383C652D...@palm.a2000.nl>,

Martijn <m.r.e...@palm.a2000.nl> wrote:

>> So let's square the number, so we always get a positive result: (v - M)^2

>

>I always wondered, why not just calculate some value X which is the

>average of the *absolute* values of [Vi - M], (i = 1,2,3...n).

Martijn <m.r.e...@palm.a2000.nl> wrote:

>> So let's square the number, so we always get a positive result: (v - M)^2

>

>I always wondered, why not just calculate some value X which is the

>average of the *absolute* values of [Vi - M], (i = 1,2,3...n).

You can, of course, do this though I forget what it's called. But it's

an absolute pig to make use of, arithmetically speaking.

regards, ct

Nov 25, 1999, 3:00:00 AM11/25/99

to

In article <81jg65$gb6$1...@xserver.sjc.ox.ac.uk>,

That's true. And the absolute value isn't differentiable at zero, so using

calculus is also a pig.

There are also more theoretical reasons. One is that the variance is just

one of an infinte set of "moments" where you consider the averages of

(vi - M), (vi - M)^2, (vi - M)^3, and so on.

Another is that the variance is additive: if you have two independent

stochastic vairables X and Y, then V(X + Y) = V(X) + V(Y).

Nov 26, 1999, 3:00:00 AM11/26/99

to

}Magnus Olsson wrote:

}

}> But one problem with this is that this is a negative number if the

}> vote is lower than the mean. So let's square the number, so we}

}> But one problem with this is that this is a negative number if the

}> always get a positive result:

}>

}> (v - M)^2

}>

}

}I always wondered, why not just calculate some value X which is the

}average of the *absolute* values of [Vi - M], (i = 1,2,3...n). Squaring

}(and rooting) isn't necessary to avoid negative values. The resulting

}average will be different from the variance but it will say something

}about the distribution of the votes.

}(and rooting) isn't necessary to avoid negative values. The resulting

}average will be different from the variance but it will say something

}about the distribution of the votes.

Probably because the absolute value isn't differentiable and thus

gives mathemeticians the creeps :-)

Nov 27, 1999, 3:00:00 AM11/27/99

to

In article <81hd6e$867$1...@bartlet.df.lth.se>,

m...@bartlet.df.lth.se (Magnus Olsson) wrote:

> In article <81h62i$bsd$2...@nntp3.atl.mindspring.net>,

> Andrew Plotkin <erky...@netcom.com> wrote:

> >The standard deviation is a bit more complicated than that. Roughly, it

> >means that 70% of the votes were within 2 of the average (the 4-8 range).

> >And 95% of the votes were within *4* of the average, and 99% were within 6

> >of the average. It recognizes that there are outliers.

> >

> >Of course, this is a pretty rough summary. (For one thing, we know

> >perfectly well that every vote was in the 1-10 range.) The standard

> >deviation implicitly assumes that the votes lie on a smooth bell curve,

> >including fractional values and values outside 1-10.

>

> That's not true: the standard deviation doesn't assume anything about

> the distribution, it's just a function of the votes. It's your

> interpretation of the standard deviation which assumes a bell

> curve. And since the votes aren't distributed on a bell curve, that

> interpretation is misleading.

>

m...@bartlet.df.lth.se (Magnus Olsson) wrote:

> In article <81h62i$bsd$2...@nntp3.atl.mindspring.net>,

> Andrew Plotkin <erky...@netcom.com> wrote:

> >The standard deviation is a bit more complicated than that. Roughly, it

> >means that 70% of the votes were within 2 of the average (the 4-8 range).

> >And 95% of the votes were within *4* of the average, and 99% were within 6

> >of the average. It recognizes that there are outliers.

> >

> >Of course, this is a pretty rough summary. (For one thing, we know

> >perfectly well that every vote was in the 1-10 range.) The standard

> >deviation implicitly assumes that the votes lie on a smooth bell curve,

> >including fractional values and values outside 1-10.

>

> That's not true: the standard deviation doesn't assume anything about

> the distribution, it's just a function of the votes. It's your

> interpretation of the standard deviation which assumes a bell

> curve. And since the votes aren't distributed on a bell curve, that

> interpretation is misleading.

>

> Let's just say that the standard deviation is a measure of how

> spread-out the votes were; a standard deviation of zero means that all

> the voters gave the game the same score, and a high standard deviation

> means that they gave very different scores.

> spread-out the votes were; a standard deviation of zero means that all

> the voters gave the game the same score, and a high standard deviation

> means that they gave very different scores.

Out of curiosity, I just did a quick sort and compiled a list of the games

with the highest standard deviations. The highest SD, strangely, went to

"Hunter, In Darkness" with 2.42. Next was (ahem!) my very own "Halothane"

with 2.35, and "Erehwon" with 2.26. For those who're interested, the first

ten games with high SDs (in descending order) were: Hunter, Halothane,

Erehwon, Beal Street, Lunatix, Exhibition and Pass the Banana (a bizarre and

fortuitous tie..), and finally A Moment Of Hope, The HeBGB Horror! and The

Water Bird (the last three were also tied.) The _lowest_ SDs went to the

following games: Skyranch (a striking 1.10), Guard Duty and Outsided,

strangely enough games that received almost uniform low ratings (though I do

note that Guard Duty has scored at least one 9, and Outsided a whopping 7).

Does anyone want to extrapolate from this?

Quentin.D.Thompson. [The 'D' is a variable.]

Lord High Executioner Of Bleagh

(Formerly A Cheap Coder)

Sent via Deja.com http://www.deja.com/

Before you buy.

Nov 27, 1999, 3:00:00 AM11/27/99

to

[attribution accidentally trimmed]

>The _lowest_ SDs went to the

>following games: Skyranch (a striking 1.10), Guard Duty and Outsided,

>strangely enough games that received almost uniform low ratings (though I do

>note that Guard Duty has scored at least one 9, and Outsided a whopping 7).

>Does anyone want to extrapolate from this?

>The _lowest_ SDs went to the

>following games: Skyranch (a striking 1.10), Guard Duty and Outsided,

>strangely enough games that received almost uniform low ratings (though I do

>note that Guard Duty has scored at least one 9, and Outsided a whopping 7).

>Does anyone want to extrapolate from this?

Items with lower averages are forced to have lower SDs.

If you want to compare SDs across multiple items, it may

make more sense to express the SD as a fraction (percentage)

of the total. (But give me a second, and I'll expose that lie.)

For example, the maximum SD for an item receiving a 5.5 average

comes from votes like:

1,1,1,1,1,10,10,10,10,10 (SD=4.5) [*]

An item receiving a 2 average comes from votes like

1,1,1,1,1,1,1,1,10 (SD=3)

And of course an item receiving a 1 average must have an SD of 0.

Then again, the maximal SD for an item with a 9 average comes from

1,10,10,10,10,10,10,10,10 (SD=3)

So I guess the issue is really that they're near the edge of

the allowed range, not actually just relative to the average

(although that also seems relevent to me). If you think of SD

as the "width" of the bell curve, then scores near the edge of

the range of allowed votes have to have squished bell curves

since they can't trail off the edges of the range.

[*] I believe SD is square root of the variance and the variance

is technically defined not as the average of the squared errors,

but the sum of the squared errors divided by (number_of_samples - 1);

in the limit as the number of samples goes to infinity this is just

the average, but prior to that, it's just annoying.

However, I am not a statistician. If you feel you need

statistical advice, please get a consultation from a professional.

Sean Barrett

Nov 28, 1999, 3:00:00 AM11/28/99

to

In article <FLvoG...@world.std.com>,

Sean T Barrett <buz...@world.std.com> wrote:

>[*] I believe SD is square root of the variance and the variance

>is technically defined not as the average of the squared errors,

>but the sum of the squared errors divided by (number_of_samples - 1);

>in the limit as the number of samples goes to infinity this is just

>the average, but prior to that, it's just annoying.

Sean T Barrett <buz...@world.std.com> wrote:

>[*] I believe SD is square root of the variance and the variance

>is technically defined not as the average of the squared errors,

>but the sum of the squared errors divided by (number_of_samples - 1);

>in the limit as the number of samples goes to infinity this is just

>the average, but prior to that, it's just annoying.

OK, I lied abotu this before :-). Or, rather, I didn't mention

it because it would only be confusing.

There are two definitions of variance. One is simply the average of the

squared deviations from the mean. The other definition introduces a

correction factor N/(N-1) where N is the number of samples. The latter

definition is usually used when you take a sampling out of a large

set and want to estimate the variance of the whole set from that of

the sampling. The reasons for this are rather theoretical and I'd

like to refer anybody interested to some textbook in mathematical

statistics.

For the purproses of counting competition votes, I'd say it's

not a critical choice which definition to use.

Nov 28, 1999, 3:00:00 AM11/28/99

to

In article <81r33f$vdu$1...@bartlet.df.lth.se>,

Magnus Olsson <m...@bartlet.df.lth.se> wrote:

}

}OK, I lied abotu this before :-). Or, rather, I didn't mention

}it because it would only be confusing.

}

}There are two definitions of variance. One is simply the average of the

}squared deviations from the mean. The other definition introduces a

}correction factor N/(N-1) where N is the number of samples. The latter

}definition is usually used when you take a sampling out of a large

}set and want to estimate the variance of the whole set from that of

}the sampling. The reasons for this are rather theoretical and I'd

}like to refer anybody interested to some textbook in mathematical

}statistics.

Magnus Olsson <m...@bartlet.df.lth.se> wrote:

}

}OK, I lied abotu this before :-). Or, rather, I didn't mention

}it because it would only be confusing.

}

}There are two definitions of variance. One is simply the average of the

}squared deviations from the mean. The other definition introduces a

}correction factor N/(N-1) where N is the number of samples. The latter

}definition is usually used when you take a sampling out of a large

}set and want to estimate the variance of the whole set from that of

}the sampling. The reasons for this are rather theoretical and I'd

}like to refer anybody interested to some textbook in mathematical

}statistics.

They wave their hands about degrees of freedom and such. I think it's

just a fudge factor :-)

}For the purproses of counting competition votes, I'd say it's

}not a critical choice which definition to use.

If you're counting all the competition votes, the ordinary uncorrected

population variance is the one to use, I think.

Nov 29, 1999, 3:00:00 AM11/29/99

to

In article <FFf04.125$5Z3....@monger.newsread.com>,

>In article <81r33f$vdu$1...@bartlet.df.lth.se>,

>Magnus Olsson <m...@bartlet.df.lth.se> wrote:

>}

>}OK, I lied abotu this before :-). Or, rather, I didn't mention

>}it because it would only be confusing.

>}

>}There are two definitions of variance. One is simply the average of the

>}squared deviations from the mean. The other definition introduces a

>}correction factor N/(N-1) where N is the number of samples. The latter

>}definition is usually used when you take a sampling out of a large

>}set and want to estimate the variance of the whole set from that of

>}the sampling. The reasons for this are rather theoretical and I'd

>}like to refer anybody interested to some textbook in mathematical

>}statistics.

>Magnus Olsson <m...@bartlet.df.lth.se> wrote:

>}

>}OK, I lied abotu this before :-). Or, rather, I didn't mention

>}it because it would only be confusing.

>}

>}There are two definitions of variance. One is simply the average of the

>}squared deviations from the mean. The other definition introduces a

>}correction factor N/(N-1) where N is the number of samples. The latter

>}definition is usually used when you take a sampling out of a large

>}set and want to estimate the variance of the whole set from that of

>}the sampling. The reasons for this are rather theoretical and I'd

>}like to refer anybody interested to some textbook in mathematical

>}statistics.

(...)

>}For the purproses of counting competition votes, I'd say it's

>}not a critical choice which definition to use.

>

>If you're counting all the competition votes, the ordinary uncorrected

>population variance is the one to use, I think.

Well, the theoretical standpoint is this: if you're taking a sample out

of a large (possibly infinite) population, and want to estimate the

standard deviation of the whole population, then using the N/(N-1)

formula means that the standard deviation of the sample is an unbiased

estimate of the standard deviation of the whole population.

In the comp votes example you could argue that we aren't sampling the

whole population of votes; we have the whole population of votes at

hand. On the other hand, you could argue that the votes are a sample of

the opinions of *all* IF players (not just the ones who voted).

But this is unnecessary. We aren't using the standard deviation to

estimate anything. We're using it in a purely descriptive way, as a

measure of how much the votes for each game differed from each

other. It's really immaterial which formual we use as long as

everybody uses the same formula. We could just as well use the mean of

the absolute of the deviations (rather than that of the squares of the

deviations), as somebody suggested.

And we could just as well use the median score instead of the average

score to determine how well a game fared. That would also address

the outlier problem to some extent. (The median is the "middle" vote -

the one that had just as many votes above it as below it).

Nov 29, 1999, 3:00:00 AM11/29/99

to

alt.distingu...@world.std.com (Sean T

Barrett).wrote.posted.offered:

Barrett).wrote.posted.offered:

>However, I am not a statistician. If you feel you need

>statistical advice, please get a consultation from a professional.

Most likely a professional whose profession is psychology. :)

--

Ross Presser

ross_p...@imtek.com

"And if you're the kind of person who parties with a bathtub full of

pasta, I suspect you don't care much about cholesterol anyway."

Nov 29, 1999, 3:00:00 AM11/29/99

to

In article <8E8D8C40...@199.45.45.11>,

Ross Presser <rpre...@NOSPAMimtek.com.invalid> wrote:

>alt.distingu...@world.std.com (Sean T

>Barrett).wrote.posted.offered:

>

>>However, I am not a statistician. If you feel you need

>>statistical advice, please get a consultation from a professional.

>

>Most likely a professional whose profession is psychology. :)

Ross Presser <rpre...@NOSPAMimtek.com.invalid> wrote:

>alt.distingu...@world.std.com (Sean T

>Barrett).wrote.posted.offered:

>

>>However, I am not a statistician. If you feel you need

>>statistical advice, please get a consultation from a professional.

>

>Most likely a professional whose profession is psychology. :)

I've seen psychologists 'doing' statistics, and I've seen the Revd

Ian Paisley doing politics - they both scar(r)ed me!

regards, ct

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu