|
|
Dave W.
Info Junkie
USA
26022 Posts |
Posted - 07/16/2012 : 19:02:26
|
Calling all statisticians and wanna-be statisticians!
Between points X and Y on my commute home from work, I have three main routes, A, B and C.
Route A has about two miles of two-lane (each way) divided turnpike (50 MPH), followed by four miles of superhighway (55 MPH). It's a generally fast 6.3 miles, but when there's a traffic jam, it can turn into a nightmare.
Route B is six miles of two-lane (each way, divided) roads, through residential neighborhoods and past strip malls. It's mostly 45 MPH, but has about a mile of 35 MPH limit, and perhaps 100 yards of 25 MPH road. Route B crosses several larger roads, and so there can be long waits at lights.
Route C is 5.3 miles long, and starts with a mile of the same turnpike as route A, but the rest is two-line (each way, divided) road that is the next-largest artery through town compared to the superhighway, despite having ten traffic lights. It's mostly 45 MPH, with a mile or so at 40 MPH.
I'm looking for the "best" route to take home, "best" being a combination of speed, consistency and gas mileage. I used a stopwatch to time each route (rounding to the nearest second), and took each route 30 times. These routes are actually a small part of my 40-mile commute, but they are the biggest choice I face.
I know which route I prefer, but I would like some statistical support for my choice. I don't have the mathematical chops to demonstrate (to myself, even) that, for example, I have a 95% chance to complete route A in thus-and-such a time (or whatever other statistical tools would help me make this choice), so I'm looking for some math-whiz help.
Here are graphical representations of the data for the three routes:
The average for each route is marked in red. The median in blue.
Here's the raw data (in seconds):
Route A: 424, 439, 450, 456, 456, 458, 467, 469, 471, 472, 474, 475, 476, 482, 482, 494, 511, 536, 541, 612, 621, 668, 679, 823, 836, 849, 999, 1157, 1277, 2195
Route B: 725, 731, 740, 763, 771, 773, 780, 784, 785, 809, 812, 819, 827, 833, 835, 840, 846, 857, 860, 865, 874, 888, 916, 977, 977, 984, 1007, 1041, 1143, 1389
Route C: 516, 561, 565, 570, 572, 576, 584, 596, 597, 598, 605, 629, 633, 634, 636, 642, 642, 677, 688, 689, 700, 706, 743, 750, 768, 844, 866, 882, 886, 992
|
- Dave W. (Private Msg, EMail) Evidently, I rock! Why not question something for a change? Visit Dave's Psoriasis Info, too. |
|
H. Humbert
SFN Die Hard
USA
4574 Posts |
Posted - 07/16/2012 : 19:29:21 [Permalink]
|
My gut tells me C.
|
"A man is his own easiest dupe, for what he wishes to be true he generally believes to be true." --Demosthenes
"The first principle is that you must not fool yourself - and you are the easiest person to fool." --Richard P. Feynman
"Face facts with dignity." --found inside a fortune cookie |
|
|
BigPapaSmurf
SFN Die Hard
3192 Posts |
Posted - 07/17/2012 : 05:18:49 [Permalink]
|
Peak hours take C, non-peak take A. The lowest possibility of major slowdowns makes C the correct chioce. Any quick days on route A are quickly ruined by one of those 2000's.
I had the same problem in golf, a 10% chance for an eagle isn't worth the increased risk of the triple-bogey. |
"...things I have neither seen nor experienced nor heard tell of from anybody else; things, what is more, that do not in fact exist and could not ever exist at all. So my readers must not believe a word I say." -Lucian on his book True History
"...They accept such things on faith alone, without any evidence. So if a fraudulent and cunning person who knows how to take advantage of a situation comes among them, he can make himself rich in a short time." -Lucian critical of early Christians c.166 AD From his book, De Morte Peregrini |
Edited by - BigPapaSmurf on 07/17/2012 05:20:07 |
|
|
Dave W.
Info Junkie
USA
26022 Posts |
Posted - 07/17/2012 : 07:27:14 [Permalink]
|
Originally posted by BigPapaSmurf
Peak hours take C, non-peak take A. | My drive from work to home is always non-peak. The really long times on route A are due to peak-time wrecks creating miles-long backups that take hours to clear.
In contrast, my drive to work in the mornings is always peak, and routes A and B are both always hopelessly snarled. During the school year, route B gets even worse because it's residential and there's a school zone in the middle (so lots of bus traffic at 25 MPH). I take C in the mornings, always, and I won't even bother to try to time the others. |
- Dave W. (Private Msg, EMail) Evidently, I rock! Why not question something for a change? Visit Dave's Psoriasis Info, too. |
|
|
Machi4velli
SFN Regular
USA
854 Posts |
Posted - 07/17/2012 : 23:31:36 [Permalink]
|
Some quick data, some you already had:
Route A (Route A without the 1 outlier) average = 658.3 (605.3) median = 488 (482) min = 424 (424) max = 2195 (1277) st dev = 364.8 (224.9)
Route B average = 875 median = 837.5 min = 725 max = 1389 st dev = 138.5
Route C average = 678.2 median = 639 min = 516 max = 992 st dev = 116.8
You can reject Route B outright, I think, from just this little analysis. It was already clear route A is sort of high risk, high reward choice, but the standard deviation numbers quantify that in a sense, as it's much higher there than in route C. And by the skewing to the right, it looks like the deviations, if large, will be as a longer time (there's a traffic jam for some reason).
Since the 2195 time was so far away from the others in route A, I also gave the numbers without it. I don't want to encourage ignoring it necessarily, but since it happened once out of 30 times, I don't know if you should consider it a realistic risk, it's not really enough data to tell, but from experience, judge whether it is and choose which values to look at.
I'll see if i can fit probability distributions to them and give you probabilities as a function of time for each route. |
"Truth does not change because it is, or is not, believed by a majority of the people." -Giordano Bruno
"The greatest enemy of knowledge is not ignorance, but the illusion of knowledge." -Stephen Hawking
"Seeking what is true is not seeking what is desirable" -Albert Camus |
Edited by - Machi4velli on 07/18/2012 00:39:51 |
|
|
Dave W.
Info Junkie
USA
26022 Posts |
Posted - 07/18/2012 : 14:15:20 [Permalink]
|
What do the standard deviations actually tell us, in the context of this data? I've always been iffy on the concept in general, but maybe getting specific with this example will make something "click" for me.
The "outlier" time isn't that much of an outlier. Before I started timing these routes, I sat on that road for at least 40 minutes one night because there was a plastic trash can rolling back-and-forth between two lanes of traffic. I was genuinely surprised I didn't get more 1800+ times from that route. Oh, and go figure this: one of the fastest times on that route was when it was pouring down rain. Perhaps the big jam happened before my entrance ramp that night. I dunno. |
- Dave W. (Private Msg, EMail) Evidently, I rock! Why not question something for a change? Visit Dave's Psoriasis Info, too. |
|
|
Machi4velli
SFN Regular
USA
854 Posts |
Posted - 07/18/2012 : 19:35:14 [Permalink]
|
Standard deviation measures how far spread out around the mean the data is -- the variance is the sum of squared deviations from the mean, so you take (value1 - mean)^2 + (value2 - mean)^2 + ... + (value n - mean)^2 and the standard deviation is the square root of this sum.
If you had data like this: 500, 500, 500 -- the st dev would be 0 If you had data like this: 400, 500, 600 -- the st dev would be 141
So given two data sets with the same mean, you can see the second is more spread out and so has a bigger st dev. This is what we find with Route A, most of them really aren't that close to 658 and we get a higher st dev.
If something is normally distributed (bell curve, common assumption on lots of things), a standard deviation is still the same thing, but it has some special properties about the probability values are within a certain number of standard deviations, but this isn't quite true here.
About outliers, there are some standard ways to measure what an outlier is (3 times the difference between 75th and 25th percentile sometimes), but they're pretty arbitrary, and I'd be more apt to go with your opinion with respect to how rare those events are. |
"Truth does not change because it is, or is not, believed by a majority of the people." -Giordano Bruno
"The greatest enemy of knowledge is not ignorance, but the illusion of knowledge." -Stephen Hawking
"Seeking what is true is not seeking what is desirable" -Albert Camus |
Edited by - Machi4velli on 07/18/2012 19:38:23 |
|
|
Dave W.
Info Junkie
USA
26022 Posts |
Posted - 07/18/2012 : 20:20:51 [Permalink]
|
Okay, so the standard deviation can't be taken as more than a generalized measure of how varied a data set is without that set being normally distributed.
'Cause with route A, the mean minus the standard deviation is 293.5, which is much faster than I ever actually drove. That number would be an average of over 77 MPH, when my mean speed was 34.5 MPH (my fastest trip was 53.5 MPH, which is surprising all by itself). |
- Dave W. (Private Msg, EMail) Evidently, I rock! Why not question something for a change? Visit Dave's Psoriasis Info, too. |
|
|
Machi4velli
SFN Regular
USA
854 Posts |
Posted - 07/18/2012 : 21:07:50 [Permalink]
|
Well, different distributions have different relationships to standard deviations. What is common across all is that it's a generalized measure of variability of the data, as you said. Mean minus standard deviation doesn't really mean anything useful here. |
"Truth does not change because it is, or is not, believed by a majority of the people." -Giordano Bruno
"The greatest enemy of knowledge is not ignorance, but the illusion of knowledge." -Stephen Hawking
"Seeking what is true is not seeking what is desirable" -Albert Camus |
|
|
Hawks
SFN Regular
Canada
1383 Posts |
Posted - 07/18/2012 : 21:28:43 [Permalink]
|
Using a two-tailed t-test with unequal variance, I get the following p-values:
A vs B, p=0.004 A vs C, p=0.777 B vs C, p=0.000
i.e. both A and C are significantly different from B, whereas there is no significant difference between A and C (where significance=0.05).
It is, strictly speaking, probably not right to use a t-test when comparing more than two data sets, but given how "clean" the results were here, I wouldn't worry. And, in any case, I can't remember which test to use since it's been a few years since I did any of this stuff... |
METHINKS IT IS LIKE A WEASEL It's a small, off-duty czechoslovakian traffic warden! |
|
|
Machi4velli
SFN Regular
USA
854 Posts |
Posted - 07/18/2012 : 22:35:56 [Permalink]
|
Don't t-tests assume the data is normal (Gaussian)? |
"Truth does not change because it is, or is not, believed by a majority of the people." -Giordano Bruno
"The greatest enemy of knowledge is not ignorance, but the illusion of knowledge." -Stephen Hawking
"Seeking what is true is not seeking what is desirable" -Albert Camus |
|
|
Dr. Mabuse
Septic Fiend
Sweden
9688 Posts |
Posted - 07/19/2012 : 02:39:02 [Permalink]
|
A low standard deviation indicates a high predictability, which is an aspect I would value if the averages are close. Route A takes you home more quickly most of the times, sure, but the average differs little to Route C, and because of this I would personally choose Route C.
What are you planning to do with the free time you get from choosing a different route? Route A will get you a few extra minutes each day, but one day you'll loose more than an half hour. Route C will not get you those extra few minutes, but you'll "never" loose an evening to sitting in the car waiting.
My own reasoning would be this: What if your wife was cooking something time-sensitive that doesn't do well being kept warm in the oven, or if she's waiting for you to come home and watch the kid while she goes to her yoga- (or martial arts-) class? Being predictable would be preferable: a cooking schedule is easily delayed a minute, but a late message that you'll be a half-hour late to mess things up.
|
Dr. Mabuse - "When the going gets tough, the tough get Duct-tape..." Dr. Mabuse whisper.mp3
"Equivocation is not just a job, for a creationist it's a way of life..." Dr. Mabuse
Support American Troops in Iraq: Send them unarmed civilians for target practice.. Collateralmurder. |
|
|
On fire for Christ
SFN Regular
Norway
1273 Posts |
Posted - 07/19/2012 : 03:46:12 [Permalink]
|
Originally posted by Dr. Mabuse
My own reasoning would be this: What if your wife was cooking something time-sensitive that doesn't do well being kept warm in the oven, or if she's waiting for you to come home and watch the kid while she goes to her yoga- (or martial arts-) class? Being predictable would be preferable: a cooking schedule is easily delayed a minute, but a late message that you'll be a half-hour late to mess things up.
|
In this scenario you simply tell your wife that her duties are to cook and take care of the child, coming home late from work is your prerogative, if a man needs to relax after work, the wife's recreational activities are secondary. |
|
|
|
BigPapaSmurf
SFN Die Hard
3192 Posts |
Posted - 07/19/2012 : 05:00:50 [Permalink]
|
Originally posted by On fire for Christ
Originally posted by Dr. Mabuse
My own reasoning would be this: What if your wife was cooking something time-sensitive that doesn't do well being kept warm in the oven, or if she's waiting for you to come home and watch the kid while she goes to her yoga- (or martial arts-) class? Being predictable would be preferable: a cooking schedule is easily delayed a minute, but a late message that you'll be a half-hour late to mess things up.
|
In this scenario you simply tell your wife that her duties are to cook and take care of the child, coming home late from work is your prerogative, if a man needs to relax after work, the wife's recreational activities are secondary.
|
Your Honor!! This man does not represent me! |
"...things I have neither seen nor experienced nor heard tell of from anybody else; things, what is more, that do not in fact exist and could not ever exist at all. So my readers must not believe a word I say." -Lucian on his book True History
"...They accept such things on faith alone, without any evidence. So if a fraudulent and cunning person who knows how to take advantage of a situation comes among them, he can make himself rich in a short time." -Lucian critical of early Christians c.166 AD From his book, De Morte Peregrini |
|
|
Valiant Dancer
Forum Goalie
USA
4826 Posts |
Posted - 07/19/2012 : 05:58:43 [Permalink]
|
Originally posted by On fire for Christ
Originally posted by Dr. Mabuse
My own reasoning would be this: What if your wife was cooking something time-sensitive that doesn't do well being kept warm in the oven, or if she's waiting for you to come home and watch the kid while she goes to her yoga- (or martial arts-) class? Being predictable would be preferable: a cooking schedule is easily delayed a minute, but a late message that you'll be a half-hour late to mess things up.
|
In this scenario you simply tell your wife that her duties are to cook and take care of the child, coming home late from work is your prerogative, if a man needs to relax after work, the wife's recreational activities are secondary.
|
OK old joke time.
A man was telling his friends that he needed to go because his wife was expecting him at a certian time.
His single friends insisted that he was letting her run things and he needed to "lay down the law" to her or throw her out of the house. After all, he pays the bills and she just stays home all day.
Fortified with liquid courage, he decides to try this out.
He goes home and demands that a lavish meal be prepared and if she didn't like it, he didn't want to see her in the house for a week.
She asked him if he'd like to not see her for three weeks. He said that would be even better.
The first week, he didn't see her. Ditto for the second week. The third week he saw her, just a little, out of the right eye. |
Cthulhu/Asmodeus when you're tired of voting for the lesser of two evils
Brother Cutlass of Reasoned Discussion |
|
|
Machi4velli
SFN Regular
USA
854 Posts |
Posted - 07/20/2012 : 00:23:44 [Permalink]
|
Originally posted by Machi4velli
Don't t-tests assume the data is normal (Gaussian)?
|
Looked this up, it does technically, but if we can assume the times are independent (surely they are) and identically distributed (a little harder assumption to make, but you can't do a lot without this one), central limit theorem says the test statistic approaches the appropriate distribution for the test to work (technically as sample size goes to infinity, but 30 is a heuristic used in some contexts as a minimum to typically work).
A sign test can be used to alleviate any normality assumptions, but it can't use information about the distribution of the data, making it a bit weaker and less good with smaller samples.
Otherwise, we can try to fit a distribution and deduce mathematically all manner of results from that, but it may over-fit the model to the sample.
It's a matter of picking a poison, and probably none of them are going to be more useful than Mab's reasoning to support Route C. |
"Truth does not change because it is, or is not, believed by a majority of the people." -Giordano Bruno
"The greatest enemy of knowledge is not ignorance, but the illusion of knowledge." -Stephen Hawking
"Seeking what is true is not seeking what is desirable" -Albert Camus |
Edited by - Machi4velli on 07/20/2012 00:37:51 |
|
|
|
|
|
|