Kit
Senior
Posts: 380
Loc: Central Massachusetts
Reg: 11-29-04
|
02-09-17 05:25 PM - Post#221144
Mike:
How do you make a simulation? Suppose there are X possessions per game. Each possession results in either a made basket, a missed basket, a foul, or a turnover. There is a percentage of 3-point shots taken, a percentage of offensive and defensive rebounds obtained, a percentage of free throws made, and percentage of turnovers committed.
Where can one get the data for the ORat and DRat for all 351 D-1 schools? Is there individual player stats included in the model?
Inquiring minds want to know...
|
mrjames
Professor
Posts: 6062
Loc: Montclair, NJ
Reg: 11-21-04
|
02-09-17 06:53 PM - Post#221159
In response to Kit
Ha - this question can get pretty deep pretty quickly.
Let's say there's Team A and Team B. The easiest way is to take Team A and Team B's ORAT and DRAT and apply the following formula:
(Team A ORAT / 100) * (Team B DRAT / 100) ^ Exponent / ((Team A ORAT / 100) * (Team B DRAT / 100) ^ Exponent + (Team A DRAT / 100) * (Team B ORAT / 100) ^ Exponent))
NOTE - have to add/subtract HCA from the efficiency ratings before you start - I use 1.5 pts from all four of the ORAT and DRAT pairs.
That exponent was about 12.5 a decade ago, but KenPom has continued to backtest it, and it appears to be closer to 10.5 today. Another fun rule of thumb is that the std dev on line misses is about 8pts, so if you're favored by 8 points, you should win all of the games where the game is less than one std dev against you (84%). If the line is 16 pts, then you win all the ones that don't go two std devs against you (97.5%).
What you're asking about does indeed get more complicated. The simplest form is to run a loop for each team for roughly the number of possessions that you expect in the game. Within each loop, you run a random number generator. If it's above a certain number (say 0.8, if the team's turnover rate is 20%), then it's a turnover. Under that, and you divide up the potential outcomes into free throws, 3PTers, 2PT Js, layups. Once the number guides you to a shot type, you run another random number to determine make or miss. Self explanatory if it's a make, but if it's a miss, you run another random number to see if the ball was offensive rebounded. And all of this is done based on the team's past style and rates factored against the opponent's ability to counter (mixed based on KenPom's offense vs defense control calcs).
Once you iterate that across all of the possessions in a game, you then loop those individual game sims thousands of times to get a sense of the distribution of potential outcomes.
Now the model isn't exactly that simple. What happens on Team A's poss can impact Team B's likelihood to score on the next poss. What happens earlier in the sim could impact how a team would usually play later in the sim. The likely pace of the game is chosen from a distribution and isn't a static number. There's other stuff that factors in as well. I find the outputs here to be a bit more revealing than the top-level calcs in some instances, though it tends to reveal more about the potential variance of the result than the magnitude of it.
Hope that helps!
|
Kit
Senior
Posts: 380
Loc: Central Massachusetts
Reg: 11-29-04
|
02-10-17 12:56 AM - Post#221186
In response to mrjames
Thanks, Mike!
|
|