Select "print" from your browser's "File" menu.

Back to Post
Username Post: Building realistic simulations
Kit
Senior
Posts 380
04-15-17 10:54 PM - Post#228749    

Hey Mike Jzmes,

Since our last communication, I got a general idea how to simulate the outcome of a college basketball game. Now that the season is over, the compilation of relevant statistics is completed, I would like to use the collected data to build my own C programming language-based model. I would like to get individual players minutes played, shots attempted, shots made, 3-point shots attempted, 3-point shots made, free- throw shots attempted, free-throw shots made, fouls committed, turnovers committed, steals, shots blocked, number of offensive rebounds, number of defensive rebounds, assists, average number of possessions, height, and position(s) played. I would like to obtain such records for each of 351 Division I teams who compete in the NCAA. Is such data readily available is a usable format (i.e. raw text file with spaces, tabs, and new-line characters as delimiters) since extracting said information from a worksheet file (i.e. Excel file, or a Word document, or some flavor of database files is a royal pain in the a--. Could you help me in getting such data sets, since it would be nice to replay this season especially since my favorite team (Princeton) performed remarkably well. Your assistance in regard to this matter is greatly appreciated. Thanks in advance!

-- Kit

mrjames
Professor
Posts 6062
05-02-17 11:20 AM - Post#229001    

Apologies for the delay.

If you can do some data scraping, I would recommend the following websites as having some fantastic data sets for both analyzing players and pulling down box score data:

http://www.sports-reference.com/cbb/conferences/iv...

http://basketball.realgm.com/ncaa/conferences/Ivy- ...

http://stats.ncaa.org/team/275/12480?game_s port_ye...

I've got some stuff written in R to pull this down, but I'm sure you can pull those tables down in five seconds. Happy to help in terms of walking through what the traditional models look like here to simulate seasons as well!
rbg
Postdoc
Posts 3044
01-24-18 10:15 AM - Post#244506    

Luke Benz, President of the Yale Undergraduate Sports Analytics Group, has an article at the Yale Daily News regarding his club's predictions for this coming weekend, as well as the conference season.

https://yaledailynews.com/blog/2018/01/23/by-th e-n...

Looking at the article and the group's twitter feed, their simulations show the chances for each team to make the Ivy Tournament:
Princeton 98.0%
Penn 97.9%
Harvard 78.2%
Yale 53.8%
Brown 37.0%
Columbia 32.5%
Dartmouth 1.6%
Cornell 0.2%

Looking at Mike's twitter account, he has similar percentages:
Princeton 98%
Penn 97%
Harvard 77%
Yale 53%
Brown 36%
Columbia 34%
Dartmouth 3%
Cornell 3%

I was looking at the IL info on Adjusted Offensive and Defensive Efficiency at Ken Pomeroy's site and saw the following rankings:

Team AdjO AdjD
Prin #87 #175
Penn #246 #90
Harv #330 #46
Yale #151 #264
Brown #262 #194
Colum #210 #246
Corn #223 #320
Dart #244 #321

I have a couple questions for the much more educated analytic experts on the board based on the above information. (I apologize if my premise is completely off base.)

Looking at the IL simulations and the adjusted efficiency rankings, is it too naive to state:

A team needs at, or around, a top 100 ranking in at least one metric to feel strongly confident to be in this year's top 4?

Does a team need to have a performance in the area of Top 150 in at least one metric to be fairly certain of getting into this year's Tournament?

Since Brown's defense and Columbia's offense are closer to top 150, will an improvement in those metrics give them the best chance of taking the last spot over Yale (assuming Yale's offense doesn't take
a nose dive)? If so, how realistic is it that either team could improve that much?



mrjames
Professor
Posts 6062
01-24-18 10:49 AM - Post#244509    

I think your "rules of thumb" are all pretty solid for this year (could change across years depending on the strength of the league, especially in the middle).

The way the process works - and why YUSAG and I come up with essentially the same numbers is this:

Start with a rating system. I use KenPom. You could use Sagarin, Bart Torvik, Massey, etc. You could use a blend of each. Derive a rating for each team. Then, use the historically "best-fit" formulas to use those ratings (plus home court advantage) to derive odds to win each game (using KenPom, Massey, Bart Torvik, etc. which already have these odds calculated can help you skip a step).

Now you have a list of all 56 games with win pct odds. Put a column next to each with a random number between 0 and 1 in it. That will determine who wins each game in the sim (set all games that have been played to a flat 0 or 1 depending on who already won the game).

From there, you write a for loop to run each individual season simulation 100, 1000, 10000, 100000 times depending on what strikes your fancy. (Technically, the reasoning behind how many sims you run should be grounded in what you're trying to figure out... if you're focused on longer odds events, you might want to run more sims to get more clarity, but if you're focused on the meat of the curve, you won't get very different answers even after 100 sims).

There's a LOT of code around the fakakta tie breakers the Ivies have put in place that comes into play when taking the Ivy wins from each sim and turning that into the rank order of finish. Then, there's more to simulate the Ivy tourney that would result. But let's leave that aside.

From there, it's just some simple math to figure out the odds of finishing in any spot, making the tourney, winning it, etc.

The key point about the odds is that they represent the likelihood of qualification given that the team continues to be exactly what it is today. That is to say that a team doesn't necessarily HAVE to improve to make the tourney if it has, say, 33% odds. From here, game results will move the odds more dramatically than the associated changes in team rating.
rbg
Postdoc
Posts 3044
01-24-18 11:22 AM - Post#244515    

Thanks for the help. Greatly appreciated!
rbg
Postdoc
Posts 3044
02-07-18 10:10 PM - Post#246857    

Following last night's game, there is updated info from Yale Undergraduate Sports Analytics Group and Mike regarding the chances for each team to make the Ivy Tournament:

YUSAG -
Penn 99.9%
Harvard 96.5%
Princeton 83.4%
Brown 54.1%
Columbia 33.8%
Yale 30.4%
Cornell 1.9%
Dartmouth 0.1%

Mike -
Penn 99.9%
Harvard 96%
Princeton 84%
Brown 53%
Columbia 35%
Yale 29%
Cornell 3%
Dartmouth 0%

SRP
Postdoc
Posts 4894
02-08-18 12:05 AM - Post#246872    

Some one in Hanover is saying "Never tell me the odds!"
mrjames
Professor
Posts 6062
02-08-18 09:02 AM - Post#246883    

Dartmouth made the Ivy Tourney in mine in 26 of 10,000 sims (0.3%).

So, yes, I'm sayin' there's a chance.
T.P.F.K.A.D.W.
PhD Student
Posts 1169
02-08-18 09:32 AM - Post#246886    

  • rbg Said:

YUSAG -
Penn 99.9%
Harvard 96.5%
Princeton 83.4%
Brown 54.1%
Columbia 33.8%
Yale 30.4%
Cornell 1.9%
Dartmouth 0.1%

Mike -
Penn 99.9%
Harvard 96%
Princeton 84%
Brown 53%
Columbia 35%
Yale 29%
Cornell 3%
Dartmouth 0%




What accounts for the startling difference between Cornell and Yale? Both teams are 2-4. I mean, yeah, Yale's a better team I suppose (though they need to start—you know—winning some games) but 29% - 6%? Damn.
Silver Maple
Postdoc
Posts 3765
02-08-18 10:03 AM - Post#246889    

I'd guess SOS. Yale has played tougher opposition than Cornell.
mrjames
Professor
Posts 6062
02-08-18 10:36 AM - Post#246896    

SOS is pretty equal and will be very equal down the stretch (essentially only difference is Brown-Columbia swapping out and those two are very close in the rankings).

The crux of it is that Yale is a MUCH better team. The Bulldogs are not only favored over Cornell at home, they're also a 2-pt favorite in Ithaca. Then every weekend they have better odds than Cornell as well. Cornell is 9% to sweep the Ps at home, Yale is roughly 25% (key weekend for tiebreakers).

Very similar to why Penn last year still had decent odds at 0-5 and even at 0-6 still had better odds to make the Ivy tourney (7%) than 2-4 Cornell (3%), leading me to say this at the time:

Mike James
‏

@ivybball
5 Feb 2017
More Mike James Retweeted CornellSportsGameday
If Cornell wins, I'm not looking forward to having to explain how a 3-3 team has lower odds to make the Ivy tourney than an 0-5 team. Mike James added,
CornellSportsGameday

@CUBigRedGameday
MBKB I It's halftime and we're all tied at 36-36. Sophomore Stone Gettings with 16 points to lead the Big Red.
T.P.F.K.A.D.W.
PhD Student
Posts 1169
02-08-18 11:29 AM - Post#246903    

#ZombieBulldogs!
penn nation
Professor
Posts 21086
02-08-18 11:48 AM - Post#246907    

You can make a similar point about the rather large differences between Princeton and Brown.

Princeton has not looked particularly good in its last 3 home contests. I understand that you and others expect Princeton to "return to form" but frankly the I'm not sure that "form" is so realistic anymore, especially with the preponderance of road games to come.

That Princeton-Brown game is so key--if Brown wins, Princeton is in for a ride.
mrjames
Professor
Posts 6062
02-08-18 12:02 PM - Post#246910    

Happy to run the sims with overrides on the efficiency ratings that are more representative of how a team is playing now.

The current sims pull the objective rating from Bart Torvik's site, so I don't really have any control over how a team is rated. I do have manual overrides that I *can* throw in (and would, potentially, if Aiken or Mason ever came back fully healthy and as a consistent positive contributor). Otherwise, it's best to stick with the ratings as they exist in the objective world.

One thing I will say is that Princeton's odds would be lower if I were using KenPom versus Bart Torvik's site. KenPom's ratings get super jumpy when the preseason weighting falls off, whereas Bart's move more throughout the year, leading to them being relatively less jumpy at this point.

Thus, Princeton didn't take as much of a hit over the past couple games at Bart's site than at KenPom.
PennFan10
Postdoc
Posts 3580
02-08-18 12:10 PM - Post#246911    

Isn't it more accurate without preseason weightings at this point?

"You are what your record says you are"
mrjames
Professor
Posts 6062
02-08-18 12:24 PM - Post#246914    

The funny/weird/slightly incomprehensible thing is no, it isn't. Which makes little intuitive sense, but Ken did some great analysis that showed that predicting performance at the end of a season as the target variable, leveraging preseason rankings still had added predictive value over where the team stood at that time. So, technically, you could justify having some preseason weight in there the entire season.

He fades the weight out earlier than that (January) for ease of explanation, but technically, it would still be a (marginally?) better rating system to leave them in. His reasoning was that it would be a very, very small amount of weighting that he wouldn't want people latching on to in order to cast aspersions on his overall system. Sometimes the product you build has to sacrifice integrity to gain buy in. Thus is life.
T.P.F.K.A.D.W.
PhD Student
Posts 1169
02-08-18 12:37 PM - Post#246915    

How would this work for a school like Kentucky that seems to replace (or at the very least, make significant changes to) their roster every year?
mrjames
Professor
Posts 6062
02-08-18 01:02 PM - Post#246917    

The way the preseason ratings work involve historical team and conference quality as well as recruiting class ratings as well. They work surprisingly well for the volatility of the inputs...



Copyright © 2004-2012 Basketball U. Terms of Use for our Site and Privacy Policy are applicable to you. All rights reserved.
Basketball U. and its subsidiaries are not affiliated in any way with any NCAA athletic conference or member institution.
FusionBB™ Version 2.1 | ©2003-2007 InteractivePHP, Inc.
Execution time: 0.562 seconds.   Total Queries: 15   Zlib Compression is on.
All times are (GMT -0500) Eastern. Current time is 06:43 PM
Top