Saturday, December 6, 2014

The New Algorithm

After an unusually poor performance for the College Women's Hoops' S-Factor last year, I decided to reevaluate the methodology that went into the rankings published here. I started off by sorting out which teams the old S-Factor predicted fairly closely to the actual seed, and which teams the S-Factor had a problem with. The chart below shows the teams where the S-Factor's prediction and the actual tournament selection differed by two seeds or more.

Real seed
Predicted seed
Diff.
Bowling Green
not picked
9
+∞
Rutgers
not picked
11
+∞
Southern Miss
not picked
8
+∞
BYU
12
7
+5
Gonzaga
6
4
+2
Iowa
6
4
+2
MTSU
8
6
+2
Oregon State
9
7
+2
James Madison
11
9
+2
Oklahoma State
5
7
-2
LSU
7
9
-2
Georgia
8
10
-2
Vanderbilt
8
10
-2
St. Joseph's
9
11
-2
UT-Martin
13
15
-2
Texas
5
8
-3
Iowa State
7
11
-4
Oklahoma
10
not picked
- ∞
Florida State
10
not picked
-∞
Florida
11
not picked
-∞



The teams on the top half of this chart are teams that the S-Factor was too bullish on, while those on the bottom half were teams the S-Factor was too harsh on.

Generally speaking, the teams that S-Factor missed high on were teams from mid-major conferences, while the teams that the S-Factor missed low on teams from the major conferences. Eight of the eleven teams that S-Factor underpredicted by two seeds or more came from either the Big 12 or the SEC.


I realized I needed to give greater weight to the teams from elite conferences, but then the question would be what constitutes an "elite conference". Does the new American Athletic Conference count, just because it has Connecticut in it? Would I have to give the same weight to the PAC-12 that I would to the SEC? More concerning to me was the fact that any definition of an "elite conference" would be subjective, not defined by wins and losses within a single season, which would shackle teams to historic expectations rather than let them define their own destinies on the basketball court.

The answer for me until this season had been to use conference RPI as the only method to distinguish teams in competitive conferences from teams in lagging conferences. Conference record, conference tournament record, and overall record are adjusted by conference RPI such that teams in the #1-RPI conference (SEC in 2014) get the full allotment of points in these categories, while teams in the #32-RPI conference (SWAC in 2014) get no points for these categories.

But many mid major conferences have relatively decent conference RPIs. For instance, all throughout last year the old S-Factor was ranking West Coast Conference teams higher than most other bracket processes. WCC's conference RPI was just a smidge lower than the Big East's, good for eighth in the country, because the WCC had a strong middle and bottom tier compared to many other mid major conferences. But they only had one team in the top 30 in RPI (Gonzaga). A strong bottom half of the conference didn't mean anything to the selection committee, so both BYU and Gonzaga got screwed. (BYU later avenged their poor seeding by becoming only the third 12-seed in tournament history to make it to the Sweet Sixteen.)

I decided I needed another way to quantify goodness of conference.

I defined an "elite" conference as one that had a certain percentage of their teams being good teams. I played around with this idea, and I came up with an "elite conference factor" that worked reasonably well, defined by the percentage of conference teams in the top 100 RPI (25%), the top 30 (50%), and the top 10 RPI (25%). Top 30 RPI teams are almost always tournament-bound teams, and top 10 RPI teams are where superstars play, but I also wanted to include information about teams in the 30-100 range, the "challenging, but not tournament bound" range.

This "elite conference factor" ranks conferences in roughly the same order as conference RPI, but falls off much faster as one progresses from the best to the worst conferences. With the "elite conference factor", the difference between, say, the Ohio Valley Conference and the MAC is less pronounced than with conference RPI even though the MAC was 12th and the OVC was 29th last year. This better reflects the way the selection committee treats all mid-major conferences as single-bid conferences even though there is a marked difference in competition level between conferences like the MAC and conferences like the OVC.

Unlike the oblique way Conference RPI is considered in my formula, I wanted to use this "elite conference factor" directly in the S-Factor formula, as a way to put the thumb on the scales in favor of teams from major conferences. But this ran the risk of favoring obviously-not-tournament-bound teams from major conferences (Alabama, say) over legitimately strong teams from mid-major conferences (BYU, say). I decided to further restrict the "elite conference factor" to teams that could legitimately be selected to the tournament. The tournament has never selected a team with a losing record to an at-large bid. Nor have they ever selected a team with an RPI above 100 or a conference record more than 2 games below .500 (that I know of). For teams that pass these requirements, the "elite conference factor" kicks in gradually for teams above RPI of 100, until the RPI 40 team, above which the full allotment of the "elite conference factor" points is granted. (There is also a small discount for teams that are one game below .500 in conference play.)

The elite conference factor is multiplied by the correction factor and weighted to 15% of the S-Factor algorithm. This produced results that favored major conference bubble teams over mid-major bubble teams, which more closely resembles the Selection Committee's selections over the past two years (the SEC's Florida getting in, but the MAC's Bowling Green being bounced, etc.). Under the new algorithm, the S-Factor would have predicted 64 out of 64 teams in the 2014 field, and 63 out of 64 in the 2013 field. The new S-Factor algorithm would have missed on Creighton in their last year of membership in the Missouri Valley Conference, but it would have been the only bracket model to have correctly predicted the inclusion of the Big 12's Kansas, which was Charlie Creme's only miss from that year.

The new S-Factor algorithm stays true to the values it has always had: that tournament selection is based ultimately and exclusively on wins and losses, and various ways to gauge the quality of each win and loss, just like the information the Selection Committee uses when they sculpt the field of 64.

No comments:

Post a Comment