A Statistical Approach to Predicting Rookie Scale Extensions
Someone needs to take my TI-84 away, I'm watching too much basketball on it.
If you couldn’t tell from my blogs, I am not a salary cap wizard. I don’t really know the first thing about the financials of the NBA besides the basics - rookies have good value, they get extended, NBA players get paid a lot of money. In all honesty, the financials of the NBA are important in general, but not so important to me. Just like many other fans, I am pretty uneducated on contracts (although I try not to offer opinions on things before I know at least the basic amount).
Some of the bad discourse around contracts comes from a lack of general knowledge about how high the cap is - most people think we’re still living in 2014. For context, in 2014, Kobe Bryant made $30,453,805. For 2013, that was a massive amount of money. He made 51.90% of the league cap - a number that no one has eclipsed since then. With the inception of a new collective bargaining agreement in the mid-2010s (I am not smart enough to explain it), the league cap exploded. Two collective bargaining agreements later, NBA players make so much more than they did 10 years ago. While 30 million dollars isn’t insignificant - it isn’t 50% of the league cap anymore. Scottie Barnes recently signed a rookie scale extension that pays him roughly 8 million dollars more than Kobe made in 2014. His cap percentage is 24.93%. As most “nerds” would try to explain: percentages over totals (I will be a hypocrite later in this article).
For anyone that doesn’t know what a rookie scale extension is - I’ll explain. When a player gets drafted into the NBA, they are on a rookie deal - the higher the draft pick, the more money. But even if you’re the #1 pick, you are on a great value contract. To give an example, Victor Wembanyama, who was probably one of the top 15 players in the league this year, made only (bear with me) $12,160,680 this year, which equates to about 8.94% of the league cap. So rookies are on pretty good deals until they prove themselves, and then after three years, their team can choose to extend them, but at a high price. Scottie Barnes made only 5.89% of the league cap, but in 2025-26, he will make 24.93%, as stated before.
Seeing the Barnes extension, I started to become curious about what the word “overpaid” or “underpaid” means? What criteria are we using to determine whether someone is OVERpaid or UNDERpaid? Both of those are relative terms, so it would be possible to see if a player is in fact either one of those compared to recent rookie scale extensions. Seeing the potential for comparison, I decided to build a model. Some of the methodology for this is derived from Nylon Calculus, who did a similar exercise - but they did the same level of gatekeeping I am going to do - so it wasn’t all that helpful.
Rather than calculate market value for all players - which requires way more data and significantly more time in regards to feature engineering - I decided to specifically focus on rookie scale extensions.
Whether a team decides to give that extension or let the player test restricted free agency is up to them - it’s not that the 10 best players are the ones who get the extensions, it’s just about figuring out what is good value. If you need an example, Santi Aldama is likely to get a rookie scale extension, but Jalen Green isn’t.
Methodology:
The model is relatively simple, but it’s effective. I’ll explain it below.
I decided to use an Ordinary Least Squares regression model for interpretability and efficiency, and I aggregated my data from Basketball Reference and Spotrac (not the best hours of my life).
My training dataset was players that signed their rookie scale extensions from 2018 to 2022. My test dataset was players that signed in 2023, and my prediction dataset was players that will be signing their extensions this year. I chose the 23 eligible rookies.
The first difficult task was feature selection. The first go-around, I decided to choose the stats myself - I went with VORP, PPG, TS%, PER and BPM. Four of the five are advanced impact stats, but PPG isn’t.
I chose PPG because while teams are assessing current impact, they are also trying to project which players could potentially turn into stars - and stars have to maintain a large burden of the offense. If a player has already demonstrated the ability to handle that, they are more likely to inspire confidence in the owner and general manager. PPG, while not an advanced stat, was unsurprisingly extremely significant when it came to predicting the salary cap percentages of future players. I chose the other three impact stats partially to see which of them would be most valuable, but also because I wanted to encompass defense without leaning too heavily in that direction.
The model had an R^2 value of 0.736 and adjusted R^2 value of 0.710, meaning that the independent variables I chose accounted for about 73% of the variance in cap percentage. That means it’s pretty good, but there’s room for improvement. There was less statistical correlation with TS%, PER, and BPM than other features I could’ve selected. So while it may not look like a large material change between the values in the more optimized model - it is definitely worse. Although it did a decent job (see below) projecting the extensions, I was missing some key aspects, mainly on the defensive end.
To find the optimized features, I used recursive feature elimination and lasso regression on a list of advanced stats and PPG - and I found 6 features to be statistically significant, and then a large drop-off. I decided to use VORP and Usage Rate along with the 6 features, just because I wanted an impact stat and I think a high usage rate suggests some trust in a player by his team, so that would explain why my adjusted R^2 is lower than expected(the adjusted R^2 is penalized for using features that aren’t as significant).
I then aggregated new data, trained the model again on the new data, and found the new R^2 to be 0.819, while the adjusted R^2 was 0.777 (which could definitely be improved with the removal of VORP and Usage Rate). Overall, it was pretty effective, but also gave for some surprising results for certain players.
The findings are interesting, but also validating for NBA teams. Some players, like Scottie Barnes do actually deserve the extensions they received (give or take a couple million) - and the numbers back it up. I’ll dive into some of players’ projections below.
Analysis:
“Thank you for the test cases”:
Scottie Barnes(projection: 21.8641%, actual: 24.93%):
Scottie might be slightly overpaid, but the difference likely isn’t significant. Overall, his two-way play makes him an analytics darling and given his relatively decent volume as a scorer, he likely is worth the 24.93% of the cap.
Cade Cunningham(projection: 23.0092%, actual: roughly 25%):
Cade also inks a max extension(his contract is technically worth more than Scottie’s if Scottie doesn’t hit some of his Rose Rule eligibilities). Cade doesn’t have any of that attached. Besides that though, the projection is roughly accurate with Cade as well - his impact stats and all-around play, along with his high scoring load makes him worth it.
“We should probably give him a max soon”:
Evan Mobley(projection: 21.9936%):
Mobley is going to get a max, no doubt. Significant, however, was the change from the first model to second model. After getting rid of impact stat features and introducing more defensive features, Mobley shot up about five percent - a difference of $7,133,868 in his first year.
Alperen Sengun(projection: 22.2916%):
Sengun, the Turkish prodigy, has proved his draft day selection wrong and been a post-up force in the NBA, along with his playmaking and passing. His projection runs correctly with the contract he should be offered - although it looks like the Rockets won’t be giving him a max extension.
Franz Wagner(projection: 21.2288%):
Franz, despite his shooting woes this season, has been an effective defender and scorer in his three years in the NBA, justifying a large extension. This is unsurprising. To justify a second max contract, ala Jayson Tatum, he would need to improve the shooting though.
“I would feel really scared about this”:
Jalen Green(projection: 20.1534%):
The result makes sense - Jalen has not been terribly inefficient given his scoring load - and taking on that much offensive responsibility in your first three years is pretty positive. But the model does tax him for his defensive shortcomings, and if they were to max both Sengun and Green (they could), there are questions about style. Are they an optimal pairing? I don’t subscribe to “the team played better with this guy, so that means he’s better” when it comes to small sample sizes. I’m not going to throw away months of uncertainty over a good three weeks.
Cam Thomas(projection: 20.8990%):
Similar to Green, Cam Thomas can take on offensive load and do relatively good things. The issue is just that he is bad at pretty much everything else. This group of players is interesting - normally, taking on high scoring loads is the part most players have to ease into - and they develop other skills alongside it. These two both have never had an issue with scoring, it’s the other things they struggle with. If you wanna get my opinion - no, I would not offer Cam Thomas 32 million dollars a year.
Josh Giddey (projection: 17.8887%):
I don’t worry about Giddey because of his role this past season. Yes, he was bad - but he played a completely different scheme than he was used to. Instead, I worry about the ceiling of a team he’s the lead initiator on. His impact stats even when he did have a high offensive load were not great, and while he can develop into a catch-and-shoot threat, will he ever be good enough off the dribble to validate paying him 28 million dollars a year? I’m not confident he could be a great player on a good team - but I do think he could be a good player on a middling one.
“Let’s sign this guy before we owe him 35 million dollars a year”:
Trey Murphy III(projection: 17.5216%):
Trey Murphy gets “only” 17.5216% of the cap because he hasn’t demonstrated the ability to handle offensive burden yet. The model values points per game the highest, and while his numbers aren’t bad - they aren’t stellar either. He’s in the realm of ‘elite role player’ to ‘potential all-star’. He’s the opposite of Giddey - I’m confident he could be a very good player on a great team, but does that justify paying him a max contract? I personally think so, but that is based on recent signings: OG Anunoby, Mikal Bridges, etc. I think the Pelicans would be doing themselves a big favor by signing him soon.
Jonathan Kuminga(projection: 17.0949%):
Kuminga took a jump this year - and he should be poised to handle more scoring duty in the next season with Klay Thompson gone. His two-way potential is still massive and I have confidence he can put it all together. The Warriors would be getting a steal signing him for only 17.09% of the cap.
Jalen Johnson(projection: 15.1412%):
The Hawks need to make this deal happen right now. He is really, really good.
Jalen Suggs(projection: 15.2096%):
A really good defensive guard who has shown enough catch-and-shoot ability to validate an extension of this size. A steep overpay would be anything over 33 million dollars a year though.
Santi Aldama(projection: 8.6282%):
Aldama is an advanced stats god - it’s a good thing we have PPG to kinda level this thing out, otherwise he might have been making 25 million dollars a year or something. I’d expect him to get somewhere in this percentage range though.
Credit to Nylon Calculus for helping inspire the idea. Overall, I found this exercise pretty interesting - I plan on tweaking the model even more. I’ll try to host the projections online somewhere if I decide to move off Substack. I’ll be working on other models this month.