Building a Rudimentary Player Tracking System Using Computer Vision and Homography

Part One

Jul 09, 2024

Introduction:

Last week, I wrote articles building models using various types of data: advanced, traditional, and even financial. However, the question remains - how do we get the data? We know in the case of traditional stats that scorekeepers traditionally record points, rebounds, assists, steals, and blocks every game. But in recent years, sports (along with various other industries) have turned to computers not just for logging data, but for tracking it in the first place. I might have lost many of the non-tech people already. They might be wondering,

“How can a computer track what a human eye sees?”

Being a Computer Science and Data Analytics student, I had a very basic idea about how computers do this. I had watched a Rajiv Maheswaran TED Talk and some Sloan Sports Analytics Conference panels, and I found it cool, but also very complex.

Regardless, the curiosity persisted. Last Thursday, I finally decided to jump into the project - and while it’s far (and I really mean far) from finished - I thought it would be cool to share some of my results and how I’ve gotten to this point. In addition, I’ll discuss how I plan on improving the project and what I am going to do with the project once it’s “optimized”.

If you’re still looking for an answer to the above question, I’ll explain my methodology below.

Methodology:

YOLO (you only look once) is a state-of-the-art, real-time object detection model and it is used often for sports analysis systems. Before I trained my model, I wanted to see how the YOLO model would do as a test without any dataset.

As you can see, it does a good job tracking players (although I suspect the low resolution and constant collisions in this video play a role in its low confidence next to each persons name. Overall though, not too bad.

Next, I imported a Roboflow dataset and trained the model - and then took the file with the best weights (best.pt) and imported it - that is the model we’ll be using.

As you can see, it does better now, identifying the refs, players, and ball (sometimes, which proved to be a problem that persisted).

Okay, but that is just detection. How about tracking? For anyone wondering what the difference is - detection can tell me which of these players are indeed players, while tracking can tell me which players are which frame over frame.

For tracking, I implemented an algorithm called deepSORT - which does the following:

Feature Extraction:
- Uses a Convolutional Neural Network (CNN) to extract appearance features of detected objects, creating descriptors for tracking.
Association:
- Kalman Filter: Predicts the future positions of objects.
- Cost Matrix: Combines motion and appearance information to match detections to existing tracks.
Track Management:
- Initializes new tracks, updates existing ones, and deletes tracks that remain unmatched beyond a threshold.

If you didn’t understand any of that, don’t worry - it’s not really important just yet. Basically, I chose deepSORT because it is the tracking extension of SORT, which is better for less collisions and overlays. After applying deepSORT, I got this video.

As you can see, there is a lot more consistency.

If you want to know why I switched from the Ja Morant video to this one - it’s exactly what you’re thinking - Ja’s speed and willingness to dunk on people proved to be an issue on my very basic model, which was trained on only 160 images. The more balanced offense of the Spurs and Nuggets offered better solutions. Obviously, when I plan on optimizing this further, I will have to train my model on more and also optimize the parameters.

After I got this result, I was fairly pleased - most players maintain consistent IDs, although the fairly small dataset proved to be an issue as far as keeping some players in the frame. In addition, I think a higher resolution and moving this out of a test phase, where I can spend longer periods of time waiting for one video with higher resolution would prove beneficial.

So what now? Well, I always thought it would be cool if I could see the plays being run by a team from a top-down angle, like a coach was drawing it on a whiteboard in real time. I thought that would be a good use of time. However, this is a more complex process than one would think, even with the player’s positions being tracked. The reason this gets tricky is obvious once you think about it optically - a computer cannot easily translate a broadcast angle to a top-down angle without some more complex mathematics.

I am trying to map coordinates on a 3D plane to a 2D plane. Luckily, OpenCV has a function that can do this for us, given the coordinates of both the court diagram and a frame from the video. As you will see though, it still struggles with elevated camera angles that capture jumpers.

Using the same clip from before, you will see that generally speaking the players are in the right position (although some of them start from out of bounds). With a steadier camera angle and a compression factor, they will be in the right spot - so for my next update, I will have that.

More importantly though, the movement is correctly captured - and in the frame where the ball is being tracked, it shows as a red dot. You will see though, that when the camera angle is elevated, the players all fall down the court. This will all be corrected before the next update.

I tested this again on a different video from the same series:

The tracking is much better in this one with no major difference in camera elevation. The IDs are being randomly switched constantly because I have to include a dictionary for names and teams.

So what next?

As you can tell, this project is far from finished. I blogged about it partially because I was excited, but also because I needed to take a break from it and get back to some more traditional predictive models.

What will I have by the next post about this project?

Better Homography Mapping
Offensive/Defensive Distinctions
Consistent IDs/Names

I hope to use this model to eventually track my own recreational basketball games and draw some stats from them, although I’ll tell the computer to cover its eyes when I brick my 8th three pointer.

Thanks to everyone that bothered to read! If you enjoyed the project - drop a sub. For the next few days we’ll take a break from the computer science and talk some NBA Draft/Olympics.

Brief shoutouts:

Nick Kalinowski (@kalidrafts on Twitter), who inspired me to create this model and is the best young analytics person out there.

Gaurav Mohan, who provided great insight on homography mapping.

And last but certainly not least, my lovely girlfriend, who may not have a deep understanding of basketball or computer science but always encourages me to chase my dreams.

The Zone Master

Discussion about this post