Basketball In Game Interactions with Neo4j

This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.

Analytics, Business, Data Modeling, ONgDB

June 17, 2016

Ben Nussbaum

ONgDB NBA Game Analysis in Real-Time

NBA Game Interactions with ONgDB on GraphGrid The NBA has enjoyed explosive growth in recent years; so much so that its TV deal, currently fetching $930 million annually from ESPN and Turner, will raise that number to $2.6 billion beginning next season, a 180 percent increase. In addition to its globalization, nutritional advancement, and technological progress, the quality of play itself has been consistently climbing season after season. Much of this trend can be attributed to team staffs making better decisions about personnel, playing time, play style, matchups, lineups, and the like. And as much as Barkley and other old-school players would like to minimize its impact, it is undeniable that the best teams who make the best decisions have a common underlying focus: data.

Hard data, and how to interpret it (or “analytics”). Finding patterns and adjusting accordingly is crucial in any field. It is certainly no less applicable in basketball, whether it be within your own team, your opponents, or player prospects. All this data can be easily and efficiently stored within a graph database, where anything can be a node. Players, coaches, teams, games, stats, possessions, arenas, management, even gear – these are just some of the things that can relate with each other to have an impact on the ultimate goal of winning.

Below is a sample of a few Spurs possessions from one quarter of an old Suns game:
Spurs Suns In Game Interactions
Each Possession (green) has a time stamp of the number of seconds on the clock when the possession started, and when it ended. Each Touch (blue) has an “OCCURRED_IN” relationship to its possession. Each Player (purple) has a “TOUCHED_BY” relationship to every touch that belongs to them, and each Opponent (pink) has a “DEFENDED_BY” relationship to the touches they were defending. They can also have any number of properties about the player (physical attributes, jersey number, position, college, salary, etc.). Finally, each Event (red) has a timestamp, as well as either a “STARTED_BY” or “ENDED_BY” relationship to its touch (for example, a Touch may begin with a “Received Pass” event, and end with a “Made 3” event). These Event nodes also contain coordinates corresponding to where the player was on the court when it happened. We could have tracked the position of every player during every event, perhaps giving each event ten “ON_COURT” relationships containing the coordinates of the player.

STATS LLC, an NBA partner supplies the player tracking technology SportVU to all NBA arenas, collects data that could benefit the teams when connected with data from other sources in the Open Native Graph Database (ONgDB). Their data is including each player’s speed, spacing, position on the court, and ball possessions. All of this interconnectivity afforded by the Open Native Graph Database (ONgDB) over any other kind of data storage makes tracking advanced metrics as simple as writing a few lines of Geequel code. With all this data streaming into ONgDB in real-time, analysis could become even more interactive court-side.

How often does Kawhi Leonard shoot the ball when he touches it while guarded by Marcus Morris?

MATCH (p:Player {name: "Kawhi Leonard"}), (d:Player {name: "Marcus Morris"}) WITH p, d 
MATCH (p)-[:TOUCHED_BY]->(t:Touch)<-[:DEFENDED_BY]-(d) 
WITH COLLECT(t) AS ts, COUNT(t) AS total 
UNWIND ts AS t MATCH (t)<-[:ENDED_BY]-(e:Event) WHERE e.type IN ["Made 2", "Made 3", "Missed 2", "Missed 3"] WITH total, COUNT(e) AS shots 
RETURN 1.0 * shots / total;

How well does he shoot in those cases?

MATCH (p:Player {name: "Kawhi Leonard"}), (d:Player {name: "Marcus Morris"}) WITH p, d 
MATCH (p)-[:TOUCHED_BY]->(t:Touch)<-[:DEFENDED_BY]-(d) WITH t 
MATCH (t)<-[:ENDED_BY]-(e:Event) WHERE e.type IN ["Made 2", "Made 3", "Missed 2", "Missed 3"] 
WITH COLLECT(t) AS ts, COUNT(t) AS total 
UNWIND ts AS t MATCH (t)<-[:ENDED_BY]-(e:Event) WHERE e.type IN ["Made 2", "Made 3"] WITH total, COUNT(t) AS made 
RETURN 1.0 * made / total;

Whom does he shoot best against?

MATCH (p:Player {name: "Kawhi Leonard"}), (d:Player) WITH p, d 
MATCH (p)-[:TOUCHED_BY]->(t:Touch)<-[:DEFENDED_BY]-(d) WITH d, t 
MATCH (t)<-[:ENDED_BY]-(e:Event) WHERE e.type IN ["Made 2", "Made 3", "Missed 2", "Missed 3"] 
WITH d, COLLECT(t) AS ts, COUNT(t) AS total 
UNWIND ts AS t MATCH (t)<-[:ENDED_BY]-(e:Event) WHERE e.type IN ["Made 2", "Made 3"] 
WITH d, total, COUNT(t) AS made 
RETURN d.name, 1.0 * made / total AS shootingPercentage ORDER BY shootingPercentage DESC;

These can all be run within a time period (a month, a season, etc). Or against a certain team. Or on touches in the last five minutes of a game with a score difference of ten or less. Or only for touches in a nationally televised game on an odd day of the month against a team whose head coach has a mustache and a Netflix account, while being defended by a player born in February who went to a college with a jungle cat for a mascot. If you have the right data in a decent graph format, you can run anything you like. The queries only get minimally more complex, and they’re still performant.

Enjoy Game 7!