Behind the Scenes: Building a Tennis Player Stats API for the Demi Schuurs WTA Project

Go behind the scenes of the Demi Schuurs WTA project and discover how a dedicated tennis player stats API was designed, built, and optimised for doubles-focused performance and reliability.
Behind the Scenes: Building a Tennis Player Stats API for the Demi Schuurs WTA Project

Introduction: Turning a Player Profile into a Data Product

Most tennis fans see a player website as a collection of photos, match results, and a bio page. Under the hood, the best modern player sites work more like data products: they aggregate live stats, structure them into an API, and power consistent experiences across web, mobile, and social channels.

This was exactly the challenge behind the Demi Schuurs WTA project: create a rich, data-driven profile for one of the world’s top doubles specialists, powered by a reliable tennis player stats API. In this behind-the-scenes breakdown, we’ll walk through how the API was planned, designed, and implemented, and which lessons you can apply to your own sports stats projects.

Why a Dedicated Tennis Player Stats API?

When you’re building a website for a specific athlete, it’s tempting to hardcode stats or embed third-party widgets. For the Demi Schuurs project, we quickly realised those approaches wouldn’t work.

We needed:

  • Accuracy: Doubles specialists like Demi compete in multiple tournaments per month. Results change fast.
  • Consistency: The same numbers had to appear on match history pages, season summaries, and career overview sections.
  • Reusability: The data should be usable for future mobile apps, dashboards, or even social media bots.
  • Performance: Player pages must feel instant, even when querying years of match history.

These goals essentially forced the project into an API-first direction. Instead of building a site that happens to have some data, we built a structured tennis player stats API and then layered the Demi Schuurs user experience on top of it.

Clarifying the Scope: What “Stats” Really Mean

Before touching any code, we had to decide what “stats” actually meant for this project. Tennis data can quickly become overwhelming, especially in doubles where you track two players per team, separate rankings, and different tours.

For Demi Schuurs, we prioritised:

  • Core player data: Name, nationality, handedness, date of birth, height.
  • Career overview: Career titles, best ranking, prize money, win–loss record (overall, by surface, and by tournament level).
  • Year-by-year stats: Annual win–loss, titles per year, finals reached.
  • Match-level data: Tournament, round, surface, scoreline, partner, opponents, and outcome.
  • Ranking history: Week-by-week doubles ranking, with the ability to chart trajectory over time.

We deliberately left out hyper-granular tracking like serve speeds or in-depth point-by-point stats for the first phase. That decision made the API lean, easier to maintain, and more focused on what fans and partners actually needed.

Designing the Data Model for a Doubles Specialist

Most generic tennis APIs are built around singles players and treat doubles as an afterthought. For a player like Demi Schuurs, the opposite is true: doubles is the main event. That shaped our data model in several ways.

Key Entities

At the heart of the system are a few core entities:

  • Player: Base information for every player, not only Demi, so match data remains relational.
  • Team: A doubles team, typically composed of two players. A single player can appear in many teams across seasons.
  • Match: The main container for results, storing participants, tournament, surface, score, and outcome.
  • Tournament: Tournament name, category (e.g. WTA 500, Grand Slam), location, and surface.
  • RankingSnapshot: Player ranking and points at specific dates.

Handling Doubles Teams

Doubles adds some complexity. Demi may partner with different players throughout the season, and the same pairing can reappear years later. Instead of putting a “partner_name” field on the match, we modelled a Team entity:

  • Each Team is identified by a combination of player IDs.
  • Matches reference two teams: team_a_id and team_b_id.
  • This makes it trivial to generate stats per partnership (e.g. “Demi & Desirae Krawczyk” record).

This structure also future-proofs the API in case we later add mixed doubles or other formats.

Choosing an API Style: REST for Simplicity

GraphQL is popular for sports data, but for the Demi Schuurs project we chose a conventional REST API. The main reasons were:

  • Client simplicity: The front-end team could hit predictable endpoints like /players/{id}/matches without designing GraphQL queries.
  • Caching: RESTful URLs map nicely to HTTP caching at the CDN layer.
  • Future integrations: Media partners and smaller apps are often more comfortable consuming REST JSON APIs.

The structure ended up looking like this (simplified):

  • GET /players/{id} – Player profile and static info.
  • GET /players/{id}/stats – Aggregated stats (career + current season).
  • GET /players/{id}/matches – Paginated match history, with filters.
  • GET /players/{id}/rankings – Ranking timeline for graphs.
  • GET /teams/{id}/stats – Partnership-specific stats.

Data Sources: Official Feeds vs. Custom Scraping

One of the most sensitive parts of any sports stats API is data sourcing. You must consider:

  • Licensing: Are you allowed to collect and redistribute the data?
  • Reliability: How often is the data updated? How stable is the format?
  • Coverage: Does the feed include doubles, qualifiers, and smaller events?

For the Demi Schuurs project we combined:

  • Official sources: Structured feeds and publicly documented endpoints for rankings and results, within license constraints.
  • Controlled scraping: Carefully built scrapers to fill in gaps where no official machine-readable format existed, respecting robots.txt and legal boundaries.

Incoming data passes through a normalization pipeline to map fields into our internal schema. For example, tournament names can appear with minor variations; we use canonical IDs to avoid treating “US Open” and “US Open Tennis Championships” as different events.

ETL Pipeline: From Raw Results to Clean Stats

Raw tennis data is messy. Withdrawals, walkovers, super tiebreaks, and rankings updates all introduce edge cases. We built an ETL (Extract, Transform, Load) pipeline dedicated to the Demi Schuurs API.

Extraction

Scripts fetch new results and ranking updates on a schedule:

  • Cron jobs run every few hours during tournaments and less frequently off-season.
  • Each job stores the raw payloads for auditing and reprocessing if needed.

Transformation

Transformation is where the “tennis logic” lives:

  • Map tournament codes to canonical tournament IDs.
  • Clean and parse scorelines, including match tiebreaks (e.g. 10–8 super tiebreaks).
  • Detect doubles matches and build / update Team records.
  • Resolve player identities, merging duplicates when minor spelling differences occur.

Loading

Finally, cleaned data is loaded into the main database:

  • New Match rows for each result, with foreign keys to Player, Team, and Tournament.
  • Aggregated snapshots to speed up career stats queries.
  • Ranking snapshots for each new ranking list.

This pipeline keeps the API fast and stable, while allowing us to correct errors centrally and re-run transformations when needed.

Performance and Caching: Serving Stats at Speed

Tennis stats can become expensive to compute if you recalculate aggregates on every request. For the Demi Schuurs WTA project, the biggest risks were:

  • Calculating career totals from thousands of matches.
  • Filtering match history by year, surface, and tournament level in one query.
  • Generating ranking charts across many seasons.

Precomputed Aggregates

Instead of calculating everything on the fly, we maintain precomputed aggregates per player:

  • Career win–loss.
  • Win–loss by surface and season.
  • Titles and finals by year.

When new match results are ingested, we incrementally update these aggregates. That turns potentially heavy queries into simple lookups.

Response Caching

On top of that, we layer caching at different levels:

  • Database caching: Common queries cached using a key-value store like Redis.
  • HTTP caching: Player stats endpoints served via a CDN with sensible cache headers.
  • Client caching: Front-end code that only refreshes data when needed, not on every page navigation.

Because Demi’s stats change mostly when she plays, not every minute, a cache lifetime of minutes to hours is often acceptable and dramatically improves perceived performance.

API Design Details That Matter

Beyond the high-level architecture, a few small API decisions had big downstream effects on usability and reliability.

Consistent Identifiers

We standardised on UUIDs for internal objects but exposed stable, readable slugs for public consumption. For example:

  • /players/demi-schuurs instead of /players/123.
  • /teams/demi-schuurs-desirae-krawczyk for partnership-based stats.

This makes URLs more user-friendly and resilient if internal IDs ever change.

Flexible Filtering

The /players/{id}/matches endpoint supports filters like:

  • ?season=2024
  • ?surface=clay
  • ?tournament_level=wta_1000
  • ?partner_id=XYZ

These filters make it easy to power UI features like dropdowns for season selection or surface toggles without additional endpoints.

Versioning and Stability

To avoid breaking front-end code or future partners, we namespaced endpoints with an API version, e.g. /v1/players. Any breaking change in response structure or behaviour will be introduced under /v2, leaving existing consumers unaffected.

Integrating the API into the Demi Schuurs Website

Once the API stabilised, integrating it into the Demi Schuurs WTA site was mostly a matter of front-end design and UX.

Dynamic Player Profile

The main profile page pulls from multiple endpoints:

  • Basic bio info from /players/{id}.
  • Headline stats (titles, best ranking, win–loss) from /players/{id}/stats.
  • Current ranking and ranking trend indicators from /players/{id}/rankings.

Because the data is centralised, every part of the site always shows the same numbers, which is critical for trust and professional appearance.

Interactive Match History

The match history section uses the matches endpoint with infinite scroll and filters. Fans can:

  • Drill down into specific seasons.
  • Check performance on particular surfaces.
  • See how Demi performs with different partners.

All of this is powered by simple API queries, which makes the front-end code thinner and easier to maintain.

Key Lessons from the Demi Schuurs WTA Stats API

Building a dedicated tennis player stats API for the Demi Schuurs project surfaced several lessons that apply to similar sports and data initiatives.

1. Model the Sport, Not Just the Data

The data model only made sense once we leaned into what makes doubles unique: teams, partnerships, and tournaments where doubles follows different patterns than singles. Copying a generic tennis schema would have missed the nuances that fans care about.

2. Start Small, Then Iterate

We resisted the urge to track every micro-statistic from day one. Instead, we focused on reliable core stats and expanded only where there was a clear demand. That discipline kept the ETL pipeline manageable and the API responsive.

3. Own Your Data Quality

Even when consuming official sources, you need your own validation and normalization. Canonical tournament IDs, deduplicated player records, and a consistent scoring format were essential to delivering clean stats for Demi’s site.

4. Design for Reuse

By treating the stats layer as a standalone API rather than a byproduct of the website, we created a foundation that can power mobile apps, social graphics, automated reports, and more—without rewriting the logic every time.

Where This API Can Go Next

The current Demi Schuurs tennis stats API already covers most fan-facing use cases, but there’s plenty of room for growth:

  • Advanced analytics: Serve/return efficiency, clutch performance, and partner-by-partner breakdowns.
  • Live integrations: Real-time score updates during matches to power live dashboards.
  • Open partner access: Secure keys for media partners or tournament organisers to integrate Demi’s stats into their own platforms.

Because the core architecture is in place, these enhancements are incremental rather than foundational rewrites.

Conclusion

Behind the polished interface of the Demi Schuurs WTA project sits a carefully designed tennis player stats API. By focusing on data modelling, ETL reliability, performance, and a doubles-first perspective, the project delivers accurate, fast, and flexible stats that can evolve with Demi’s career.

If you’re considering building your own tennis stats platform—whether for a single player, an academy, or an entire tour—the principles behind this project are a solid starting point: clarify your scope, respect the specifics of the sport, build an API-first backbone, and treat data quality as a first-class feature.

Subscribe to my newsletter

Subscribe to my newsletter to get the latest updates and news

Member discussion