Now a guarantee in churning out statistics, **Tennismylife** wants to explain what is the **logic** behind the **records**. Given that numbers are now an integral part of the whole world of tennis, one may wonder what the “** theory**” behind

**these numbers is.**

Everyone is able to understand and** find more or less important statistics**. Everyone knows about** Connors’ famous 109 titles**, even more than** Federer’s 20 Slams**, but how can we frame them better? Let’s take a step back (perhaps very long). Tennis is a sport that is played almost entirely in **knockout tournaments** and, in fact, in **tournaments**. This means that each tournament has **x rounds and only the last one can assign the trophy**. Another peculiarity of tennis is the presence o**f different surfaces: Hard, Clay, Grass, Carpet** (until a few years ago). Then there are the

**categories**:

**. All these elements divide the data into**

*Slam, Masters 1000, ATP 500, ATP 250***categories**and

**sub-categories**that can be a

**union and / or merge of the same**. We can isolate the victories on

**clay**and find its subset in the

**Slams**, which specifically become the matches of

**Roland Garros**and many other related examples. Following the example given at the beginning we can say that

**Connors’ 109 wins are part of the overall tournament wins**and

**Federer’s 20 Slams are a subset of the same**, i.e. those in which the data are

**filtered**according to the category

**Slam**.

Statistics that for the unaccustomed can be so difficult, are easily **framed**. To simplify the work, they have been **divided into the following rows:**

**Stats:**

- Played
- Won
- Count
- Percentage
- Entry
- Youngest
- Oldest
- Average Age
- Timespan

**Round**: R128 – R64 – R32 – R16 – QF – SF – F – W

**Categories**: Overall – Surface – Level – Tournament

**Number 1** refers to the **games played**. This is a **“total” datum**, that is, a **datum that is not a subset of anyone**. From it emerges the famous **1558** of **Connors** (which in the meantime could be changed), followed by the **data concerning Federer’s matches** (also constantly updated). From the statistics it is possible to extract a **subset** which can be that of the **surface**, or the **category**, or a **tournament**.

At **number 2** there are the **wins**. This count concerns a** subset of point 1**, but was chosen as a** stand-alone case** because it is very distant from a conceptual point of view. Everyone wants to know the number of wins of a tennis player rather than the number of matches played by the same, that’s why the distinction.

The most substantial part concerns the **number 3**. The **count-er**. This is nothing more than the count of the **shifts reached by a tennis player.** These rounds range from **R128** to **F** (final) to extend to **victory** (W) which is a special case of the final. This columns the results that merge with the data in line regarding **surface, category and tournaments** to create an **M x N matrix** where **M** is the number of **rounds** taken into consideration and **N** the **subcategories**.

At **number 4** there is a very particular statistic that concerns the **percentage**. A very singular figure in tennis since at each tournament at most you can have **only one defeat and several victories**. This could concern all rounds, but it is better to isolate it only with **categories**, therefore only 4 statistics.

A particular statistic concerns **Entries** (**number 5**), these represent the number of **participations** of a tennis player that can concern a specific tournament, a category, or all tournaments in general.

An increasingly important role in the statistical survey concerns the **ages** of the players (**number 6 and 7**). A bit complex to derive and therefore directly entered in our database, they briefly concern the **youngest** or **oldest** to achieve an achievement. This section can cover both **rounds** and **categories**, so the same matrix is found for the Number 3 section. Youngest and Oldest are completely the same, what changes is the order: **increasing** in the first case, **decreasing** in the second.

A particular section is that of the **aging age** (**number 8**). While youngest and oldest give only 1 data, this calculates the** average age.** However, this only makes sense in a **single tournament**. It would make sense, as always, to extend it also to roudns and categories, but the data would lack consistency.

Another particular fact that is very fashionable today is the * Timespan *(

**number 9**). It is a fundamental parameter for quantifying the

**longevity**of a tennis player. This too is plotted following a matrix as in the case of

**Count, Youngest and Oldest.**

**Source Code**

It seems everything easy, but now** we need to calculate these statistics**. Thanks to the **R language**, which seems made especially for us, a project has been developed that calculates everything necessary. The code can be** downloaded from Github**. Already in the folder there are **HTML pages** that display all the statistics. Starting, as usual, from * Index.html*, you can navigate through the hundreds of pages produced by the code in R. These aren’t automatically updated, however just run the

**Update.R**script to get fresh numbers, thanks to the constantly updated TML Database live. The methods of the code can be used as an

**API**for a possible

**Web App.**

**Additions**

The code will always be updated and enriched with** new methods that serve with a single method to find the data that interests us**. The ultimate goal is ** to have a method for each statistic**. At the moment, the “

**Same Tour**” section has also been added, which deals with statistics regarding a specific tournament, such as: who has won multiple editions of the same tournament, who has more finals in the same tournament, who has more quarterfinals. And this is only the beginning guys!

https://twitter.com/TennisMyLife68/status/1232026325482950656