Pandas DataFrames

Pandas DataFrames#

image, alt = "An image of a panda working on some code with a smaller panda sat next to them."

Learning Objectives#

Understand the advantages of using Pandas DataFrames.
Be able to create a DataFrame by importing data or converting a dictionary.
Be able to present, locate, filter and modify data within a DataFrame.
Be able to use use DataFrame functions

Today we are going to look at a new Python library called Pandas. This library may be used when analysing more complex datasets.

Why use Pandas?#

Up to this point in your degree it is likely that you have only worked with small data sets and that analysing your data has not posed too many challanges beyond those associated with learning a new skill. For this sort of work the methods and tools we have used to this point are more than sufficient, however as we begin to work with larger and more complex datasets, these methods may become inefficient and labourious - enter pandas. The key advantages of learning to use pandas are:

Pandas is very useful for working with large amounts of data.
Unlike numpy data does not have to be numeric.
Data can be cleaned using pandas.
Data can be filtered using pandas.
Subsets of data can be extracted easily from a pandas dataframe.
Pandas is used in data science.

One key disadvantage of using numpy is that it is unable to use categorical data (e.g. names, places, types, etc.) and instead can only deal with numbers. Take a look at the example data below and what happens when we try to feed it into numpy from a csv file:

Name	Age	Eye Colour	Height (m)
Joan	54	Brown	1.54
Davide	36	Blue	1.85
Liam	29	Hazel	1.74
Emily	28	Green	1.64
Carlos	15	Green	1.77
Katie	11	Blue	1.73
Ross	26	Blue	1.79
Zack	35	Hazel	1.55
Daisy	78	Brown	1.50
Kim	7	Blue	1.10
Schmonathan	68	Brown	1.68
Jamantha	57	Green	1.43
Carlos	20	Amber	1.91

[[  nan   nan   nan   nan]
 [  nan 54.     nan  1.54]
 [  nan 36.     nan  1.85]
 [  nan 29.     nan  1.74]
 [  nan 28.     nan  1.64]
 [  nan 15.     nan  1.77]
 [  nan 12.     nan  1.73]
 [  nan 26.     nan  1.79]
 [  nan 35.     nan  1.55]
 [  nan 78.     nan  1.5 ]
 [  nan  7.     nan  1.1 ]
 [  nan 68.     nan  1.68]
 [  nan 57.     nan  1.43]
 [  nan 20.     nan  1.91]]

It is clear that numpy is unable to be used when processing this data as it returns “nan” values (nan = not a number). Now let’s try using a pandas to import the same data:

	Name	Age	Eye Colour	Height (m)
0	Joan	54	Blue	1.54
1	Davide	36	Brown	1.85
2	Liam	29	Hazel	1.74
3	Emily	28	Green	1.64
4	Carlos	15	Green	1.77
5	Katie	12	Blue	1.73
6	Ross	26	Blue	1.79
7	Zack	35	Hazel	1.55
8	Daisy	78	Brown	1.50
9	Kim	7	Blue	1.10
10	Schmonathan	68	Brown	1.68
11	Jamantha	57	Green	1.43
12	Carlos	20	Amber	1.91

Pandas has not only been able to read in all of the data from the csv file, but it has also presented the data much more clearly. This is what we refer to as a DataFrame.

Creating A DataFrame from Scratch#

It is often the case that rather than importing data into a DataFrame, we may just want to create one in our script. To do this is relatively simple, just take a look at the snippet of code below:

First I will create a dictionary containing information about various european countries.
Secondly, I will convert this into a pandas DataFrame.

Take a look at the lines of code below and ensure you understand how it is constructed.

# Creating my dictionary
euroData = {'Country': ['United Kingdom', 'Italy', 'France', 'Sweden', 'Finland', 'Malta', 'Germany'],
        'Capital City': ['London', 'Rome', 'Paris', 'Stockholm', 'Helsinki', 'Valletta', 'Berlin'],
        'Population': [67330000, 59110000, 67750000, 10420000, 5541000, 518536, 83200000],
        'Life Expectancy': [80.90, 82.34, 82.18, 82.41, 82.13, 82.65, 80.94]}

# Converting my dictionary into a pandas DataFrame
euroData = pd.DataFrame(data=euroData)
display(euroData)

	Country	Capital City	Population	Life Expectancy
0	United Kingdom	London	67330000	80.90
1	Italy	Rome	59110000	82.34
2	France	Paris	67750000	82.18
3	Sweden	Stockholm	10420000	82.41
4	Finland	Helsinki	5541000	82.13
5	Malta	Valletta	518536	82.65
6	Germany	Berlin	83200000	80.94

What can we do with a DataFrame?#

Importing Data#

Importing data using pandas is essentially the same as importing data using numpy:

First, we will import pandas into the script in the same way we did for numpy.
Next, use we use “read_csv” to import the data into a DataFrame.

import pandas as pd
Data = pd.read_csv("PeopleData.csv")

Getting to know our DataFrame#

In order to get a feel for the data that we are working with, there are a few pandas commands we can use. We can look at our enitre DataFrame or just the top/bottom 5 rows of the DataFrame using appropriate commands.

# This is how we look at the entire DataFrame
display(Data)

	Name	Age	Eye Colour	Height (m)
0	Joan	54	Blue	1.54
1	Davide	36	Brown	1.85
2	Liam	29	Hazel	1.74
3	Emily	28	Green	1.64
4	Carlos	15	Green	1.77
5	Katie	12	Blue	1.73
6	Ross	26	Blue	1.79
7	Zack	35	Hazel	1.55
8	Daisy	78	Brown	1.50
9	Kim	7	Blue	1.10
10	Schmonathan	68	Brown	1.68
11	Jamantha	57	Green	1.43
12	Carlos	20	Amber	1.91

# This is how we take a look at the top 5 rows of the DataFrame.
Data.head()

	Name	Age	Eye Colour	Height (m)
0	Joan	54	Blue	1.54
1	Davide	36	Brown	1.85
2	Liam	29	Hazel	1.74
3	Emily	28	Green	1.64
4	Carlos	15	Green	1.77

# This is how we take a look at only select columns from the top 5 rows of the DataFrame.
Data[["Name", "Age"]].head()

	Name	Age
0	Joan	54
1	Davide	36
2	Liam	29
3	Emily	28
4	Carlos	15

# This is how we take a look at the bottom 5 rows of the DataFrame.
Data.tail()

	Name	Age	Eye Colour	Height (m)
8	Daisy	78	Brown	1.50
9	Kim	7	Blue	1.10
10	Schmonathan	68	Brown	1.68
11	Jamantha	57	Green	1.43
12	Carlos	20	Amber	1.91

# This is how we take a look at only select columns from the bottom 5 rows of the DataFrame.
Data[["Name", "Age"]].tail()

	Name	Age
8	Daisy	78
9	Kim	7
10	Schmonathan	68
11	Jamantha	57
12	Carlos	20

We are also able to gauge properties of the DataFrame such as what the each column represents, the types of data that it contains, the shape of the DataFrame and information about the indexing:

An example of how to find out what information is contained in each column:

print(Data.columns)

Index(['Name', 'Age', 'Eye Colour', 'Height (m)'], dtype='object')

An example of how to find out what type of data is stored in the DataFrame:

print(Data.dtypes)

Name           object
Age             int64
Eye Colour     object
Height (m)    float64
dtype: object

An example of how to find out what the shape of the DataFrame is:

print(Data.shape)

(13, 4)

An example of how to find out the indexing of the DataFrame:

print(Data.index)

RangeIndex(start=0, stop=13, step=1)

To look at specific entries in a DataFrame we can use the “.loc” and “.iloc” functions.

When using loc we are specifying which entry in the DataFrame we want using the index labels, whereas iloc uses the index positions. If our index labels are just our index positions, there will be no difference when using loc or iloc bar the fact that loc gives an inclusive range of results whereas iloc gives an exlusive range of results.

Make sure this is clear to you by reviewing the difference in the outputs from the snippets of code below:

# Using loc
print(Data.loc[0], "\n \n")
print(Data.loc[0:0], "\n \n")
print(Data.loc[0:3], "\n \n")
print(Data.loc[0][2])

# If we had added an alternative index to the data frame that did not begin at zero
# e.g. (101, 102, 103, ...) we would use these values when using loc.

Name          Joan
Age             54
Eye Colour    Blue
Height (m)    1.54
Name: 0, dtype: object 
 

   Name  Age Eye Colour  Height (m)
0  Joan   54       Blue        1.54 
 

     Name  Age Eye Colour  Height (m)
0    Joan   54       Blue        1.54
1  Davide   36      Brown        1.85
2    Liam   29      Hazel        1.74
3   Emily   28      Green        1.64 
 

Blue

# Using iloc
print(Data.iloc[0], "\n \n")
print(Data.iloc[0:0], "\n \n")
print(Data.iloc[0:3], "\n \n")
print(Data.iloc[0][2])

# If we had added an alternative index to the data frame that did not begin at zero
# e.g. (101, 102, 103, ...) we would still use the index location (e.g. 0, 1, 2).

Name          Joan
Age             54
Eye Colour    Blue
Height (m)    1.54
Name: 0, dtype: object 
 

Empty DataFrame
Columns: [Name, Age, Eye Colour, Height (m)]
Index: [] 
 

     Name  Age Eye Colour  Height (m)
0    Joan   54       Blue        1.54
1  Davide   36      Brown        1.85
2    Liam   29      Hazel        1.74 
 

Blue

Filtering Data#

It is often the case when working with large datasets that we are only actually interested in a subset of the data. For example, using the “Data” DataFrame above I may only want to see the heights of the participants in this particular study. To this I can simply filter this information from the DataFrame:

Heights = Data["Height (m)"]
print(Heights)

   1.54
   1.85
   1.74
   1.64
   1.77
   1.73
   1.79
   1.55
   1.50
   1.10
  1.68
  1.43
  1.91
Name: Height (m), dtype: float64

In the example above I have created a new DataFrame containing just the heights that I may then use in my analysis moving forward. This saves me having to extracting the information from the larger DataFrame everytime that information is required. Either method is fine.

We can see that when the “Heights” DataFrame has been printed out that the index is also shown. If we would like to extract a particular value from the DataFrame we can do so using the index:

selectedHeight = Heights[6]
print(selectedHeight)

1.79

In the example above “selectedHeight” was extracted from the DataFrame “Heights” in the same way that we would extract information from a numpy array or list. If we wanted to select this Height from the original larger DataFrame “Data”, we would simply use two sets of square brackets:

selectedHeight = Data["Height (m)"][6]
print(selectedHeight)

1.79

In some instances it may be useful to know all of the unique values in your DataFrame. For example, perhaps from my sample of data I would like to know all of the unique eye colours of the people that were sampled:

UniqueEyeColours = Data["Eye Colour"].unique()
print(UniqueEyeColours)

['Blue' 'Brown' 'Hazel' 'Green' 'Amber']

Or perhaps I just want to the information relating to one of the particpants in my survey. Look carefully at the structure of the line of code below used to extract the information associated with Carlos:

CarlosInfo = Data[Data["Name"] == 'Carlos']
print(CarlosInfo)

      Name  Age Eye Colour  Height (m)
4   Carlos   15      Green        1.77
12  Carlos   20      Amber        1.91

As there are two people named Carlos in my data I have been presented with two rows of data. If I want to filter the data to be even more specific I simply add a second condition in the filtering command.

I am specifically interested in the data associated with the Carlos with amber eyes. I will add a second critereon to be met in the filtering using “&”. Note that each condition is enclosed in a pair of rounded brackets. Look at the code below closely:

CarlosInfo = Data[(Data["Name"] == 'Carlos') & (Data["Eye Colour"] == 'Amber')]
print(CarlosInfo)

      Name  Age Eye Colour  Height (m)
12  Carlos   20      Amber        1.91

If I want to get data for people called Carlos or Katie, I can simply modify the code above by adding an “or” statement using “|”:

NewData = Data[(Data["Name"] == 'Carlos') | (Data["Name"] == 'Katie')]
print(NewData)

      Name  Age Eye Colour  Height (m)
 Carlos   15      Green        1.77
  Katie   12       Blue        1.73
Carlos   20      Amber        1.91

If I want to filter my data to only show those who are taller than 1.8m I can do so by applying a condition:

TallPeople = Data[(Data["Height (m)"] > 1.8)]
print(TallPeople)

      Name  Age Eye Colour  Height (m)
1   Davide   36      Brown        1.85
12  Carlos   20      Amber        1.91

Replacing Data#

It might be the case that there is an incorrect data entry in a DataFrame that needs correcting, or perhaps a series of terms all need replacing. We can easily replace terms in a DataFrame using “.replace()”. For example, lets say that the age recorded for Davide was incorrect and needed replacing, we can do so easily. Here I am going to replace the number 36 with 38 in the age column.

Data['Age'] = Data["Age"].replace([36],38)
Data.head()

	Name	Age	Eye Colour	Height (m)
0	Joan	54	Blue	1.54
1	Davide	38	Brown	1.85
2	Liam	29	Hazel	1.74
3	Emily	28	Green	1.64
4	Carlos	15	Green	1.77

Replacing all values equal to something thing can be useful, but may also result in data being changed that should not be changed. For example, when correcting the age of Davide above, the method would actually have re-aged anybody who is 36 to 38.

If we want to change an individual value we can also do this fairly easily too. Take a look at the example below where I am going to change the eye colour associated with Liam from Hazel to Amber:

Data.iloc[2,2] = "Amber"
Data.head()

	Name	Age	Eye Colour	Height (m)
0	Joan	54	Blue	1.54
1	Davide	38	Brown	1.85
2	Liam	29	Amber	1.74
3	Emily	28	Green	1.64
4	Carlos	15	Green	1.77

Looping#

In addition to filtering data from a DataFrame, it is also possible to perform calculations/analysis by looping through it (e.g. using for and while loops).

To show an example of how we can loop through a DataFrame we are going to use a new set of data called “all_matches”. This dataset contains the results of all international football matches from 1872 to 2023. We are going to loop through this data to determine which country is the most successful of all time:

The first thing we should do is take a look at how the data in the DataFrame is formatted. To do this, we will use “.head()” to view the first five rows of the DataFrame.

import matplotlib.pyplot as plt
Football = pd.read_csv("all_matches.csv")

Football.head()

	date	home_team	away_team	home_score	away_score	tournament	country	neutral
0	1872-11-30	Scotland	England	0	0	Friendly	Scotland	False
1	1873-03-08	England	Scotland	4	2	Friendly	England	False
2	1874-03-07	Scotland	England	2	1	Friendly	Scotland	False
3	1875-03-06	England	Scotland	2	2	Friendly	England	False
4	1876-03-04	Scotland	England	3	0	Friendly	Scotland	False

Next, in order to determine which country is the most successful we need to know how many different countries are actually in this dataset. When doing this I will assume that every team has played at home and away. To do this, we can use “.unique()” as above:

countries = Football["home_team"].unique()
print(countries)

['Scotland' 'England' 'Wales' 'Ireland' 'Uruguay' 'Austria' 'Hungary'
 'Argentina' 'Belgium' 'France' 'Netherlands' 'British Guiana' 'Bohemia'
 'Switzerland' 'Germany' 'Sweden' 'Denmark' 'Italy' 'South Africa' 'Chile'
 'Norway' 'Finland' 'Luxembourg' 'Russia' 'Philippines' 'China' 'Suriname'
 'Brazil' 'Japan' 'Paraguay' 'Spain' 'Czechoslovakia' 'Egypt'
 'Northern Ireland' 'Estonia' 'Costa Rica' 'Guatemala' 'Poland'
 'Yugoslavia' 'New Zealand' 'Romania' 'Latvia' 'Portugal' 'Australia'
 'Lithuania' 'Trinidad and Tobago' 'Turkey' 'Mexico' 'Aruba'
 'United States' 'Soviet Union' 'Haiti' 'Bulgaria' 'Canada' 'Jamaica'
 'Kenya' 'Azerbaijan' 'Georgia' 'Peru' 'Honduras' 'British Honduras'
 'Dutch East Indies' 'El Salvador' 'Uganda' 'Barbados' 'Greece' 'Cuba'
 'Curaçao' 'Faroe Islands' 'Mauritius' 'Dominica' 'Reunion'
 'New Caledonia' 'Palestine' 'Grenada' 'Guadeloupe' 'French Guiana'
 'Martinique' 'St Vincent & Grenadines' 'Panama' 'Colombia' 'Venezuela'
 'Bolivia' 'Saint Kitts and Nevis' 'Ecuador' 'Slovakia'
 'Bohemia and Moravia' 'Croatia' 'Iran' 'Afghanistan' 'Korea' 'Lebanon'
 'Syria' 'Nicaragua' 'Iceland' 'Albania' 'Southern Rhodesia' 'Hong Kong'
 'Tanganyika' 'French Somaliland' 'Northern Rhodesia' 'Madagascar'
 'South Vietnam' 'Macao' 'Singapore' 'India' 'South Korea' 'Thailand'
 'Belgian Congo' 'Ethiopia' 'Puerto Rico' 'Israel' 'Sierra Leone'
 'Zanzibar' 'Antigua and Barbuda' 'Saint Kitts' 'Netherlands Antilles'
 'Gold Coast' 'West Germany' 'Indonesia' 'East Germany' 'Fiji' 'Nigeria'
 'Montserrat' 'Ceylon' 'Pakistan' 'Tahiti' 'Saar' 'Gambia'
 'Portuguese Guinea' 'Jordan' 'Libya' 'New Hebrides' 'Burma' 'Taiwan'
 'French Togoland' 'Ivory Coast' 'French Congo' 'Cameroon' 'Malaya'
 'Cambodia' 'Sudan' 'North Korea' 'Ubangi-Shari' 'Malta' 'Tunisia' 'Iraq'
 'Morocco' 'Saudi Arabia' 'Ghana' 'Nyasaland' 'Algeria'
 'United Arab Republic' 'Dahomey' 'North Vietnam' 'Togo' 'Congo'
 'Mali Federation' 'Upper Volta' 'Cyprus' 'Saint Lucia' 'Kuwait' 'Senegal'
 'Liberia' 'Central African Republic' 'Chad' 'Guinea' 'Mali' 'Gabon'
 'Northern Cyprus' 'Congo-Leopoldville' 'Solomon Islands' 'Malaysia'
 'Niger' 'DR Congo' 'Bermuda' 'Tanzania' 'Zambia' 'Bahrain' 'Malawi'
 'Papua New Guinea' 'Dominican Republic' 'Guyana' 'Mauritania' 'Rhodesia'
 'Swaziland' 'Laos' 'Qatar' 'Bonaire' 'Khmer Republic' 'Lesotho' 'Bahamas'
 'South Yemen' 'Somalia' 'Zaire' 'United Arab Emirates' 'Sri Lanka'
 'Nepal' 'Bangladesh' 'Burundi' 'Guinea-Bissau' 'Mozambique' 'Guam'
 'Angola' 'Oman' 'São Tomé and Príncipe' 'Botswana' 'Benin' 'Cape Verde'
 'Seychelles' 'Rwanda' 'Comoros' 'Tuvalu' 'Wallis and Futuna' 'Tonga'
 'Kiribati' 'Brunei' 'Zimbabwe' 'Liechtenstein' 'Vanuatu' 'Greenland'
 'Western Samoa' 'Eastern Samoa' 'Belize' 'Djibouti' 'Maldives'
 'Burkina Faso' 'North Yemen' 'Anguilla' 'Equatorial Guinea'
 'Cayman Islands' 'Sint Maarten' 'Namibia' 'Saint Martin'
 'British Virgin Islands' 'San Marino' 'Slovenia' 'Moldova'
 'Comm of Indep States' 'Ukraine' 'Kazakhstan' 'Tajikistan' 'Uzbekistan'
 'Eritrea' 'Turkmenistan' 'Kyrgyzstan' 'Armenia' 'Belarus' 'Vietnam'
 'Yemen' 'Myanmar' 'Macedonia' 'Czechia' 'Gibraltar' 'Cook Islands'
 'Bosnia and Herzegovina' 'Andorra' 'US Virgin Islands' 'Palau'
 'Northern Mariana Islands' 'Samoa' 'Turks and Caicos Islands'
 'FS Micronesia' 'Mayotte' 'Bhutan' 'Kosovo' 'Serbia and Montenegro'
 'Mongolia' 'Serbia' 'Montenegro' 'Kurdistan' 'South Sudan'
 'Saint Barthelemy' 'East Timor' 'Eswatini' 'North Macedonia']

Now we have a list off all the countries we need to extract data for, we can loop through the entire data set and determine how many matches each country has won. This is made a little more complicated as looking at the data we need to consider both the home and away games:

# First we create two lists - one for the country name and one for their percentage of wins.
teamNames = []
percentageWins = []

for country in countries:
    
    # First we will filter only the data associated with our current country from the entire dataset.
    # This first line of code narrows down the data to only the matches where they were either playing home or away.
    currentCountry = Football[(Football['home_team']==country) | (Football['away_team']==country)]
    
    # This line of code simply filters from the currentCountry DataFrame to determine how many home wins they had.
    HomeWins = currentCountry[(currentCountry['home_team'] == country) & (currentCountry['home_score'] > currentCountry['away_score'])]
    # This line of code simply filters from the currentCountry DataFrame to determine how many away wins they had.
    AwayWins = currentCountry[(currentCountry['away_team'] == country) & (currentCountry['away_score'] > currentCountry['home_score'])]
    
    # Now we calculate the number of wins as a percentage and append our data to our two lists.
    teamNames.append(country)
    percentageWins.append( ((HomeWins.shape[0] + AwayWins.shape[0]) / currentCountry.shape[0]) * 100 ) 
    # In the line above the number of home wins, away wins and games played is determined using .shape[0]

I will now create a new DataFrame containing the countries and their win percentage from the lists I made in the last step:

# Building my new DataFrame
winData = pd.DataFrame({'Country Name': teamNames, 'Percentage Wins': percentageWins})

# Printing the top 5 rows of my DataFrame
winData.head()

	Country Name	Percentage Wins
0	Scotland	47.738095
1	England	58.451957
2	Wales	31.926864
3	Ireland	35.104895
4	Uruguay	44.308943

I can sort our new DataFrame by the column “Percentage of Wins” to determine the country that is the most successful:

# Sorting the values by most sucessful
sortedWinData = winData.sort_values('Percentage Wins', ascending=False)
sortedWinData.head()

	Country Name	Percentage Wins
143	Ubangi-Shari	100.000000
137	French Congo	100.000000
90	Korea	100.000000
157	Mali Federation	75.000000
115	Saint Kitts	72.727273

Plotting Data from a DataFrame#

If I want to visualise this data in a better way that a simple table, we can also extract the data from a DataFrame for plotting. The process is exactly the same as if we were plotting data from a numpy array. For example the bar chart below shows the 20 most successful countries at football:

import matplotlib.pyplot as plt

plt.figure(figsize = (8,7))
plt.bar(sortedWinData['Country Name'][0:20], sortedWinData['Percentage Wins'][0:20])
plt.ylabel("Percentage Wins")
plt.xlabel("Country")
plt.title("Most Successful Countries @ Football")
plt.xticks(rotation = 90)
plt.show()

_images/658e83d21a4b8ac950b41db7d8d41fa98c383ba0377b1a2c2d2e075c192c1c70.png

There we have it! The most successful countries of all time at football are Ubangi-Shari (a former French colony in Africa), French Congo (former French colony) and Korea (before the seperation of states). Let’s take a look at their record by filtering our original Football DataFrame to show only these countries matches:

TopTeams = Football[(Football["country"] == "Ubangi-Shari") | 
                   (Football["country"] == "French Congo") | 
                   (Football["country"] == "Korea") ]
TopTeams.head()

	date	home_team	away_team	home_score	away_score	tournament	country	neutral
2812	1942-08-16	Korea	Japan	5	0	Friendly	Korea	False
4322	1954-12-31	French Congo	Cameroon	5	1	Friendly	French Congo	False
4715	1956-12-31	Ubangi-Shari	Cameroon	5	1	Friendly	Ubangi-Shari	False

Each team only played (and won) a single match before, but I am inclined to give Korea the crown as they had the largest goal difference… So there we have it - Korea is the greatest football nation on the planet!

Useful Pandas DataFrame Functions#

There are lots of very useful pandas specfic functions - those related to DataFrames can be found here.

Activities#

1. Brazil vs Portugal#

Using “all_matches.csv” find out the following:

How many football games have been played between Brazil and Portugal.
Who has won the most when playing each other.
Who has won the most games overall (i.e. not just against each other).

2. Learning = Earning?#

Using the “BankChurners.csv” data:

Use a for loop to determine the mean credit limit for each education level in the data.
Plot the mean credit limit for each education level in the data.
Plot the mean utilisation ration for each education level in the data.

You will want to use .mean() for this activity. If you use the .mean() method it will return mean for all the columns, you will need to select the column you are interested in.

3. “The price of greatness is responsibility.”#

Using the “politics_apr2020.csv” data:

Print the countries with the youngest leaders.
Print the countries with the oldest leaders.
Find the longest serving leader.
Plot the leaders against tenure.

4. Covid-19#

Using the “covid19_stats.csv” data:

Calculate the total number of excess deaths attributed to Covid-19 for each country in 2021.
Show the top and bottom 5 countries by excess deaths.
Plot the total number of deaths attributed to covid.

5. Global Statistics#

Using the “global_stats.csv” data:

Determine how many different countries can be found within each continental base in this dataset.
Print out the top/bottom 5 populated countries within each continental base.
Rank and plot the Human development Index (HDI) for countries with more than 50 million internet users.