title
My top 25 pandas tricks
description
You're about to learn 25 tricks that will help you to work faster, write better pandas code, and impress your friends. These are the BEST tricks I've learned from 5 years of teaching Python's pandas library.
Don't miss the BONUS at the end of this video!
TRICKS:
0:00 Introduction
0:43 1. Show installed versions
1:20 2. Create an example DataFrame
2:22 3. Rename columns
3:47 4. Reverse row order
4:36 5. Reverse column order
5:01 6. Select columns by data type
5:40 7. Convert strings to numbers
6:55 8. Reduce DataFrame size
8:15 9. Build a DataFrame from multiple files (row-wise)
10:00 10. Build a DataFrame from multiple files (column-wise)
10:45 11. Create a DataFrame from the clipboard
11:50 12. Split a DataFrame into two random subsets
12:57 13. Filter a DataFrame by multiple categories
13:52 14. Filter a DataFrame by largest categories
14:42 15. Handle missing values
15:57 16. Split a string into multiple columns
16:59 17. Expand a Series of lists into a DataFrame
17:39 18. Aggregate by multiple functions
18:41 19. Combine the output of an aggregation with a DataFrame
19:56 20. Select a slice of rows and columns
20:52 21. Reshape a MultiIndexed Series
22:04 22. Create a pivot table
23:01 23. Convert continuous data into categorical data
23:56 24. Change display options
24:47 25. Style a DataFrame
26:14 Bonus. Profile a DataFrame
DOWNLOAD the Jupyter notebook:
https://github.com/justmarkham/pandas-videos
WATCH my introductory series, Data Analysis with pandas:
https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y
JOIN the "Data School Insiders" community:
https://www.patreon.com/dataschool
LET'S CONNECT!
- Email Newsletter: https://www.dataschool.io/subscribe/
- LinkedIn: https://www.linkedin.com/in/justmarkham/
- Twitter: https://twitter.com/justmarkham
- Facebook: https://www.facebook.com/DataScienceSchool/
- YouTube: https://www.youtube.com/dataschool?sub_confirmation=1
detail
{'title': 'My top 25 pandas tricks', 'heatmap': [{'end': 836.962, 'start': 805.741, 'weight': 1}, {'end': 1183.389, 'start': 1158.508, 'weight': 0.773}], 'summary': 'Discover 25 pandas tricks for faster work and better code, including data frame manipulation, type conversion, memory optimization, and titanic dataset analysis with survival rate calculation and pandas styling.', 'chapters': [{'end': 79.675, 'segs': [{'end': 28.202, 'src': 'embed', 'start': 0.109, 'weight': 1, 'content': [{'end': 2.59, 'text': 'Hey everyone, this is Kevin from Data School.', 'start': 0.109, 'duration': 2.481}, {'end': 6.192, 'text': "Today you're going to learn my top 25 pandas tricks.", 'start': 3.05, 'duration': 3.142}, {'end': 10.614, 'text': "These are the best tricks I've learned from five years of teaching pandas.", 'start': 6.692, 'duration': 3.922}, {'end': 16.977, 'text': 'These tricks will help you to work faster, write better code, and impress your friends.', 'start': 11.094, 'duration': 5.883}, {'end': 19.118, 'text': "Here's everything you're going to learn.", 'start': 17.477, 'duration': 1.641}, {'end': 24.3, 'text': "There's also a bonus at the end of this video that you're not going to want to miss.", 'start': 20.398, 'duration': 3.902}, {'end': 28.202, 'text': "Now before we start, I'm going to import pandas and numpy.", 'start': 24.9, 'duration': 3.302}], 'summary': 'Kevin shares his top 25 pandas tricks after 5 years of teaching, promising faster work and better code.', 'duration': 28.093, 'max_score': 0.109, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc109.jpg'}, {'end': 79.675, 'src': 'embed', 'start': 47.046, 'weight': 0, 'content': [{'end': 52.971, 'text': "Sometimes you need to know the Pandas version you're using, especially when reading the Pandas documentation.", 'start': 47.046, 'duration': 5.925}, {'end': 57.053, 'text': 'You can show the Pandas version by typing pd.', 'start': 53.491, 'duration': 3.562}, {'end': 70.443, 'text': ',version,, but if you also need to know the versions of pandas dependencies, you can use the show versions function.', 'start': 57.053, 'duration': 13.39}, {'end': 76.847, 'text': 'you can see the versions of python, pandas, numpy, matplotlib and more.', 'start': 70.443, 'duration': 6.404}, {'end': 79.675, 'text': 'Thanks to Harvey Summers for this trick.', 'start': 77.694, 'duration': 1.981}], 'summary': 'Knowing the pandas version is crucial for reading documentation; use pd.show_versions to see versions of pandas, python, numpy, matplotlib, and more.', 'duration': 32.629, 'max_score': 47.046, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc47046.jpg'}], 'start': 0.109, 'title': 'Top 25 pandas tricks', 'summary': 'Unveils the top 25 pandas tricks, gathered from five years of teaching, to help work faster, write better code, and impress others, including the method to show installed versions and a bonus at the end.', 'chapters': [{'end': 79.675, 'start': 0.109, 'title': 'Top 25 pandas tricks by data school', 'summary': 'Unveils the top 25 pandas tricks, gathered from five years of teaching, to help work faster, write better code, and impress others, including the method to show installed versions and a bonus at the end.', 'duration': 79.566, 'highlights': ["Trick number one: show installed versions. It's important to know the Pandas version you're using, which can be done by typing 'pd.,version,'. Additionally, the trick to show versions of pandas dependencies, including python, pandas, numpy, and matplotlib, is shared.", 'The tricks shared in the chapter are the best from five years of teaching pandas, aimed at helping to work faster, write better code, and impress others.', 'The chapter concludes with a mention of a bonus at the end of the video, encouraging viewers not to miss it.']}], 'duration': 79.566, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc109.jpg', 'highlights': ["Trick number one: show installed versions. It's important to know the Pandas version you're using, which can be done by typing 'pd.,version,'.", 'The tricks shared in the chapter are the best from five years of teaching pandas, aimed at helping to work faster, write better code, and impress others.', 'The chapter concludes with a mention of a bonus at the end of the video, encouraging viewers not to miss it.']}, {'end': 212.513, 'segs': [{'end': 134.399, 'src': 'embed', 'start': 104.825, 'weight': 1, 'content': [{'end': 110.671, 'text': 'Now, if you need a much larger data frame, the above method will require way too much typing.', 'start': 104.825, 'duration': 5.846}, {'end': 121.14, 'text': "In that case, you can use NumPy's random.rand function, tell it the number of rows and columns, and pass that to the DataFrame constructor.", 'start': 111.451, 'duration': 9.689}, {'end': 124.496, 'text': "Now that's pretty good.", 'start': 122.935, 'duration': 1.561}, {'end': 134.399, 'text': 'but if you also want non-numeric column names, you can coerce a string of letters to a list and then pass that list to the columns parameter.', 'start': 124.496, 'duration': 9.903}], 'summary': "Use numpy's random.rand function to generate larger data frames efficiently.", 'duration': 29.574, 'max_score': 104.825, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc104825.jpg'}, {'end': 182.523, 'src': 'embed', 'start': 150.703, 'weight': 0, 'content': [{'end': 157.667, 'text': "I prefer to use dot notation to select pandas columns, but that won't work since the column names have spaces.", 'start': 150.703, 'duration': 6.964}, {'end': 158.967, 'text': "Let's fix this.", 'start': 158.127, 'duration': 0.84}, {'end': 164.25, 'text': 'The most flexible method for renaming columns is the rename method.', 'start': 159.808, 'duration': 4.442}, {'end': 173.435, 'text': 'You pass it a dictionary in which the keys are the old names and the values are the new names, and you also specify the axis.', 'start': 164.77, 'duration': 8.665}, {'end': 182.523, 'text': 'The best thing about this method is that you can use it to rename any number of columns, whether it be just one column or all columns.', 'start': 174.42, 'duration': 8.103}], 'summary': 'Use the rename method with a dictionary to rename columns in pandas.', 'duration': 31.82, 'max_score': 150.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc150703.jpg'}, {'end': 225.502, 'src': 'embed', 'start': 192.734, 'weight': 2, 'content': [{'end': 201.582, 'text': "Now, if the only thing you're doing is replacing spaces with underscores, an even better method is to use the str.replace method,", 'start': 192.734, 'duration': 8.848}, {'end': 204.385, 'text': "since you don't have to type out all of the column names.", 'start': 201.582, 'duration': 2.803}, {'end': 212.513, 'text': "All three of these methods have the same result, which is to rename the columns so that they don't have any spaces.", 'start': 205.826, 'duration': 6.687}, {'end': 225.502, 'text': 'Finally, if you just need to add a prefix or suffix to all of your column names, you can use the add prefix method or the add suffix method.', 'start': 215.293, 'duration': 10.209}], 'summary': 'Methods for renaming columns and adding prefixes or suffixes are demonstrated in the transcript.', 'duration': 32.768, 'max_score': 192.734, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc192734.jpg'}], 'start': 80.135, 'title': 'Pandas data frame tricks', 'summary': "Discusses creating example data frames using dictionaries and numpy's random.rand function, as well as renaming columns using the rename method, dot notation, and str.replace method.", 'chapters': [{'end': 212.513, 'start': 80.135, 'title': 'Pandas data frame tricks', 'summary': "Discusses creating example data frames using dictionaries and numpy's random.rand function, as well as renaming columns using the rename method, dot notation, and str.replace method.", 'duration': 132.378, 'highlights': ["Creating an example data frame using NumPy's random.rand function is a more efficient method for generating larger data frames than manually passing a dictionary.", 'The rename method in pandas is the most flexible for renaming columns, allowing for the renaming of any number of columns by passing a dictionary with old and new names, and specifying the axis.', 'Using the str.replace method for replacing spaces with underscores in column names is an even better and more efficient method than overwriting the columns attribute of the data frame.']}], 'duration': 132.378, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc80135.jpg', 'highlights': ['The rename method in pandas is the most flexible for renaming columns, allowing for the renaming of any number of columns by passing a dictionary with old and new names, and specifying the axis.', "Creating an example data frame using NumPy's random.rand function is a more efficient method for generating larger data frames than manually passing a dictionary.", 'Using the str.replace method for replacing spaces with underscores in column names is an even better and more efficient method than overwriting the columns attribute of the data frame.']}, {'end': 404.061, 'segs': [{'end': 267.192, 'src': 'embed', 'start': 215.293, 'weight': 2, 'content': [{'end': 225.502, 'text': 'Finally, if you just need to add a prefix or suffix to all of your column names, you can use the add prefix method or the add suffix method.', 'start': 215.293, 'duration': 10.209}, {'end': 230.326, 'text': 'Trick number four, reverse row order.', 'start': 227.823, 'duration': 2.503}, {'end': 233.028, 'text': "Let's take a look at the drinks data frame.", 'start': 231.046, 'duration': 1.982}, {'end': 238.567, 'text': 'This is a dataset of average alcohol consumption by country.', 'start': 234.363, 'duration': 4.204}, {'end': 242.351, 'text': 'What if you wanted to reverse the order of the rows?', 'start': 239.208, 'duration': 3.143}, {'end': 250.278, 'text': 'The most straightforward method is to use the loc accessor and pass it colon colon negative one,', 'start': 243.071, 'duration': 7.207}, {'end': 254.643, 'text': 'which is the same slicing notation used to reverse a Python list.', 'start': 250.278, 'duration': 4.365}, {'end': 267.192, 'text': 'What if you also wanted to reset the index so that it starts at zero? You would use the resetIndex method and tell it to drop the old index entirely.', 'start': 256.687, 'duration': 10.505}], 'summary': 'Pandas methods for column prefix/suffix, reversing row order, and resetting index.', 'duration': 51.899, 'max_score': 215.293, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc215293.jpg'}, {'end': 377.649, 'src': 'embed', 'start': 297.732, 'weight': 0, 'content': [{'end': 300.754, 'text': 'which is why country is now on the right side.', 'start': 297.732, 'duration': 3.022}, {'end': 304.775, 'text': 'Trick number six, select columns by data type.', 'start': 301.534, 'duration': 3.241}, {'end': 308.517, 'text': 'Here are the data types of the drinks data frame.', 'start': 305.436, 'duration': 3.081}, {'end': 312.178, 'text': "Now let's say that you need to select only the numeric columns.", 'start': 308.677, 'duration': 3.501}, {'end': 315.119, 'text': 'You can use the select D types method.', 'start': 312.638, 'duration': 2.481}, {'end': 319.581, 'text': 'This includes both the int and float columns.', 'start': 316.42, 'duration': 3.161}, {'end': 324.343, 'text': 'You could also use this method to select just the object columns.', 'start': 320.481, 'duration': 3.862}, {'end': 329.965, 'text': 'You can tell it to include multiple data types by passing a list.', 'start': 325.763, 'duration': 4.202}, {'end': 334.954, 'text': 'You can also tell it to exclude certain data types.', 'start': 331.653, 'duration': 3.301}, {'end': 338.796, 'text': 'Thanks to Vikram Lucky for this trick.', 'start': 336.695, 'duration': 2.101}, {'end': 343.317, 'text': 'Trick number seven, convert strings to numbers.', 'start': 340.176, 'duration': 3.141}, {'end': 346.338, 'text': "Let's create another example data frame.", 'start': 343.877, 'duration': 2.461}, {'end': 353.221, 'text': 'Now these numbers are actually stored as strings, which results in object columns.', 'start': 347.859, 'duration': 5.362}, {'end': 361.017, 'text': 'In order to do mathematical operations on these columns, we need to convert the data types to numeric.', 'start': 354.884, 'duration': 6.133}, {'end': 364.685, 'text': 'You can use the as type method on the first two columns.', 'start': 361.518, 'duration': 3.167}, {'end': 375.006, 'text': 'However, this would have resulted in an error if you tried to use it on the third column, because that column contains a dash to represent zero,', 'start': 366.636, 'duration': 8.37}, {'end': 377.649, 'text': "and pandas doesn't understand how to handle it.", 'start': 375.006, 'duration': 2.643}], 'summary': "Select and convert data types in pandas for analysis, thanks to vikram lucky's tricks.", 'duration': 79.917, 'max_score': 297.732, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc297732.jpg'}], 'start': 215.293, 'title': 'Data frame manipulation and type conversion', 'summary': 'Covers tricks such as adding prefixes or suffixes to column names, reversing row and column orders, selecting columns by data type, converting strings to numbers, and handling nan values using methods like as_type, to_numeric, and fillna.', 'chapters': [{'end': 297.732, 'start': 215.293, 'title': 'Pandas tricks for data manipulation', 'summary': 'Covers five pandas tricks for data manipulation, including adding prefixes or suffixes to column names, reversing row order using loc accessor and resetindex method, and reversing column order using loc.', 'duration': 82.439, 'highlights': ['You can use the add prefix method or the add suffix method to add a prefix or suffix to all column names.', 'You can reverse the order of rows using the loc accessor and pass it colon colon negative one, and reset the index using the resetIndex method.', 'You can reverse the left to right order of your columns using loc with the colon colon negative one notation after the comma.']}, {'end': 353.221, 'start': 297.732, 'title': 'Data frame manipulation tricks', 'summary': 'Covers data frame manipulation tricks including selecting columns by data type, converting strings to numbers, and includes examples and tips for practical implementation.', 'duration': 55.489, 'highlights': ['The chapter covers data frame manipulation tricks including selecting columns by data type, converting strings to numbers, and includes examples and tips for practical implementation.', 'You can use the select D types method to select only the numeric columns, including both the int and float columns.', 'You can tell it to include multiple data types by passing a list, and also exclude certain data types.', 'Trick number seven involves converting strings to numbers, which is useful for cases where numbers are stored as strings resulting in object columns.']}, {'end': 404.061, 'start': 354.884, 'title': 'Data type conversion and handling nan values', 'summary': 'Discusses the need to convert data types to numeric for mathematical operations, using methods like as type, to_numeric, and fillna to handle invalid inputs and nan values, and how to apply these methods to the entire data frame.', 'duration': 49.177, 'highlights': ['Using the as type method on the first two columns for data type conversion.', 'Using the to_numeric function on the third column to handle invalid input and convert them into nan values.', 'Filling nan values with zeros using the fillna method.', 'Applying the function to the entire data frame using the apply method.']}], 'duration': 188.768, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc215293.jpg', 'highlights': ['You can use the select D types method to select only the numeric columns, including both the int and float columns.', 'Trick number seven involves converting strings to numbers, which is useful for cases where numbers are stored as strings resulting in object columns.', 'You can use the add prefix method or the add suffix method to add a prefix or suffix to all column names.', 'You can reverse the order of rows using the loc accessor and pass it colon colon negative one, and reset the index using the resetIndex method.', 'Using the as type method on the first two columns for data type conversion.']}, {'end': 956.546, 'segs': [{'end': 484.272, 'src': 'embed', 'start': 458.257, 'weight': 0, 'content': [{'end': 464.566, 'text': "By only reading in those two columns, we've reduced the data frame size to 13.6 kilobytes.", 'start': 458.257, 'duration': 6.309}, {'end': 475.48, 'text': 'The second step is to convert any object columns containing categorical data to the category data type, which we specify with the dtype parameter.', 'start': 465.387, 'duration': 10.093}, {'end': 484.272, 'text': 'By reading in the continent column as the category data type, we further reduce the data frame size to 2.3 kilobytes.', 'start': 476.328, 'duration': 7.944}], 'summary': 'Reduced data frame size to 2.3kb by converting categorical data.', 'duration': 26.015, 'max_score': 458.257, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc458257.jpg'}, {'end': 679.792, 'src': 'embed', 'start': 615.804, 'weight': 2, 'content': [{'end': 624.088, 'text': "Here's an example in which the drinks data set has been split into two CSV files and each file contains three columns.", 'start': 615.804, 'duration': 8.284}, {'end': 639.227, 'text': "Similar to the previous trick, we'll start by using glob and this time we'll tell the concat function to concatenate along the columns axis.", 'start': 628.286, 'duration': 10.941}, {'end': 644.368, 'text': 'Now our data frame has all six columns.', 'start': 642.068, 'duration': 2.3}, {'end': 648.929, 'text': 'Trick number 11, create a data frame from the clipboard.', 'start': 645.209, 'duration': 3.72}, {'end': 658.651, 'text': "Let's say that you have some data stored in an Excel spreadsheet or a Google sheet, and you want to get it into a data frame as quickly as possible.", 'start': 649.77, 'duration': 8.881}, {'end': 662.492, 'text': 'Just select the data and copy it to the clipboard.', 'start': 659.472, 'duration': 3.02}, {'end': 671.234, 'text': 'Then you can use the read clipboard function to read it into a data frame.', 'start': 666.793, 'duration': 4.441}, {'end': 679.792, 'text': 'Just like Read CSV, Read Clipboard automatically detects the correct data type for each column.', 'start': 673.126, 'duration': 6.666}], 'summary': 'Using concat function to combine csv files and create data frame from clipboard with read clipboard function.', 'duration': 63.988, 'max_score': 615.804, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc615804.jpg'}, {'end': 836.962, 'src': 'heatmap', 'start': 805.741, 'weight': 1, 'content': [{'end': 813.389, 'text': 'However, you can actually rewrite this code more clearly by using the is in method and passing it a list of genres.', 'start': 805.741, 'duration': 7.648}, {'end': 824.88, 'text': 'And if you want to reverse the filter so that you are excluding rather including those three genres, you can put a tilde in front of the condition.', 'start': 815.768, 'duration': 9.112}, {'end': 831.91, 'text': 'This works because tilde is the not operator in Python.', 'start': 827.864, 'duration': 4.046}, {'end': 836.962, 'text': 'Trick number 14, filter a data frame by largest categories.', 'start': 832.478, 'duration': 4.484}], 'summary': "Rewrite code using 'is in' method, reverse filter with tilde, and filter data frame by largest categories.", 'duration': 31.221, 'max_score': 805.741, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc805741.jpg'}, {'end': 873.421, 'src': 'embed', 'start': 845.569, 'weight': 4, 'content': [{'end': 851.014, 'text': "We'll start by taking the value counts of genre and saving it as a series called counts.", 'start': 845.569, 'duration': 5.445}, {'end': 858.4, 'text': 'The series method nLargest makes it easy to select the three largest values in the series.', 'start': 851.954, 'duration': 6.446}, {'end': 863.814, 'text': 'And all we actually need from the series is the index.', 'start': 860.612, 'duration': 3.202}, {'end': 873.421, 'text': 'Finally, we can pass the index object to isIn, and it will be treated like a list of genres.', 'start': 866.756, 'duration': 6.665}], 'summary': 'Taking value counts of genre, selecting top 3 genres.', 'duration': 27.852, 'max_score': 845.569, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc845569.jpg'}], 'start': 405.387, 'title': 'Optimizing data frames and pandas techniques', 'summary': 'Discusses reducing data frame size resulting in a reduction from 30.4 to 2.3 kilobytes, and building a data frame row-wise for memory efficiency. additionally, it covers various techniques such as concatenating files, creating data frames from clipboard, and filtering by categories.', 'chapters': [{'end': 534.278, 'start': 405.387, 'title': 'Data frame optimization tricks', 'summary': 'Discusses two key tricks for optimizing data frames: reducing the data frame size by selectively reading columns and converting object columns to the category data type, resulting in a reduction from 30.4 to 2.3 kilobytes, and building a data frame from multiple files row-wise for memory efficiency and reduced code complexity.', 'duration': 128.891, 'highlights': ['The second step is to convert any object columns containing categorical data to the category data type, which we specify with the dtype parameter. By reading in the continent column as the category data type, we further reduce the data frame size to 2.3 kilobytes.', "The first step is to only read in the columns that you actually need, which we specify with the use calls parameter. By only reading in those two columns, we've reduced the data frame size to 13.6 kilobytes.", "Let's say that your data set is spread across multiple files, but you wanna read the data set into a single data frame."]}, {'end': 956.546, 'start': 535.119, 'title': 'Pandas dataframe tricks and techniques', 'summary': 'Covers various tricks for working with pandas data frames, including concatenating files, creating data frames from clipboard, splitting data frames, filtering by categories, handling missing values, and more, with notable tricks including building a data frame for multiple files column-wise, creating a data frame from the clipboard, and filtering a data frame by largest categories.', 'duration': 421.427, 'highlights': ['Trick number 10, build a data frame for multiple files column-wise. Using glob and the concat function to concatenate rows into a single data frame from multiple CSV files. (Relevance: 5)', 'Trick number 11, create a data frame from the clipboard. Utilizing the read clipboard function to quickly read data from Excel or Google sheets into a data frame. (Relevance: 4)', 'Trick number 14, filter a data frame by largest categories. Selecting the three largest genres from a movie data frame using value counts and nLargest method. (Relevance: 3)', 'Trick number 13, filter a data frame by multiple categories. Filtering a data frame by multiple genres using the is in method and the tilde operator for exclusion. (Relevance: 2)', 'Trick number 15, handle missing values. Demonstrating methods to identify and handle missing values in a data frame, including using isNA, dropNA, and setting a threshold for dropping columns. (Relevance: 1)']}], 'duration': 551.159, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc405387.jpg', 'highlights': ['By reading in the continent column as the category data type, we further reduce the data frame size to 2.3 kilobytes.', "By only reading in those two columns, we've reduced the data frame size to 13.6 kilobytes.", 'Using glob and the concat function to concatenate rows into a single data frame from multiple CSV files.', 'Utilizing the read clipboard function to quickly read data from Excel or Google sheets into a data frame.', 'Selecting the three largest genres from a movie data frame using value counts and nLargest method.']}, {'end': 1256.283, 'segs': [{'end': 1045.752, 'src': 'embed', 'start': 957.126, 'weight': 0, 'content': [{'end': 960.969, 'text': 'Trick number 16, split a string into multiple columns.', 'start': 957.126, 'duration': 3.843}, {'end': 963.651, 'text': "Let's create another example data frame.", 'start': 961.749, 'duration': 1.902}, {'end': 971.133, 'text': 'What if we wanted to split the name column into three separate columns for first, middle and last name?', 'start': 964.909, 'duration': 6.224}, {'end': 980.419, 'text': 'We would use the str.split method and tell it to split on a space character and expand the results into a data frame.', 'start': 971.814, 'duration': 8.605}, {'end': 987.644, 'text': 'These three columns can actually be saved to the original data frame in a single assignment statement.', 'start': 981.44, 'duration': 6.204}, {'end': 1002.31, 'text': "Now what if we wanted to split a string, but only keep one of the resulting columns? For example, let's split the location column on comma space.", 'start': 991.963, 'duration': 10.347}, {'end': 1012.276, 'text': 'If we only cared about saving the city name in column zero, we can just select that column and save it to the data frame.', 'start': 1004.211, 'duration': 8.065}, {'end': 1018.6, 'text': 'Thanks to Daniel Kim and Emmanuel Amison for this trick.', 'start': 1015.058, 'duration': 3.542}, {'end': 1024.099, 'text': 'Trick number 17, expand a series of lists into a data frame.', 'start': 1019.477, 'duration': 4.622}, {'end': 1027.541, 'text': "Let's create another example data frame.", 'start': 1025.28, 'duration': 2.261}, {'end': 1034.345, 'text': 'There are two columns and the second column contains regular Python lists of integers.', 'start': 1028.982, 'duration': 5.363}, {'end': 1045.752, 'text': 'If we wanted to expand the second column into its own data frame, we can use the apply method on that column and pass it the series constructor.', 'start': 1035.566, 'duration': 10.186}], 'summary': 'Trick 16: split a string into multiple columns. trick 17: expand a series of lists into a data frame.', 'duration': 88.626, 'max_score': 957.126, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc957126.jpg'}, {'end': 1141.707, 'src': 'embed', 'start': 1080.553, 'weight': 5, 'content': [{'end': 1084.154, 'text': "For example, here's the total price of order number one.", 'start': 1080.553, 'duration': 3.601}, {'end': 1095.997, 'text': 'If you wanted to calculate the total price of every order, you would group by order ID and then take the sum of item price for each group.', 'start': 1086.495, 'duration': 9.502}, {'end': 1104.444, 'text': "However, you're not actually limited to aggregating by a single function such as sum.", 'start': 1099.281, 'duration': 5.163}, {'end': 1113.368, 'text': 'To aggregate by multiple functions, you use the ag method and pass it a list of functions such as sum and count.', 'start': 1105.204, 'duration': 8.164}, {'end': 1120.632, 'text': 'This gives us the total price of each order as well as the number of items in each order.', 'start': 1115.109, 'duration': 5.523}, {'end': 1126.875, 'text': 'Trick number 19, combine the output of an aggregation with a data frame.', 'start': 1121.852, 'duration': 5.023}, {'end': 1129.96, 'text': "Let's take another look at the orders data frame.", 'start': 1127.799, 'duration': 2.161}, {'end': 1141.707, 'text': 'What if we wanted to create a new column listing the total price of each order? Recall that we calculated the total price using the sum method.', 'start': 1131.921, 'duration': 9.786}], 'summary': 'Use aggregation to calculate total price and number of items for each order.', 'duration': 61.154, 'max_score': 1080.553, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc1080553.jpg'}, {'end': 1203.184, 'src': 'heatmap', 'start': 1158.508, 'weight': 4, 'content': [{'end': 1168.575, 'text': 'The solution is to use the transform method, which performs the same calculation but returns output data that is the same shape as the input data.', 'start': 1158.508, 'duration': 10.067}, {'end': 1174.74, 'text': "We'll store the results in a new DataFrame column called TotalPrice.", 'start': 1170.497, 'duration': 4.243}, {'end': 1183.389, 'text': 'As you can see, the total price of each order is now listed on every single line.', 'start': 1177.665, 'duration': 5.724}, {'end': 1190.434, 'text': 'That makes it easy to calculate the percentage of the total order price that each line represents.', 'start': 1184.19, 'duration': 6.244}, {'end': 1195.718, 'text': 'Thanks to Chris Moffitt for this trick.', 'start': 1193.737, 'duration': 1.981}, {'end': 1200.622, 'text': 'Trick number 20, select a slice of rows and columns.', 'start': 1196.679, 'duration': 3.943}, {'end': 1203.184, 'text': "Let's take a look at another data set.", 'start': 1201.423, 'duration': 1.761}], 'summary': 'Using the transform method to calculate totalprice for each order, facilitating percentage calculation. trick by chris moffitt.', 'duration': 32.687, 'max_score': 1158.508, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc1158508.jpg'}, {'end': 1256.283, 'src': 'embed', 'start': 1229.823, 'weight': 3, 'content': [{'end': 1236.588, 'text': 'you can use the loc accessor and pass it a slice of the min through the max row labels.', 'start': 1229.823, 'duration': 6.765}, {'end': 1246.036, 'text': "And if you're not interested in all of the columns, you can also pass it a slice of column labels.", 'start': 1239.932, 'duration': 6.104}, {'end': 1250.919, 'text': 'Thanks to Alexandru Mircea for this trick.', 'start': 1248.097, 'duration': 2.822}, {'end': 1256.283, 'text': 'Trick number 21, reshape a multi-indexed series.', 'start': 1252.34, 'duration': 3.943}], 'summary': 'Use loc accessor to slice min and max row labels, and column labels. trick 21: reshape a multi-indexed series.', 'duration': 26.46, 'max_score': 1229.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc1229823.jpg'}], 'start': 957.126, 'title': 'Dataframe manipulation', 'summary': 'Demonstrates how to efficiently split strings into multiple columns and explores various dataframe tricks including expanding lists, aggregating by multiple functions, combining the output of an aggregation, and selecting a slice of rows and columns.', 'chapters': [{'end': 1012.276, 'start': 957.126, 'title': 'Splitting strings into multiple columns', 'summary': 'Demonstrates how to split strings into multiple columns using the str.split method to efficiently organize data into separate columns, including examples of splitting names into first, middle, and last name columns and extracting specific parts of a string, such as city names, into new columns.', 'duration': 55.15, 'highlights': ['The chapter showcases the use of the str.split method to split a string into multiple columns, enhancing data organization and analysis.', 'It demonstrates the process of splitting the name column into three separate columns for first, middle, and last names, showcasing efficient data structuring.', 'The example of splitting the location column on a comma space and selectively saving the city name in a new column illustrates the practical application of string splitting for specific data extraction.']}, {'end': 1256.283, 'start': 1015.058, 'title': 'Pandas dataframe tricks and tips', 'summary': 'Explores various dataframe tricks including expanding lists into dataframes, aggregating by multiple functions, combining the output of an aggregation with a dataframe, and selecting a slice of rows and columns from different datasets.', 'duration': 241.225, 'highlights': ['Trick number 18, aggregate by multiple functions', 'Trick number 17, expand a series of lists into a data frame', 'Trick number 19, combine the output of an aggregation with a data frame', 'Trick number 20, select a slice of rows and columns', 'Trick number 21, reshape a multi-indexed series']}], 'duration': 299.157, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc957126.jpg', 'highlights': ['The chapter showcases the use of the str.split method to split a string into multiple columns, enhancing data organization and analysis.', 'It demonstrates the process of splitting the name column into three separate columns for first, middle, and last names, showcasing efficient data structuring.', 'The example of splitting the location column on a comma space and selectively saving the city name in a new column illustrates the practical application of string splitting for specific data extraction.', 'Trick number 21, reshape a multi-indexed series', 'Trick number 20, select a slice of rows and columns', 'Trick number 19, combine the output of an aggregation with a data frame', 'Trick number 18, aggregate by multiple functions', 'Trick number 17, expand a series of lists into a data frame']}, {'end': 1656.835, 'segs': [{'end': 1285.678, 'src': 'embed', 'start': 1257.143, 'weight': 3, 'content': [{'end': 1267.65, 'text': 'The Titanic dataset has a survived column made up of ones and zeros, so you can calculate the overall survival rate by taking the mean of that column.', 'start': 1257.143, 'duration': 10.507}, {'end': 1275.614, 'text': 'If you wanted to calculate the survival rate by a single category, such as sex, you would use a group by.', 'start': 1269.491, 'duration': 6.123}, {'end': 1285.678, 'text': 'And if you wanted to calculate the survival rate across two different categories at once, you would group by both of those categories.', 'start': 1277.434, 'duration': 8.244}], 'summary': 'Analyzing titanic dataset: calculate overall survival rate, by single category using group by, and across two categories using group by.', 'duration': 28.535, 'max_score': 1257.143, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc1257143.jpg'}, {'end': 1362.86, 'src': 'embed', 'start': 1337.438, 'weight': 0, 'content': [{'end': 1345.665, 'text': 'With a pivot table, you directly specify the index, the columns, the values, and the aggregation function.', 'start': 1337.438, 'duration': 8.227}, {'end': 1354.673, 'text': 'An added benefit of a pivot table is that you can easily add row and column totals by setting margins equal true.', 'start': 1346.886, 'duration': 7.787}, {'end': 1362.86, 'text': 'This shows the overall survival rate as well as the survival rate by sex and passenger class.', 'start': 1357.075, 'duration': 5.785}], 'summary': 'Pivot table allows easy analysis of survival rates by sex and passenger class.', 'duration': 25.422, 'max_score': 1337.438, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc1337438.jpg'}, {'end': 1421.014, 'src': 'embed', 'start': 1391.829, 'weight': 4, 'content': [{'end': 1397.475, 'text': "It's currently continuous data, but what if you wanted to convert it into categorical data?", 'start': 1391.829, 'duration': 5.646}, {'end': 1404.742, 'text': 'One solution would be to label the age ranges such as child, young, adult and adult.', 'start': 1398.676, 'duration': 6.066}, {'end': 1408.746, 'text': 'The best way to do this is by using the cut function.', 'start': 1405.663, 'duration': 3.083}, {'end': 1413.167, 'text': 'This assigned each value to a bin with a label.', 'start': 1409.844, 'duration': 3.323}, {'end': 1416.951, 'text': 'Ages 0 to 18 were assigned the label child.', 'start': 1413.968, 'duration': 2.983}, {'end': 1421.014, 'text': 'Ages 18 to 25 were assigned the label young adult.', 'start': 1417.531, 'duration': 3.483}], 'summary': 'Data can be converted to categorical using the cut function, labeling age ranges as child, young adult, and adult.', 'duration': 29.185, 'max_score': 1391.829, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc1391829.jpg'}, {'end': 1623.825, 'src': 'embed', 'start': 1567.391, 'weight': 1, 'content': [{'end': 1571.013, 'text': 'Note that there are many more options for how you can style your data frame.', 'start': 1567.391, 'duration': 3.622}, {'end': 1573.694, 'text': 'Thanks to Chris Moffitt for this trick.', 'start': 1571.893, 'duration': 1.801}, {'end': 1577.797, 'text': 'Now for a bonus trick, profile a data frame.', 'start': 1574.475, 'duration': 3.322}, {'end': 1583.822, 'text': "Let's say that you've got a new data set and you want to quickly explore it without too much work.", 'start': 1578.857, 'duration': 4.965}, {'end': 1589.367, 'text': "There's a separate package called Pandas Profiling that is designed for this purpose.", 'start': 1584.522, 'duration': 4.845}, {'end': 1592.61, 'text': 'First, you have to install it with conda or pip.', 'start': 1590.208, 'duration': 2.402}, {'end': 1596.393, 'text': "Once that's done, you import Pandas Profiling.", 'start': 1593.451, 'duration': 2.942}, {'end': 1602.399, 'text': 'Then simply run the profile report function and pass it any data frame.', 'start': 1597.234, 'duration': 5.165}, {'end': 1607.039, 'text': 'It returns an interactive HTML report.', 'start': 1604.238, 'duration': 2.801}, {'end': 1613.841, 'text': 'The first section is an overview of the dataset and a list of possible issues with the data.', 'start': 1608.139, 'duration': 5.702}, {'end': 1619.163, 'text': 'The next section gives a summary of each column.', 'start': 1616.122, 'duration': 3.041}, {'end': 1623.825, 'text': 'You can click toggle details for even more information.', 'start': 1619.843, 'duration': 3.982}], 'summary': 'Pandas profiling package creates interactive html report for data frame, providing overview and column summary.', 'duration': 56.434, 'max_score': 1567.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc1567391.jpg'}], 'start': 1257.143, 'title': 'Titanic dataset analysis and pandas tricks', 'summary': 'Discusses calculating survival rates by different categories in the titanic dataset and styling a data frame using pandas tricks, including formatting columns, adding visual elements, and using pandas profiling to generate interactive html reports.', 'chapters': [{'end': 1475.508, 'start': 1257.143, 'title': 'Titanic dataset analysis', 'summary': 'Discusses how to calculate the survival rate by different categories, including sex and passenger class, using the titanic dataset. it also covers converting continuous data into categorical data and changing display options.', 'duration': 218.365, 'highlights': ["You can calculate the overall survival rate by taking the mean of the 'survived' column, which consists of ones and zeros.", "It's often more convenient to reshape a multi-indexed series into a data frame by using the unstack method for easier data interaction.", 'Creating a pivot table is a convenient method for analyzing data frames, allowing for easy specification of index, columns, values, and aggregation function, with the added benefit of easily adding row and column totals.', 'Converting continuous data into categorical data can be achieved by using the cut function to assign values to specific bins with labels, resulting in automatic ordering of categories.', 'Changing display options using the set option function allows for standardizing the display format without altering the underlying data.']}, {'end': 1656.835, 'start': 1476.389, 'title': 'Style and profile: pandas tricks', 'summary': 'Demonstrates how to style a data frame in different ways, such as formatting columns, adding visual elements, and hiding the index, as well as how to quickly explore a data frame using pandas profiling, generating an interactive html report with an overview, summary of each column, correlation heatmap, and dataset head.', 'duration': 180.446, 'highlights': ['The chapter demonstrates how to style a data frame by creating a dictionary of format strings to specify formatting for each column, applying additional styling methods, hiding the index, and adding visual elements like background gradients and bar charts within the data frame. Thanks to Jeff Hale and Chris Moffitt for these tricks.', 'The chapter also introduces Pandas Profiling as a tool for quickly exploring a data frame, which generates an interactive HTML report containing an overview, a summary of each column, a correlation heatmap, and the dataset head. Thanks to Siddhartha, Hasma Balazs, and Subham Biswas for this trick.', 'The chapter encourages viewers to share their excitement about using the presented tricks and to mention any favorite pandas tricks not covered in the episode.']}], 'duration': 399.692, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/RlIiVeig3hc/pics/RlIiVeig3hc1257143.jpg', 'highlights': ['Creating a pivot table is a convenient method for analyzing data frames, allowing for easy specification of index, columns, values, and aggregation function, with the added benefit of easily adding row and column totals.', 'The chapter demonstrates how to style a data frame by creating a dictionary of format strings to specify formatting for each column, applying additional styling methods, hiding the index, and adding visual elements like background gradients and bar charts within the data frame. Thanks to Jeff Hale and Chris Moffitt for these tricks.', 'The chapter also introduces Pandas Profiling as a tool for quickly exploring a data frame, which generates an interactive HTML report containing an overview, a summary of each column, a correlation heatmap, and the dataset head. Thanks to Siddhartha, Hasma Balazs, and Subham Biswas for this trick.', "You can calculate the overall survival rate by taking the mean of the 'survived' column, which consists of ones and zeros.", 'Converting continuous data into categorical data can be achieved by using the cut function to assign values to specific bins with labels, resulting in automatic ordering of categories.']}], 'highlights': ["Trick number one: show installed versions. It's important to know the Pandas version you're using, which can be done by typing 'pd.,version,'.", 'The tricks shared in the chapter are the best from five years of teaching pandas, aimed at helping to work faster, write better code, and impress others.', 'The rename method in pandas is the most flexible for renaming columns, allowing for the renaming of any number of columns by passing a dictionary with old and new names, and specifying the axis.', "Creating an example data frame using NumPy's random.rand function is a more efficient method for generating larger data frames than manually passing a dictionary.", 'Using the str.replace method for replacing spaces with underscores in column names is an even better and more efficient method than overwriting the columns attribute of the data frame.', 'You can use the select D types method to select only the numeric columns, including both the int and float columns.', 'Trick number seven involves converting strings to numbers, which is useful for cases where numbers are stored as strings resulting in object columns.', 'You can use the add prefix method or the add suffix method to add a prefix or suffix to all column names.', 'You can reverse the order of rows using the loc accessor and pass it colon colon negative one, and reset the index using the resetIndex method.', 'Using the as type method on the first two columns for data type conversion.', 'By reading in the continent column as the category data type, we further reduce the data frame size to 2.3 kilobytes.', "By only reading in those two columns, we've reduced the data frame size to 13.6 kilobytes.", 'Using glob and the concat function to concatenate rows into a single data frame from multiple CSV files.', 'Utilizing the read clipboard function to quickly read data from Excel or Google sheets into a data frame.', 'Selecting the three largest genres from a movie data frame using value counts and nLargest method.', 'The chapter showcases the use of the str.split method to split a string into multiple columns, enhancing data organization and analysis.', 'It demonstrates the process of splitting the name column into three separate columns for first, middle, and last names, showcasing efficient data structuring.', 'The example of splitting the location column on a comma space and selectively saving the city name in a new column illustrates the practical application of string splitting for specific data extraction.', 'Trick number 21, reshape a multi-indexed series', 'Trick number 20, select a slice of rows and columns', 'Trick number 19, combine the output of an aggregation with a data frame', 'Trick number 18, aggregate by multiple functions', 'Trick number 17, expand a series of lists into a data frame', 'Creating a pivot table is a convenient method for analyzing data frames, allowing for easy specification of index, columns, values, and aggregation function, with the added benefit of easily adding row and column totals.', 'The chapter demonstrates how to style a data frame by creating a dictionary of format strings to specify formatting for each column, applying additional styling methods, hiding the index, and adding visual elements like background gradients and bar charts within the data frame. Thanks to Jeff Hale and Chris Moffitt for these tricks.', 'The chapter also introduces Pandas Profiling as a tool for quickly exploring a data frame, which generates an interactive HTML report containing an overview, a summary of each column, a correlation heatmap, and the dataset head. Thanks to Siddhartha, Hasma Balazs, and Subham Biswas for this trick.', "You can calculate the overall survival rate by taking the mean of the 'survived' column, which consists of ones and zeros.", 'Converting continuous data into categorical data can be achieved by using the cut function to assign values to specific bins with labels, resulting in automatic ordering of categories.']}