title
Python Pandas Tutorial 5: Handle Missing Data: fillna, dropna, interpolate
description
In this tutorial we'll learn how to handle missing data in pandas using fillna, interpolate and dropna methods. You can fill missing values using a value or list of values or use one of the interpolation methods.
Topics that are covered in this Python Pandas Video:
0:00 Introduction
2:30 Convert string column into the date type
3:15 Use date as an index of dataframe usine set_index() method
4:10 Use fillna() method in dataframe
7:35 Use fillna(method="ffill") method in dataframe
8:57 Use fillna(method="bfill") method in dataframe
9:56 "axis" parameter in fillna() method in dataframe
11:18 "limit" parameter in fillna() method in dataframe
13:46 interpolate() to do interpolation in dataframe
15:34 interpolate() method "time"
16:50 dropna() method Drop all the rows which has "na" in dataframe
17:50 "how" parameter in dropna() method
18:33 "thresh" parameter in dropna() method
Code link: https://github.com/codebasics/py/tree/master/pandas/5_handling_missing_data_fillna_dropna_interpolate
Do you want to learn technology from me? Check https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description for my affordable video courses.
Popular Playlist:
Complete python course: https://www.youtube.com/playlist?list=PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0
Data science course: https://www.youtube.com/playlist?list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV
Machine learning tutorials: https://www.youtube.com/playlist?list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw
Pandas tutorials: https://www.youtube.com/playlist?list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy
Git github tutorials: https://www.youtube.com/playlist?list=PLeo1K3hjS3usJuxZZUBdjAcilgfQHkRzW
Matplotlib course: https://www.youtube.com/playlist?list=PLeo1K3hjS3uu4Lr8_kro2AqaO6CFYgKOl
Data structures course: https://www.youtube.com/playlist?list=PLeo1K3hjS3uu_n_a__MI_KktGTLYopZ12
Data Science Project - Real Estate Price Prediction: https://www.youtube.com/watch?v=rdfbcdP75KI&list=PLeo1K3hjS3uu7clOTtwsp94PcHbzqpAdg
To download csv and code for all tutorials: go to https://github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.
🌎 My Website For Video Courses: https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description
Need help building software or data analytics and AI solutions? My company https://www.atliq.com/ can help. Click on the Contact button on that website.
#️⃣ Social Media #️⃣
🔗 Discord: https://discord.gg/r42Kbuk
📸 Dhaval's Personal Instagram: https://www.instagram.com/dhavalsays/
📸 Instagram: https://www.instagram.com/codebasicshub/
🔊 Facebook: https://www.facebook.com/codebasicshub
📝 Linkedin (Personal): https://www.linkedin.com/in/dhavalsays/
📝 Linkedin (Codebasics): https://www.linkedin.com/company/codebasics/
📱 Twitter: https://twitter.com/codebasicshub
🔗 Patreon: https://www.patreon.com/codebasics?fan_landing=true
detail
{'title': 'Python Pandas Tutorial 5: Handle Missing Data: fillna, dropna, interpolate', 'heatmap': [{'end': 199.629, 'start': 127.81, 'weight': 0.777}, {'end': 398.515, 'start': 383.406, 'weight': 0.885}, {'end': 519.38, 'start': 490.659, 'weight': 1}, {'end': 852.207, 'start': 821.931, 'weight': 0.756}, {'end': 1141.308, 'start': 1113.444, 'weight': 0.77}, {'end': 1243.397, 'start': 1202.053, 'weight': 0.798}], 'summary': "Tutorial covers handling missing data in a new york city weather data csv file using fill, any interpolate, and drop any methods in pandas, focusing on the second and third of january, demonstrating techniques for handling missing values, converting a string column to a date column, setting it as the index, replacing na values with meaningful guesses, filling specific values for specific columns using a dictionary, using forward fill and backward fill methods, the 'limit' parameter, and the 'interpolate' method in pandas, and reindexing the data frame to insert missing dates.", 'chapters': [{'end': 56.012, 'segs': [{'end': 56.012, 'src': 'embed', 'start': 0.409, 'weight': 0, 'content': [{'end': 6.093, 'text': 'dear friends, in this tutorial, we are going to look at how to handle missing data in pandas.', 'start': 0.409, 'duration': 5.684}, {'end': 14.738, 'text': "now. often, when you are downloading data from internet or, let's say, getting it from any other source, it might have missing values,", 'start': 6.093, 'duration': 8.645}, {'end': 17.74, 'text': 'as shown in this csv file.', 'start': 14.738, 'duration': 3.002}, {'end': 26.306, 'text': "this file contains new york city's weather data and you can see that some of these cells are not having any value in it.", 'start': 17.74, 'duration': 8.566}, {'end': 33.31, 'text': 'also, it is missing the data for second and third January.', 'start': 27.046, 'duration': 6.264}, {'end': 45.539, 'text': "okay, so when you're processing this kind of information in pandas, we will see how you can deal with these missing values using fill,", 'start': 33.31, 'duration': 12.229}, {'end': 48.943, 'text': 'any interpolate and drop any methods.', 'start': 45.539, 'duration': 3.404}, {'end': 56.012, 'text': 'I have more tutorials on how to handle missing data, but this is just to start and we are only covering these three methods.', 'start': 48.943, 'duration': 7.069}], 'summary': "Learn how to handle missing data in pandas using fill, interpolate, and drop methods in new york city's weather data.", 'duration': 55.603, 'max_score': 0.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0409.jpg'}], 'start': 0.409, 'title': 'Handling missing data in pandas', 'summary': 'Covers techniques for handling missing data in a new york city weather data csv file using fill, any interpolate, and drop any methods in pandas, focusing on the second and third of january.', 'chapters': [{'end': 56.012, 'start': 0.409, 'title': 'Handling missing data in pandas', 'summary': "Covers how to handle missing data in a csv file containing new york city's weather data using fill, any interpolate, and drop any methods in pandas, with a focus on the second and third of january.", 'duration': 55.603, 'highlights': ["The tutorial focuses on handling missing data in a CSV file containing New York City's weather data.", 'It demonstrates the use of fill, any interpolate, and drop any methods in pandas to deal with missing values.', 'Specifically, it addresses the missing data for the second and third of January in the weather data CSV file.']}], 'duration': 55.603, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0409.jpg', 'highlights': ['It demonstrates the use of fill, any interpolate, and drop any methods in pandas to deal with missing values.', "The tutorial focuses on handling missing data in a CSV file containing New York City's weather data.", 'Specifically, it addresses the missing data for the second and third of January in the weather data CSV file.']}, {'end': 327.617, 'segs': [{'end': 127.81, 'src': 'embed', 'start': 87.058, 'weight': 0, 'content': [{'end': 106.485, 'text': 'and the first thing we do is usual is import pandas as PD and then I will read the CSV file that I just showed you and okay, and print the data frame.', 'start': 87.058, 'duration': 19.427}, {'end': 111.426, 'text': 'the star that you are seeing here means it was processing it.', 'start': 106.485, 'duration': 4.941}, {'end': 117.448, 'text': 'so it read this CSV file successfully into a data frame.', 'start': 111.426, 'duration': 6.022}, {'end': 127.81, 'text': 'now, for the purpose of this tutorial, I want to make my day a date column, so let me show you what I mean by that.', 'start': 117.448, 'duration': 10.362}], 'summary': 'Imported csv file into pandas data frame and processed it successfully.', 'duration': 40.752, 'max_score': 87.058, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs087058.jpg'}, {'end': 223.675, 'src': 'heatmap', 'start': 127.81, 'weight': 1, 'content': [{'end': 139.945, 'text': "so when I, When you normally read CSV like this, what it's going to do is it's going to read day as a string column.", 'start': 127.81, 'duration': 12.135}, {'end': 141.425, 'text': 'You can see it is a string.', 'start': 139.985, 'duration': 1.44}, {'end': 145.647, 'text': "So whatever you're seeing here, this is nothing but a string.", 'start': 141.625, 'duration': 4.022}, {'end': 146.868, 'text': "It's not an Excel file.", 'start': 145.767, 'duration': 1.101}, {'end': 148.128, 'text': "It's a CSV file.", 'start': 146.968, 'duration': 1.16}, {'end': 154.931, 'text': 'So I want to first convert that column into a date column.', 'start': 148.788, 'duration': 6.143}, {'end': 171.841, 'text': 'And for doing that you have to use pass dates argument and in that you can say that pass day column as a date type.', 'start': 155.691, 'duration': 16.15}, {'end': 177.843, 'text': "okay, and when you do that, let's first print it.", 'start': 171.841, 'duration': 6.002}, {'end': 179.684, 'text': 'you can see that it converted.', 'start': 177.843, 'duration': 1.841}, {'end': 183.066, 'text': 'now, by looking at it, you cannot probably figure out the type.', 'start': 179.684, 'duration': 3.382}, {'end': 188.484, 'text': 'so what I do usually is just So.', 'start': 183.066, 'duration': 5.418}, {'end': 192.426, 'text': 'you can see that now the type is timestamp.', 'start': 188.484, 'duration': 3.942}, {'end': 193.446, 'text': "OK so we're good.", 'start': 192.566, 'duration': 0.88}, {'end': 194.567, 'text': 'All right.', 'start': 193.466, 'duration': 1.101}, {'end': 199.629, 'text': 'So I got the as a date time column.', 'start': 194.987, 'duration': 4.642}, {'end': 205.032, 'text': 'Now I want to make this an index for my data frame.', 'start': 200.109, 'duration': 4.923}, {'end': 215.785, 'text': 'and in order to do that, you can just say df dot, set index day as your index and in place equal to true.', 'start': 205.872, 'duration': 9.913}, {'end': 221.432, 'text': "remember, you have to do in place equal to true, otherwise it's not going to modify the original data frame,", 'start': 215.785, 'duration': 5.647}, {'end': 223.675, 'text': 'but instead it will return a new data frame.', 'start': 221.432, 'duration': 2.243}], 'summary': 'Convert string column to date column, then set as index.', 'duration': 28.688, 'max_score': 127.81, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0127810.jpg'}, {'end': 327.617, 'src': 'embed', 'start': 249.182, 'weight': 2, 'content': [{'end': 257.75, 'text': "so in this case let's say I want to replace all NA values with some other value.", 'start': 249.182, 'duration': 8.568}, {'end': 266.358, 'text': 'okay. so the first method that we are going to cover fill any, okay.', 'start': 257.75, 'duration': 8.608}, {'end': 279.544, 'text': 'so what you can do is DF dot, fill any, okay, and in bracket you can pass the value that you want any to replace with.', 'start': 266.358, 'duration': 13.186}, {'end': 289.588, 'text': "okay, and I'm not going to modify my original data frame, but instead get this back into a new data frame.", 'start': 279.544, 'duration': 10.044}, {'end': 300.389, 'text': 'and when I run it you can see that all these nn values that it had it replaced them with zero value.', 'start': 289.588, 'duration': 10.801}, {'end': 304.874, 'text': 'you can see that everything is everything that was na is zero now?', 'start': 300.389, 'duration': 4.485}, {'end': 308.817, 'text': 'okay. um, so this is good now.', 'start': 304.874, 'duration': 3.943}, {'end': 315.984, 'text': 'uh, sometimes having zero is not probably the best guess.', 'start': 308.817, 'duration': 7.167}, {'end': 320.554, 'text': 'so you want to come up with a better guess?', 'start': 315.984, 'duration': 4.57}, {'end': 327.617, 'text': 'okay, for example, here, in the case of event, what does zero mean?', 'start': 320.554, 'duration': 7.063}], 'summary': 'Replace na values with zero or better guess in data frame using fillna method.', 'duration': 78.435, 'max_score': 249.182, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0249182.jpg'}], 'start': 56.012, 'title': 'Data analysis in jupyter notebook and handling na values', 'summary': "Discusses reading and manipulating a csv file in jupyter notebook, converting a string column to a date column, and setting it as the index for the data frame. it also covers replacing na values with meaningful guesses in a data frame and demonstrates the 'fill any' method to replace na values with zero.", 'chapters': [{'end': 223.675, 'start': 56.012, 'title': 'Data analysis in jupyter notebook', 'summary': 'Discusses the process of reading and manipulating a csv file in jupyter notebook, including converting a string column to a date column and setting it as the index for the data frame.', 'duration': 167.663, 'highlights': ['The chapter emphasizes the use of Jupyter notebook for data visualization and demonstrates how to read and process a CSV file, importing it into a data frame.', "It explains the process of converting a string column into a date column using the 'parse_dates' argument and then setting it as the index for the data frame, ensuring the modification is done in place for the original data frame."]}, {'end': 327.617, 'start': 223.675, 'title': 'Handling na values in data frames', 'summary': "Covers the method of replacing na values with meaningful guesses in a data frame and demonstrates the 'fill any' method to replace na values with zero.", 'duration': 103.942, 'highlights': ["The 'fill any' method is used to replace NA values in a data frame with a specific value, such as zero.", 'It is important to replace NA values with meaningful guesses, as zero may not always be the best guess for all cases.']}], 'duration': 271.605, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs056012.jpg', 'highlights': ['The chapter emphasizes the use of Jupyter notebook for data visualization and demonstrates how to read and process a CSV file, importing it into a data frame.', "It explains the process of converting a string column into a date column using the 'parse_dates' argument and then setting it as the index for the data frame, ensuring the modification is done in place for the original data frame.", 'It is important to replace NA values with meaningful guesses, as zero may not always be the best guess for all cases.', "The 'fill any' method is used to replace NA values in a data frame with a specific value, such as zero."]}, {'end': 995.831, 'segs': [{'end': 431.07, 'src': 'heatmap', 'start': 383.406, 'weight': 0, 'content': [{'end': 398.515, 'text': 'I want to say no event, okay, and then print new data frame Now, as you can see here, the temperature and wind speed, it replaced with zero,', 'start': 383.406, 'duration': 15.109}, {'end': 399.395, 'text': 'as you can see here.', 'start': 398.515, 'duration': 0.88}, {'end': 403.277, 'text': 'But the event now I have no event.', 'start': 399.935, 'duration': 3.342}, {'end': 411.981, 'text': 'OK So you can just use this dictionary to fill specific values for a specific column.', 'start': 403.717, 'duration': 8.264}, {'end': 417.498, 'text': "But still, I'm not happy with how I handle missing values here.", 'start': 413.334, 'duration': 4.164}, {'end': 425.525, 'text': "Because, see, if you're calculating a mean or something for this temperature, then mean is going to become really horrible.", 'start': 418.058, 'duration': 7.467}, {'end': 431.07, 'text': "And if someone looks at data, he'll think, OK, on first January, it was 32 temperature.", 'start': 425.565, 'duration': 5.505}], 'summary': 'Replacing missing values with zero, but concerns about impact on mean calculation.', 'duration': 69.941, 'max_score': 383.406, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0383406.jpg'}, {'end': 519.38, 'src': 'heatmap', 'start': 460.402, 'weight': 2, 'content': [{'end': 469.265, 'text': 'OK So whatever was the temperature on the previous day you carry forward and you do it in a similar way for other two data types.', 'start': 460.402, 'duration': 8.863}, {'end': 477.388, 'text': 'OK So for that you can use again your field and a method.', 'start': 469.465, 'duration': 7.923}, {'end': 487.417, 'text': 'OK, but here what you will do is you will say method equal to forward fill, forward fill.', 'start': 477.868, 'duration': 9.549}, {'end': 490.659, 'text': 'you can specify by typing f fill.', 'start': 487.417, 'duration': 3.242}, {'end': 500.005, 'text': "f fill means if i have any value, then just carry forward previous day's value.", 'start': 490.659, 'duration': 9.346}, {'end': 505.608, 'text': "okay, so let's print that.", 'start': 500.005, 'duration': 5.603}, {'end': 512.273, 'text': 'okay, cool, now you can see that It just carry forwarded the value from the previous day.', 'start': 505.608, 'duration': 6.665}, {'end': 519.38, 'text': "So fourth January had any value but now it carry forwarded First January's value here.", 'start': 512.293, 'duration': 7.087}], 'summary': "Carry forward previous day's temperature for other two data types using 'forward fill' method.", 'duration': 39.603, 'max_score': 460.402, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0460402.jpg'}, {'end': 728.605, 'src': 'embed', 'start': 696.174, 'weight': 4, 'content': [{'end': 705.727, 'text': "let's say in the case of 7 January I had 32 and it will just copy 32 to both of these missing data points.", 'start': 696.174, 'duration': 9.553}, {'end': 713.534, 'text': "okay now let's say, due to some reason I want to carry forward this value only once.", 'start': 705.727, 'duration': 7.807}, {'end': 717.057, 'text': 'okay, so I want to copy it only here but not here.', 'start': 713.534, 'duration': 3.523}, {'end': 728.605, 'text': "in that case you can specify limit and you can say I'm, My limit is 1 as far as copying my valid value to missing value is concerned.", 'start': 717.057, 'duration': 11.548}], 'summary': 'On 7 january, 32 was copied to both missing data points, with the option to carry forward only once using a specified limit of 1.', 'duration': 32.431, 'max_score': 696.174, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0696174.jpg'}, {'end': 852.207, 'src': 'heatmap', 'start': 821.931, 'weight': 0.756, 'content': [{'end': 823.432, 'text': "so I'm here, I'm pressing B.", 'start': 821.931, 'duration': 1.501}, {'end': 824.953, 'text': "it's creating a new cell for me.", 'start': 823.432, 'duration': 1.521}, {'end': 832.259, 'text': 'okay, so here DF dot interpolate.', 'start': 824.953, 'duration': 7.306}, {'end': 839.964, 'text': "okay, when you do df dot interpolate, it's gonna interpolate the values.", 'start': 832.259, 'duration': 7.705}, {'end': 849.226, 'text': 'so if you look at your new data from here, you will notice that now for the fourth january, it came up with a better guess,', 'start': 839.964, 'duration': 9.262}, {'end': 852.207, 'text': 'which is a linear interpolation.', 'start': 849.226, 'duration': 2.981}], 'summary': 'Using df.interpolate() creates linear interpolation for missing data.', 'duration': 30.276, 'max_score': 821.931, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0821931.jpg'}, {'end': 949.429, 'src': 'embed', 'start': 881.655, 'weight': 3, 'content': [{'end': 883.717, 'text': "so it's somehow coming up with this.", 'start': 881.655, 'duration': 2.062}, {'end': 891.604, 'text': 'it was 33.33.', 'start': 890.003, 'duration': 1.601}, {'end': 898.865, 'text': "so it's using interpolation, linear interpolation, and coming up with this values, okay.", 'start': 891.604, 'duration': 7.261}, {'end': 903.466, 'text': "so again, I'm going to go ahead and check the documentation for interpolate.", 'start': 898.865, 'duration': 4.601}, {'end': 916.834, 'text': "so in search bar you can type in interpolate and look at data frame dot interpolate documentation and you will notice that in a method you don't specify anything.", 'start': 903.466, 'duration': 13.368}, {'end': 922.077, 'text': 'it is by default linear, but you can use so many other methods.', 'start': 916.834, 'duration': 5.243}, {'end': 927.76, 'text': 'you can use quadratic, cubic and piecewise, polynomial.', 'start': 922.077, 'duration': 5.683}, {'end': 933.883, 'text': 'there are so many methods to specify as far as your interpolation is concerned.', 'start': 927.76, 'duration': 6.123}, {'end': 937.965, 'text': "okay, So I'm going to use time now.", 'start': 933.883, 'duration': 4.082}, {'end': 940.566, 'text': "So let's see what time can do for us.", 'start': 938.225, 'duration': 2.341}, {'end': 949.429, 'text': 'So here, before we do that, you will see that using linear interpolation, it came up with the middle value.', 'start': 941.046, 'duration': 8.383}], 'summary': 'Using linear interpolation, the value 33.33 was derived from the data, with options for other interpolation methods available.', 'duration': 67.774, 'max_score': 881.655, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0881655.jpg'}], 'start': 327.617, 'title': 'Handling missing values in pandas dataframe', 'summary': "Explains filling specific values for specific columns using a dictionary in pandas dataframe, emphasizing the importance of handling missing values for accurate analysis and decision-making. it also discusses using forward fill and backward fill methods, the 'limit' parameter, and the 'interpolate' method in pandas to make better guesses for missing values, with examples showing the impact of different methods and parameters.", 'chapters': [{'end': 460.201, 'start': 327.617, 'title': 'Pandas dataframe: filling missing values', 'summary': 'Explains how to fill specific values for specific columns using a dictionary in pandas dataframe, highlighting the importance of handling missing values for accurate analysis and decision-making.', 'duration': 132.584, 'highlights': ["You can use a dictionary to fill specific values for specific columns in a pandas dataframe, such as replacing NaN values with custom values like 0 for temperature and wind speed, and 'no event' for the event column.", 'Handling missing values is crucial for accurate analysis and decision-making, as incorrect values can significantly distort statistical measures like mean and lead to erroneous conclusions.']}, {'end': 995.831, 'start': 460.402, 'title': 'Data interpolation and fill methods', 'summary': "Discusses using forward fill and backward fill methods to fill missing data in a dataset, along with using the 'limit' parameter and the 'interpolate' method in pandas to make better guesses for missing values, with examples showing the impact of different methods and parameters.", 'duration': 535.429, 'highlights': ['Using forward fill and backward fill methods to fill missing data with examples of impacts on specific dates.', "Demonstrating the use of the 'limit' parameter to control the number of copies of valid values to missing values.", "Introducing the 'interpolate' method in pandas to make better guesses for missing values, with a focus on linear interpolation and the impact on specific data points.", 'Discussing the use of different interpolation methods such as quadratic, cubic, and piecewise polynomial.']}], 'duration': 668.214, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0327617.jpg', 'highlights': ["You can use a dictionary to fill specific values for specific columns in a pandas dataframe, such as replacing NaN values with custom values like 0 for temperature and wind speed, and 'no event' for the event column.", 'Handling missing values is crucial for accurate analysis and decision-making, as incorrect values can significantly distort statistical measures like mean and lead to erroneous conclusions.', 'Using forward fill and backward fill methods to fill missing data with examples of impacts on specific dates.', "Introducing the 'interpolate' method in pandas to make better guesses for missing values, with a focus on linear interpolation and the impact on specific data points.", "Demonstrating the use of the 'limit' parameter to control the number of copies of valid values to missing values.", 'Discussing the use of different interpolation methods such as quadratic, cubic, and piecewise polynomial.']}, {'end': 1325.645, 'segs': [{'end': 1087.148, 'src': 'embed', 'start': 1029.454, 'weight': 0, 'content': [{'end': 1041.606, 'text': "and I'm just printing the new data frame so you can see that in my excel sheet, whichever row had any any value, okay, it dropped all of them.", 'start': 1029.454, 'duration': 12.152}, {'end': 1047.867, 'text': 'so now I got only three rows, which has a valid content in all of the columns.', 'start': 1041.606, 'duration': 6.261}, {'end': 1059.09, 'text': 'okay, sometimes you want to drop the row if it has at least one any, okay.', 'start': 1047.867, 'duration': 11.223}, {'end': 1062.411, 'text': 'so here what it is doing is actually it is doing that.', 'start': 1059.09, 'duration': 3.321}, {'end': 1066.693, 'text': 'so here, if you have at least one any, it is dropping it.', 'start': 1062.411, 'duration': 4.282}, {'end': 1070.774, 'text': "but let's say, i want to drop only if it has all any.", 'start': 1066.693, 'duration': 4.081}, {'end': 1079.145, 'text': 'so for example, i want to drop this row but i still want to preserve these rows because it has at least some data.', 'start': 1070.774, 'duration': 8.371}, {'end': 1083.707, 'text': 'okay, so for that you can use how parameter.', 'start': 1079.145, 'duration': 4.562}, {'end': 1087.148, 'text': 'you can say how is equal to all.', 'start': 1083.707, 'duration': 3.441}], 'summary': 'Dropped rows with any value, leaving only 3 valid rows.', 'duration': 57.694, 'max_score': 1029.454, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs01029454.jpg'}, {'end': 1147.113, 'src': 'heatmap', 'start': 1113.444, 'weight': 2, 'content': [{'end': 1117.207, 'text': 'Okay, now what if I want to go by non any value.', 'start': 1113.444, 'duration': 3.763}, {'end': 1129.417, 'text': "so let's say I want to say that if I have at least one non any value, then keep that row and drop any other rows.", 'start': 1117.207, 'duration': 12.21}, {'end': 1132.68, 'text': 'so for that you can use a threshold parameter.', 'start': 1129.417, 'duration': 3.263}, {'end': 1135.503, 'text': 'you can say threshold equal to one.', 'start': 1133.721, 'duration': 1.782}, {'end': 1141.308, 'text': 'threshold equal to one means if i have at least one non-na value, then keep the row.', 'start': 1135.503, 'duration': 5.805}, {'end': 1147.113, 'text': 'okay. so when you run that, see what happens is again the same result.', 'start': 1141.308, 'duration': 5.805}], 'summary': 'Using threshold=1 keeps rows with at least one non-na value, resulting in the same outcome.', 'duration': 29.906, 'max_score': 1113.444, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs01113444.jpg'}, {'end': 1259.707, 'src': 'heatmap', 'start': 1202.053, 'weight': 3, 'content': [{'end': 1208.336, 'text': 'okay, last thing that we want to cover is how do you go about inserting the missing dates?', 'start': 1202.053, 'duration': 6.283}, {'end': 1215.48, 'text': "so i don't have second and third january here and i want to, let's say, insert, uh, those dates.", 'start': 1208.336, 'duration': 7.144}, {'end': 1218.461, 'text': 'so for that you will do something like this.', 'start': 1215.48, 'duration': 2.981}, {'end': 1227.507, 'text': "so here you will create a date range and using the date range, so let's say i have a date range from 1st january to 11 january.", 'start': 1218.461, 'duration': 9.046}, {'end': 1232.03, 'text': 'so 1st january to 11 january, i created a date range.', 'start': 1227.507, 'duration': 4.523}, {'end': 1243.397, 'text': 'so this is your date range and you pass that to date time index and create a date time index and then you do re-indexing in your data frame.', 'start': 1232.03, 'duration': 11.367}, {'end': 1259.707, 'text': "so i'm saying df.reindex, using that index and when you print your data frame again, so you have to do in place equal to true.", 'start': 1243.397, 'duration': 16.31}], 'summary': 'To insert missing dates, create a date range, use date time index, and re-index the data frame.', 'duration': 32.2, 'max_score': 1202.053, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs01202053.jpg'}], 'start': 996.531, 'title': 'Handling missing data', 'summary': "Discusses techniques for handling missing values in dataframes, including dropping rows with missing values using 'drop any' and 'how' parameter, dropping rows with 'na' values based on a threshold parameter, reindexing the data frame to insert missing dates, and methods to fill missing values, resulting in a final dataframe with only valid content.", 'chapters': [{'end': 1087.148, 'start': 996.531, 'title': 'Handling missing values in dataframes', 'summary': "Discusses the powerful feature of dropping rows with missing values in a dataframe, demonstrating the use of 'drop any' and 'how' parameter to handle missing values, resulting in a final dataframe with only valid content in all columns.", 'duration': 90.617, 'highlights': ["The 'drop any' method is used to remove rows with any missing values, resulting in a dataframe with only valid content in all columns.", "The 'how' parameter can be used to specify the condition for dropping rows with missing values, such as dropping rows with at least one missing value or dropping rows with all missing values."]}, {'end': 1325.645, 'start': 1087.148, 'title': 'Handling missing data in python', 'summary': "Covers handling missing data in a data frame, including dropping rows with 'na' values based on a threshold parameter, reindexing the data frame to insert missing dates, and methods to fill missing values, with examples of how to implement these techniques.", 'duration': 238.497, 'highlights': ['Using the threshold parameter to drop rows with a specified minimum number of non-na values, such as keeping rows with at least one non-na value by setting threshold=1.', 'Reindexing the data frame to insert missing dates by creating a date range and passing it to a date time index, followed by reindexing the data frame using the created index.', 'Demonstrating the process of dropping rows based on the number of valid values using the threshold parameter, with examples of how changing the threshold value affects the dropped rows based on the number of valid values required to keep a row.']}], 'duration': 329.114, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/EaGbS7eWSs0/pics/EaGbS7eWSs0996531.jpg', 'highlights': ["The 'drop any' method removes rows with any missing values, resulting in a dataframe with only valid content.", "The 'how' parameter specifies conditions for dropping rows with missing values, such as dropping rows with at least one missing value or all missing values.", 'Using the threshold parameter to drop rows with a specified minimum number of non-na values, such as keeping rows with at least one non-na value by setting threshold=1.', 'Reindexing the data frame to insert missing dates by creating a date range and passing it to a date time index, followed by reindexing the data frame using the created index.']}], 'highlights': ["The tutorial focuses on handling missing data in a CSV file containing New York City's weather data.", 'It demonstrates the use of fill, any interpolate, and drop any methods in pandas to deal with missing values.', "You can use a dictionary to fill specific values for specific columns in a pandas dataframe, such as replacing NaN values with custom values like 0 for temperature and wind speed, and 'no event' for the event column.", 'Handling missing values is crucial for accurate analysis and decision-making, as incorrect values can significantly distort statistical measures like mean and lead to erroneous conclusions.', 'The chapter emphasizes the use of Jupyter notebook for data visualization and demonstrates how to read and process a CSV file, importing it into a data frame.']}