title
Graphing/visualization - Data Analysis with Python and Pandas p.2
description
Doing some basic visualizations with our Pandas dataframe in Python with Matplotlib.
Text-based tutorial: https://pythonprogramming.net/graph-visualization-python3-pandas-data-analysis/
Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex
detail
{'title': 'Graphing/visualization - Data Analysis with Python and Pandas p.2', 'heatmap': [{'end': 1524.691, 'start': 1460.421, 'weight': 1}], 'summary': 'Learn data analysis with python and pandas, focusing on the avocado dataset, addressing visualization issues, converting dates, and using moving average to smooth data. also, sort data frames chronologically, handle data frame modifications, graph multiple regions, reshape data frames, and enhance visual representation for faster data analysis.', 'chapters': [{'end': 239.372, 'segs': [{'end': 126.951, 'src': 'embed', 'start': 1.463, 'weight': 0, 'content': [{'end': 2.384, 'text': 'what is going on everybody.', 'start': 1.463, 'duration': 0.921}, {'end': 6.248, 'text': 'welcome back to another data analysis data science tutorial.', 'start': 2.384, 'duration': 3.864}, {'end': 15.958, 'text': "with python and pandas we're going to be continuing our work with the avocado data set before we leave it for a new and exciting data set in the next tutorial.", 'start': 6.248, 'duration': 9.71}, {'end': 22.024, 'text': 'just a few more things i want to show with the avocado data set, especially just issues you might run into over time.', 'start': 15.958, 'duration': 6.066}, {'end': 24.907, 'text': "so let's go ahead and jump in.", 'start': 22.024, 'duration': 2.883}, {'end': 32.811, 'text': "So the first thing that we're going to do is we're just going to basically recreate where we were import pandas as pd.", 'start': 26.628, 'duration': 6.183}, {'end': 35.172, 'text': 'df equals pd.', 'start': 32.811, 'duration': 2.361}, {'end': 40.734, 'text': 'read pd.read csv data sets.', 'start': 35.172, 'duration': 5.562}, {'end': 44.696, 'text': 'avocado.csv. Albany.', 'start': 40.734, 'duration': 3.962}, {'end': 46.517, 'text': 'df equals df.', 'start': 44.696, 'duration': 1.821}, {'end': 55.701, 'text': 'df equals df where df region is albany.', 'start': 47.398, 'duration': 8.303}, {'end': 58.262, 'text': 'not all pandy.', 'start': 55.701, 'duration': 2.561}, {'end': 68.106, 'text': 'albany. df dot set index as Date in place is equal to true.', 'start': 58.262, 'duration': 9.844}, {'end': 71.548, 'text': 'and then finally Albany ADF dot.', 'start': 68.106, 'duration': 3.442}, {'end': 81.154, 'text': "actually we're just gonna do average price Price Dot plot and I didn't look up the command.", 'start': 71.548, 'duration': 9.606}, {'end': 84.436, 'text': "So I'll just run this twice.", 'start': 81.154, 'duration': 3.282}, {'end': 85.497, 'text': 'And there we have it.', 'start': 84.436, 'duration': 1.061}, {'end': 88.799, 'text': "so pretty graph, but there's a few issues that we're having right out of the gate.", 'start': 85.497, 'duration': 3.302}, {'end': 94.83, 'text': 'So The first issue is these dates are running over each other.', 'start': 88.819, 'duration': 6.011}, {'end': 99.932, 'text': "And when you see something like this, it probably means Pandas doesn't actually realize it's a date.", 'start': 94.85, 'duration': 5.082}, {'end': 104.034, 'text': 'So the first thing that I would do is convert that to a date time.', 'start': 100.613, 'duration': 3.421}, {'end': 113.417, 'text': "So when we read in the CSV if you actually have a date, probably the big thing you'd want to do is go ahead and just say df, date,", 'start': 104.114, 'duration': 9.303}, {'end': 117.679, 'text': 'or whatever the column name is, equals pd.to date time.', 'start': 113.417, 'duration': 4.262}, {'end': 119.66, 'text': 'And then df date.', 'start': 119.2, 'duration': 0.46}, {'end': 126.951, 'text': "So this will probably be, I'm trying to think if we've done one in the first one, but I don't think so.", 'start': 122.03, 'duration': 4.921}], 'summary': 'Continuing avocado data analysis, addressing issues with dates and plotting average prices.', 'duration': 125.488, 'max_score': 1.463, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1463.jpg'}, {'end': 191.945, 'src': 'embed', 'start': 148.136, 'weight': 1, 'content': [{'end': 151.721, 'text': "And then we're just reassigning that value because this is actually just going to return a bunch of values for us.", 'start': 148.136, 'duration': 3.585}, {'end': 152.622, 'text': 'So pretty cool.', 'start': 151.821, 'duration': 0.801}, {'end': 163.324, 'text': "Anyway, now what if we go to graph that? we can see ah okay, they're nice and slanted, and it appears that Pandas actually knows.", 'start': 153.223, 'duration': 10.101}, {'end': 164.385, 'text': 'hey, these are dates.', 'start': 163.324, 'duration': 1.061}, {'end': 165.827, 'text': 'So, awesome.', 'start': 164.846, 'duration': 0.981}, {'end': 167.048, 'text': "So that's one thing.", 'start': 166.187, 'duration': 0.861}, {'end': 171.472, 'text': 'The next thing is this graph is crazy, crazy busy looking.', 'start': 167.208, 'duration': 4.264}, {'end': 176.176, 'text': 'So the first thing I would think of to smooth it out is to use some sort of moving average.', 'start': 171.852, 'duration': 4.324}, {'end': 182.421, 'text': "And if you don't know what a moving average is, it's like at any point along the way, let's say we're gonna do a 25 moving average,", 'start': 177.797, 'duration': 4.624}, {'end': 184.622, 'text': "So we'll take every point.", 'start': 183.662, 'duration': 0.96}, {'end': 191.945, 'text': "We're going to say, okay, at this point, so in this point and the previous 24 points, what's the average? So 25 point average.", 'start': 184.662, 'duration': 7.283}], 'summary': 'Pandas identifies dates, suggests using a 25-point moving average to smooth out a busy graph.', 'duration': 43.809, 'max_score': 148.136, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg148136.jpg'}, {'end': 249.508, 'src': 'embed', 'start': 216.161, 'weight': 2, 'content': [{'end': 218.443, 'text': 'Then you would pass a number for the window.', 'start': 216.161, 'duration': 2.282}, {'end': 226.229, 'text': "And then what kind of rolling? So you could say a rolling sum, or in this case, we're going to say a rolling mean, and then we're going to plot it.", 'start': 219.023, 'duration': 7.206}, {'end': 231.628, 'text': 'cool, except not cool, because this is pretty ugly.', 'start': 227.506, 'duration': 4.122}, {'end': 233.129, 'text': 'something went wrong.', 'start': 231.628, 'duration': 1.501}, {'end': 239.372, 'text': 'so the first thing i would think of when i would see a chart like this is that possibly things are out of order.', 'start': 233.129, 'duration': 6.243}, {'end': 249.508, 'text': "so what if we said albany df dot, let's do head uh 25.", 'start': 239.372, 'duration': 10.136}], 'summary': 'Using rolling mean for data visualization, encountering potential data disorder.', 'duration': 33.347, 'max_score': 216.161, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg216161.jpg'}], 'start': 1.463, 'title': 'Data analysis with python and pandas', 'summary': 'Explores data analysis and data science using python and pandas, focusing on the avocado data set, addressing visualization issues, converting date in pandas, and using moving average to smooth data.', 'chapters': [{'end': 99.932, 'start': 1.463, 'title': 'Avocado data analysis with python and pandas', 'summary': 'Explores data analysis and data science tutorial using python and pandas, focusing on the avocado data set, addressing issues with visualizations, specifically regarding the dates running over each other.', 'duration': 98.469, 'highlights': ["The first thing to do is to recreate the import, read the CSV file, filter the data for 'Albany', set the index as Date, and plot the average price, with a focus on addressing issues with the visualizations.", "The tutorial emphasizes the issues with the dates running over each other in the plotted graph, indicating potential problems with Pandas' date recognition.", 'The chapter aims to provide further insights into the avocado data set before transitioning to a new and exciting data set in the next tutorial.']}, {'end': 171.472, 'start': 100.613, 'title': 'Converting date in pandas', 'summary': 'Explains how to convert a column to date time using pd.to_datetime, and discusses reassigning values and graphing the data in pandas.', 'duration': 70.859, 'highlights': ['The chapter demonstrates converting a column to date time using pd.to_datetime and reassigning values in Pandas.', 'It also discusses graphing the data to visualize the dates and the busy appearance of the graph.']}, {'end': 239.372, 'start': 171.852, 'title': 'Using moving average to smooth data', 'summary': 'Discusses using a 25-point moving average to smooth out fluctuations in data, and identifies the process of using a rolling mean as a method for achieving this.', 'duration': 67.52, 'highlights': ['By using a 25-point moving average, we can condense fluctuations and smooth out the data.', 'The process involves using the rolling mean function on the average price data, passing a number for the window, and then plotting the results.', 'Identifying the need to check if the data is out of order when encountering a chart with unusual patterns.']}], 'duration': 237.909, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1463.jpg', 'highlights': ['The chapter aims to provide further insights into the avocado data set before transitioning to a new and exciting data set in the next tutorial.', 'By using a 25-point moving average, we can condense fluctuations and smooth out the data.', 'The process involves using the rolling mean function on the average price data, passing a number for the window, and then plotting the results.', "The first thing to do is to recreate the import, read the CSV file, filter the data for 'Albany', set the index as Date, and plot the average price, with a focus on addressing issues with the visualizations.", "The tutorial emphasizes the issues with the dates running over each other in the plotted graph, indicating potential problems with Pandas' date recognition.", 'The chapter demonstrates converting a column to date time using pd.to_datetime and reassigning values in Pandas.', 'It also discusses graphing the data to visualize the dates and the busy appearance of the graph.', 'Identifying the need to check if the data is out of order when encountering a chart with unusual patterns.']}, {'end': 653.335, 'segs': [{'end': 331.623, 'src': 'embed', 'start': 305.745, 'weight': 0, 'content': [{'end': 311.086, 'text': 'You can also sort data frames by specific columns, and that will also adjust the index, just for the record.', 'start': 305.745, 'duration': 5.341}, {'end': 313.507, 'text': 'We do get a little bit of a warning here.', 'start': 311.967, 'duration': 1.54}, {'end': 316.888, 'text': 'I will talk about that warning in a little bit.', 'start': 314.887, 'duration': 2.001}, {'end': 322.129, 'text': 'So all I want to do now is actually just sort that and then graph that again.', 'start': 317.568, 'duration': 4.561}, {'end': 326.45, 'text': 'Okay, and now we get a pretty graph.', 'start': 322.909, 'duration': 3.541}, {'end': 327.951, 'text': "And I guess the warning doesn't show..", 'start': 326.57, 'duration': 1.381}, {'end': 331.623, 'text': "I don't know.", 'start': 331.363, 'duration': 0.26}], 'summary': 'Demonstrates sorting data frames, warning received, and successful graph creation.', 'duration': 25.878, 'max_score': 305.745, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg305745.jpg'}, {'end': 624.806, 'src': 'embed', 'start': 595.608, 'weight': 1, 'content': [{'end': 600.431, 'text': "Because later you might be doing things and you might be iterating, and then you'll see, like hundreds of these warnings.", 'start': 595.608, 'duration': 4.823}, {'end': 605.375, 'text': "So one way that you can do it is by telling pandas, I get it, it's a copy.", 'start': 600.672, 'duration': 4.703}, {'end': 609.061, 'text': 'so how can we do that?', 'start': 607.481, 'duration': 1.58}, {'end': 610.542, 'text': "well, i'm so glad.", 'start': 609.061, 'duration': 1.481}, {'end': 620.065, 'text': 'you asked john, albany df equals df.copy and now what that does is just return to us a new copy of a data frame,', 'start': 610.542, 'duration': 9.523}, {'end': 624.806, 'text': "because later we could modify that data frame and it won't impact albany df.", 'start': 620.065, 'duration': 4.741}], 'summary': 'Use df.copy in pandas to avoid modifying original data frame, preventing hundreds of warnings.', 'duration': 29.198, 'max_score': 595.608, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg595608.jpg'}], 'start': 239.372, 'title': 'Sorting and manipulating data frames', 'summary': "Covers sorting data frames using 'sort_index' to arrange dates chronologically, ensuring successful graph representation, and discusses copying columns, handling warnings about modifying data frames, and methods to avoid them, including using df.copy function.", 'chapters': [{'end': 352.221, 'start': 239.372, 'title': 'Sorting data frames for proper visualization', 'summary': "Explores the process of sorting data frames to ensure proper visualization, demonstrating the use of 'sort_index' to arrange the dates in chronological order and resolve issues with reverse chronological order, leading to a successful graph representation.", 'duration': 112.849, 'highlights': ["The process begins with a visual inspection of the data using 'head' to identify the issue of reverse order and then proceeds with sorting the index using 'albanydf.sort_index' to ensure proper chronological order.", 'Sorting the data frame results in a successful graph representation with a noticeable difference in the distribution of data compared to the previous incorrect graph representation.']}, {'end': 653.335, 'start': 352.741, 'title': 'Data frame manipulation and warnings', 'summary': 'Discusses copying a column, handling warnings about modifying data frames, and methods to avoid them, including using df.copy function to create a new copy of the data frame and reference it without impacting the original data frame.', 'duration': 300.594, 'highlights': ['The chapter discusses copying a column to the data frame and the need to handle warnings about modifying data frames, including the importance of avoiding potential issues when modifying data frames. The chapter covers copying a column to the data frame and the need to handle warnings about modifying data frames, including the importance of avoiding potential issues when modifying data frames.', 'The warning from Pandas advises to use df.copy to create a new copy of the data frame and reference it without impacting the original data frame, as modifying a forked data frame can lead to unexpected changes or non-changes in values. The warning from Pandas advises to use df.copy to create a new copy of the data frame and reference it without impacting the original data frame, as modifying a forked data frame can lead to unexpected changes or non-changes in values.', 'Methods to avoid the warning include using df.copy function to create a new copy of the data frame and reference it without impacting the original data frame, allowing for modifications without affecting the original data frame. Methods to avoid the warning include using df.copy function to create a new copy of the data frame and reference it without impacting the original data frame, allowing for modifications without affecting the original data frame.']}], 'duration': 413.963, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg239372.jpg', 'highlights': ['Sorting the data frame ensures proper chronological order for successful graph representation.', 'The warning from Pandas advises to use df.copy to create a new copy of the data frame and reference it without impacting the original data frame.']}, {'end': 1224.782, 'segs': [{'end': 829.694, 'src': 'embed', 'start': 792.903, 'weight': 0, 'content': [{'end': 794.804, 'text': 'And then we could convert it to a list.', 'start': 792.903, 'duration': 1.901}, {'end': 800.433, 'text': 'And then we could iterate over that list.', 'start': 798.752, 'duration': 1.681}, {'end': 802.255, 'text': 'So, okay.', 'start': 801.374, 'duration': 0.881}, {'end': 808.08, 'text': "The problem is, that's messy.", 'start': 805.218, 'duration': 2.862}, {'end': 810.842, 'text': "And there's a simpler way.", 'start': 809.501, 'duration': 1.341}, {'end': 815.146, 'text': "So a lot of times in pandas, there's just an easier way than whatever we're doing.", 'start': 811.363, 'duration': 3.783}, {'end': 816.487, 'text': 'So in this case, there is.', 'start': 815.206, 'duration': 1.281}, {'end': 820.01, 'text': 'It would be dfregion.unique.', 'start': 816.507, 'duration': 3.503}, {'end': 829.694, 'text': 'Done So in this way we can get all the unique values and then we could actually iterate over these.', 'start': 822.967, 'duration': 6.727}], 'summary': 'In pandas, dfregion.unique can retrieve all unique values for iteration.', 'duration': 36.791, 'max_score': 792.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg792903.jpg'}, {'end': 873.713, 'src': 'embed', 'start': 846.895, 'weight': 1, 'content': [{'end': 853.661, 'text': 'What you write may end up with the exact same outcome, but pandas is likely, pandas is using C++.', 'start': 846.895, 'duration': 6.766}, {'end': 855.683, 'text': "You're gonna be using Python, which is slow.", 'start': 854.102, 'duration': 1.581}, {'end': 857.104, 'text': 'So yeah.', 'start': 856.844, 'duration': 0.26}, {'end': 859.787, 'text': 'Okay, df region, unique, cool.', 'start': 857.625, 'duration': 2.162}, {'end': 860.948, 'text': 'So we have the unique values.', 'start': 859.827, 'duration': 1.121}, {'end': 870.016, 'text': "Now, how might we graph every single one? So on matplotlib, basically you've got this canvas in the background.", 'start': 861.088, 'duration': 8.928}, {'end': 873.713, 'text': 'and then anytime you plot something, it just kind of adds.', 'start': 871.172, 'duration': 2.541}], 'summary': 'Using python for data analysis may be slower than using pandas with c++. graphing unique values on matplotlib adds to the canvas in the background.', 'duration': 26.818, 'max_score': 846.895, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg846895.jpg'}, {'end': 915.275, 'src': 'embed', 'start': 883.999, 'weight': 2, 'content': [{'end': 902.072, 'text': 'our task here is these regions are values in rows and instead what we wanna do is actually almost like reshape our data frame to be a data frame where the columns are the regions and the rows.', 'start': 883.999, 'duration': 18.073}, {'end': 912.435, 'text': "well, the column headers are the regions and the values of those columns are the, let's say, the 25 moving average, and then the index is date.", 'start': 902.072, 'duration': 10.363}, {'end': 915.275, 'text': 'So we just kinda wanna restructure it, basically.', 'start': 913.315, 'duration': 1.96}], 'summary': 'Restructure data frame with regions as columns and 25 moving average as values.', 'duration': 31.276, 'max_score': 883.999, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg883999.jpg'}, {'end': 1138.724, 'src': 'embed', 'start': 1110.857, 'weight': 3, 'content': [{'end': 1113.579, 'text': "you have two data frames and they're indexed in the exact same way.", 'start': 1110.857, 'duration': 2.722}, {'end': 1120.443, 'text': 'Then you can just call, you can say the first DF dot join the second DF, and then they will be joined on their index.', 'start': 1114.059, 'duration': 6.384}, {'end': 1129.749, 'text': "So, um, the one problem we're going to have is we've got many regions, like, I don't know, 50 or something.", 'start': 1121.624, 'duration': 8.125}, {'end': 1133.351, 'text': 'Uh, and they all have the same column name.', 'start': 1130.329, 'duration': 3.022}, {'end': 1135.123, 'text': "that's gonna be a problem.", 'start': 1134.102, 'duration': 1.021}, {'end': 1138.724, 'text': 'So, instead of calling them all the same,', 'start': 1135.683, 'duration': 3.041}], 'summary': 'Join two data frames on their index, facing issue with multiple regions having the same column name.', 'duration': 27.867, 'max_score': 1110.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1110857.jpg'}, {'end': 1224.782, 'src': 'embed', 'start': 1190.763, 'weight': 4, 'content': [{'end': 1194.045, 'text': "Right And then it's slowly starts grinding.", 'start': 1190.763, 'duration': 3.282}, {'end': 1199.488, 'text': "So, um, so yeah, that's relatively interesting.", 'start': 1195.326, 'duration': 4.162}, {'end': 1203.351, 'text': 'So, um, I was trying to figure out what the heck is happening.', 'start': 1200.669, 'duration': 2.682}, {'end': 1205.332, 'text': "Cause this is a logic that I've used.", 'start': 1203.371, 'duration': 1.961}, {'end': 1208.415, 'text': "for years and I've never had this issue.", 'start': 1205.834, 'duration': 2.581}, {'end': 1216.04, 'text': 'And if I watch the processes, like if I actually open up like task manager or something, this is exploding RAM.', 'start': 1209.056, 'duration': 6.984}, {'end': 1224.782, 'text': "now, what i don't know is why it explodes ram.", 'start': 1220.14, 'duration': 4.642}], 'summary': 'Issue with logic causing ram explosion in the process.', 'duration': 34.019, 'max_score': 1190.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1190763.jpg'}], 'start': 653.475, 'title': 'Graphing multiple regions using pandas, reshaping data frame for plotting, and data frame join and ram issue', 'summary': "Covers graphing multiple regions, setting index, sorting index, iterating over regions, obtaining unique values for graphing, and emphasizes using pandas for data manipulation. it also discusses reshaping a data frame to have regions as columns and values, using python's pandas library, and addressing ram issues during data frame joins.", 'chapters': [{'end': 873.713, 'start': 653.475, 'title': 'Graphing multiple regions using pandas', 'summary': 'Discusses graphing multiple regions using pandas, including setting index, sorting index, iterating over regions, and obtaining unique values for graphing, emphasizing the efficiency of using pandas over python for data manipulation.', 'duration': 220.238, 'highlights': ['The chapter emphasizes the ease of obtaining unique values for graphing using dfregion.unique, which simplifies the process of iterating over regions.', 'The transcript highlights the use of pandas for data manipulation due to its efficiency compared to Python, as pandas is likely using C++ while Python is slow.', 'The speaker demonstrates the process of obtaining unique values for graphing by converting the data to a set and then a list, emphasizing the messy nature of this approach compared to using dfregion.unique.', 'The chapter emphasizes the importance of referring to the Pandas documentation to explore available functionalities before resorting to Python, as keeping operations in Pandas is likely faster than writing custom Python code.']}, {'end': 1025.011, 'start': 873.713, 'title': 'Reshaping data frame for plotting', 'summary': "Discusses reshaping a data frame to have regions as columns and the 25 moving average as values, and iterating over the regions to accomplish this, using python's pandas library.", 'duration': 151.298, 'highlights': ['The task is to reshape the data frame so that the columns represent the regions and the values are the 25 moving average, with the date as the index.', 'The code iterates over the regions using the unique function and creates a new data frame for each region using the copy method.', "The chapter uses Python's pandas library to achieve the restructuring of the data frame for plotting."]}, {'end': 1224.782, 'start': 1025.09, 'title': 'Data frame join and ram issue', 'summary': 'Discusses creating a data frame, joining multiple data frames on an index, using f strings to give unique column names, and encountering a ram issue during the process.', 'duration': 199.692, 'highlights': ['The process involves creating a data frame and joining multiple data frames on an index, potentially with unique column names using F strings.', 'The speaker encounters a RAM issue during the process, observing the RAM usage exploding without understanding the cause.', 'The speaker runs the code and observes a slowdown in the process, indicating potential performance issues with the data frame join operation.', 'The speaker discusses the logic used for years and expresses confusion about the sudden RAM explosion, suggesting a need for troubleshooting and optimization.']}], 'duration': 571.307, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg653475.jpg', 'highlights': ['The chapter emphasizes the ease of obtaining unique values for graphing using dfregion.unique, simplifying the process of iterating over regions.', 'The chapter emphasizes using pandas for data manipulation due to its efficiency compared to Python, as pandas is likely using C++ while Python is slow.', 'The task is to reshape the data frame so that the columns represent the regions and the values are the 25 moving average, with the date as the index.', 'The process involves creating a data frame and joining multiple data frames on an index, potentially with unique column names using F strings.', 'The speaker encounters a RAM issue during the process, observing the RAM usage exploding without understanding the cause.']}, {'end': 1662.499, 'segs': [{'end': 1325.635, 'src': 'embed', 'start': 1263.822, 'weight': 0, 'content': [{'end': 1265.183, 'text': 'So you have two different types.', 'start': 1263.822, 'duration': 1.361}, {'end': 1276.89, 'text': "So if we said DF, type, dot, let's just do unique again.", 'start': 1265.243, 'duration': 11.647}, {'end': 1281.151, 'text': "there's actually two types conventional, inorganic.", 'start': 1276.89, 'duration': 4.261}, {'end': 1284.152, 'text': "this is what's causing a problem, because we've got duplicate date.", 'start': 1281.151, 'duration': 3.001}, {'end': 1287.173, 'text': "so it's expected that our index is the same.", 'start': 1284.152, 'duration': 3.021}, {'end': 1292.415, 'text': 'but the problem is, will we wind up having multiple dates all the same?', 'start': 1287.173, 'duration': 5.242}, {'end': 1301.977, 'text': 'so, for example, if I say graph, DF, dot, tail and run that, Well these are all the same date.', 'start': 1292.415, 'duration': 9.562}, {'end': 1311.464, 'text': 'And so when we go to actually join, Pandas is looking for, okay, where should we join this? Well, it has many indexes that are identical.', 'start': 1302.538, 'duration': 8.926}, {'end': 1314.106, 'text': "So it's like, I don't know what to do.", 'start': 1312.565, 'duration': 1.541}, {'end': 1325.635, 'text': "So again, I don't know the underlying reason why RAM is exploding so much, but I know why, and it's because of the date.", 'start': 1316.628, 'duration': 9.007}], 'summary': 'The issue arises from duplicate dates causing problems with joining and indexing in pandas, leading to ram explosion.', 'duration': 61.813, 'max_score': 1263.822, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1263822.jpg'}, {'end': 1442.191, 'src': 'embed', 'start': 1375.877, 'weight': 3, 'content': [{'end': 1382.799, 'text': "but i'm just going to say df equals df.copy, otherwise it's going to be like that's exactly what it sounds like.", 'start': 1375.877, 'duration': 6.922}, {'end': 1390.402, 'text': "by the way, df type equals and then we'll just go with organic.", 'start': 1382.799, 'duration': 7.603}, {'end': 1399.165, 'text': "OK, then we set the date and then let's do a DF.", 'start': 1392.242, 'duration': 6.923}, {'end': 1403.066, 'text': 'So, for example, do you have to.', 'start': 1399.865, 'duration': 3.201}, {'end': 1407.528, 'text': 'So while while I have you guys, I can just say DF dot sort values.', 'start': 1403.086, 'duration': 4.442}, {'end': 1414.423, 'text': "And we're going to say df.sortValues by equals date.", 'start': 1408.72, 'duration': 5.703}, {'end': 1419.626, 'text': "And then we're going to say ascending equals true.", 'start': 1415.484, 'duration': 4.142}, {'end': 1423.729, 'text': "I believe by normally it's actually false, I think is the default.", 'start': 1419.686, 'duration': 4.043}, {'end': 1425.95, 'text': "And then we'll say in place equals true.", 'start': 1424.249, 'duration': 1.701}, {'end': 1430.693, 'text': 'So this is just how you would sort a data frame by a certain value.', 'start': 1426.43, 'duration': 4.263}, {'end': 1436.936, 'text': 'So rather than, because before I showed you sort index, but I said, hey, you can sort by columns too.', 'start': 1430.733, 'duration': 6.203}, {'end': 1438.197, 'text': "Here's how you do it.", 'start': 1437.537, 'duration': 0.66}, {'end': 1442.191, 'text': "And then let's do df.head.", 'start': 1439.349, 'duration': 2.842}], 'summary': 'Demonstrating data frame sorting and manipulation in python with examples.', 'duration': 66.314, 'max_score': 1375.877, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1375877.jpg'}, {'end': 1524.691, 'src': 'heatmap', 'start': 1460.421, 'weight': 1, 'content': [{'end': 1466.145, 'text': 'So let me take this, come up here, copy, come down here.', 'start': 1460.421, 'duration': 5.724}, {'end': 1473.76, 'text': 'Do you have sort values? Should be good there.', 'start': 1468.086, 'duration': 5.674}, {'end': 1476.607, 'text': "Let's just run that real quick.", 'start': 1473.78, 'duration': 2.827}, {'end': 1477.369, 'text': 'Let me make sure.', 'start': 1476.948, 'duration': 0.421}, {'end': 1480.798, 'text': 'Okay I think so.', 'start': 1480.017, 'duration': 0.781}, {'end': 1488.463, 'text': "I mean, that was much quicker, right? Uh, so now I'm going to get rid of this, uh, thing that limits us graph DF.", 'start': 1481.098, 'duration': 7.365}, {'end': 1494.107, 'text': "And then let's just, let me, while at the end here, let me just say graph DF dot tail.", 'start': 1488.603, 'duration': 5.504}, {'end': 1496.989, 'text': 'So run those.', 'start': 1495.347, 'duration': 1.642}, {'end': 1499.33, 'text': "Yeah It's definitely going through that like really fast.", 'start': 1497.109, 'duration': 2.221}, {'end': 1505.308, 'text': 'Okay So now finally, we can plot this.', 'start': 1500.431, 'duration': 4.877}, {'end': 1510.396, 'text': 'So first, you know, the first thing that we might say or try to do is graph df.plot.', 'start': 1505.368, 'duration': 5.028}, {'end': 1524.691, 'text': "There we go Two issues one is the legend is just too crazy The other thing is the graph isn't quite yet big enough.", 'start': 1514.447, 'duration': 10.244}], 'summary': 'Data manipulation and visualization performed to improve graph appearance and processing speed.', 'duration': 64.27, 'max_score': 1460.421, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1460421.jpg'}, {'end': 1568.169, 'src': 'embed', 'start': 1538.356, 'weight': 5, 'content': [{'end': 1541.738, 'text': "but with pandas like, for example, dot plot, that's some magic.", 'start': 1538.356, 'duration': 3.382}, {'end': 1544.1, 'text': 'we never even imported matplotlib.', 'start': 1541.738, 'duration': 2.362}, {'end': 1549.786, 'text': "so you can either just use what they give you, you don't have to put up with this,", 'start': 1544.1, 'duration': 5.686}, {'end': 1553.969, 'text': 'or you can make custom matplotlib graphs if you want to learn more about matplotlib.', 'start': 1549.786, 'duration': 4.183}, {'end': 1558.013, 'text': "you told i've got like a total super long series on.", 'start': 1553.969, 'duration': 4.044}, {'end': 1560.175, 'text': 'uh, do data analysis?', 'start': 1558.013, 'duration': 2.162}, {'end': 1560.756, 'text': 'data viz.', 'start': 1560.175, 'duration': 0.581}, {'end': 1563.848, 'text': 'Maybe I thought I clicked it.', 'start': 1562.628, 'duration': 1.22}, {'end': 1564.808, 'text': 'Anyway, there we go.', 'start': 1564.148, 'duration': 0.66}, {'end': 1568.169, 'text': 'And you can go through this tutorial.', 'start': 1565.569, 'duration': 2.6}], 'summary': 'Pandas offers magic like dot plot without importing matplotlib. options for default or custom matplotlib graphs are available for data visualization and analysis.', 'duration': 29.813, 'max_score': 1538.356, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1538356.jpg'}, {'end': 1620.262, 'src': 'embed', 'start': 1590.899, 'weight': 6, 'content': [{'end': 1593.821, 'text': 'Beautiful So this is all of them compared to each other.', 'start': 1590.899, 'duration': 2.922}, {'end': 1598.483, 'text': 'Yes, it might be useful to have the legend actually, but it is what it is.', 'start': 1593.961, 'duration': 4.522}, {'end': 1601.985, 'text': 'Why do we have this empty gap? We talked about it already.', 'start': 1598.703, 'duration': 3.282}, {'end': 1605.947, 'text': "It's because of the NA values.", 'start': 1602.826, 'duration': 3.121}, {'end': 1611.09, 'text': 'So we can actually say drop an A and then plot it.', 'start': 1606.588, 'duration': 4.502}, {'end': 1613.992, 'text': "And then now you don't have that gap there.", 'start': 1612.551, 'duration': 1.441}, {'end': 1620.262, 'text': 'Anyway, Um, I think that is a stopping point.', 'start': 1615.253, 'duration': 5.009}], 'summary': 'Comparing data, addressing na values, and improving visualization.', 'duration': 29.363, 'max_score': 1590.899, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1590899.jpg'}], 'start': 1224.782, 'title': 'Data frame date issue, sorting, iteration, and data visualization tutorial', 'summary': 'Covers identifying data frame date issue, sorting and iterating through a pandas dataframe, and exploring pandas data visualization, leading to enhanced visual representation and faster data analysis.', 'chapters': [{'end': 1325.635, 'start': 1224.782, 'title': 'Data frame date issue', 'summary': 'Discusses identifying a data frame issue where multiple identical dates are causing problems with joining, leading to ram explosion, and the underlying reason for the issue is identified as the date.', 'duration': 100.853, 'highlights': ['Identifying the data frame issue with multiple identical dates causing problems with joining, leading to RAM explosion.', 'Explaining the underlying reason for the issue, which is identified as the date causing duplicate indexes.', 'Noting the presence of two different types (conventional and inorganic) in the data frame, leading to duplicate dates.']}, {'end': 1480.798, 'start': 1326.437, 'title': 'Pandas dataframe sorting and iteration', 'summary': 'Demonstrates sorting and iterating through a pandas dataframe, including examples of using df.sort_values to sort by a specific column and iterating through the dataframe to perform operations.', 'duration': 154.361, 'highlights': ["The chapter covers sorting a Pandas DataFrame using df.sort_values by a specific column, with an example of sorting by the 'date' column in ascending order.", 'The speaker iterates through the DataFrame using df.head to display the first few rows of the DataFrame.', 'The chapter includes examples of using df.copy to create a copy of the DataFrame and setting the index of the DataFrame.', "The speaker demonstrates the usage of df.sort_values to sort the DataFrame by a specific column, providing insights into the parameters like 'ascending' and 'inplace'."]}, {'end': 1662.499, 'start': 1481.098, 'title': 'Pandas data visualization tutorial', 'summary': 'Explores pandas data visualization, showcasing how to plot graphs, customize charts, and handle na values, leading to faster data analysis and enhanced visual representation.', 'duration': 181.401, 'highlights': ['Pandas data visualization tutorial covers plot customization, including adjusting graph size and removing legends to enhance visual representation and improve data analysis.', 'The tutorial demonstrates the handling of NA values in the graph by dropping them, resulting in a more accurate visual representation and improved data analysis.', 'The chapter introduces the efficient utilization of Pandas for data visualization, highlighting the speed of data processing and the ease of creating visualizations without importing matplotlib.']}], 'duration': 437.717, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/DamIIzp41Jg/pics/DamIIzp41Jg1224782.jpg', 'highlights': ['Identifying the data frame issue with multiple identical dates causing problems with joining, leading to RAM explosion.', 'Explaining the underlying reason for the issue, which is identified as the date causing duplicate indexes.', 'Noting the presence of two different types (conventional and inorganic) in the data frame, leading to duplicate dates.', "The chapter covers sorting a Pandas DataFrame using df.sort_values by a specific column, with an example of sorting by the 'date' column in ascending order.", "The speaker demonstrates the usage of df.sort_values to sort the DataFrame by a specific column, providing insights into the parameters like 'ascending' and 'inplace'.", 'Pandas data visualization tutorial covers plot customization, including adjusting graph size and removing legends to enhance visual representation and improve data analysis.', 'The tutorial demonstrates the handling of NA values in the graph by dropping them, resulting in a more accurate visual representation and improved data analysis.', 'The chapter introduces the efficient utilization of Pandas for data visualization, highlighting the speed of data processing and the ease of creating visualizations without importing matplotlib.', 'The speaker iterates through the DataFrame using df.head to display the first few rows of the DataFrame.', 'The chapter includes examples of using df.copy to create a copy of the DataFrame and setting the index of the DataFrame.']}], 'highlights': ['The process involves using the rolling mean function on the average price data, passing a number for the window, and then plotting the results.', 'By using a 25-point moving average, we can condense fluctuations and smooth out the data.', "The tutorial emphasizes the issues with the dates running over each other in the plotted graph, indicating potential problems with Pandas' date recognition.", 'The chapter aims to provide further insights into the avocado data set before transitioning to a new and exciting data set in the next tutorial.', "The first thing to do is to recreate the import, read the CSV file, filter the data for 'Albany', set the index as Date, and plot the average price, with a focus on addressing issues with the visualizations.", 'The chapter demonstrates converting a column to date time using pd.to_datetime and reassigning values in Pandas.', 'It also discusses graphing the data to visualize the dates and the busy appearance of the graph.', 'Identifying the need to check if the data is out of order when encountering a chart with unusual patterns.', 'Sorting the data frame ensures proper chronological order for successful graph representation.', 'The warning from Pandas advises to use df.copy to create a new copy of the data frame and reference it without impacting the original data frame.', 'The chapter emphasizes the ease of obtaining unique values for graphing using dfregion.unique, simplifying the process of iterating over regions.', 'The chapter emphasizes using pandas for data manipulation due to its efficiency compared to Python, as pandas is likely using C++ while Python is slow.', 'The task is to reshape the data frame so that the columns represent the regions and the values are the 25 moving average, with the date as the index.', 'The process involves creating a data frame and joining multiple data frames on an index, potentially with unique column names using F strings.', 'The speaker encounters a RAM issue during the process, observing the RAM usage exploding without understanding the cause.', 'Identifying the data frame issue with multiple identical dates causing problems with joining, leading to RAM explosion.', 'Explaining the underlying reason for the issue, which is identified as the date causing duplicate indexes.', 'Noting the presence of two different types (conventional and inorganic) in the data frame, leading to duplicate dates.', "The chapter covers sorting a Pandas DataFrame using df.sort_values by a specific column, with an example of sorting by the 'date' column in ascending order.", "The speaker demonstrates the usage of df.sort_values to sort the DataFrame by a specific column, providing insights into the parameters like 'ascending' and 'inplace'.", 'Pandas data visualization tutorial covers plot customization, including adjusting graph size and removing legends to enhance visual representation and improve data analysis.', 'The tutorial demonstrates the handling of NA values in the graph by dropping them, resulting in a more accurate visual representation and improved data analysis.', 'The chapter introduces the efficient utilization of Pandas for data visualization, highlighting the speed of data processing and the ease of creating visualizations without importing matplotlib.', 'The speaker iterates through the DataFrame using df.head to display the first few rows of the DataFrame.', 'The chapter includes examples of using df.copy to create a copy of the DataFrame and setting the index of the DataFrame.']}