title
Introduction - Data Analysis and Data Science with Python and Pandas

description
Welcome to a data analysis tutorial with Python and the Pandas data analysis library. The field of data analytics is quite large and what you might be aiming to do with it is likely to never match up exactly to any tutorial. With that in mind, I think the best way for us to approach learning data analysis with Python is simply by example. My plan here is to find some datasets and do some of the common data analysis tasks, using the Pandas package, to hopefully get you familiar enough with the package to work with it on your own. Text-based tutorial: https://pythonprogramming.net/introduction-python3-pandas-data-analysis/ Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join Discord: https://discord.gg/sentdex Support the content: https://pythonprogramming.net/support-donate/ Twitter: https://twitter.com/sentdex Facebook: https://www.facebook.com/pythonprogramming.net/ Twitch: https://www.twitch.tv/sentdex G+: https://plus.google.com/+sentdex

detail
{'title': 'Introduction - Data Analysis and Data Science with Python and Pandas', 'heatmap': [{'end': 458.048, 'start': 426.392, 'weight': 0.947}, {'end': 574.15, 'start': 543.417, 'weight': 0.744}], 'summary': 'Series covers tutorials on data analysis with python and pandas, setting up pandas and matplotlib, exploring datasets from kaggle, working with pandas data frames, referencing and manipulating data, and immediate plotting of data in jupyter notebooks, attracting an audience with a 10:0 gender split.', 'chapters': [{'end': 121.139, 'segs': [{'end': 59.962, 'src': 'embed', 'start': 20.953, 'weight': 0, 'content': [{'end': 26.477, 'text': "I think the best way to learn something like this is just to jump in to it, so I'm not really going to explain much more than that.", 'start': 20.953, 'duration': 5.524}, {'end': 28.759, 'text': "So let's go ahead and get into it.", 'start': 26.537, 'duration': 2.222}, {'end': 31.921, 'text': "What we're going to be doing in this series is just playing with a variety of datasets,", 'start': 28.839, 'duration': 3.082}, {'end': 36.625, 'text': "and I'm just going to hopefully get you guys comfortable and familiar with interacting with the Pandas library.", 'start': 31.921, 'duration': 4.704}, {'end': 44.21, 'text': 'By no means can I teach you everything there, so if you just go to Google and type like Pandas Python Docs or something like that, in fact..', 'start': 37.245, 'duration': 6.965}, {'end': 50.995, 'text': "you'll probably land, maybe even at the home page or something like that.", 'start': 46.031, 'duration': 4.964}, {'end': 54.537, 'text': 'but then, if you go to documentation and then go to the API,', 'start': 50.995, 'duration': 3.542}, {'end': 59.962, 'text': 'reference here is where basically all the things you can do in pandas is listed out for you.', 'start': 54.537, 'duration': 5.425}], 'summary': 'Series will cover playing with datasets using pandas library.', 'duration': 39.009, 'max_score': 20.953, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg20953.jpg'}, {'end': 110.387, 'src': 'embed', 'start': 81.939, 'weight': 2, 'content': [{'end': 83.82, 'text': 'So hopefully 3.6 or greater.', 'start': 81.939, 'duration': 1.881}, {'end': 87.203, 'text': "I'm gonna be using Python 3.7 and then pandas.", 'start': 83.82, 'duration': 3.383}, {'end': 94.209, 'text': "the version will be 0.2, 4.1, unless there's been an update in like the last day.", 'start': 87.203, 'duration': 7.006}, {'end': 102.416, 'text': "But if you're having some sort of deprecation or warning issues or whatever, check it out in 0.24.1 if you need to,", 'start': 94.209, 'duration': 8.207}, {'end': 110.387, 'text': "and if you don't know how to do that, you could always say pip install, pandas equals, equals 0.24.1 or something like that.", 'start': 102.416, 'duration': 7.971}], 'summary': 'Using python 3.7 and pandas version 0.24.1 for compatibility and updates.', 'duration': 28.448, 'max_score': 81.939, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg81939.jpg'}], 'start': 2.153, 'title': 'Pandas tutorial with python for data analysis', 'summary': 'Introduces a tutorial series on data analysis using python and pandas library, emphasizing interaction with row and column type data and the importance of referring to pandas documentation.', 'chapters': [{'end': 121.139, 'start': 2.153, 'title': 'Pandas tutorial with python for data analysis', 'summary': 'Introduces a tutorial series on data analysis using the python programming language and the pandas library, focusing on interacting with row and column type data, and emphasizes the importance of referring to the pandas documentation for comprehensive learning.', 'duration': 118.986, 'highlights': ['The tutorial series focuses on playing with various datasets to familiarize users with interacting with the Pandas library. The chapter emphasizes playing with a variety of datasets to help users get comfortable and familiar with interacting with the Pandas library.', 'The importance of referring to the Pandas documentation, specifically the API reference, is highlighted for comprehensive learning. The chapter emphasizes the importance of referring to the Pandas documentation, particularly the API reference, for comprehensive learning.', 'Instructions for installing the necessary tools, including Python 3.6 or greater and Pandas version 0.24.1, are provided. The chapter provides instructions for installing the necessary tools, including Python 3.6 or greater and Pandas version 0.24.1.']}], 'duration': 118.986, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg2153.jpg', 'highlights': ['The tutorial series focuses on playing with various datasets to familiarize users with interacting with the Pandas library.', 'The importance of referring to the Pandas documentation, specifically the API reference, is highlighted for comprehensive learning.', 'Instructions for installing the necessary tools, including Python 3.6 or greater and Pandas version 0.24.1, are provided.']}, {'end': 274.364, 'segs': [{'end': 158.052, 'src': 'embed', 'start': 121.619, 'weight': 1, 'content': [{'end': 126.744, 'text': "We're going to grab matplotlib, so pandas for data analysis stuff, matplotlib for some basic visualization.", 'start': 121.619, 'duration': 5.125}, {'end': 134.291, 'text': "We're not going to make this a matplotlib tutorial series by any means, but there's some cool quick things we can do with matplotlib, so we will.", 'start': 127.205, 'duration': 7.086}, {'end': 138.775, 'text': "And then I'm going to be using Jupiter Lab.", 'start': 134.972, 'duration': 3.803}, {'end': 142.118, 'text': "I've already got Jupiter Lab installed, so I'm not actually going to try to install it.", 'start': 139.356, 'duration': 2.762}, {'end': 143.039, 'text': "It wouldn't matter anyway.", 'start': 142.198, 'duration': 0.841}, {'end': 144.54, 'text': "But that's the editor I'm going to use.", 'start': 143.319, 'duration': 1.221}, {'end': 147.583, 'text': "It's just an interactive Python notebook type of editor.", 'start': 144.6, 'duration': 2.983}, {'end': 153.948, 'text': "If you want to use PyCharm or Sublime Text or whatever the heck idle you want to use, it doesn't matter to me.", 'start': 147.963, 'duration': 5.985}, {'end': 158.052, 'text': 'I do like interactive Python notebooks for data analysis because it lets you kind of poke around.', 'start': 154.229, 'duration': 3.823}], 'summary': 'Using matplotlib and pandas for data analysis, with jupiter lab as the interactive python notebook editor.', 'duration': 36.433, 'max_score': 121.619, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg121619.jpg'}, {'end': 204.787, 'src': 'embed', 'start': 181.018, 'weight': 0, 'content': [{'end': 187.881, 'text': "While that's installing, we'll just talk about a couple of things involved here, and also let me check the version just to make sure.", 'start': 181.018, 'duration': 6.863}, {'end': 192.982, 'text': 'Panda, yeah, there it is, 0.2, 4.1.', 'start': 189.581, 'duration': 3.401}, {'end': 198.245, 'text': "Okay, so while that's installing, let's talk a little bit about how Pandas work.", 'start': 192.983, 'duration': 5.262}, {'end': 204.787, 'text': "So, basically, you're gonna read in anything, anything that has a sort of anything that has a rows and columns sort of structure,", 'start': 198.305, 'duration': 6.482}], 'summary': 'Discussion on installing pandas version 0.2, 4.1 and an overview of how pandas work.', 'duration': 23.769, 'max_score': 181.018, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg181018.jpg'}], 'start': 121.619, 'title': 'Setting up pandas and matplotlib', 'summary': 'Discusses setting up pandas and matplotlib, including the use of jupyter lab as the editor and how pandas work, covering data input and output formats.', 'chapters': [{'end': 274.364, 'start': 121.619, 'title': 'Setting up pandas and matplotlib', 'summary': 'Discusses setting up pandas and matplotlib for data analysis, mentioning the use of jupyter lab as the editor. it also covers how pandas work, including reading in and pushing out data from various formats.', 'duration': 152.745, 'highlights': ['The chapter discusses setting up Pandas and Matplotlib for data analysis The speaker mentions grabbing matplotlib and pandas for data analysis.', 'Mentioning the use of Jupyter Lab as the editor The speaker mentions using Jupyter Lab as the editor for interactive Python notebooks.', 'The chapter covers how Pandas work, including reading in and pushing out data from various formats The speaker explains how Pandas can read in different data formats such as JSON, Excel, CSV, SQL, HTML, and HDF5, and also push out data from one format to another, such as SQL to JSON or HTML to SQL.']}], 'duration': 152.745, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg121619.jpg', 'highlights': ['The chapter covers how Pandas work, including reading in and pushing out data from various formats', 'The chapter discusses setting up Pandas and Matplotlib for data analysis', 'Mentioning the use of Jupyter Lab as the editor']}, {'end': 458.048, 'segs': [{'end': 302.762, 'src': 'embed', 'start': 274.364, 'weight': 5, 'content': [{'end': 277.487, 'text': "Google BigQuery is also pretty cool, Because that's new.", 'start': 274.364, 'duration': 3.123}, {'end': 278.528, 'text': "I haven't actually seen that.", 'start': 277.487, 'duration': 1.041}, {'end': 285.934, 'text': "I've never actually used BigQuery here, and I wonder how good It is like if it'll load in batches or not.", 'start': 278.528, 'duration': 7.406}, {'end': 287.875, 'text': "I actually don't even know because that's really challenging.", 'start': 286.014, 'duration': 1.861}, {'end': 291.176, 'text': "And I'm just finding out for the first time that this supports BigQuery.", 'start': 288.375, 'duration': 2.801}, {'end': 291.977, 'text': 'So cool.', 'start': 291.316, 'duration': 0.661}, {'end': 292.777, 'text': 'All right.', 'start': 291.997, 'duration': 0.78}, {'end': 295.358, 'text': 'It looks like we got everything installed so I can quit yapping.', 'start': 292.797, 'duration': 2.561}, {'end': 299.12, 'text': 'We are going to use data sets here from Kaggle.', 'start': 295.998, 'duration': 3.122}, {'end': 302.762, 'text': "It's not sponsored or supported in any way by Kaggle.", 'start': 299.74, 'duration': 3.022}], 'summary': 'Excited about using google bigquery for the first time, for kaggle data sets.', 'duration': 28.398, 'max_score': 274.364, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg274364.jpg'}, {'end': 349.709, 'src': 'embed', 'start': 308.905, 'weight': 0, 'content': [{'end': 313.967, 'text': 'If you just go to Kaggle.com, make an account, come to datasets, and then you can just kind of poke around if you want.', 'start': 308.905, 'duration': 5.062}, {'end': 319.63, 'text': "There's 14,590 datasets right now, but there's always new ones coming in.", 'start': 314.327, 'duration': 5.303}, {'end': 322.931, 'text': "We're going to be using the avocado dataset first.", 'start': 320.27, 'duration': 2.661}, {'end': 325.252, 'text': "If you didn't know, avocado is actually a fruit.", 'start': 322.951, 'duration': 2.301}, {'end': 329.4, 'text': 'Fascinating Also, most closely, a berry.', 'start': 325.733, 'duration': 3.667}, {'end': 330.943, 'text': 'I would not expect that.', 'start': 329.821, 'duration': 1.122}, {'end': 336.496, 'text': "Imagine getting like a fruit salad and it's like avocado or fruit flavored something and it's avocado.", 'start': 330.963, 'duration': 5.533}, {'end': 341.744, 'text': "So anyway, we're going to be using the avocado prices data sets.", 'start': 338.482, 'duration': 3.262}, {'end': 342.945, 'text': 'Go ahead and grab that download.', 'start': 341.764, 'duration': 1.181}, {'end': 344.446, 'text': 'You have to have an account to download it.', 'start': 342.985, 'duration': 1.461}, {'end': 345.487, 'text': 'So make an account.', 'start': 344.806, 'duration': 0.681}, {'end': 348.188, 'text': 'And then a lot of times on these data sets, kind of cool.', 'start': 346.387, 'duration': 1.801}, {'end': 349.709, 'text': 'They have like some explanation.', 'start': 348.228, 'duration': 1.481}], 'summary': 'Kaggle offers 14,590 datasets; using avocado dataset; account required for download.', 'duration': 40.804, 'max_score': 308.905, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg308905.jpg'}, {'end': 414.904, 'src': 'embed', 'start': 369.662, 'weight': 2, 'content': [{'end': 370.543, 'text': "Um, so that's kind of neat.", 'start': 369.662, 'duration': 0.881}, {'end': 372.304, 'text': 'Um, let me think here.', 'start': 371.143, 'duration': 1.161}, {'end': 374.625, 'text': 'Oh, the other thing I would want to show you guys is the kernels.', 'start': 372.424, 'duration': 2.201}, {'end': 378.628, 'text': 'A lot of data sets, people can write kernels, which is basically a Python notebook.', 'start': 374.645, 'duration': 3.983}, {'end': 384.871, 'text': "So you can see what other people are doing with those data sets and learn even more because most of the time they're going to be using pandas.", 'start': 379.248, 'duration': 5.623}, {'end': 390.935, 'text': "Uh, and then other things on top of pandas, maybe at scikit-learn, which is what we'll use scikit-learn towards the end.", 'start': 384.891, 'duration': 6.044}, {'end': 395.037, 'text': 'Uh, XGBoost is really popular or maybe TensorFlow, Keras, that kind of stuff.', 'start': 390.955, 'duration': 4.082}, {'end': 399.814, 'text': 'So once you feel comfortable and you really wanna see, like how good are you?', 'start': 395.911, 'duration': 3.903}, {'end': 405.798, 'text': 'or what kind of things are you missing in terms of doing data analytics and then maybe later doing like predictions, stuff like that,', 'start': 399.814, 'duration': 5.984}, {'end': 407.779, 'text': 'you can always go to the competitions on Kaggle.', 'start': 405.798, 'duration': 1.981}, {'end': 413.963, 'text': 'I think this is the best online like in terms of online coding competitions being applicable to the real world.', 'start': 407.799, 'duration': 6.164}, {'end': 414.904, 'text': 'this is the best place,', 'start': 413.963, 'duration': 0.941}], 'summary': 'Kaggle offers kernels for data analysis, competitions, and learning with popular tools like pandas and scikit-learn.', 'duration': 45.242, 'max_score': 369.662, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg369662.jpg'}, {'end': 458.048, 'src': 'heatmap', 'start': 426.392, 'weight': 0.947, 'content': [{'end': 429.275, 'text': 'Okay, so download the Kaggle Avocado dataset.', 'start': 426.392, 'duration': 2.883}, {'end': 433.019, 'text': "I'm going to put it in a directory literally called datasets.", 'start': 429.675, 'duration': 3.344}, {'end': 436.742, 'text': "So datasets, there's my avocado.", 'start': 433.699, 'duration': 3.043}, {'end': 439.205, 'text': "So now what I'm going to do is go ahead and run JupyterLab.", 'start': 437.163, 'duration': 2.042}, {'end': 442.368, 'text': 'To run JupyterLab, you literally just open up a command prompt.', 'start': 439.285, 'duration': 3.083}, {'end': 447.019, 'text': "go to the directory that you're working in, then just type jupiter lab.", 'start': 442.368, 'duration': 4.651}, {'end': 448.16, 'text': "that'll open up.", 'start': 447.019, 'duration': 1.141}, {'end': 455.546, 'text': "uh, in your browser, pick, i'm gonna go with notebook again, though if you want to use your own editor, whatever, go for it.", 'start': 448.16, 'duration': 7.386}, {'end': 458.048, 'text': "man, i don't care, or lady, i'm not judging.", 'start': 455.546, 'duration': 2.502}], 'summary': "Download kaggle avocado dataset, store in 'datasets' directory, run jupyterlab.", 'duration': 31.656, 'max_score': 426.392, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg426392.jpg'}], 'start': 274.364, 'title': 'Exploring avocado prices dataset and introduction to data analysis and kaggle', 'summary': 'Covers accessing datasets from kaggle, particularly the avocado prices dataset, and also introduces the use of python libraries like pandas, scikit-learn, xgboost, tensorflow, and keras for practical data analysis and machine learning applied in kaggle competitions.', 'chapters': [{'end': 349.709, 'start': 274.364, 'title': 'Exploring avocado prices dataset from kaggle', 'summary': 'Explores the usage of google bigquery and the process of accessing datasets from kaggle, which offers over 14,590 datasets, focusing on the avocado prices dataset.', 'duration': 75.345, 'highlights': ['The chapter explores the usage of Google BigQuery. The speaker expresses excitement about the usage of Google BigQuery and wonders about its performance in loading data.', 'Kaggle offers over 14,590 datasets Kaggle provides access to a vast collection of datasets, with over 14,590 available for exploration, and continuously introduces new ones.', 'The chapter focuses on the avocado prices dataset The speaker introduces the avocado prices dataset and encourages the audience to download it from Kaggle after creating an account.']}, {'end': 458.048, 'start': 349.729, 'title': 'Introduction to data analysis and kaggle', 'summary': 'Introduces the use of python libraries such as pandas, scikit-learn, xgboost, tensorflow, and keras for data analysis and machine learning, and emphasizes the practical application of these skills through kaggle competitions.', 'duration': 108.319, 'highlights': ['The chapter introduces the use of Python libraries such as pandas, scikit-learn, XGBoost, TensorFlow, and Keras for data analysis and machine learning.', 'Kaggle competitions are highlighted as practical applications for data analysis and machine learning skills, emphasizing their relevance to real-world problem-solving and industry demand.', "The concept of kernels in Kaggle is mentioned, which are essentially Python notebooks used for sharing and learning from others' data analysis and machine learning approaches."]}], 'duration': 183.684, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg274364.jpg', 'highlights': ['Kaggle provides access to a vast collection of datasets, with over 14,590 available for exploration, and continuously introduces new ones.', 'The chapter focuses on the avocado prices dataset and encourages the audience to download it from Kaggle after creating an account.', 'The chapter introduces the use of Python libraries such as pandas, scikit-learn, XGBoost, TensorFlow, and Keras for data analysis and machine learning.', 'Kaggle competitions are highlighted as practical applications for data analysis and machine learning skills, emphasizing their relevance to real-world problem-solving and industry demand.', "The concept of kernels in Kaggle is mentioned, which are essentially Python notebooks used for sharing and learning from others' data analysis and machine learning approaches.", 'The chapter explores the usage of Google BigQuery and expresses excitement about its performance in loading data.']}, {'end': 654.842, 'segs': [{'end': 502.09, 'src': 'embed', 'start': 458.048, 'weight': 0, 'content': [{'end': 460.51, 'text': "i'm actually at 10.", 'start': 458.048, 'duration': 2.462}, {'end': 463.372, 'text': "uh, gender split, so that's pretty exciting.", 'start': 460.51, 'duration': 2.862}, {'end': 465.714, 'text': 'or sex split anyways.', 'start': 463.372, 'duration': 2.342}, {'end': 468.717, 'text': 'um, watch out comment.', 'start': 465.714, 'duration': 3.003}, {'end': 471.119, 'text': "i'm not reading the comment section anymore after my statements.", 'start': 468.717, 'duration': 2.402}, {'end': 474.2, 'text': 'anyway, import pandas as pd.', 'start': 471.119, 'duration': 3.081}, {'end': 478.541, 'text': 'uh, this is just kind of convention that we denote pandas as pd.', 'start': 474.2, 'duration': 4.341}, {'end': 483.182, 'text': "that's just a standard that people do, and you're going to see this in a lot of people's code.", 'start': 478.541, 'duration': 4.641}, {'end': 485.042, 'text': 'the next standard is df.', 'start': 483.182, 'duration': 1.86}, {'end': 489.443, 'text': "generally, if you're going to define a data frame, it's going to be called df.", 'start': 485.042, 'duration': 4.401}, {'end': 494.844, 'text': "now, if you're going to have conflicting names or something like that, you might say avocado, underscore df or something like that,", 'start': 489.443, 'duration': 5.401}, {'end': 498.726, 'text': "but most of the time you'll just see df Short for DataFrame.", 'start': 494.844, 'duration': 3.882}, {'end': 502.09, 'text': 'And DataFrame is just the object that is your columns and rows.', 'start': 498.967, 'duration': 3.123}], 'summary': 'Gender split at 10, using pandas convention, df for dataframe.', 'duration': 44.042, 'max_score': 458.048, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg458048.jpg'}, {'end': 577.332, 'src': 'heatmap', 'start': 523.674, 'weight': 2, 'content': [{'end': 525.396, 'text': 'but panel is just like multiple data frames.', 'start': 523.674, 'duration': 1.722}, {'end': 530.1, 'text': "So it's like a three dimensional, right? So rather it's like columns, rows, and then depth.", 'start': 525.836, 'duration': 4.264}, {'end': 535.571, 'text': 'Anyway, df equals pd dot read underscore csv.', 'start': 531.147, 'duration': 4.424}, {'end': 543.397, 'text': 'So this is a function we can read csv and then later we could do something like df dot to csv and save it out as a csv.', 'start': 535.691, 'duration': 7.706}, {'end': 545.719, 'text': "And we'll do stuff like that in a little bit.", 'start': 543.417, 'duration': 2.302}, {'end': 547.461, 'text': "But for now, we're just going to read csv.", 'start': 545.899, 'duration': 1.562}, {'end': 556.784, 'text': "and data sets, slash avocado dot csv and we'll just read that in first.", 'start': 548.241, 'duration': 8.543}, {'end': 559.825, 'text': 'so that works and we could just like print out a data frame.', 'start': 556.784, 'duration': 3.041}, {'end': 567.768, 'text': 'but as you can see, it always prints out just like gobs of information and like as, as we edit and modify this data frame and work with it,', 'start': 559.825, 'duration': 7.943}, {'end': 569.108, 'text': 'we want to be able to debug.', 'start': 567.768, 'duration': 1.34}, {'end': 574.15, 'text': 'often, like you might say, like you might print something out in python, but this is a lot.', 'start': 569.108, 'duration': 5.042}, {'end': 577.332, 'text': 'so so what you can instead do is use dot head.', 'start': 574.15, 'duration': 3.182}], 'summary': 'Introduction to working with data frames in python for csv files.', 'duration': 53.658, 'max_score': 523.674, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg523674.jpg'}, {'end': 606.314, 'src': 'embed', 'start': 574.15, 'weight': 3, 'content': [{'end': 577.332, 'text': 'so so what you can instead do is use dot head.', 'start': 574.15, 'duration': 3.182}, {'end': 581.915, 'text': "you'll see people using this a lot and all that is is just four, just so we can kind of see a quick snippet.", 'start': 577.332, 'duration': 4.583}, {'end': 585.137, 'text': 'uh, and this will be the first n rows.', 'start': 581.915, 'duration': 3.222}, {'end': 589.66, 'text': 'default is five, but you can always change that, so we can say three and then, boom, you get that.', 'start': 585.137, 'duration': 4.523}, {'end': 596.326, 'text': "Alternatively, sometimes you'll do some calculations, especially on like chronological data, where you'll use like a moving window,", 'start': 590.06, 'duration': 6.266}, {'end': 597.366, 'text': "and head won't be good enough.", 'start': 596.326, 'duration': 1.04}, {'end': 599.508, 'text': "You'll actually want to see like maybe the end.", 'start': 597.387, 'duration': 2.121}, {'end': 606.314, 'text': 'So you like the foot maybe, but instead we call this DF dot tail two or something.', 'start': 599.989, 'duration': 6.325}], 'summary': 'Use .head() to quickly view first n rows; use .tail() for end rows.', 'duration': 32.164, 'max_score': 574.15, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg574150.jpg'}, {'end': 637.745, 'src': 'embed', 'start': 608.877, 'weight': 4, 'content': [{'end': 610.378, 'text': 'And you can go bigger than five too, if you want.', 'start': 608.877, 'duration': 1.501}, {'end': 618.076, 'text': 'So that is just a way for you to visualize.', 'start': 614.835, 'duration': 3.241}, {'end': 620.357, 'text': "So as you can see, we've got quite a few columns here.", 'start': 618.096, 'duration': 2.261}, {'end': 623.178, 'text': "We've got date, average price, total volume.", 'start': 620.877, 'duration': 2.301}, {'end': 625.819, 'text': 'This is like how many avocados were sold that day.', 'start': 623.198, 'duration': 2.621}, {'end': 629.941, 'text': 'These are specific PLUs, I wanna say.', 'start': 626.82, 'duration': 3.121}, {'end': 632.722, 'text': "And then, is that right? Hopefully that's the right term.", 'start': 630.341, 'duration': 2.381}, {'end': 635.384, 'text': 'Anyway, total bags.', 'start': 633.383, 'duration': 2.001}, {'end': 637.745, 'text': "I'm not really like, I guess bags, I don't know.", 'start': 635.444, 'duration': 2.301}], 'summary': 'Visualize data with columns like date, average price, total volume, and specific plus.', 'duration': 28.868, 'max_score': 608.877, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg608877.jpg'}], 'start': 458.048, 'title': 'Working with pandas data frames', 'summary': "Covers the basics of working with pandas data frames, emphasizing conventions for denoting pandas as pd and data frames as df. it also highlights a 10:0 gender split in the audience. additionally, it introduces reading data from a csv file using pandas in python, showcasing the use of functions like 'read_csv' and 'head' to manipulate and visualize the data frame, and provides a brief overview of the dataset structure and columns.", 'chapters': [{'end': 502.09, 'start': 458.048, 'title': 'Pandas data frame basics', 'summary': 'Explains the basics of working with pandas data frames, with a focus on conventions for denoting pandas as pd and data frames as df, also highlighting a 10:0 gender split in the audience.', 'duration': 44.042, 'highlights': ['The audience has a 10:0 gender split, indicating a lack of gender diversity.', 'Conventions for denoting pandas as pd and data frames as df are widely used in coding practices.', 'Data frames are objects consisting of columns and rows, with the standard convention of denoting them as df.']}, {'end': 654.842, 'start': 502.451, 'title': 'Reading and visualizing data', 'summary': "Introduces reading data from a csv file using pandas in python, illustrating the use of functions like 'read_csv' and 'head' to manipulate and visualize the data frame, with a brief overview of the dataset structure and columns.", 'duration': 152.391, 'highlights': ["The function 'pd.read_csv' is used to read data from a CSV file, and 'df.to_csv' can be used to save the data as a CSV, providing a practical approach to working with data.", "The method 'head' is utilized to display a quick preview of the first few rows of the data frame, facilitating easy debugging and data visualization.", "The concept of using 'tail' to view the end of the data frame, especially useful for chronological data and moving window calculations, is explained, offering a comprehensive understanding of data visualization techniques.", "The dataset's columns, including 'date,' 'average price,' 'total volume,' and 'region,' are mentioned, providing insights into the structure and content of the dataset."]}], 'duration': 196.794, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg458048.jpg', 'highlights': ['Conventions for denoting pandas as pd and data frames as df are widely used in coding practices.', 'The audience has a 10:0 gender split, indicating a lack of gender diversity.', "The function 'pd.read_csv' is used to read data from a CSV file, and 'df.to_csv' can be used to save the data as a CSV, providing a practical approach to working with data.", "The method 'head' is utilized to display a quick preview of the first few rows of the data frame, facilitating easy debugging and data visualization.", "The dataset's columns, including 'date,' 'average price,' 'total volume,' and 'region,' are mentioned, providing insights into the structure and content of the dataset."]}, {'end': 1010.862, 'segs': [{'end': 684.985, 'src': 'embed', 'start': 654.942, 'weight': 0, 'content': [{'end': 661.487, 'text': "But anyway, what we're going to do is now like, if you wanted to reference a specific column,", 'start': 654.942, 'duration': 6.545}, {'end': 666.03, 'text': 'you could say something like this DF and then use square brackets and then in quotes', 'start': 661.487, 'duration': 4.543}, {'end': 668.852, 'text': 'So kind of like you would a dictionary.', 'start': 666.471, 'duration': 2.381}, {'end': 673.076, 'text': "DF, we'll go average price.", 'start': 669.353, 'duration': 3.723}, {'end': 677.399, 'text': 'So as you can see, this just kind of prints out to us the average price.', 'start': 673.636, 'duration': 3.763}, {'end': 684.985, 'text': "And again, the same thing we can do is .head, just to get a few, just to get a snippet of what we're working with.", 'start': 677.459, 'duration': 7.526}], 'summary': 'Reference specific column using df and square brackets, like a dictionary. print average price and use .head to get a snippet.', 'duration': 30.043, 'max_score': 654.942, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg654942.jpg'}, {'end': 749.057, 'src': 'embed', 'start': 704.847, 'weight': 3, 'content': [{'end': 706.728, 'text': "But nobody does this, don't do this.", 'start': 704.847, 'duration': 1.881}, {'end': 710.972, 'text': 'I might have seen someone use this syntax one time.', 'start': 707.149, 'duration': 3.823}, {'end': 712.973, 'text': 'Just know it exists.', 'start': 711.392, 'duration': 1.581}, {'end': 721.199, 'text': "So sometimes a column name might actually be something that you think is a panda's method or functionality, and it's not.", 'start': 713.433, 'duration': 7.766}, {'end': 724.722, 'text': "It's actually the person's column name, and you just were none the wiser.", 'start': 721.279, 'duration': 3.443}, {'end': 729.185, 'text': "And that's kind of why I think it's better to use this terminology instead.", 'start': 724.882, 'duration': 4.303}, {'end': 729.986, 'text': "I'll leave that there.", 'start': 729.305, 'duration': 0.681}, {'end': 731.567, 'text': "why it's better to use this?", 'start': 730.706, 'duration': 0.861}, {'end': 732.707, 'text': 'because this is totally clear.', 'start': 731.567, 'duration': 1.14}, {'end': 738.711, 'text': 'anyone reading your code understands average price is um, is a column name, because imagine,', 'start': 732.707, 'duration': 6.004}, {'end': 745.275, 'text': "you could have this called like you could have it called mean or something like that, where people are like thinking oh, it's an average.", 'start': 738.711, 'duration': 6.564}, {'end': 746.675, 'text': 'it would be very confusing.', 'start': 745.275, 'duration': 1.4}, {'end': 749.057, 'text': 'so anyway, use that one.', 'start': 746.675, 'duration': 2.382}], 'summary': 'Use clear and descriptive column names to avoid confusion and misinterpretation in code.', 'duration': 44.21, 'max_score': 704.847, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg704847.jpg'}, {'end': 805.88, 'src': 'embed', 'start': 772.435, 'weight': 1, 'content': [{'end': 775.477, 'text': "we could get a list of, like all the regions and we'll talk about that.", 'start': 772.435, 'duration': 3.042}, {'end': 779.54, 'text': "but let's say we just wanted just the albany region only.", 'start': 775.477, 'duration': 4.063}, {'end': 780.801, 'text': 'how could we do that?', 'start': 779.54, 'duration': 1.261}, {'end': 787.106, 'text': "well, we could say instead and so now we're gonna have like the main df and then we're gonna make a new data frame.", 'start': 780.801, 'duration': 6.305}, {'end': 791.429, 'text': "so now we'd probably like to say something a little more like albany df.", 'start': 787.106, 'duration': 4.323}, {'end': 793.33, 'text': "okay, So that'll be our new data frame.", 'start': 791.429, 'duration': 1.901}, {'end': 805.88, 'text': "We're gonna say that's the data frame where the data frame region is equal to Albany, Albany.", 'start': 793.35, 'duration': 12.53}], 'summary': 'Creating a new data frame for the albany region.', 'duration': 33.445, 'max_score': 772.435, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg772435.jpg'}, {'end': 981.276, 'src': 'embed', 'start': 933.887, 'weight': 2, 'content': [{'end': 940.049, 'text': 'Now, it does have, apparently, this meaningless column number, which is pointless.', 'start': 933.887, 'duration': 6.162}, {'end': 944.13, 'text': "And we'll talk later about how we can handle for that.", 'start': 942.01, 'duration': 2.12}, {'end': 946.611, 'text': 'But mostly, we just want the date to be the index.', 'start': 944.19, 'duration': 2.421}, {'end': 948.292, 'text': "We're not really worried about extra columns.", 'start': 946.651, 'duration': 1.641}, {'end': 953.994, 'text': "There's a bunch of different ways, as we like load data in and later we can drop columns blah, blah, blah, blah blah,", 'start': 949.532, 'duration': 4.462}, {'end': 955.515, 'text': "But for now we're not worried about that.", 'start': 953.994, 'duration': 1.521}, {'end': 956.515, 'text': "We'll handle that later.", 'start': 955.655, 'duration': 0.86}, {'end': 962.058, 'text': 'I just kind of want to show you guys a quick rundown of Ways we can manipulate data with pandas.', 'start': 956.795, 'duration': 5.263}, {'end': 964.619, 'text': 'So anyway, we want a meaningful index.', 'start': 962.138, 'duration': 2.481}, {'end': 981.276, 'text': "So what are we gonna say? We're gonna say Albany underscore DF dot set index as date Did I type Albendy again? Okay, so now it's indexed by date.", 'start': 964.679, 'duration': 16.597}], 'summary': 'Demonstrating how to set a meaningful date index with pandas for data manipulation.', 'duration': 47.389, 'max_score': 933.887, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg933887.jpg'}], 'start': 654.942, 'title': 'Referencing and manipulating data in pandas', 'summary': 'Explains referencing specific columns in a dataframe in pandas using square brackets or dot notation, highlighting the importance of clear terminology for column names. it also covers manipulating and filtering data, creating new data frames, setting meaningful indexes, and handling unnecessary columns in a csv file.', 'chapters': [{'end': 749.057, 'start': 654.942, 'title': 'Referencing data in pandas', 'summary': 'Explains how to reference specific columns in a dataframe in pandas using square brackets or dot notation, highlighting the importance of using clear terminology for column names to avoid confusion and improve code readability.', 'duration': 94.115, 'highlights': ['The chapter demonstrates referencing specific columns in a DataFrame in Pandas using square brackets or dot notation.', 'It shows how to print out the average price from a specific column, providing an example of data manipulation in Pandas.', 'Importance of using clear terminology for column names is emphasized to improve code readability and avoid confusion among readers.', 'The transcript emphasizes the potential confusion when using ambiguous column names, advocating for the use of clear and understandable terminology.']}, {'end': 1010.862, 'start': 749.057, 'title': 'Using pandas to manipulate data', 'summary': 'Covers the process of using pandas to manipulate and filter data, including creating a new data frame for a specific region, setting a meaningful index, and handling unnecessary columns in a csv file.', 'duration': 261.805, 'highlights': ['Creating a new data frame for a specific region, such as Albany, by filtering the main data frame based on the region. Filtering the data frame to only include the Albany region.', "Setting a meaningful index by using the 'set_index' function to index the data by date. Demonstrating the process of setting the index of the data frame as the date.", 'Identifying and addressing unnecessary columns in a CSV file, such as a meaningless column number, to focus on the relevant data. Discussing the need to handle unnecessary columns in a CSV file to focus on essential data.']}], 'duration': 355.92, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg654942.jpg', 'highlights': ['The chapter demonstrates referencing specific columns in a DataFrame in Pandas using square brackets or dot notation.', 'Creating a new data frame for a specific region, such as Albany, by filtering the main data frame based on the region.', "Setting a meaningful index by using the 'set_index' function to index the data by date.", 'Importance of using clear terminology for column names is emphasized to improve code readability and avoid confusion among readers.', 'Identifying and addressing unnecessary columns in a CSV file, such as a meaningless column number, to focus on the relevant data.', 'It shows how to print out the average price from a specific column, providing an example of data manipulation in Pandas.', 'The transcript emphasizes the potential confusion when using ambiguous column names, advocating for the use of clear and understandable terminology.', 'Filtering the data frame to only include the Albany region.', 'Discussing the need to handle unnecessary columns in a CSV file to focus on essential data.']}, {'end': 1295.011, 'segs': [{'end': 1060.085, 'src': 'embed', 'start': 1011.984, 'weight': 0, 'content': [{'end': 1018.992, 'text': 'So this is like your first exposure to pandas in its nonsense.', 'start': 1011.984, 'duration': 7.008}, {'end': 1023.597, 'text': "Sometimes it makes total sense, but it's confusing the first few times you deal with it.", 'start': 1019.092, 'duration': 4.505}, {'end': 1031.465, 'text': 'So with pandas, a lot of these operations, what they do is actually return a new data frame.', 'start': 1024.278, 'duration': 7.187}, {'end': 1034.089, 'text': "so uh, let's go back up here.", 'start': 1031.465, 'duration': 2.624}, {'end': 1036.551, 'text': 'so so you can even see, it happened right here.', 'start': 1034.089, 'duration': 2.462}, {'end': 1043.358, 'text': "we just said dot set index date, and then we didn't ask it to display albany data frame.", 'start': 1036.551, 'duration': 6.807}, {'end': 1046.502, 'text': 'it just um, it just print.', 'start': 1043.358, 'duration': 3.144}, {'end': 1050.843, 'text': "it didn't print it out, but it just displayed it, because that's what this set index does.", 'start': 1046.502, 'duration': 4.341}, {'end': 1053.744, 'text': 'it returns a data frame.', 'start': 1050.843, 'duration': 2.901}, {'end': 1060.085, 'text': 'so um, so we have two options when we want to perform most things.', 'start': 1053.744, 'duration': 6.341}], 'summary': 'Introduction to pandas, operations return new data frame, set index displays data.', 'duration': 48.101, 'max_score': 1011.984, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg1011984.jpg'}, {'end': 1233.488, 'src': 'embed', 'start': 1204.206, 'weight': 1, 'content': [{'end': 1206.066, 'text': "it's like the percent thing.", 'start': 1204.206, 'duration': 1.86}, {'end': 1207.487, 'text': "i'll get it right for the next tutorial.", 'start': 1206.066, 'duration': 1.421}, {'end': 1211.788, 'text': "anyway, there's a thing you can do in jupiter notebooks that will just display the graph immediately.", 'start': 1207.487, 'duration': 4.301}, {'end': 1217.093, 'text': "in that case, if you just see something like this, just like, run it again and it'll pop up anyway.", 'start': 1212.328, 'duration': 4.765}, {'end': 1220.256, 'text': 'so, as you can see here, we can plot stuff like immediately.', 'start': 1217.093, 'duration': 3.163}, {'end': 1223.74, 'text': 'uh, or you can plot really specific columns.', 'start': 1220.256, 'duration': 3.484}, {'end': 1229.043, 'text': "so you could say hey, i want to plot average price, And it'll plot average price.", 'start': 1223.74, 'duration': 5.303}, {'end': 1233.488, 'text': 'I will just say right out of the gate, this is not a correct plot.', 'start': 1229.464, 'duration': 4.024}], 'summary': 'Jupyter notebooks can display graphs immediately, allowing for quick visualization of data.', 'duration': 29.282, 'max_score': 1204.206, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg1204206.jpg'}], 'start': 1011.984, 'title': 'Working with pandas dataframes', 'summary': 'Introduces working with pandas dataframes, featuring the concept of returning a new dataframe from operations, options for performing operations in place or by reassigning, and immediate plotting of data in jupyter notebooks.', 'chapters': [{'end': 1295.011, 'start': 1011.984, 'title': 'Working with pandas dataframes', 'summary': 'Introduces working with pandas dataframes, including the concept of returning a new dataframe from operations, options for performing operations in place or by reassigning, and immediate plotting of data in jupyter notebooks.', 'duration': 283.027, 'highlights': ['Returning a new dataframe from operations, such as set_index, and the option to perform most operations by reassigning or in place.', 'Immediate plotting of data, including the ability to plot specific columns, in Jupyter notebooks.', 'Introduction to working with pandas dataframes, covering the confusion in initial exposure and the need for redefining commands in some cases.']}], 'duration': 283.027, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/nLw1RNvfElg/pics/nLw1RNvfElg1011984.jpg', 'highlights': ['Returning a new dataframe from operations, such as set_index, and the option to perform most operations by reassigning or in place.', 'Immediate plotting of data, including the ability to plot specific columns, in Jupyter notebooks.', 'Introduction to working with pandas dataframes, covering the confusion in initial exposure and the need for redefining commands in some cases.']}], 'highlights': ['Kaggle provides access to over 14,590 datasets for exploration, continuously introducing new ones.', 'Kaggle competitions are highlighted as practical applications for data analysis and machine learning skills.', 'The tutorial series focuses on playing with various datasets to familiarize users with interacting with the Pandas library.', 'The chapter covers how Pandas work, including reading in and pushing out data from various formats.', 'Instructions for installing the necessary tools, including Python 3.6 or greater and Pandas version 0.24.1, are provided.', 'The importance of referring to the Pandas documentation, specifically the API reference, is highlighted for comprehensive learning.', 'The chapter introduces the use of Python libraries such as pandas, scikit-learn, XGBoost, TensorFlow, and Keras for data analysis and machine learning.', 'The chapter explores the usage of Google BigQuery and expresses excitement about its performance in loading data.', 'The audience has a 10:0 gender split, indicating a lack of gender diversity.', 'Conventions for denoting pandas as pd and data frames as df are widely used in coding practices.', "The method 'head' is utilized to display a quick preview of the first few rows of the data frame, facilitating easy debugging and data visualization.", 'The chapter demonstrates referencing specific columns in a DataFrame in Pandas using square brackets or dot notation.', 'Creating a new data frame for a specific region, such as Albany, by filtering the main data frame based on the region.', "Setting a meaningful index by using the 'set_index' function to index the data by date.", 'Immediate plotting of data, including the ability to plot specific columns, in Jupyter notebooks.']}