Coursnap

title
Python Pandas Tutorial (Part 1): Getting Started with Data Analysis - Installation and Loading Data

description
In this video, we will be learning how to get started with Pandas using Python. This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription. In this Python Programming video, we will be learning how to get started with Pandas. Pandas is a Data Analysis Library that allows us to easily read, analyze, and modify data. Pandas is a fundamental tool to learn in the growing field of Data Science. So we'll start by learning how to install Pandas, how to load data into a Jupyter Notebook, and how to see basic information about the data we've loaded in. Let's get started... The code for this video can be found at: http://bit.ly/Pandas-01 Virtual Environment Tutorial - https://youtu.be/Kg1Yvry_Ydk Jupyter Tutorial - https://youtu.be/HW29067qVWk StackOverflow Survey Download Page - http://bit.ly/SO-Survey-Download ✅ Support My Channel Through Patreon: https://www.patreon.com/coreyms ✅ Become a Channel Member: https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join ✅ One-Time Contribution Through PayPal: https://goo.gl/649HFY ✅ Cryptocurrency Donations: Bitcoin Wallet - 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3 Ethereum Wallet - 0x151649418616068fB46C3598083817101d3bCD33 Litecoin Wallet - MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot ✅ Corey's Public Amazon Wishlist http://a.co/inIyro1 ✅ Equipment I Use and Books I Recommend: https://www.amazon.com/shop/coreyschafer ▶️ You Can Find Me On: My Website - http://coreyms.com/ My Second Channel - https://www.youtube.com/c/coreymschafer Facebook - https://www.facebook.com/CoreyMSchafer Twitter - https://twitter.com/CoreyMSchafer Instagram - https://www.instagram.com/coreymschafer/ #Python #Pandas

detail
{'title': 'Python Pandas Tutorial (Part 1): Getting Started with Data Analysis - Installation and Loading Data', 'heatmap': [{'end': 436.691, 'start': 397.566, 'weight': 0.729}, {'end': 928.784, 'start': 890.492, 'weight': 1}, {'end': 1024.568, 'start': 988.405, 'weight': 0.871}, {'end': 1122.41, 'start': 1100.559, 'weight': 0.835}], 'summary': 'The tutorial covers the importance of learning pandas in python for data science, focusing on installation, data analysis, setting up pandas and jupyter, reading csv files, visualizing data frames, analyzing csv data, adjusting data display, and pandas overview. it includes guidance on using virtual environment, installing python packages, and demonstrates data analysis with real-world data from the stack overflow developer survey, utilizing attributes and methods like shape and info to analyze large data frames with 88,000 rows and 85 columns.', 'chapters': [{'end': 67.06, 'segs': [{'end': 51.48, 'src': 'embed', 'start': 0.149, 'weight': 0, 'content': [{'end': 5.452, 'text': "Hey there, how's it going everybody? In this series of videos, we're going to be learning how to use the Pandas library in Python.", 'start': 0.149, 'duration': 5.303}, {'end': 11.634, 'text': 'So Pandas is a data analysis library that allows us to easily read in and work with different types of data.', 'start': 5.832, 'duration': 5.802}, {'end': 16.737, 'text': 'So we can use this to analyze CSV files, Excel files, and other similar formats.', 'start': 12.055, 'duration': 4.682}, {'end': 21.782, 'text': "So if you're getting into the data science field, then this library is going to be essential to learn.", 'start': 17.137, 'duration': 4.645}, {'end': 25.706, 'text': "It's one of the most downloaded packages for Python, and that's for a great reason.", 'start': 21.922, 'duration': 3.784}, {'end': 33.013, 'text': "So not only does it allow us to easily read in and analyze data, but it also has great performance since it's built on top of NumPy.", 'start': 26.086, 'duration': 6.927}, {'end': 37.775, 'text': "And we'll be learning how to do different types of data analysis in this series.", 'start': 33.573, 'duration': 4.202}, {'end': 42.356, 'text': "So in this video we're going to be going over how to get pandas installed,", 'start': 38.255, 'duration': 4.101}, {'end': 51.48, 'text': "how to download the data that I'll be using for most of this series and also how to get all of this open in a Jupyter notebook so that we're ready to do some coding and analysis.", 'start': 42.356, 'duration': 9.124}], 'summary': 'Learn pandas library for data analysis in python, essential for data science, with great performance and widely used.', 'duration': 51.331, 'max_score': 0.149, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA149.jpg'}], 'start': 0.149, 'title': 'Pandas library in python', 'summary': 'Covers the importance of learning pandas in python, its relevance to data science, and its performance. it is one of the most downloaded packages for python, focusing on installation and data analysis.', 'chapters': [{'end': 67.06, 'start': 0.149, 'title': 'Pandas library in python', 'summary': 'Covers the importance of learning pandas, its relevance to data science, and its performance, being one of the most downloaded packages for python, with a focus on installation and data analysis.', 'duration': 66.911, 'highlights': ['Pandas is a data analysis library in Python that is essential for data science, being one of the most downloaded packages for Python and built on top of NumPy, ensuring great performance.', 'The series will cover how to install Pandas, download data, and open it in a Jupyter notebook for coding and analysis.', 'The importance of learning Pandas for data science is emphasized, as it allows easy reading and analysis of different types of data such as CSV files and Excel files.']}], 'duration': 66.911, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA149.jpg', 'highlights': ['Pandas is a data analysis library in Python, essential for data science, most downloaded package for Python, built on top of NumPy, ensuring great performance.', 'The importance of learning Pandas for data science is emphasized, allowing easy reading and analysis of different types of data such as CSV files and Excel files.', 'The series will cover how to install Pandas, download data, and open it in a Jupyter notebook for coding and analysis.']}, {'end': 529.692, 'segs': [{'end': 112.549, 'src': 'embed', 'start': 82.507, 'weight': 1, 'content': [{'end': 88.468, 'text': "then I'll be sure to leave a link to my video on that topic in the description section below, if anyone is interested.", 'start': 82.507, 'duration': 5.961}, {'end': 91.529, 'text': "So it's really easy to install pandas here.", 'start': 88.888, 'duration': 2.641}, {'end': 102.439, 'text': 'All we need to do is say pip, install pandas and we will let this run through and once we have pandas installed,', 'start': 91.929, 'duration': 10.51}, {'end': 106.484, 'text': "then let's also install jupyter so that we can use jupyter notebooks.", 'start': 102.439, 'duration': 4.045}, {'end': 112.549, 'text': 'Now I was a bit hesitant to use Jupyter for this series because some people find it difficult to get the hang of.', 'start': 107.426, 'duration': 5.123}], 'summary': 'Installing pandas and jupyter using pip for data analysis tutorials.', 'duration': 30.042, 'max_score': 82.507, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA82507.jpg'}, {'end': 200.102, 'src': 'embed', 'start': 168.515, 'weight': 0, 'content': [{'end': 174.822, 'text': "And I'll leave a link to that video in the description section below if anyone would like to learn more about the details of using that.", 'start': 168.515, 'duration': 6.307}, {'end': 178.685, 'text': 'okay. so now we have pandas and jupyter notebooks installed.', 'start': 175.342, 'duration': 3.343}, {'end': 182.788, 'text': "now we're going to need to download the data that i'll be using for most of this series.", 'start': 178.685, 'duration': 4.103}, {'end': 185.53, 'text': "now, for anyone who's been watching my latest videos,", 'start': 182.788, 'duration': 2.742}, {'end': 190.734, 'text': 'you know that i like to use the stack overflow developer survey for different kinds of data analysis.', 'start': 185.53, 'duration': 5.204}, {'end': 200.102, 'text': "now, the reason i like to use this data is because it's real world data and it has a lot of data in there that i think would be interesting to most people who are watching these types of videos.", 'start': 190.734, 'duration': 9.368}], 'summary': 'Using stack overflow developer survey for real world data analysis in the series.', 'duration': 31.587, 'max_score': 168.515, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA168515.jpg'}, {'end': 242.063, 'src': 'embed', 'start': 206.385, 'weight': 2, 'content': [{'end': 207.285, 'text': 'So hopefully,', 'start': 206.385, 'duration': 0.9}, {'end': 217.61, 'text': "using this data will keep people interested and also give you a good idea of what it's like to actually download real data from a source and start analyzing it with pandas.", 'start': 207.285, 'duration': 10.325}, {'end': 222.652, 'text': 'So to download this data, I have this pulled up here in the browser.', 'start': 218.01, 'duration': 4.642}, {'end': 225.934, 'text': 'We can go over to the Stack Overflow survey results page.', 'start': 222.712, 'duration': 3.222}, {'end': 229.096, 'text': 'Now this is easy to find if you just Google it.', 'start': 226.374, 'duration': 2.722}, {'end': 234.379, 'text': "but just to keep things easy, I'll have a link to this download page in the description section as well.", 'start': 229.096, 'duration': 5.283}, {'end': 242.063, 'text': 'Okay, now on this page, you can download the data in CSV form for any year that they have available.', 'start': 235.279, 'duration': 6.784}], 'summary': 'Learn to download and analyze real data from stack overflow survey results using pandas.', 'duration': 35.678, 'max_score': 206.385, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA206385.jpg'}, {'end': 283.628, 'src': 'embed', 'start': 255.834, 'weight': 3, 'content': [{'end': 259.036, 'text': 'And this should go ahead and download this for us.', 'start': 255.834, 'duration': 3.202}, {'end': 259.997, 'text': 'OK, it did.', 'start': 259.116, 'duration': 0.881}, {'end': 266.903, 'text': "And now I'm going to open this in my finder here and I'm going to unzip this data.", 'start': 260.017, 'duration': 6.886}, {'end': 268.284, 'text': 'It comes in a zip drive.', 'start': 267.103, 'duration': 1.181}, {'end': 276.226, 'text': "And once that data is downloaded and unzipped, I'm going to go ahead and drag that folder to a folder here on my desktop.", 'start': 268.664, 'duration': 7.562}, {'end': 280.347, 'text': "And that's where we'll also create a notebook and analyze this data.", 'start': 276.566, 'duration': 3.781}, {'end': 283.628, 'text': "So real quick, I don't have this open.", 'start': 280.887, 'duration': 2.741}], 'summary': 'Downloaded, unzipped, and analyzed data for notebook creation.', 'duration': 27.794, 'max_score': 255.834, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA255834.jpg'}, {'end': 436.691, 'src': 'heatmap', 'start': 391.824, 'weight': 4, 'content': [{'end': 396.986, 'text': "And now within here, I'm going to navigate to my folder where I place that data.", 'start': 391.824, 'duration': 5.162}, {'end': 400.847, 'text': 'And this should be the same command on Mac, Linux and Windows.', 'start': 397.566, 'duration': 3.281}, {'end': 407.889, 'text': "So I'm going to say CD, and I'm going to go to my desktop, this is going to be wherever your project directory is.", 'start': 400.867, 'duration': 7.022}, {'end': 410.871, 'text': 'But mine is in this pandas demo on my desktop.', 'start': 408.269, 'duration': 2.602}, {'end': 420.338, 'text': 'And once I am navigated to that directory, to start up a Jupyter notebook, we just need to say Jupyter notebook and run that.', 'start': 411.532, 'duration': 8.806}, {'end': 422.94, 'text': 'And we should see a server startup here.', 'start': 420.659, 'duration': 2.281}, {'end': 425.062, 'text': "And it seems like it's taking a second.", 'start': 423.601, 'duration': 1.461}, {'end': 425.863, 'text': 'Okay, there we go.', 'start': 425.162, 'duration': 0.701}, {'end': 436.691, 'text': "Now back in our terminal here, this will run a Jupyter server and you will need to leave that terminal open while you're working in Jupyter.", 'start': 427.104, 'duration': 9.587}], 'summary': 'Navigation to project folder and starting jupyter notebook on different os.', 'duration': 44.867, 'max_score': 391.824, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA391824.jpg'}, {'end': 509.138, 'src': 'embed', 'start': 460.188, 'weight': 5, 'content': [{'end': 467.073, 'text': 'Okay, so we can see our data folder here that we downloaded and placed in our Jupyter demo folder a little bit ago.', 'start': 460.188, 'duration': 6.885}, {'end': 469.674, 'text': "But now let's create a new notebook.", 'start': 467.773, 'duration': 1.901}, {'end': 473.637, 'text': "So, to create a new notebook, I'm going to click on New up here at the top right.", 'start': 470.034, 'duration': 3.603}, {'end': 480.903, 'text': "then I'm going to use Python 3 and now we can name our notebook.", 'start': 474.197, 'duration': 6.706}, {'end': 489.57, 'text': "so up here, where it says untitled, I'm going to click here and I'm just going to call this pandas demo and rename that okay.", 'start': 480.903, 'duration': 8.667}, {'end': 491.872, 'text': "so now we're ready to start using pandas.", 'start': 489.57, 'duration': 2.302}, {'end': 498.676, 'text': 'so we can import this by saying import pandas as PD.', 'start': 491.872, 'duration': 6.804}, {'end': 503.077, 'text': 'now, importing pandas as PD is just a common convention when using pandas.', 'start': 498.676, 'duration': 4.401}, {'end': 504.617, 'text': "So let's run that.", 'start': 503.477, 'duration': 1.14}, {'end': 509.138, 'text': 'And I ran that cell by pressing shift and enter.', 'start': 505.457, 'duration': 3.681}], 'summary': "Creating a new notebook named 'pandas demo' and importing pandas as pd for use.", 'duration': 48.95, 'max_score': 460.188, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA460188.jpg'}], 'start': 67.481, 'title': 'Setting up pandas and jupyter for data analysis', 'summary': 'Covers the installation of pandas and jupyter for data analysis, including guidance on using virtual environment, installing python packages, and downloading real-world data from the stack overflow developer survey for analysis with pandas.', 'chapters': [{'end': 372.074, 'start': 67.481, 'title': 'Setting up pandas and jupyter for data analysis', 'summary': 'Covers the installation of pandas and jupyter for data analysis, including guidance on using virtual environment, installing python packages, and downloading real-world data from the stack overflow developer survey for analysis with pandas.', 'duration': 304.593, 'highlights': ['The chapter emphasizes the installation of pandas and Jupyter for data analysis, providing guidance on using virtual environment and installing Python packages, including a demonstration of downloading real-world data from the Stack Overflow developer survey (2019) for analysis with pandas.', "The speaker demonstrates the installation process of pandas and Jupyter, highlighting the use of 'pip' to install the required packages, with a focus on using Jupyter for visualizing data and tables, emphasizing its ease of use and benefits for data visualization.", 'The speaker provides practical guidance on downloading real-world data from the Stack Overflow developer survey, emphasizing the relevance of using real-world data for analysis and providing a link to the survey results page for easy access to the data.', 'The chapter includes practical demonstrations of unzipping the downloaded data, organizing it into a project folder, and renaming the files for easy accessibility within the script, providing a comprehensive overview of the data directory and its contents for better understanding during analysis.']}, {'end': 529.692, 'start': 372.674, 'title': 'Opening and using jupyter notebook for data analysis', 'summary': 'Demonstrates how to open and use jupyter notebook for data analysis, including navigating to the project directory, starting a jupyter notebook server, creating a new notebook, and importing pandas for data manipulation.', 'duration': 157.018, 'highlights': ['Starting a Jupyter notebook server', 'Creating a new notebook', 'Importing pandas for data manipulation', 'Navigating to the project directory']}], 'duration': 462.211, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA67481.jpg', 'highlights': ['The chapter emphasizes the installation of pandas and Jupyter for data analysis, providing guidance on using virtual environment and installing Python packages, including a demonstration of downloading real-world data from the Stack Overflow developer survey (2019) for analysis with pandas.', "The speaker demonstrates the installation process of pandas and Jupyter, highlighting the use of 'pip' to install the required packages, with a focus on using Jupyter for visualizing data and tables, emphasizing its ease of use and benefits for data visualization.", 'The speaker provides practical guidance on downloading real-world data from the Stack Overflow developer survey, emphasizing the relevance of using real-world data for analysis and providing a link to the survey results page for easy access to the data.', 'The chapter includes practical demonstrations of unzipping the downloaded data, organizing it into a project folder, and renaming the files for easy accessibility within the script, providing a comprehensive overview of the data directory and its contents for better understanding during analysis.', 'Starting a Jupyter notebook server', 'Creating a new notebook', 'Importing pandas for data manipulation', 'Navigating to the project directory']}, {'end': 766.609, 'segs': [{'end': 611.768, 'src': 'embed', 'start': 529.692, 'weight': 0, 'content': [{'end': 532.696, 'text': 'so our data is in a csv format.', 'start': 529.692, 'duration': 3.004}, {'end': 538.984, 'text': 'so in order to read in that csv, we can simply say df, which is going to stand for data frame.', 'start': 532.696, 'duration': 6.288}, {'end': 541.085, 'text': "We'll learn all about data frames here in a bit.", 'start': 539.024, 'duration': 2.061}, {'end': 547.408, 'text': "We're going to say df is equal to pd.read underscore csv.", 'start': 541.425, 'duration': 5.983}, {'end': 551.75, 'text': "We're going to use the read csv method from pandas here.", 'start': 547.808, 'duration': 3.942}, {'end': 555.452, 'text': 'And now we just want to pass in a path to our csv file.', 'start': 552.211, 'duration': 3.241}, {'end': 564.677, 'text': 'Now, mine was within that data folder and that was within the file survey underscore results, underscore public.', 'start': 555.932, 'duration': 8.745}, {'end': 571.459, 'text': 'So now, if I hit shift enter, then that will run that cell.', 'start': 565.937, 'duration': 5.522}, {'end': 575.34, 'text': 'So right off the bat, we can see that this is pretty simple to work with.', 'start': 571.939, 'duration': 3.401}, {'end': 584.563, 'text': 'So when using native Python in order to read in a CSV file, we need to use the CSV module to create a CSV reader and things like that.', 'start': 575.72, 'duration': 8.843}, {'end': 587.384, 'text': "But here we're just doing this all in one line.", 'start': 584.943, 'duration': 2.441}, {'end': 591.748, 'text': "So when it reads this in, it's going to read it in as a data frame.", 'start': 587.924, 'duration': 3.824}, {'end': 595.411, 'text': 'So data frames are pretty much the backbone of pandas.', 'start': 592.168, 'duration': 3.243}, {'end': 602.478, 'text': "And we'll go over data frames and series objects in depth in the next video.", 'start': 595.732, 'duration': 6.746}, {'end': 607.824, 'text': 'But for the basics, a data frame is basically just rows and columns of data.', 'start': 602.899, 'duration': 4.925}, {'end': 611.768, 'text': 'We can see what a data frame looks like just by printing it out.', 'start': 608.264, 'duration': 3.504}], 'summary': 'Using pandas, df=pd.read_csv method simplifies reading csv, creating data frame.', 'duration': 82.076, 'max_score': 529.692, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA529692.jpg'}, {'end': 740.556, 'src': 'embed', 'start': 688.332, 'weight': 3, 'content': [{'end': 694.276, 'text': 'And shape gives us the number of rows and columns in a tuple form.', 'start': 688.332, 'duration': 5.944}, {'end': 696.077, 'text': "So let's look at this.", 'start': 694.957, 'duration': 1.12}, {'end': 702.902, 'text': "So in our next cell down here, I'm going to say df.shape, and I will run that.", 'start': 696.438, 'duration': 6.464}, {'end': 705.384, 'text': 'Now, this is an attribute here.', 'start': 703.322, 'duration': 2.062}, {'end': 708.065, 'text': "It's not a method, so you don't want to put parentheses.", 'start': 705.584, 'duration': 2.481}, {'end': 714.63, 'text': 'So df.shape, and we can see that we have 88, 000 rows and 85 columns.', 'start': 708.506, 'duration': 6.124}, {'end': 723.338, 'text': 'Now if you wanted a bit more information then we can use the info method.', 'start': 718.252, 'duration': 5.086}, {'end': 730.226, 'text': 'The info method will give us the number of rows and columns and also all of the data types of all the columns as well.', 'start': 723.639, 'duration': 6.587}, {'end': 736.654, 'text': 'Now before I run that it looks like my text is getting cut off here a little bit.', 'start': 731.007, 'duration': 5.647}, {'end': 740.556, 'text': "Sometimes this happens whenever I'm within Jupyter.", 'start': 737.574, 'duration': 2.982}], 'summary': 'The dataset contains 88,000 rows and 85 columns. the info method provides additional column data types.', 'duration': 52.224, 'max_score': 688.332, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA688332.jpg'}], 'start': 529.692, 'title': 'Reading csv files with pandas and visualizing data frames in jupyter', 'summary': 'Covers the process of reading csv files into a data frame using pandas, highlighting the comparison with using native python and the csv module. it also introduces the concept of data frames as the backbone of pandas, discusses the advantages of visualizing data frames in jupyter notebooks, and demonstrates the use of attributes and methods like shape and info to analyze data, including an example of a large data frame with 88,000 rows and 85 columns.', 'chapters': [{'end': 584.563, 'start': 529.692, 'title': 'Reading csv files with pandas', 'summary': 'Covers the process of reading csv files into a data frame using pandas, demonstrating the simplicity of this method and highlighting the comparison with using native python and the csv module.', 'duration': 54.871, 'highlights': ["Pandas' pd.read_csv method simplifies reading CSV files into a data frame, offering efficiency and ease of use.", 'Using native Python to read CSV files requires the use of the CSV module, involving more complex steps compared to the straightforward approach with Pandas.']}, {'end': 766.609, 'start': 584.943, 'title': 'Visualizing data frames in jupyter', 'summary': 'Introduces the concept of data frames as the backbone of pandas, discusses the advantages of visualizing data frames in jupyter notebooks, and demonstrates the use of attributes and methods like shape and info to analyze data, including an example of a data frame with 88,000 rows and 85 columns.', 'duration': 181.666, 'highlights': ['The chapter introduces the concept of data frames as the backbone of pandas, discussing the advantages of visualizing data frames in Jupyter notebooks.', 'The shape attribute provides the number of rows and columns in a tuple form, as demonstrated with an example of a data frame with 88,000 rows and 85 columns.', 'The info method gives the number of rows and columns and also all of the data types of all the columns, providing comprehensive information about the data frame.']}], 'duration': 236.917, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA529692.jpg', 'highlights': ["Pandas' pd.read_csv method simplifies reading CSV files into a data frame, offering efficiency and ease of use.", 'Using native Python to read CSV files requires the use of the CSV module, involving more complex steps compared to the straightforward approach with Pandas.', 'The chapter introduces the concept of data frames as the backbone of pandas, discussing the advantages of visualizing data frames in Jupyter notebooks.', 'The info method gives the number of rows and columns and also all of the data types of all the columns, providing comprehensive information about the data frame.', 'The shape attribute provides the number of rows and columns in a tuple form, as demonstrated with an example of a data frame with 88,000 rows and 85 columns.']}, {'end': 1057.71, 'segs': [{'end': 793.183, 'src': 'embed', 'start': 766.749, 'weight': 0, 'content': [{'end': 770.63, 'text': "So it's kind of messing with how these look, but now we can see these just fine.", 'start': 766.749, 'duration': 3.881}, {'end': 775.973, 'text': 'Okay, so like I was saying, we can see here that we have 88, 883 rows and 85 columns.', 'start': 771.511, 'duration': 4.462}, {'end': 778.634, 'text': 'Now, if you wanted more information, then we can use the info method.', 'start': 775.993, 'duration': 2.641}, {'end': 791.662, 'text': 'And that will give us the number of rows and the number of columns, but also all of the data types of the columns.', 'start': 785.157, 'duration': 6.505}, {'end': 793.183, 'text': "So let's run that.", 'start': 792.223, 'duration': 0.96}], 'summary': 'Dataset contains 883 rows and 85 columns. info method provides data types of columns.', 'duration': 26.434, 'max_score': 766.749, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA766749.jpg'}, {'end': 852.554, 'src': 'embed', 'start': 825.588, 'weight': 2, 'content': [{'end': 829.051, 'text': 'Now it also gives us the data types of each of these columns.', 'start': 825.588, 'duration': 3.463}, {'end': 832.095, 'text': "And we're going to go over data types in a future video.", 'start': 829.372, 'duration': 2.723}, {'end': 837, 'text': 'But for the most part objects usually mean strings.', 'start': 832.955, 'duration': 4.045}, {'end': 839.863, 'text': 'And then we have other things as well.', 'start': 837.8, 'duration': 2.063}, {'end': 842.225, 'text': 'So int64 is just an integer.', 'start': 839.923, 'duration': 2.302}, {'end': 844.587, 'text': 'Float is a float.', 'start': 842.886, 'duration': 1.701}, {'end': 846.99, 'text': 'So probably a decimal number.', 'start': 845.088, 'duration': 1.902}, {'end': 852.554, 'text': 'and there are no other data types in this dataset.', 'start': 848.191, 'duration': 4.363}], 'summary': 'Data types of columns: objects (strings), int64 (integers), float (decimal numbers)', 'duration': 26.966, 'max_score': 825.588, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA825588.jpg'}, {'end': 928.784, 'src': 'heatmap', 'start': 877.796, 'weight': 3, 'content': [{'end': 884.064, 'text': "So to do this, we can change a setting, and I'm going to come down here to the bottom here,", 'start': 877.796, 'duration': 6.268}, {'end': 888.55, 'text': "and I'm going to change a setting by saying pd.set underscore option.", 'start': 884.064, 'duration': 4.486}, {'end': 890.492, 'text': 'and within.', 'start': 889.691, 'duration': 0.801}, {'end': 903.423, 'text': 'here I will say display dot max underscore columns and I will set that equal to 85 so that we can see all of our columns.', 'start': 890.492, 'duration': 12.931}, {'end': 905.164, 'text': 'and I will run that.', 'start': 903.423, 'duration': 1.741}, {'end': 915.633, 'text': "and now, if we print out our data frame, so I'm going to go back up here to where we printed out this data frame and I will rerun that cell.", 'start': 905.164, 'duration': 10.469}, {'end': 923.48, 'text': 'and now, if I scroll through these columns, Then we can see that now it looks like we actually have these 85 different columns here.', 'start': 915.633, 'duration': 7.847}, {'end': 928.784, 'text': "So I can keep scrolling and keep scrolling and it didn't just chop us off at that 20 like it was before.", 'start': 923.56, 'duration': 5.224}], 'summary': 'Changed setting to display 85 columns in data frame.', 'duration': 27.368, 'max_score': 877.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA877796.jpg'}, {'end': 975.622, 'src': 'embed', 'start': 949.796, 'weight': 4, 'content': [{'end': 957.418, 'text': 'file that was included in our download gives the matching questions for all of these column names here', 'start': 949.796, 'duration': 7.622}, {'end': 967.22, 'text': 'So if we wanted to see what these column names here mean for this data, then we can load in that schema CSV file as well.', 'start': 957.958, 'duration': 9.262}, {'end': 968.66, 'text': 'So let me do this.', 'start': 967.88, 'duration': 0.78}, {'end': 975.622, 'text': "I'll go down to the bottom of our notebook and I will just load this in by saying schema underscore DF.", 'start': 968.7, 'duration': 6.922}], 'summary': 'The provided file gives matching questions for all column names, allowing for easy interpretation of the data.', 'duration': 25.826, 'max_score': 949.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA949796.jpg'}, {'end': 1024.568, 'src': 'heatmap', 'start': 988.405, 'weight': 0.871, 'content': [{'end': 998.827, 'text': 'and this is within the data folder and this was called survey underscore results, underscore schema dot csv.', 'start': 988.405, 'duration': 10.422}, {'end': 1000.507, 'text': 'so I will run this.', 'start': 998.827, 'duration': 1.68}, {'end': 1007.372, 'text': "and now let's look at This schema data frame that we just loaded in.", 'start': 1000.507, 'duration': 6.865}, {'end': 1012.336, 'text': 'so here we on this Column column here.', 'start': 1007.372, 'duration': 4.964}, {'end': 1016.921, 'text': 'this gives us all of the columns in our other data frame.', 'start': 1012.336, 'duration': 4.585}, {'end': 1024.568, 'text': "So we have respondent main branch hobbyist and if I scroll up to that data frame here, I'm going to delete this info here,", 'start': 1016.921, 'duration': 7.647}], 'summary': 'Loaded survey_results_schema.csv file and examined the schema data frame with columns like respondent, main branch, and hobbyist.', 'duration': 36.163, 'max_score': 988.405, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA988405.jpg'}], 'start': 766.749, 'title': 'Analyzing csv data and data types in data frames', 'summary': 'Covers analyzing a csv file with 88,883 rows and 85 columns using df.info method, and discusses changing settings to view all 85 columns in a data frame and loading a schema data frame to understand specific column names and data sets.', 'chapters': [{'end': 852.554, 'start': 766.749, 'title': 'Analyzing csv data', 'summary': 'Covers analyzing a csv file with 88,883 rows and 85 columns, demonstrating the use of the df.info method to obtain data type information.', 'duration': 85.805, 'highlights': ['The dataset contains 88,883 rows and 85 columns.', 'Demonstrates the use of the df.info method to obtain data type information.', 'Introduces the concept of data types and their representation in the dataset, including objects, int64, and float.']}, {'end': 1057.71, 'start': 852.935, 'title': 'Data types and columns in data frames', 'summary': 'Discusses the process of changing settings in jupyter to view all 85 columns in a data frame, as well as loading a schema data frame to understand the meanings of specific column names and examples of certain data sets.', 'duration': 204.775, 'highlights': ['The process of changing settings in Jupyter to view all 85 columns in a data frame', 'Loading a schema data frame to understand the meanings of specific column names']}], 'duration': 290.961, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA766749.jpg', 'highlights': ['The dataset contains 88,883 rows and 85 columns.', 'Demonstrates the use of the df.info method to obtain data type information.', 'Introduces the concept of data types and their representation in the dataset.', 'The process of changing settings in Jupyter to view all 85 columns in a data frame.', 'Loading a schema data frame to understand the meanings of specific column names.']}, {'end': 1367.979, 'segs': [{'end': 1122.41, 'src': 'heatmap', 'start': 1057.71, 'weight': 0, 'content': [{'end': 1060.132, 'text': 'and i will be showing you how to do that in the next video.', 'start': 1057.71, 'duration': 2.422}, {'end': 1069.298, 'text': "but for now we can see that we can't see all of the rows to the questions that correlate to each column name.", 'start': 1060.972, 'duration': 8.326}, {'end': 1080.266, 'text': 'here remember, we have 85 columns, but for here we can only see the first five, and then we get this ellipses here and then we can see the last five.', 'start': 1069.298, 'duration': 10.968}, {'end': 1087.812, 'text': "so let's set this up so that we can view 85 rows and then reprint this so that we can see all of these.", 'start': 1080.266, 'duration': 7.546}, {'end': 1098.558, 'text': "So, back in the same cell where we set our max columns now let's also add one for rows as well.", 'start': 1088.772, 'duration': 9.786}, {'end': 1100.179, 'text': "So I'm just going to copy and paste that.", 'start': 1098.738, 'duration': 1.441}, {'end': 1106.402, 'text': "But instead of max columns here, I'm going to have this be max rows and I will run that.", 'start': 1100.559, 'duration': 5.843}, {'end': 1109.704, 'text': 'And now we will rerun this schema.', 'start': 1106.982, 'duration': 2.722}, {'end': 1112.565, 'text': 'here and now we can see that.', 'start': 1110.524, 'duration': 2.041}, {'end': 1116.987, 'text': 'we can see all of the columns and the corresponding question text.', 'start': 1112.565, 'duration': 4.422}, {'end': 1122.41, 'text': 'so if you wanted to know what any of these columns mean, then this is how we do it.', 'start': 1116.987, 'duration': 5.423}], 'summary': 'Adjusting settings to view all 85 rows and columns for analysis.', 'duration': 54.855, 'max_score': 1057.71, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA1057710.jpg'}, {'end': 1204.69, 'src': 'embed', 'start': 1175.391, 'weight': 2, 'content': [{'end': 1179.713, 'text': 'But there are a couple of methods that we can use to only see a certain number of rows.', 'start': 1175.391, 'duration': 4.322}, {'end': 1187.478, 'text': "which you'll most likely use a lot just to get an idea that your filters and data frames seem to be working correctly.", 'start': 1180.613, 'duration': 6.865}, {'end': 1195.364, 'text': 'So we can see the first five rows by saying, instead of doing a df here, we can say df.head.', 'start': 1188.018, 'duration': 7.346}, {'end': 1199.887, 'text': 'And if I run that, then we just get the first five rows here.', 'start': 1195.764, 'duration': 4.123}, {'end': 1204.69, 'text': 'Okay, and you can pass in a value if you want to see a certain number of values.', 'start': 1200.507, 'duration': 4.183}], 'summary': 'Methods like df.head can be used to see a certain number of rows, e.g. first five rows, in a data frame.', 'duration': 29.299, 'max_score': 1175.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA1175391.jpg'}, {'end': 1278.7, 'src': 'embed', 'start': 1253.408, 'weight': 3, 'content': [{'end': 1258.65, 'text': "Now before we end here, I'd like to mention the sponsor of this video and that is Brilliant.org.", 'start': 1253.408, 'duration': 5.242}, {'end': 1264.012, 'text': "So in this series, we've been learning about pandas and how to analyze data in Python.", 'start': 1259.89, 'duration': 4.122}, {'end': 1268.435, 'text': 'And Brilliant would be an excellent way to supplement what you learn here with their hands-on courses.', 'start': 1264.393, 'duration': 4.042}, {'end': 1273.998, 'text': 'They have some excellent courses and lessons that do a deep dive on how to think about and analyze data correctly.', 'start': 1268.855, 'duration': 5.143}, {'end': 1278.7, 'text': 'For data analysis fundamentals, I would really recommend checking out their statistics course,', 'start': 1274.358, 'duration': 4.342}], 'summary': "Learn data analysis with brilliant.org's statistics course.", 'duration': 25.292, 'max_score': 1253.408, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA1253408.jpg'}], 'start': 1057.71, 'title': 'Adjusting data display and pandas overview', 'summary': 'Covers adjusting the display of rows and columns in a dataset, including an example of showing all 85 columns and rows, and provides an overview of using pandas for data analysis, mentioning methods for viewing specific rows and columns, as well as sponsorship support details.', 'chapters': [{'end': 1112.565, 'start': 1057.71, 'title': 'Adjusting rows and columns in data display', 'summary': 'Demonstrates how to adjust the display of rows and columns in a dataset, with an example of setting the display to show all 85 columns and rows.', 'duration': 54.855, 'highlights': ['By setting the max rows and columns, the chapter shows how to display all 85 columns and rows in the dataset, addressing the issue of only being able to see the first and last five. This ensures comprehensive visibility of the entire dataset.', 'The demonstration emphasizes the practical application of the code by showcasing the adjustment of the display settings to enable visibility of all 85 columns and rows, providing a practical solution for efficient data exploration.']}, {'end': 1367.979, 'start': 1112.565, 'title': 'Pandas data analysis overview', 'summary': 'Provides an introduction to using pandas for data analysis, including methods for viewing specific rows and columns, and also includes a mention of a sponsor and how to support the channel.', 'duration': 255.414, 'highlights': ['The chapter provides an introduction to using pandas for data analysis.', 'The methods for viewing specific rows and columns include df.head and df.tail.', 'A mention of the sponsor, Brilliant.org, is included with a recommendation for their statistics and machine learning courses.']}], 'duration': 310.269, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/ZyhVh-qRZPA/pics/ZyhVh-qRZPA1057710.jpg', 'highlights': ['By setting the max rows and columns, the chapter shows how to display all 85 columns and rows in the dataset, ensuring comprehensive visibility of the entire dataset.', 'The demonstration emphasizes the practical application of the code by showcasing the adjustment of the display settings to enable visibility of all 85 columns and rows.', 'The methods for viewing specific rows and columns include df.head and df.tail.', 'The chapter provides an introduction to using pandas for data analysis.', 'A mention of the sponsor, Brilliant.org, is included with a recommendation for their statistics and machine learning courses.']}], 'highlights': ['Pandas is a data analysis library in Python, essential for data science, most downloaded package for Python, built on top of NumPy, ensuring great performance.', 'The importance of learning Pandas for data science is emphasized, allowing easy reading and analysis of different types of data such as CSV files and Excel files.', 'The series will cover how to install Pandas, download data, and open it in a Jupyter notebook for coding and analysis.', 'The chapter emphasizes the installation of pandas and Jupyter for data analysis, providing guidance on using virtual environment and installing Python packages, including a demonstration of downloading real-world data from the Stack Overflow developer survey (2019) for analysis with pandas.', "The speaker demonstrates the installation process of pandas and Jupyter, highlighting the use of 'pip' to install the required packages, with a focus on using Jupyter for visualizing data and tables, emphasizing its ease of use and benefits for data visualization.", 'The speaker provides practical guidance on downloading real-world data from the Stack Overflow developer survey, emphasizing the relevance of using real-world data for analysis and providing a link to the survey results page for easy access to the data.', 'The chapter includes practical demonstrations of unzipping the downloaded data, organizing it into a project folder, and renaming the files for easy accessibility within the script, providing a comprehensive overview of the data directory and its contents for better understanding during analysis.', "Pandas' pd.read_csv method simplifies reading CSV files into a data frame, offering efficiency and ease of use.", 'Using native Python to read CSV files requires the use of the CSV module, involving more complex steps compared to the straightforward approach with Pandas.', 'The chapter introduces the concept of data frames as the backbone of pandas, discussing the advantages of visualizing data frames in Jupyter notebooks.', 'The info method gives the number of rows and columns and also all of the data types of all the columns, providing comprehensive information about the data frame.', 'The shape attribute provides the number of rows and columns in a tuple form, as demonstrated with an example of a data frame with 88,000 rows and 85 columns.', 'The dataset contains 88,883 rows and 85 columns.', 'Demonstrates the use of the df.info method to obtain data type information.', 'Introduces the concept of data types and their representation in the dataset.', 'The process of changing settings in Jupyter to view all 85 columns in a data frame.', 'By setting the max rows and columns, the chapter shows how to display all 85 columns and rows in the dataset, ensuring comprehensive visibility of the entire dataset.', 'The demonstration emphasizes the practical application of the code by showcasing the adjustment of the display settings to enable visibility of all 85 columns and rows.', 'The methods for viewing specific rows and columns include df.head and df.tail.', 'The chapter provides an introduction to using pandas for data analysis.', 'A mention of the sponsor, Brilliant.org, is included with a recommendation for their statistics and machine learning courses.']}