title
Intro To Web Scraping With Python

description
In this video we will look at web scraping using Python and the BeautifulSoup library. This is an introductory level tutorial. All beginners welcome Final Code Gist: https://gist.github.com/bradtraversy/f2014a236646ff62dccfc9fe5d469ed5 💖 Become a Patron: Show support & get perks! http://www.patreon.com/traversymedia Website & Udemy Courses http://www.traversymedia.com Follow Traversy Media: http://www.facebook.com/traversymedia http://www.twitter.com/traversymedia http://www.instagram.com/traversymedia

detail
{'title': 'Intro To Web Scraping With Python', 'heatmap': [{'end': 141.129, 'start': 119.617, 'weight': 0.821}, {'end': 238.591, 'start': 186.127, 'weight': 0.832}, {'end': 1072.173, 'start': 1052.173, 'weight': 0.768}, {'end': 1194.682, 'start': 1175.366, 'weight': 0.702}], 'summary': "'intro to web scraping with python' covers web scraping basics, using python and beautiful soup, with methods like find, find all, and select for extracting titles, links, and dates from a sample blog website, and saving them into a csv file.", 'chapters': [{'end': 167.375, 'segs': [{'end': 43.614, 'src': 'embed', 'start': 7.116, 'weight': 0, 'content': [{'end': 8.037, 'text': "Hey, what's going on, guys?", 'start': 7.116, 'duration': 0.921}, {'end': 15.441, 'text': "in this video, we're going to take a look at web scraping and we're going to be using Python, along with a library called Beautiful Soup,", 'start': 8.037, 'duration': 7.404}, {'end': 16.842, 'text': "and we're going to do two things.", 'start': 15.441, 'duration': 1.401}, {'end': 25.427, 'text': "First, we're going to put just a very simple HTML page or HTML structure directly into a variable, like they're doing here in the documentation,", 'start': 16.942, 'duration': 8.485}, {'end': 31.031, 'text': "and we're going to go through some of the methods and some of the ways to basically pull things out of that structure.", 'start': 25.427, 'duration': 5.604}, {'end': 36.032, 'text': 'So they have methods like find and find all, select, find parent.', 'start': 31.451, 'duration': 4.581}, {'end': 43.614, 'text': "And a lot of this stuff, even if you're not familiar with Python, I know a lot of you guys are JavaScript developers, a lot of my subscribers.", 'start': 36.512, 'duration': 7.102}], 'summary': 'Using python and beautiful soup library for web scraping, demonstrating methods like find, find all, select, and find parent.', 'duration': 36.498, 'max_score': 7.116, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc7116.jpg'}, {'end': 119.117, 'src': 'embed', 'start': 86.785, 'weight': 1, 'content': [{'end': 87.805, 'text': 'I want to loop through them.', 'start': 86.785, 'duration': 1.02}, {'end': 92.908, 'text': 'I want to get the title, the link that the title goes to, as well as the date.', 'start': 87.825, 'duration': 5.083}, {'end': 95.829, 'text': 'And I want to put them into a CSV file.', 'start': 93.368, 'duration': 2.461}, {'end': 99.611, 'text': "OK, so we're going to do that using the Python CSV module.", 'start': 95.849, 'duration': 3.762}, {'end': 102.672, 'text': 'And then you could take that file and you could just.', 'start': 100.531, 'duration': 2.141}, {'end': 109.394, 'text': 'you could view it with a spreadsheet, with like Excel or something, or you could import it into a database or you could do whatever you want with it.', 'start': 102.672, 'duration': 6.722}, {'end': 111.354, 'text': "So that's what we'll be doing.", 'start': 110.254, 'duration': 1.1}, {'end': 119.117, 'text': "So I'm going to close or minimize this and I have VS code open with a file called web scraping dot pi.", 'start': 111.474, 'duration': 7.643}], 'summary': 'Using python csv module to extract and store title, link, and date into a csv file for further manipulation.', 'duration': 32.332, 'max_score': 86.785, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc86785.jpg'}, {'end': 154.412, 'src': 'heatmap', 'start': 119.617, 'weight': 2, 'content': [{'end': 126.559, 'text': 'And as far as Python just install Python 3 as you can see I have it installed for say Python 3.', 'start': 119.617, 'duration': 6.942}, {'end': 127.84, 'text': 'dash dash version.', 'start': 126.559, 'duration': 1.281}, {'end': 136.486, 'text': 'dive three, point seven, point zero, and then i have to pick three dash dash version, and i have that installed as well.', 'start': 127.84, 'duration': 8.646}, {'end': 141.129, 'text': "so just go to python dot org, or if you're on a mac, you can use homebrew to install it.", 'start': 136.486, 'duration': 4.643}, {'end': 143.509, 'text': 'uh, whatever you want to do.', 'start': 142.029, 'duration': 1.48}, {'end': 149.011, 'text': 'so in this file i have an html underscore doc variable and i have it set to just a very,', 'start': 143.509, 'duration': 5.502}, {'end': 154.412, 'text': 'very simple web page with head body tags and then two divs in the body.', 'start': 149.011, 'duration': 5.401}], 'summary': "Install python 3.7.0 and 'dash' version for mac using homebrew or python.org.", 'duration': 34.795, 'max_score': 119.617, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc119617.jpg'}], 'start': 7.116, 'title': 'Web scraping with python', 'summary': 'Covers web scraping using python and beautiful soup, with methods like find, find all, select, and find parent. it also involves scraping a sample blog website to extract titles, links, and dates, and saving them into a csv file with python csv module.', 'chapters': [{'end': 167.375, 'start': 7.116, 'title': 'Web scraping with python', 'summary': 'Covers web scraping using python and beautiful soup, including methods like find, find all, select, and find parent, as well as scraping a sample blog website to extract titles, links, and dates and saving them into a csv file with python csv module.', 'duration': 160.259, 'highlights': ['The chapter covers web scraping using Python and Beautiful Soup, including methods like find, find all, select, and find parent. The video discusses web scraping using Python and Beautiful Soup, highlighting the methods like find, find all, select, and find parent for pulling data from an HTML structure.', 'Scraping a sample blog website to extract titles, links, and dates and saving them into a CSV file with Python CSV module. The chapter demonstrates scraping a sample blog website to extract titles, links, and dates, and then saving them into a CSV file using the Python CSV module.', 'Using Python 3 and installing it from python.org or Homebrew on Mac. The author mentions using Python 3, installing it from python.org, or using Homebrew on Mac for Python installation.']}], 'duration': 160.259, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc7116.jpg', 'highlights': ['The chapter covers web scraping using Python and Beautiful Soup, including methods like find, find all, select, and find parent.', 'Scraping a sample blog website to extract titles, links, and dates and saving them into a CSV file with Python CSV module.', 'Using Python 3 and installing it from python.org or Homebrew on Mac.']}, {'end': 459.994, 'segs': [{'end': 242.352, 'src': 'heatmap', 'start': 168.155, 'weight': 1, 'content': [{'end': 173.459, 'text': "So this is what we're going to use to just kind of experiment with the different methods and the way of selecting things.", 'start': 168.155, 'duration': 5.304}, {'end': 176.701, 'text': 'So first things first, we need to install Beautiful Soup.', 'start': 174.059, 'duration': 2.642}, {'end': 177.662, 'text': "So we're going to use pip.", 'start': 176.721, 'duration': 0.941}, {'end': 179.723, 'text': "So I'm going to say pip3 install.", 'start': 177.802, 'duration': 1.921}, {'end': 185.207, 'text': 'It may just be pip on your system, depending on how you installed Python.', 'start': 180.023, 'duration': 5.184}, {'end': 188.589, 'text': "So let's say install bs4.", 'start': 186.127, 'duration': 2.462}, {'end': 189.85, 'text': "That's the name of the package.", 'start': 188.629, 'duration': 1.221}, {'end': 193.512, 'text': "And mine's going to go really quick because I already have it installed.", 'start': 190.39, 'duration': 3.122}, {'end': 195.654, 'text': 'Whoops, let me just clear that out.', 'start': 193.532, 'duration': 2.122}, {'end': 198.458, 'text': 'So we need to bring that in.', 'start': 196.977, 'duration': 1.481}, {'end': 210.102, 'text': "So at the top of the file, let's say from BS4, we want to import beautiful soup and then such a weird name.", 'start': 198.918, 'duration': 11.184}, {'end': 213.383, 'text': 'And then down here we want to initialize it.', 'start': 210.762, 'duration': 2.621}, {'end': 216.984, 'text': "So I'm going to create a variable called soup and set it to beautiful soup.", 'start': 213.403, 'duration': 3.581}, {'end': 223.386, 'text': "then the first parameter is going to be what to scrape, and in this case it's just a local variable.", 'start': 217.784, 'duration': 5.602}, {'end': 228.648, 'text': "when we get to scraping the website, it will make a request and we'll get a response, and that's what we'll put in.", 'start': 223.386, 'duration': 5.262}, {'end': 232.189, 'text': "but for now it's just going to be this variable.", 'start': 228.648, 'duration': 3.541}, {'end': 238.591, 'text': 'and then, second parameter, is going to just be HTML dot parser inside of a string.', 'start': 232.189, 'duration': 6.402}, {'end': 242.352, 'text': 'alright, now we can select things directly.', 'start': 238.591, 'duration': 3.761}], 'summary': 'Experimenting with beautiful soup for selecting and scraping content from websites.', 'duration': 74.197, 'max_score': 168.155, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc168155.jpg'}, {'end': 270.011, 'src': 'embed', 'start': 217.784, 'weight': 0, 'content': [{'end': 223.386, 'text': "then the first parameter is going to be what to scrape, and in this case it's just a local variable.", 'start': 217.784, 'duration': 5.602}, {'end': 228.648, 'text': "when we get to scraping the website, it will make a request and we'll get a response, and that's what we'll put in.", 'start': 223.386, 'duration': 5.262}, {'end': 232.189, 'text': "but for now it's just going to be this variable.", 'start': 228.648, 'duration': 3.541}, {'end': 238.591, 'text': 'and then, second parameter, is going to just be HTML dot parser inside of a string.', 'start': 232.189, 'duration': 6.402}, {'end': 242.352, 'text': 'alright, now we can select things directly.', 'start': 238.591, 'duration': 3.761}, {'end': 251.492, 'text': 'so if I were to print out soup dot body and go over here and run the file now to run it,', 'start': 242.352, 'duration': 9.14}, {'end': 259.363, 'text': 'you just want to do run python in my case python three and then the name of the file, which is web scraping.', 'start': 251.492, 'duration': 7.871}, {'end': 263.047, 'text': "that's what i called it dot pi, And it gives us the body.", 'start': 259.363, 'duration': 3.684}, {'end': 267.249, 'text': 'OK, so soup dot body just gives us the opening body to the closing.', 'start': 263.067, 'duration': 4.182}, {'end': 270.011, 'text': 'Now we can also grab like the head.', 'start': 267.829, 'duration': 2.182}], 'summary': 'Introduction to web scraping using python, with example code demonstrating how to scrape and parse html content.', 'duration': 52.227, 'max_score': 217.784, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc217784.jpg'}, {'end': 437.599, 'src': 'embed', 'start': 401.167, 'weight': 3, 'content': [{'end': 417.278, 'text': "we want to select stuff by like IDs and classes, so we could do L equals, soup dot, find ID equals, let's do section dash, section one,", 'start': 401.167, 'duration': 16.111}, {'end': 418.319, 'text': 'and if we run that.', 'start': 417.278, 'duration': 1.041}, {'end': 420.821, 'text': 'We get section one.', 'start': 419.66, 'duration': 1.161}, {'end': 423.823, 'text': "Alright, now let's say we want to do it by a class.", 'start': 420.841, 'duration': 2.982}, {'end': 431.969, 'text': "So I'm going to say class, now the ul has a class of items, so I'm going to grab that, class equals items.", 'start': 423.843, 'duration': 8.126}, {'end': 437.599, 'text': "so let's go ahead and let's run that.", 'start': 433.576, 'duration': 4.023}], 'summary': 'Selecting elements by ids and classes using python beautiful soup.', 'duration': 36.432, 'max_score': 401.167, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc401167.jpg'}], 'start': 168.155, 'title': 'Web scraping basics', 'summary': 'Covers web scraping basics using python, including beautiful soup installation, package import, and initialization for web scraping. additionally, it covers selecting elements directly, using find and find all methods, and selecting elements by ids, classes, and attribute values.', 'chapters': [{'end': 216.984, 'start': 168.155, 'title': 'Web scraping with beautiful soup', 'summary': 'Covers the installation of beautiful soup using pip, importing the package, and initializing it for web scraping.', 'duration': 48.829, 'highlights': ['The chapter covers the installation of Beautiful Soup using pip, importing the package, and initializing it for web scraping.', 'The installation of Beautiful Soup using pip, with a mention of using pip3 or pip depending on the system, is explained.', 'The process of importing the Beautiful Soup package and initializing it for web scraping is demonstrated.']}, {'end': 459.994, 'start': 217.784, 'title': 'Web scraping basics', 'summary': 'Covers the basics of web scraping using python, including selecting elements directly, using methods like find and find all, and selecting elements by ids, classes, and attribute values.', 'duration': 242.21, 'highlights': ['The chapter covers the basics of web scraping using Python It introduces the concept of web scraping and highlights the use of Python for this purpose.', 'Selecting elements directly using methods like find and find all The chapter demonstrates selecting elements directly from a webpage using methods like find and find all, providing a practical approach to web scraping.', 'Selecting elements by IDs, classes, and attribute values The chapter explains how to select elements by IDs, classes, and attribute values, showcasing the versatility of web scraping techniques in Python.']}], 'duration': 291.839, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc168155.jpg', 'highlights': ['The chapter covers the basics of web scraping using Python, introducing the concept and highlighting the use of Python for this purpose.', 'The process of importing the Beautiful Soup package and initializing it for web scraping is demonstrated, including the installation using pip and mentioning the use of pip3 or pip depending on the system.', 'Selecting elements directly from a webpage using methods like find and find all is demonstrated, providing a practical approach to web scraping.', 'The chapter explains how to select elements by IDs, classes, and attribute values, showcasing the versatility of web scraping techniques in Python.']}, {'end': 678.6, 'segs': [{'end': 612.162, 'src': 'embed', 'start': 459.994, 'weight': 0, 'content': [{'end': 469.642, 'text': "so, for instance, our h3 up here has a an html5 data attribute of data, hello, and then it's equal to height.", 'start': 459.994, 'duration': 9.648}, {'end': 484.213, 'text': 'so if we wanted to select by that, we could say l equals, soup, dot, find, and we could say adders, attrs, equals, And then adjacent object,', 'start': 469.642, 'duration': 14.571}, {'end': 492.219, 'text': 'where we say data, dash hello, which was the attribute, and then a colon and then the value of high.', 'start': 484.213, 'duration': 8.006}, {'end': 496.922, 'text': 'OK, so if we go ahead and run that, we get our H3.', 'start': 492.239, 'duration': 4.683}, {'end': 504.967, 'text': "Now, if this was something different, like high one and we try to run it, we're going to get back none because there is no this doesn't exist.", 'start': 497.502, 'duration': 7.465}, {'end': 507.189, 'text': 'OK, but we can do that as well.', 'start': 505.728, 'duration': 1.461}, {'end': 509.652, 'text': 'All right.', 'start': 509.232, 'duration': 0.42}, {'end': 512.274, 'text': 'so we also have select.', 'start': 509.652, 'duration': 2.622}, {'end': 515.996, 'text': 'now select allows us to select things by CSS selectors.', 'start': 512.274, 'duration': 3.722}, {'end': 519.018, 'text': "it's actually very similar to jQuery.", 'start': 515.996, 'duration': 3.022}, {'end': 523.001, 'text': "so let's say we want to grab the idea of section one.", 'start': 519.018, 'duration': 3.983}, {'end': 527.604, 'text': "we could say L equals and let's do soup dots.", 'start': 523.001, 'duration': 4.603}, {'end': 536.209, 'text': 'select and simply pass in a number sign and then section dash one.', 'start': 529.202, 'duration': 7.007}, {'end': 539.432, 'text': "OK so we'll run that see what we get.", 'start': 536.229, 'duration': 3.203}, {'end': 546.038, 'text': "Now it gives us section one the div but it's in a list select is always going to return it inside of a list.", 'start': 540.052, 'duration': 5.986}, {'end': 548.08, 'text': "So remember that even if there's only one.", 'start': 546.218, 'duration': 1.862}, {'end': 552.504, 'text': 'So if you wanted to get just the one you would add on an index.', 'start': 548.601, 'duration': 3.903}, {'end': 554.106, 'text': 'of zero.', 'start': 553.465, 'duration': 0.641}, {'end': 561.612, 'text': "OK So if I do that and we run it now it's not within a list because we chose the first index.", 'start': 554.126, 'duration': 7.486}, {'end': 576.319, 'text': "OK And if we wanted to select like let's say a class so we could say dot item and all the allies have the class of items.", 'start': 563.331, 'duration': 12.988}, {'end': 579.201, 'text': 'So if we select the zero index that will be item one.', 'start': 576.379, 'duration': 2.822}, {'end': 584.084, 'text': "So let's run that and you can see we get the ally with the item one.", 'start': 579.721, 'duration': 4.363}, {'end': 586.039, 'text': 'All right.', 'start': 585.779, 'duration': 0.26}, {'end': 589.342, 'text': "Now, usually you're going to want to get the data within these tags.", 'start': 586.099, 'duration': 3.243}, {'end': 593.686, 'text': "You're not really going to care about the HTML like this ally in class and stuff.", 'start': 589.362, 'duration': 4.324}, {'end': 595.268, 'text': "What you're going to want is this.", 'start': 593.746, 'duration': 1.522}, {'end': 600.853, 'text': 'OK, so for that, we have a method called get text.', 'start': 595.288, 'duration': 5.565}, {'end': 612.162, 'text': "so let's say L equals soup dot find, and I'm sorry, not find.", 'start': 602.454, 'duration': 9.708}], 'summary': 'Using beautiful soup to select and extract data from html using css selectors and methods like get_text.', 'duration': 152.168, 'max_score': 459.994, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc459994.jpg'}, {'end': 678.6, 'src': 'embed', 'start': 644.071, 'weight': 1, 'content': [{'end': 663.648, 'text': "so if we say let's do four item, we'll do for item in soup, dot select, we use select um dot, item Okay, which, remember, select gives us a list.", 'start': 644.071, 'duration': 19.577}, {'end': 672.275, 'text': "It'll give us a list of the items which are looping through and then let's print out the item, but let's add on to it, get text.", 'start': 663.708, 'duration': 8.567}, {'end': 678.6, 'text': "so we only get the text or the data and we'll run that and we get item one through item five.", 'start': 672.275, 'duration': 6.325}], 'summary': 'Using soup.select(), we extracted text data from four items, resulting in items one through five.', 'duration': 34.529, 'max_score': 644.071, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc644071.jpg'}], 'start': 459.994, 'title': 'Using beautiful soup for web scraping', 'summary': 'Covers using beautiful soup to select html elements by html5 data attribute and css selectors, utilizing soup.select() to grab specific elements, and extracting text data using beautifulsoup, showcasing methods like get_text() and select().', 'chapters': [{'end': 519.018, 'start': 459.994, 'title': 'Using beautiful soup to select html elements', 'summary': 'Explains how to use beautiful soup to select html elements by html5 data attribute and css selectors, demonstrating it with examples.', 'duration': 59.024, 'highlights': ["Beautiful Soup can be used to select HTML elements by HTML5 data attribute, such as selecting by 'data-hello' attribute and value 'height'.", "The 'select' method in Beautiful Soup allows selecting elements by CSS selectors, similar to jQuery."]}, {'end': 579.201, 'start': 519.018, 'title': 'Web scraping: selecting elements', 'summary': 'Explains how to use soup.select() to grab specific elements from a webpage, highlighting the method to select by id or class and the use of index to retrieve specific elements.', 'duration': 60.183, 'highlights': ['Using soup.select() with a number sign and section ID returns the div inside a list.', 'Adding an index to soup.select() allows for selecting specific elements, demonstrated by selecting the first index to avoid returning the element inside a list.', 'Demonstrating the selection of elements by class using soup.select() and retrieving a specific element by selecting the zero index.']}, {'end': 678.6, 'start': 579.721, 'title': 'Web scraping text extraction', 'summary': 'Demonstrates the use of beautifulsoup to extract text data from html tags, showcasing methods like get_text() and select(), resulting in the retrieval of specific data and the ability to loop through and extract text from multiple items.', 'duration': 98.879, 'highlights': ["The chapter showcases the usage of BeautifulSoup's get_text() method to extract specific text data from HTML tags, providing a clear example of retrieving 'item one' from the HTML content.", "It demonstrates the implementation of BeautifulSoup's select() method to retrieve a list of items matching a specific class, enabling the extraction of text data from multiple items such as 'item one' through 'item five'."]}], 'duration': 218.606, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc459994.jpg', 'highlights': ["The chapter showcases the usage of BeautifulSoup's get_text() method to extract specific text data from HTML tags, providing a clear example of retrieving 'item one' from the HTML content.", "It demonstrates the implementation of BeautifulSoup's select() method to retrieve a list of items matching a specific class, enabling the extraction of text data from multiple items such as 'item one' through 'item five'.", 'Adding an index to soup.select() allows for selecting specific elements, demonstrated by selecting the first index to avoid returning the element inside a list.', 'Using soup.select() with a number sign and section ID returns the div inside a list.', "The 'select' method in Beautiful Soup allows selecting elements by CSS selectors, similar to jQuery.", "Beautiful Soup can be used to select HTML elements by HTML5 data attribute, such as selecting by 'data-hello' attribute and value 'height'."]}, {'end': 959.096, 'segs': [{'end': 835.755, 'src': 'embed', 'start': 806.283, 'weight': 1, 'content': [{'end': 809.345, 'text': 'What we could do is add another next sibling.', 'start': 806.283, 'duration': 3.062}, {'end': 816.698, 'text': 'Oops, next sibling and run that and then we get the image.', 'start': 809.365, 'duration': 7.333}, {'end': 829.049, 'text': 'but instead of doing this double next sibling, we can actually use a method called find underscore, next underscore sibling with parentheses,', 'start': 816.698, 'duration': 12.351}, {'end': 835.755, 'text': "and then, if we run that, we get the image, because that's going to find the next actual element, not not counting the line break.", 'start': 829.049, 'duration': 6.706}], 'summary': "Using 'find_next_sibling' method to efficiently locate the next actual element and avoid counting line breaks.", 'duration': 29.472, 'max_score': 806.283, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc806283.jpg'}, {'end': 959.096, 'src': 'embed', 'start': 909.388, 'weight': 0, 'content': [{'end': 914.309, 'text': 'OK, one more thing I want to show you is that if we want to get the sibling, for instance,', 'start': 909.388, 'duration': 4.921}, {'end': 924.693, 'text': 'if if we are at the age three and we want to get the next paragraph not just the next sibling but the next paragraph what we could do is say soup,', 'start': 914.309, 'duration': 10.384}, {'end': 927.394, 'text': "dot, find, and let's just get the age three.", 'start': 924.693, 'duration': 2.701}, {'end': 932.187, 'text': 'and we could do dot, find next sibling.', 'start': 928.826, 'duration': 3.361}, {'end': 940.41, 'text': 'but we could pass in a parameter of paragraph or p, and if we run that, we get the paragraph okay.', 'start': 932.187, 'duration': 8.223}, {'end': 943.351, 'text': 'so you can, you can specify what you want.', 'start': 940.41, 'duration': 2.941}, {'end': 945.512, 'text': "So that's it.", 'start': 944.851, 'duration': 0.661}, {'end': 946.472, 'text': 'All right.', 'start': 946.152, 'duration': 0.32}, {'end': 952.214, 'text': "So now that we know I mean there's a lot more that that is in the library, but these are the, I guess,", 'start': 946.612, 'duration': 5.602}, {'end': 956.115, 'text': 'the main methods and stuff like that that you would use.', 'start': 952.214, 'duration': 3.901}, {'end': 959.096, 'text': "So now we're going to actually do some scraping.", 'start': 956.636, 'duration': 2.46}], 'summary': 'Demonstrated using beautifulsoup to extract specific sibling elements and next paragraphs, highlighting the flexibility and control offered by the library.', 'duration': 49.708, 'max_score': 909.388, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc909388.jpg'}], 'start': 679.621, 'title': 'Web scraping basics and dom navigation', 'summary': 'Covers the basics of web scraping navigation including accessing webpage contents, identifying elements, and dealing with line breaks using python and beautifulsoup. it also discusses navigating dom elements using methods like next_sibling and find_next_sibling, and provides an overview of web scraping methods such as find previous sibling, find parent, and find next sibling with examples.', 'chapters': [{'end': 773.577, 'start': 679.621, 'title': 'Web scraping navigation basics', 'summary': 'Discusses the basics of web scraping navigation, including accessing the contents of a webpage, identifying elements, and dealing with line breaks within the structure, utilizing python and beautifulsoup.', 'duration': 93.956, 'highlights': ['The chapter focuses on understanding web scraping navigation basics, including accessing and manipulating webpage contents using Python and BeautifulSoup. It emphasizes the importance of understanding how to access and manipulate webpage contents using Python and BeautifulSoup.', 'The chapter explores the concept of line breaks within the webpage structure and how they are treated as items in a list, affecting access to desired elements. It highlights the challenge of dealing with line breaks within the webpage structure and their impact on accessing specific elements.', 'The chapter emphasizes the need to use appropriate indexing, such as the one index, to access the desired elements within the webpage contents. It stresses the importance of using correct indexing, like the one index, to access the intended elements within the webpage contents.']}, {'end': 835.755, 'start': 774.097, 'title': 'Navigating dom elements', 'summary': 'Discusses how to navigate through dom elements using methods like next_sibling and find_next_sibling, demonstrating an example of how to access a specific element with these methods.', 'duration': 61.658, 'highlights': ["Using 'find_next_sibling' method is more efficient than chaining 'next_sibling' multiple times, as demonstrated by the example with accessing the image element.", "Explaining how to access the next sibling element in the DOM using 'next_sibling' method and the issue with line breaks not being counted as elements."]}, {'end': 959.096, 'start': 836.776, 'title': 'Web scraping methods overview', 'summary': 'Covers methods in web scraping with beautifulsoup including find previous sibling, find parent, and find next sibling, with examples demonstrating their functionality and output.', 'duration': 122.32, 'highlights': ['The chapter covers methods in web scraping with BeautifulSoup, including find previous sibling, find parent, and find next sibling, with examples demonstrating their functionality and output.', 'Demonstrated the use of find_previous_sibling to retrieve the previous sibling section, showcasing its practical application in web scraping.', 'Illustrated the functionality of find_parent to retrieve the parent element, exemplified with the retrieval of the UL element.', 'Explained the use of find_next_sibling to specify and retrieve the next paragraph, showcasing the flexibility of specifying the element type to be retrieved.']}], 'duration': 279.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc679621.jpg', 'highlights': ['The chapter covers methods in web scraping with BeautifulSoup, including find previous sibling, find parent, and find next sibling, with examples demonstrating their functionality and output.', "Using 'find_next_sibling' method is more efficient than chaining 'next_sibling' multiple times, as demonstrated by the example with accessing the image element.", "Explaining how to access the next sibling element in the DOM using 'next_sibling' method and the issue with line breaks not being counted as elements.", 'The chapter focuses on understanding web scraping navigation basics, including accessing and manipulating webpage contents using Python and BeautifulSoup. It emphasizes the importance of understanding how to access and manipulate webpage contents using Python and BeautifulSoup.']}, {'end': 1121.142, 'segs': [{'end': 995.688, 'src': 'embed', 'start': 959.156, 'weight': 1, 'content': [{'end': 966.259, 'text': "So I'm going to create a new file and I'm going to call this blog scraping dot pie.", 'start': 959.156, 'duration': 7.103}, {'end': 974.871, 'text': "OK now for this we're going to need to import requests.", 'start': 968.906, 'duration': 5.965}, {'end': 981.556, 'text': "OK now if you get a message that says like you don't have requests or whatever just go ahead and use pip.", 'start': 974.891, 'duration': 6.665}, {'end': 987.962, 'text': 'So pip install requests and that should take care of that.', 'start': 981.777, 'duration': 6.185}, {'end': 991.344, 'text': 'Next thing we need to do is bring in from B.S.', 'start': 989.022, 'duration': 2.322}, {'end': 995.688, 'text': 'for we need to import beautiful suit.', 'start': 992.605, 'duration': 3.083}], 'summary': "Creating a new file 'blog scraping.py' requires importing 'requests' and 'beautiful soup'.", 'duration': 36.532, 'max_score': 959.156, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc959156.jpg'}, {'end': 1076.055, 'src': 'heatmap', 'start': 1052.173, 'weight': 0, 'content': [{'end': 1054.315, 'text': "So that'll get the web page and put it in here.", 'start': 1052.173, 'duration': 2.142}, {'end': 1058.438, 'text': 'And then we also need to just pass an HTML dot parser.', 'start': 1054.835, 'duration': 3.603}, {'end': 1060.488, 'text': 'All right.', 'start': 1060.188, 'duration': 0.3}, {'end': 1067.831, 'text': "So now let's let's take a look at the structure, because whenever you scrape a web page,", 'start': 1060.848, 'duration': 6.983}, {'end': 1072.173, 'text': "you need to get familiar with the structure of the of what you're scraping.", 'start': 1067.831, 'duration': 4.342}, {'end': 1074.134, 'text': 'So we want these posts.', 'start': 1072.733, 'duration': 1.401}, {'end': 1076.055, 'text': 'Right So open Chrome tools.', 'start': 1074.234, 'duration': 1.821}], 'summary': 'Scraping web page for posts using html parser.', 'duration': 49.26, 'max_score': 1052.173, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc1052173.jpg'}], 'start': 959.156, 'title': 'Web scraping with python', 'summary': 'Explains how to use python for web scraping, covering import of libraries, making a get request, using beautiful soup for html parsing, and understanding web page structure for scraping.', 'chapters': [{'end': 1121.142, 'start': 959.156, 'title': 'Web scraping with python', 'summary': 'Explains how to use python for web scraping, including importing libraries, making a get request, using beautiful soup to parse html, and understanding the structure of a web page for scraping.', 'duration': 161.986, 'highlights': ['The chapter explains the process of web scraping using Python, emphasizing the use of requests and Beautiful Soup libraries. It discusses the process of importing the requests library and using pip for installation, as well as importing the Beautiful Soup library for parsing HTML.', 'The chapter covers the steps for making a GET request and parsing HTML using Beautiful Soup. It outlines the process of setting a response variable equal to requests and making a GET request with a specific URL, as well as initializing Beautiful Soup to parse the retrieved web page.', 'The chapter emphasizes the importance of understanding the structure of the web page for effective scraping. It details the process of using Chrome tools to analyze the structure of a web page, identifying the classes and tags needed for scraping posts, titles, links, author information, dates, and post text.']}], 'duration': 161.986, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc959156.jpg', 'highlights': ['The chapter emphasizes the importance of understanding the structure of the web page for effective scraping.', 'The chapter explains the process of web scraping using Python, emphasizing the use of requests and Beautiful Soup libraries.', 'The chapter covers the steps for making a GET request and parsing HTML using Beautiful Soup.']}, {'end': 1547.468, 'segs': [{'end': 1194.682, 'src': 'heatmap', 'start': 1121.142, 'weight': 4, 'content': [{'end': 1132.451, 'text': "let's go back to VS code and let's create a variable called posts and let's say soup dot and we're going to, we're going to use find all.", 'start': 1121.142, 'duration': 11.309}, {'end': 1142.566, 'text': "OK so fine we could use select as well but I'm just going to use find all where class is equal to post dash preview.", 'start': 1133.903, 'duration': 8.663}, {'end': 1151.349, 'text': 'OK because each one has each one has a class of post dash preview.', 'start': 1143.166, 'duration': 8.183}, {'end': 1155.111, 'text': "So we want to get a list of posts and that's what we're doing.", 'start': 1151.87, 'duration': 3.241}, {'end': 1159.672, 'text': 'We could use select soup dot select and then dot post preview.', 'start': 1155.151, 'duration': 4.521}, {'end': 1160.873, 'text': "It doesn't really matter though.", 'start': 1159.712, 'duration': 1.161}, {'end': 1175.146, 'text': "And then down here let's do a for loop, let's say for post in posts and then let's just print out post and see what we get.", 'start': 1161.753, 'duration': 13.393}, {'end': 1180.831, 'text': "So we'll go ahead and minimize this and let's run the file.", 'start': 1175.366, 'duration': 5.465}, {'end': 1186.316, 'text': "This time we're not running web scraping.py, we're running blog scraping.", 'start': 1181.311, 'duration': 5.005}, {'end': 1189.54, 'text': 'And there we go.', 'start': 1188.94, 'duration': 0.6}, {'end': 1191.801, 'text': 'So it just basically gives us all the posts.', 'start': 1189.56, 'duration': 2.241}, {'end': 1194.682, 'text': 'OK, all those dibs with the class of post preview.', 'start': 1192.221, 'duration': 2.461}], 'summary': 'Using beautifulsoup in python to extract and print a list of posts from a webpage.', 'duration': 68.398, 'max_score': 1121.142, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc1121142.jpg'}, {'end': 1427.998, 'src': 'embed', 'start': 1396.984, 'weight': 1, 'content': [{'end': 1399.685, 'text': 'Now we need to create headers in a CSV file.', 'start': 1396.984, 'duration': 2.701}, {'end': 1401.906, 'text': 'You have the headers up top like title.', 'start': 1399.725, 'duration': 2.181}, {'end': 1404.387, 'text': 'What do we have title link and date.', 'start': 1402.366, 'duration': 2.021}, {'end': 1411.05, 'text': "So I'm actually going to create a variable called headers and set it to a list of strings.", 'start': 1404.927, 'duration': 6.123}, {'end': 1417.573, 'text': 'So first will be title link and date.', 'start': 1411.15, 'duration': 6.423}, {'end': 1421.772, 'text': 'Now we just want to write a row using the writer.', 'start': 1419.069, 'duration': 2.703}, {'end': 1425.275, 'text': 'So we can say csv underscore writer.', 'start': 1421.872, 'duration': 3.403}, {'end': 1427.998, 'text': 'And then it has a method called write row.', 'start': 1425.776, 'duration': 2.222}], 'summary': 'Creating csv file headers: title, link, date using csv_writer.write_row', 'duration': 31.014, 'max_score': 1396.984, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc1396984.jpg'}, {'end': 1525.309, 'src': 'embed', 'start': 1501.857, 'weight': 0, 'content': [{'end': 1510.523, 'text': 'So we wrote the headers right here title, link, date and then we went through each post and we wrote a row for each one with the title,', 'start': 1501.857, 'duration': 8.666}, {'end': 1511.443, 'text': 'the link and the date.', 'start': 1510.523, 'duration': 0.92}, {'end': 1515.166, 'text': 'And we could import this to a database or do whatever we want with it.', 'start': 1511.764, 'duration': 3.402}, {'end': 1518.427, 'text': 'So we have successfully scraped a website.', 'start': 1515.606, 'duration': 2.821}, {'end': 1519.847, 'text': 'All right.', 'start': 1518.447, 'duration': 1.4}, {'end': 1521.428, 'text': "So that's it, guys.", 'start': 1519.987, 'duration': 1.441}, {'end': 1525.309, 'text': 'Hopefully you enjoyed this and, you know, experiment with it.', 'start': 1521.468, 'duration': 3.841}], 'summary': 'Successfully scraped website data for database import.', 'duration': 23.452, 'max_score': 1501.857, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc1501857.jpg'}], 'start': 1121.142, 'title': 'Python web scraping', 'summary': 'Covers python web scraping on a blog with beautifulsoup, including extracting post titles, links, and dates, and writing them into a csv file, achieving successful scraping.', 'chapters': [{'end': 1189.54, 'start': 1121.142, 'title': 'Blog scraping with python', 'summary': "Focuses on using python to perform web scraping on a blog, specifically creating a variable to store posts, using beautifulsoup's find_all method to retrieve a list of posts, and iterating through the list using a for loop to print out each post.", 'duration': 68.398, 'highlights': ["Creating a variable 'posts' and using BeautifulSoup's find_all method to retrieve a list of posts, by specifying the class as 'post-preview'.", 'Utilizing a for loop to iterate through the list of posts and print out each post.']}, {'end': 1547.468, 'start': 1189.56, 'title': 'Web scraping with python', 'summary': 'Covers web scraping using python, demonstrating the process of extracting post titles, links, and dates, and writing them into a csv file, ultimately achieving successful scraping of a website.', 'duration': 357.908, 'highlights': ['The chapter covers web scraping using Python Demonstrates the process of web scraping using Python to extract data from a website.', 'Extracting post titles, links, and dates Details the process of extracting post titles, links, and dates from a website, showcasing the specific steps involved in the extraction process.', 'Writing data into a CSV file Illustrates the steps involved in writing the extracted data into a CSV file, including the creation of headers and writing rows for each post.', 'Successful scraping of a website The demonstration concludes with successful scraping of a website, showcasing the practical application of web scraping with Python.']}], 'duration': 426.326, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/4UcqECQe5Kc/pics/4UcqECQe5Kc1121142.jpg', 'highlights': ['Successful scraping of a website The demonstration concludes with successful scraping of a website, showcasing the practical application of web scraping with Python.', 'Writing data into a CSV file Illustrates the steps involved in writing the extracted data into a CSV file, including the creation of headers and writing rows for each post.', 'Extracting post titles, links, and dates Details the process of extracting post titles, links, and dates from a website, showcasing the specific steps involved in the extraction process.', 'The chapter covers web scraping using Python Demonstrates the process of web scraping using Python to extract data from a website.', "Creating a variable 'posts' and using BeautifulSoup's find_all method to retrieve a list of posts, by specifying the class as 'post-preview'.", 'Utilizing a for loop to iterate through the list of posts and print out each post.']}], 'highlights': ['The chapter covers web scraping using Python and Beautiful Soup, including methods like find, find all, select, and find parent.', 'The chapter covers methods in web scraping with BeautifulSoup, including find previous sibling, find parent, and find next sibling, with examples demonstrating their functionality and output.', "The chapter showcases the usage of BeautifulSoup's get_text() method to extract specific text data from HTML tags, providing a clear example of retrieving 'item one' from the HTML content.", 'The chapter emphasizes the importance of understanding the structure of the web page for effective scraping.', 'The chapter explains the process of web scraping using Python, emphasizing the use of requests and Beautiful Soup libraries.', 'Successful scraping of a website The demonstration concludes with successful scraping of a website, showcasing the practical application of web scraping with Python.']}