title
Intro To Web Scraping With Node.js & Cheerio
description
In this video we will take a look at the Node.js library, Cheerio which is a jQuery like tool for the server used in web scraping. This is similar to the python video that I did on web scraping
Sponsor:
DevMountain Bootcamp - https://goo.gl/6q0dEa
Code For This Project:
https://gist.github.com/bradtraversy/7b68f19b0c502f1fbd0aa9bc4cfe793d#file-node_cheerio_scraping-js
💖 Become a Patron: Show support & get perks!
http://www.patreon.com/traversymedia
Website & Udemy Courses
http://www.traversymedia.com
Follow Traversy Media:
https://www.facebook.com/traversymedia
https://www.twitter.com/traversymedia
detail
{'title': 'Intro To Web Scraping With Node.js & Cheerio', 'heatmap': [{'end': 174.948, 'start': 131.713, 'weight': 0.967}, {'end': 271.845, 'start': 249.522, 'weight': 0.764}, {'end': 354.113, 'start': 339.319, 'weight': 0.73}], 'summary': 'Tutorial series provides a comprehensive introduction to web scraping with node.js and cheerio, covering processes such as data extraction, csv file storage, error checking, html retrieval, element selection, jquery methods, data processing, and writing to csv files with ethical considerations.', 'chapters': [{'end': 219.296, 'segs': [{'end': 45.144, 'src': 'embed', 'start': 7.059, 'weight': 0, 'content': [{'end': 8.921, 'text': 'this video is sponsored by devmountain.', 'start': 7.059, 'duration': 1.862}, {'end': 12.985, 'text': "if you're interested in learning web development, ios or ux design,", 'start': 8.921, 'duration': 4.064}, {'end': 18.972, 'text': 'devmountain is a 12-week design and development boot camp intended to get you a full-time position in the industry.', 'start': 12.985, 'duration': 5.987}, {'end': 22.576, 'text': 'to learn more, visit devmountain.com or click the link in the description below.', 'start': 18.972, 'duration': 3.604}, {'end': 23.905, 'text': "Hey, what's going on, guys?", 'start': 23.004, 'duration': 0.901}, {'end': 30.07, 'text': 'So about a week ago I did a video on web scraping with Python and a library called Beautiful Soup.', 'start': 24.025, 'duration': 6.045}, {'end': 31.852, 'text': 'And a lot of you guys like that.', 'start': 30.591, 'duration': 1.261}, {'end': 35.976, 'text': 'You like the fact that I used Python and I will be doing more Python tutorials.', 'start': 31.872, 'duration': 4.104}, {'end': 42.021, 'text': 'But I also got a bunch of comments asking about Node.js and, in particular, a library called Cheerio,', 'start': 36.256, 'duration': 5.765}, {'end': 45.144, 'text': 'which is used for web scraping using Node and JavaScript.', 'start': 42.021, 'duration': 3.123}], 'summary': 'Devmountain offers 12-week boot camp for web development, ios, and ux design with the goal of securing full-time positions in the industry.', 'duration': 38.085, 'max_score': 7.059, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE7059.jpg'}, {'end': 109.754, 'src': 'embed', 'start': 67.834, 'weight': 1, 'content': [{'end': 76.637, 'text': 'um, dot children, dot parent, all that stuff to kind of traverse the dom and and pick and choose what we want and get that data.', 'start': 67.834, 'duration': 8.803}, {'end': 85.381, 'text': 'so what i want to do in this video is i want to scrape the sample blog, just like we did in the python video, and loop through the post,', 'start': 76.637, 'duration': 8.744}, {'end': 91.103, 'text': 'get the title, the link and the date and put them all into a csv file.', 'start': 85.381, 'duration': 5.722}, {'end': 96.685, 'text': "okay, and we'll be using the fs or the file system module that comes with node.js for that,", 'start': 91.103, 'duration': 5.582}, {'end': 100.607, 'text': "And we'll just kind of experiment a little bit and look at some of the methods and so on.", 'start': 97.245, 'duration': 3.362}, {'end': 105.711, 'text': "I'm not going to spend too much time on it because it's basically just jQuery and it's really late.", 'start': 100.627, 'duration': 5.084}, {'end': 109.754, 'text': "And I'm actually going on vacation tomorrow for a couple of days with my family.", 'start': 105.891, 'duration': 3.863}], 'summary': 'Scraping sample blog to extract post title, link, and date into a csv file using node.js and the file system module.', 'duration': 41.92, 'max_score': 67.834, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE67834.jpg'}, {'end': 185.292, 'src': 'heatmap', 'start': 131.713, 'weight': 4, 'content': [{'end': 134.816, 'text': "First thing I'm going to do is just create a package.json.", 'start': 131.713, 'duration': 3.103}, {'end': 136.078, 'text': 'So npm init-y.', 'start': 134.916, 'duration': 1.162}, {'end': 139.681, 'text': "And let's clear this up.", 'start': 138.14, 'duration': 1.541}, {'end': 142.464, 'text': 'So that created a package.json file.', 'start': 140.202, 'duration': 2.262}, {'end': 144.487, 'text': 'Now we need to install two things.', 'start': 142.825, 'duration': 1.662}, {'end': 147.13, 'text': "We're going to install Cheerio.", 'start': 144.527, 'duration': 2.603}, {'end': 155.92, 'text': 'And we also want to install something called a request, which is a very lightweight HTTP module to make requests.', 'start': 148.031, 'duration': 7.889}, {'end': 157.442, 'text': 'All right.', 'start': 157.082, 'duration': 0.36}, {'end': 163.824, 'text': 'You could use Axios or Fetch or something like that, but I think that this is a good choice for this type of thing.', 'start': 157.782, 'duration': 6.042}, {'end': 164.965, 'text': 'All right.', 'start': 164.645, 'duration': 0.32}, {'end': 168.166, 'text': "So now that those are installed, let's create our file.", 'start': 165.085, 'duration': 3.081}, {'end': 174.948, 'text': "So I'm going to create a file called scrape.js and we want to bring in our stuff.", 'start': 168.206, 'duration': 6.742}, {'end': 176.449, 'text': "So let's bring in request.", 'start': 175.008, 'duration': 1.441}, {'end': 185.292, 'text': "Set that to require request and then let's bring in Cheerio.", 'start': 177.329, 'duration': 7.963}], 'summary': 'Created package.json, installed cheerio and request modules, and set up scrape.js for web scraping.', 'duration': 53.579, 'max_score': 131.713, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE131713.jpg'}], 'start': 7.059, 'title': 'Web scraping with node.js and cheerio', 'summary': 'Introduces web scraping using node.js and cheerio, demonstrating how to extract data from a sample blog and store it into a csv file. it also covers the process of setting up a web scraping tool using cheerio and request.', 'chapters': [{'end': 91.103, 'start': 7.059, 'title': 'Web scraping with node.js and cheerio', 'summary': 'Introduces web scraping using node.js and cheerio, a library designed for the server, to extract data from a sample blog, demonstrating how to utilize jquery methods to traverse the dom and obtain information, with the goal of storing the extracted data into a csv file.', 'duration': 84.044, 'highlights': ['Cheerio is a library designed for the server, providing a fast, flexible, and lean implementation of core jQuery for web scraping using Node.js. Cheerio is described as a fast, flexible, and lean implementation of core jQuery designed specifically for the server, enabling efficient web scraping using Node.js.', 'The tutorial aims to scrape a sample blog, extract the title, link, and date of the posts, and store the data into a CSV file. The objective of the tutorial is to scrape a sample blog, extract the title, link, and date of the posts using Cheerio and Node.js, and then save the obtained data into a CSV file.', 'Devmountain is promoted as a 12-week design and development boot camp, offering courses in web development, iOS, and UX design to facilitate full-time job placement in the industry. Devmountain is highlighted as a 12-week design and development boot camp that provides courses in web development, iOS, and UX design, with the goal of assisting students in obtaining full-time positions in the industry.']}, {'end': 131.373, 'start': 91.103, 'title': 'Using node.js fs module for web scraping', 'summary': 'Involves using the fs module in node.js for web scraping, with a brief overview of methods and an upcoming vacation, with the instructor going on vacation for a few days.', 'duration': 40.27, 'highlights': ['The chapter covers using the fs module in node.js for web scraping and briefly explores its methods.', 'The instructor is going on vacation for a couple of days with family.']}, {'end': 219.296, 'start': 131.713, 'title': 'Web scraping with cheerio and request', 'summary': 'Covers the process of setting up a web scraping tool using cheerio and request, beginning with creating a package.json file, installing cheerio and request, and making a request to a url for scraping data.', 'duration': 87.583, 'highlights': ['Setting up package.json using npm init-y The process begins with creating a package.json file using npm init-y.', 'Installing Cheerio and Request for web scraping The chapter emphasizes the installation of Cheerio and Request, with a preference for Request as a lightweight HTTP module for making requests.', 'Creating a file for web scraping and bringing in dependencies The next step involves creating a file named scrape.js and importing request and Cheerio as dependencies for web scraping.', 'Making a request to a URL for scraping data The chapter details the process of making a request to a URL for scraping data, demonstrating the practical application of the setup.']}], 'duration': 212.237, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE7059.jpg', 'highlights': ['Cheerio is a library designed for the server, providing a fast, flexible, and lean implementation of core jQuery for web scraping using Node.js.', 'The tutorial aims to scrape a sample blog, extract the title, link, and date of the posts, and store the data into a CSV file.', 'Devmountain is highlighted as a 12-week design and development boot camp that provides courses in web development, iOS, and UX design, with the goal of assisting students in obtaining full-time positions in the industry.', 'The chapter covers using the fs module in node.js for web scraping and briefly explores its methods.', 'Setting up package.json using npm init-y The process begins with creating a package.json file using npm init-y.', 'Installing Cheerio and Request for web scraping The chapter emphasizes the installation of Cheerio and Request, with a preference for Request as a lightweight HTTP module for making requests.', 'Creating a file for web scraping and bringing in dependencies The next step involves creating a file named scrape.js and importing request and Cheerio as dependencies for web scraping.', 'Making a request to a URL for scraping data The chapter details the process of making a request to a URL for scraping data, demonstrating the practical application of the setup.']}, {'end': 632.349, 'segs': [{'end': 278.219, 'src': 'heatmap', 'start': 249.522, 'weight': 0, 'content': [{'end': 256.524, 'text': "so if that's true, then let's go ahead and just console, log the html and see what we get all right.", 'start': 249.522, 'duration': 7.002}, {'end': 258.466, 'text': "so i'm just going to go to my terminal.", 'start': 256.524, 'duration': 1.942}, {'end': 271.845, 'text': "Let's minimize this and let's run node and the file name of scrape and we just get the entire page, all the HTML, which is, which is what is expected.", 'start': 259.195, 'duration': 12.65}, {'end': 278.219, 'text': 'So this HTML, we want to run this through a load method from Cheerio.', 'start': 272.697, 'duration': 5.522}], 'summary': 'The html page was successfully scraped and will be processed using cheerio.', 'duration': 28.697, 'max_score': 249.522, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE249522.jpg'}, {'end': 378.994, 'src': 'heatmap', 'start': 339.319, 'weight': 4, 'content': [{'end': 340.705, 'text': 'So site heading.', 'start': 339.319, 'duration': 1.386}, {'end': 344.624, 'text': 'And we want our terminal.', 'start': 343.383, 'duration': 1.241}, {'end': 346.086, 'text': "Let's go ahead and run it.", 'start': 344.644, 'duration': 1.442}, {'end': 350.65, 'text': 'And it just gives us this giant object which has a whole bunch of stuff in it.', 'start': 346.706, 'duration': 3.944}, {'end': 354.113, 'text': 'You can see all these methods all these arrays and objects and stuff.', 'start': 350.71, 'duration': 3.403}, {'end': 357.596, 'text': 'This is a very basic tutorial.', 'start': 355.774, 'duration': 1.822}, {'end': 362.76, 'text': 'So I just want to get show you how to get the HTML and how to get the text inside.', 'start': 357.736, 'duration': 5.024}, {'end': 367.064, 'text': 'So if you want the HTML we can simply tack on dot HTML.', 'start': 362.88, 'duration': 4.184}, {'end': 378.994, 'text': "And let's go ahead and run that and it gives us the H1 and the span because that's what's inside of this this site heading.", 'start': 369.769, 'duration': 9.225}], 'summary': 'Basic tutorial on accessing html and text from a website using terminal commands.', 'duration': 39.675, 'max_score': 339.319, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE339319.jpg'}, {'end': 522.148, 'src': 'embed', 'start': 490.205, 'weight': 2, 'content': [{'end': 493.027, 'text': 'And of course, the parent is the actual site heading.', 'start': 490.205, 'duration': 2.822}, {'end': 498.409, 'text': 'So if we get the text of the site heading, we should get the text from the H1 and the span.', 'start': 493.447, 'duration': 4.962}, {'end': 499.809, 'text': "And that's what we get.", 'start': 499.029, 'duration': 0.78}, {'end': 502.07, 'text': "So I'm not going to go too deep into this.", 'start': 500.55, 'duration': 1.52}, {'end': 504.551, 'text': "It's basically just jQuery stuff.", 'start': 502.13, 'duration': 2.421}, {'end': 508.533, 'text': "So let's take a quick look at how to loop over things.", 'start': 505.432, 'duration': 3.101}, {'end': 522.148, 'text': "so if we look at the um, the navigation, which is, let's see nav and should be in here ul, so each li right here in the nav.", 'start': 510.282, 'duration': 11.866}], 'summary': 'Using jquery to extract text from site heading and loop over navigation elements.', 'duration': 31.943, 'max_score': 490.205, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE490205.jpg'}], 'start': 219.376, 'title': 'Web scraping and jquery methods', 'summary': 'Covers the process of web scraping using cheerio, including error checking, html retrieval, and element selection, as well as demonstrating jquery methods such as find, children, next, and parent for extracting content and looping through elements.', 'chapters': [{'end': 357.596, 'start': 219.376, 'title': 'Web scraping with cheerio', 'summary': "Details the process of web scraping using cheerio, which involves checking for errors, retrieving html, and using cheerio's load method to select elements from the dom, ultimately demonstrating a basic tutorial.", 'duration': 138.22, 'highlights': ['The chapter details the process of web scraping using Cheerio The tutorial focuses on web scraping using Cheerio, providing a step-by-step guide.', "involves checking for errors, retrieving HTML, and using Cheerio's load method to select elements from the DOM The process involves checking for errors and ensuring a successful HTTP response with a status code of 200, retrieving HTML, and utilizing Cheerio's load method to select elements from the DOM for further manipulation.", 'demonstrating a basic tutorial The tutorial serves as a basic introduction to web scraping, showcasing the fundamental concepts and processes involved.']}, {'end': 632.349, 'start': 357.736, 'title': 'Jquery methods and looping', 'summary': 'Demonstrates how to extract html and text content using jquery methods like find, children, next, and parent, and then showcases how to loop through elements to retrieve their text and attributes.', 'duration': 274.613, 'highlights': ['The chapter demonstrates how to extract HTML and text content using jQuery methods like find, children, next, and parent. The speaker explains how to use jQuery methods such as find, children, next, and parent to extract HTML and text content from elements.', 'The chapter showcases how to loop through elements to retrieve their text and attributes. The speaker demonstrates how to loop through elements using the jQuery each method to retrieve text and attributes, such as href, from each element.']}], 'duration': 412.973, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE219376.jpg', 'highlights': ['The chapter details the process of web scraping using Cheerio The tutorial focuses on web scraping using Cheerio, providing a step-by-step guide.', "involves checking for errors, retrieving HTML, and using Cheerio's load method to select elements from the DOM The process involves checking for errors and ensuring a successful HTTP response with a status code of 200, retrieving HTML, and utilizing Cheerio's load method to select elements from the DOM for further manipulation.", 'The chapter demonstrates how to extract HTML and text content using jQuery methods like find, children, next, and parent. The speaker explains how to use jQuery methods such as find, children, next, and parent to extract HTML and text content from elements.', 'The chapter showcases how to loop through elements to retrieve their text and attributes. The speaker demonstrates how to loop through elements using the jQuery each method to retrieve text and attributes, such as href, from each element.', 'demonstrating a basic tutorial The tutorial serves as a basic introduction to web scraping, showcasing the fundamental concepts and processes involved.']}, {'end': 845.788, 'segs': [{'end': 710.777, 'src': 'embed', 'start': 633.23, 'weight': 0, 'content': [{'end': 638.894, 'text': "Now what I'm going to do is create a new file so that we can get the posts and we can loop through them,", 'start': 633.23, 'duration': 5.664}, {'end': 642.657, 'text': 'get the stuff that we need and put them into a CSV file.', 'start': 638.894, 'duration': 3.763}, {'end': 644.118, 'text': "So I'm going to create a new file.", 'start': 642.677, 'duration': 1.441}, {'end': 645.739, 'text': "We'll call this one scrape2.js.", 'start': 644.158, 'duration': 1.581}, {'end': 650.715, 'text': "And let's see.", 'start': 649.534, 'duration': 1.181}, {'end': 651.356, 'text': 'In here.', 'start': 650.855, 'duration': 0.501}, {'end': 655.3, 'text': "I'm going to just copy, scrape the initial file we just created.", 'start': 651.356, 'duration': 3.944}, {'end': 659.886, 'text': 'because we want to do all this stuff, make the request, bring in the dependencies.', 'start': 655.3, 'duration': 4.586}, {'end': 663.169, 'text': 'We just want to get rid of all this stuff inside of this.', 'start': 660.326, 'duration': 2.843}, {'end': 668.636, 'text': 'Actually, we want to keep this, the Cheerio.load, but get rid of everything else.', 'start': 664.451, 'duration': 4.185}, {'end': 672.956, 'text': 'Get rid of that.', 'start': 672.295, 'duration': 0.661}, {'end': 673.757, 'text': 'All right.', 'start': 673.496, 'duration': 0.261}, {'end': 683.384, 'text': "So let's grab the actually let's take a look at the Dom real quick and see for each post.", 'start': 675.018, 'duration': 8.366}, {'end': 685.546, 'text': 'It has a class.', 'start': 684.665, 'duration': 0.881}, {'end': 687.787, 'text': 'Each post is a class of post preview.', 'start': 685.646, 'duration': 2.141}, {'end': 691.29, 'text': "So that's the selector that we're going to want to use to loop through.", 'start': 687.827, 'duration': 3.463}, {'end': 695.994, 'text': 'And then the title is inside an H2 with the class of post title.', 'start': 691.831, 'duration': 4.163}, {'end': 699.055, 'text': 'the link is inside an a tag.', 'start': 696.614, 'duration': 2.441}, {'end': 704.176, 'text': 'in there, inside post meta, we have a span with the class of post date.', 'start': 699.055, 'duration': 5.121}, {'end': 707.076, 'text': 'so that makes it pretty easy.', 'start': 704.176, 'duration': 2.9}, {'end': 710.777, 'text': 'so yeah, so this should be simple.', 'start': 707.076, 'duration': 3.701}], 'summary': 'Creating new file to scrape and loop through posts, extracting data for csv file.', 'duration': 77.547, 'max_score': 633.23, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE633230.jpg'}, {'end': 845.788, 'src': 'embed', 'start': 797.35, 'weight': 2, 'content': [{'end': 806.813, 'text': "like the space in between here, and then we're just going to do g for global so that it gets everything, and we want to replace it with just nothing,", 'start': 797.35, 'duration': 9.463}, {'end': 808.313, 'text': 'an empty string.', 'start': 806.813, 'duration': 1.5}, {'end': 809.693, 'text': "so let's try that out.", 'start': 808.313, 'duration': 1.38}, {'end': 812.434, 'text': 'so if we go and run that, there we go.', 'start': 809.693, 'duration': 2.741}, {'end': 815.235, 'text': 'so now we have all the titles without the, the white space.', 'start': 812.434, 'duration': 2.801}, {'end': 817.841, 'text': 'All right.', 'start': 817.321, 'duration': 0.52}, {'end': 819.301, 'text': "so let's see.", 'start': 817.841, 'duration': 1.46}, {'end': 821.382, 'text': 'Next, we want the link.', 'start': 819.422, 'duration': 1.96}, {'end': 823.903, 'text': "So let's say const.", 'start': 822.462, 'duration': 1.441}, {'end': 833.705, 'text': "And if you watch the Python one, notice how close this is to that, even though it's a completely different language, different library.", 'start': 825.303, 'duration': 8.402}, {'end': 841.067, 'text': "It just shows that if you learn one language, it's easy to pick up others because you do basically the same thing.", 'start': 834.145, 'duration': 6.922}, {'end': 845.788, 'text': "It's just a bit of a different syntax, unless you're dealing with like..", 'start': 841.107, 'duration': 4.681}], 'summary': "Using 'g' for global to replace white space, easy to pick up other languages.", 'duration': 48.438, 'max_score': 797.35, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE797350.jpg'}], 'start': 633.23, 'title': 'Web scraping and data processing', 'summary': 'Involves creating a new file to extract specific data and outputting them into a csv file, using cheerio and regular expressions in javascript to remove white space from titles and extract links.', 'chapters': [{'end': 774.142, 'start': 633.23, 'title': 'Creating csv file from web scraped data', 'summary': 'Involves creating a new file, scrape2.js, to loop through posts and extract specific data such as post titles, links, and post dates, and then outputting them into a csv file, with an emphasis on using cheerio and finding specific classes and tags within the document.', 'duration': 140.912, 'highlights': ['The process involves creating a new file, scrape2.js, to loop through posts and extract specific data such as post titles, links, and post dates, and then outputting them into a CSV file.', 'The chapter emphasizes the use of Cheerio and finding specific classes and tags within the document.', "The selector for looping through each post is identified as 'post preview', while the title is located inside an H2 with the class of 'post title' and the link is inside an a tag.", "The method 'find' is used to extract the post title from the document, with a focus on obtaining the text.", "The process of running 'scrape2.js' successfully retrieves all the titles from the posts."]}, {'end': 845.788, 'start': 774.882, 'title': 'Removing white space and extracting links', 'summary': 'Demonstrates using regular expressions in javascript to remove white space from titles and extract links, while highlighting the similarities in syntax between different programming languages.', 'duration': 70.906, 'highlights': ["The chapter showcases the use of regular expressions in JavaScript to remove white space from titles, demonstrating the effectiveness of using 'dot replace' with a regular expression to replace all white spaces with an empty string, resulting in clean titles without white space.", 'It highlights the similarity in syntax between different programming languages, emphasizing that learning one language facilitates the understanding and application of similar concepts in others.']}], 'duration': 212.558, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE633230.jpg', 'highlights': ['The process involves creating a new file, scrape2.js, to loop through posts and extract specific data such as post titles, links, and post dates, and then outputting them into a CSV file.', 'The chapter emphasizes the use of Cheerio and finding specific classes and tags within the document.', "The chapter showcases the use of regular expressions in JavaScript to remove white space from titles, demonstrating the effectiveness of using 'dot replace' with a regular expression to replace all white spaces with an empty string, resulting in clean titles without white space.", "The selector for looping through each post is identified as 'post preview', while the title is located inside an H2 with the class of 'post title' and the link is inside an a tag.", "The method 'find' is used to extract the post title from the document, with a focus on obtaining the text.", "The process of running 'scrape2.js' successfully retrieves all the titles from the posts.", 'It highlights the similarity in syntax between different programming languages, emphasizing that learning one language facilitates the understanding and application of similar concepts in others.']}, {'end': 1214.027, 'segs': [{'end': 882.873, 'src': 'embed', 'start': 847.014, 'weight': 1, 'content': [{'end': 851.118, 'text': "you know, really low level languages where you're doing memory management, stuff like that.", 'start': 847.014, 'duration': 4.104}, {'end': 854.22, 'text': "But for web development, it's pretty easy.", 'start': 851.198, 'duration': 3.022}, {'end': 856.922, 'text': "So let's do L dot.", 'start': 854.961, 'duration': 1.961}, {'end': 858.544, 'text': 'I lost my train of thought.', 'start': 857.463, 'duration': 1.081}, {'end': 867.932, 'text': "We're getting all the links, so let's do find link and we want the actual href.", 'start': 860.245, 'duration': 7.687}, {'end': 872.456, 'text': 'So adder dot adder href.', 'start': 867.972, 'duration': 4.484}, {'end': 874.848, 'text': 'All right.', 'start': 874.588, 'duration': 0.26}, {'end': 881.532, 'text': "And then let's console log title and link and run that.", 'start': 875.088, 'duration': 6.444}, {'end': 882.873, 'text': 'And there we go.', 'start': 882.393, 'duration': 0.48}], 'summary': 'Discussion on web development and code execution.', 'duration': 35.859, 'max_score': 847.014, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE847014.jpg'}, {'end': 959.119, 'src': 'embed', 'start': 920.942, 'weight': 0, 'content': [{'end': 924.325, 'text': "It's giving us a date, but notice the date is a comma inside of it.", 'start': 920.942, 'duration': 3.383}, {'end': 929.869, 'text': "So that's going to kind of mess things up for us since this is a comma separated value file.", 'start': 924.605, 'duration': 5.264}, {'end': 944.636, 'text': 'So what we could do is tack on to this and use a regular expression and just put a literal comma.', 'start': 930.99, 'duration': 13.646}, {'end': 949.157, 'text': "So we want to replace a comma with, let's do a space.", 'start': 944.676, 'duration': 4.481}, {'end': 955.118, 'text': "So let's see what that gives us.", 'start': 950.677, 'duration': 4.441}, {'end': 959.119, 'text': "Okay, so we'll just do January 4th.", 'start': 955.438, 'duration': 3.681}], 'summary': 'Using regular expressions to replace comma with space in date data.', 'duration': 38.177, 'max_score': 920.942, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE920942.jpg'}, {'end': 1109.456, 'src': 'embed', 'start': 1003.21, 'weight': 4, 'content': [{'end': 1015.493, 'text': "we want to open up a stream to write and we're going to set this to the fs module and it has this create right stream that's what we want to use here and then we pass in the file name that we want to use,", 'start': 1003.21, 'duration': 12.283}, {'end': 1016.514, 'text': 'which is going to be post.csv.', 'start': 1015.493, 'duration': 1.021}, {'end': 1020.987, 'text': 'Okay, so we have that to work with.', 'start': 1019.224, 'duration': 1.763}, {'end': 1025.253, 'text': 'Now, just like in the Python video, we have to write the headers.', 'start': 1021.388, 'duration': 3.865}, {'end': 1034.188, 'text': "Okay, so let's say write stream dot write.", 'start': 1025.273, 'duration': 8.915}, {'end': 1039.642, 'text': "And for our headers, I'm actually going to put in back ticks here.", 'start': 1036.679, 'duration': 2.963}, {'end': 1042.204, 'text': "And I'm going to say title.", 'start': 1040.662, 'duration': 1.542}, {'end': 1045.126, 'text': "So we don't have to do any concatenation or anything like that.", 'start': 1042.223, 'duration': 2.903}, {'end': 1047.228, 'text': "So let's say title.", 'start': 1046.186, 'duration': 1.042}, {'end': 1051.812, 'text': 'What else was it? The link and the date.', 'start': 1048.008, 'duration': 3.804}, {'end': 1056.856, 'text': "And then we're just going to put a new line like that.", 'start': 1052.572, 'duration': 4.284}, {'end': 1057.897, 'text': 'All right.', 'start': 1056.876, 'duration': 1.021}, {'end': 1059.098, 'text': "So that'll write the headers.", 'start': 1057.917, 'duration': 1.181}, {'end': 1064.062, 'text': "Now down here, we're going to just copy this.", 'start': 1059.158, 'duration': 4.904}, {'end': 1085.917, 'text': "going to replace this console log and let's say write to csv or write row to csv and instead of this stuff here we want to write the actual Values.", 'start': 1067.472, 'duration': 18.445}, {'end': 1086.917, 'text': 'so this is a.', 'start': 1085.917, 'duration': 1}, {'end': 1094.364, 'text': 'since we use back ticks, we can use this syntax where we just put variables inside of a money sign and curly braces.', 'start': 1086.917, 'duration': 7.447}, {'end': 1109.456, 'text': "so title and link and Date all right, and then we'll just do New line like that and we should be good.", 'start': 1094.364, 'duration': 15.092}], 'summary': 'Opening a stream to write to post.csv, writing headers and values for title, link, and date.', 'duration': 106.246, 'max_score': 1003.21, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE1003210.jpg'}, {'end': 1202.3, 'src': 'embed', 'start': 1172.878, 'weight': 3, 'content': [{'end': 1174.479, 'text': "so i think that's it, guys.", 'start': 1172.878, 'duration': 1.601}, {'end': 1182.586, 'text': 'now i know this is very basic, but it gives you a start on um how to scrape websites and, like i said in the python video,', 'start': 1174.479, 'duration': 8.107}, {'end': 1186.77, 'text': "there's a lot of ethics that goes in that go into, uh, web scraping,", 'start': 1182.586, 'duration': 4.184}, {'end': 1194.395, 'text': "because you know there's a lot of sites that don't want you to to scrape their data,", 'start': 1187.47, 'duration': 6.925}, {'end': 1202.3, 'text': "so you have to always look into it before you you do any kind of scraping on a public site, and that's why I didn't use a public site,", 'start': 1194.395, 'duration': 7.905}], 'summary': 'Basic introduction to web scraping and ethical considerations in python.', 'duration': 29.422, 'max_score': 1172.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE1172878.jpg'}], 'start': 847.014, 'title': 'Web development and writing to csv file', 'summary': 'Covers extracting titles, links, and date from a web page using javascript for web development. it also explains writing to a csv file with the node fs module, including creating a write stream, writing headers, and values, and ethical considerations in web scraping.', 'chapters': [{'end': 959.119, 'start': 847.014, 'title': 'Web development link extraction', 'summary': 'Covers extracting titles, links, and date from a web page using javascript, resulting in successful retrieval of the necessary data for web development.', 'duration': 112.105, 'highlights': ['Using JavaScript to extract titles, links, and date from a web page for web development.', 'Demonstrating the process of finding links and retrieving the href attribute using JavaScript.', 'Utilizing regular expressions to replace commas with spaces for improving data handling.']}, {'end': 1214.027, 'start': 959.199, 'title': 'Writing to csv file with node fs module', 'summary': 'Covers writing to a csv file using the node fs module, creating a write stream, writing headers, and values to the csv file, with emphasis on avoiding data manipulation issues, ending with a reminder about ethical considerations in web scraping.', 'duration': 254.828, 'highlights': ['Creating a write stream using the fs module and specifying the file name as post.csv. The speaker explains creating a write stream using the fs module and specifies the file name as post.csv.', 'Writing headers to the CSV file and using back ticks to avoid concatenation. The transcript details writing headers to the CSV file and utilizing back ticks to avoid concatenation.', 'Demonstrating the process of writing values to the CSV file and handling comma replacement to avoid data manipulation issues. The speaker demonstrates the process of writing values to the CSV file and emphasizes handling comma replacement to avoid data manipulation issues.', 'Emphasizing ethical considerations in web scraping and the importance of respecting website data policies. The chapter ends with a reminder about ethical considerations in web scraping and the importance of respecting website data policies.']}], 'duration': 367.013, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/LoziivfAAjE/pics/LoziivfAAjE847014.jpg', 'highlights': ['Utilizing regular expressions to replace commas with spaces for improving data handling.', 'Demonstrating the process of finding links and retrieving the href attribute using JavaScript.', 'Using JavaScript to extract titles, links, and date from a web page for web development.', 'Emphasizing ethical considerations in web scraping and the importance of respecting website data policies.', 'Creating a write stream using the fs module and specifying the file name as post.csv.', 'Writing headers to the CSV file and using back ticks to avoid concatenation.', 'Demonstrating the process of writing values to the CSV file and handling comma replacement to avoid data manipulation issues.']}], 'highlights': ['Cheerio is a library designed for the server, providing a fast, flexible, and lean implementation of core jQuery for web scraping using Node.js.', 'The tutorial aims to scrape a sample blog, extract the title, link, and date of the posts, and store the data into a CSV file.', 'The chapter details the process of web scraping using Cheerio The tutorial focuses on web scraping using Cheerio, providing a step-by-step guide.', 'The process involves creating a new file, scrape2.js, to loop through posts and extract specific data such as post titles, links, and post dates, and then outputting them into a CSV file.', 'Utilizing regular expressions to replace commas with spaces for improving data handling.', 'Emphasizing ethical considerations in web scraping and the importance of respecting website data policies.']}