title
Data Analyst Portfolio Project | Data Cleaning in SQL | Project 3/4
description
Take my Full MySQL Course Here: https://bit.ly/3tqOipr
Today we continue our Data Analyst Portfolio Project Series. In this project we will be cleaning data in SQL. Data Cleaning is a super underrated skill in the Data Analyst community so I hope this can be helpful to start you on your journey to learning that skill.
Please remember to save this project and add it to your GitHub once you are done!
LINKS:
Project Dataset: https://github.com/AlexTheAnalyst/PortfolioProjects/blob/main/Nashville%20Housing%20Data%20for%20Data%20Cleaning.xlsx
Link to SQL Code: https://github.com/AlexTheAnalyst/PortfolioProjects/blob/main/Data%20Cleaning%20Portfolio%20Project%20Queries.sql
Link to Install SQL Server: https://www.youtube.com/watch?v=RSlqWnP-Dy8
____________________________________________
SUBSCRIBE!
Do you want to become a Data Analyst? That's what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
____________________________________________
RESOURCES:
Coursera Courses:
Google Data Analyst Certification: https://coursera.pxf.io/5bBd62
Data Analysis with Python - https://coursera.pxf.io/BXY3Wy
IBM Data Analysis Specialization - https://coursera.pxf.io/AoYOdR
Tableau Data Visualization - https://coursera.pxf.io/MXYqaN
Udemy Courses:
Python for Data Analysis and Visualization- https://bit.ly/3hhX4LX
Statistics for Data Science - https://bit.ly/37jqDbq
SQL for Data Analysts (SSMS) - https://bit.ly/3fkqEij
Tableau A-Z - http://bit.ly/385lYvN
*Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!*
____________________________________________
SUPPORT MY CHANNEL - PATREON/MERCH
Patreon Page - https://www.patreon.com/AlexTheAnalyst
Alex The Analyst Shop - https://teespring.com/stores/alex-the-analyst-shop
____________________________________________
Websites:
GitHub: https://github.com/AlexTheAnalyst
____________________________________________
*All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for*
0:00 Introduction
1:02 Download Dataset
1:56 Import Data into SQL
7:35 DATA CLEANING
50:32 Recap
detail
{'title': 'Data Analyst Portfolio Project | Data Cleaning in SQL | Project 3/4', 'heatmap': [{'end': 2266.315, 'start': 2230.967, 'weight': 0.708}, {'end': 2826.705, 'start': 2788.467, 'weight': 1}], 'summary': 'Covers a sql portfolio project on data cleaning using nashville housing data, addressing challenges in etl processes, database transformation, property address and data transformation, removal of duplicates, and data management with sql, achieving successful updates and deletion of 104 rows.', 'chapters': [{'end': 184.901, 'segs': [{'end': 39.533, 'src': 'embed', 'start': 0.129, 'weight': 1, 'content': [{'end': 2.332, 'text': "What's going on, everybody? Welcome back to another video.", 'start': 0.129, 'duration': 2.203}, {'end': 5.556, 'text': 'Today, we will be heading back into SQL for our third portfolio project.', 'start': 2.532, 'duration': 3.024}, {'end': 15.895, 'text': 'Now I am extremely excited for this project in particular for a few reasons.', 'start': 11.892, 'duration': 4.003}, {'end': 18.837, 'text': "One, we're getting back into SQL and I really like SQL.", 'start': 16.155, 'duration': 2.682}, {'end': 22.14, 'text': 'And two, we are finally focusing on data cleaning.', 'start': 19.338, 'duration': 2.802}, {'end': 28.224, 'text': 'And I have talked so much about why data cleaning is important and that you really need to learn how to clean data,', 'start': 22.58, 'duration': 5.644}, {'end': 33.048, 'text': "and that that's a big part of what a data analyst does, but I haven't actually showed you how to do it yet.", 'start': 28.224, 'duration': 4.824}, {'end': 36.15, 'text': 'And so that is what this whole project is going to be.', 'start': 33.468, 'duration': 2.682}, {'end': 38.492, 'text': "And then at the end, you'll get to add it to your portfolio.", 'start': 36.23, 'duration': 2.262}, {'end': 39.533, 'text': "So it's really a win-win.", 'start': 38.532, 'duration': 1.001}], 'summary': 'Exciting third portfolio project in sql, focusing on data cleaning and portfolio addition.', 'duration': 39.404, 'max_score': 0.129, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU129.jpg'}, {'end': 97.64, 'src': 'embed', 'start': 68.167, 'weight': 0, 'content': [{'end': 69.829, 'text': 'So I will put this link in the description.', 'start': 68.167, 'duration': 1.662}, {'end': 74.074, 'text': "We're going to go right over here to the Nashville housing data for data cleaning.", 'start': 69.849, 'duration': 4.225}, {'end': 76.677, 'text': 'All you have to do is click download.', 'start': 74.094, 'duration': 2.583}, {'end': 80.631, 'text': "and it's going to download it and you can open it up if you want to.", 'start': 77.449, 'duration': 3.182}, {'end': 86.853, 'text': "We're not gonna do anything to this data at all, but really quick, I'm just gonna show you what it does look like.", 'start': 80.651, 'duration': 6.202}, {'end': 92.096, 'text': "And we'll, of course, look at this in SQL in just a little bit, but we have a unique ID, parcel ID.", 'start': 87.194, 'duration': 4.902}, {'end': 93.577, 'text': 'We have this address.', 'start': 92.656, 'duration': 0.921}, {'end': 97.64, 'text': 'a sales date, the price of the home.', 'start': 94.857, 'duration': 2.783}], 'summary': 'Using nashville housing data for data cleaning, including unique id, address, sales date, and home prices.', 'duration': 29.473, 'max_score': 68.167, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU68167.jpg'}], 'start': 0.129, 'title': 'Sql data cleaning project', 'summary': 'Covers a new sql portfolio project focusing on data cleaning using nashville housing data, emphasizing the importance of data cleaning and sql skills, and providing step-by-step guidance on importing data into sql server.', 'chapters': [{'end': 184.901, 'start': 0.129, 'title': 'Sql data cleaning project', 'summary': 'Covers a new sql portfolio project focusing on data cleaning using nashville housing data, emphasizing the importance of data cleaning and sql skills, and providing step-by-step guidance on importing data into sql server.', 'duration': 184.772, 'highlights': ['Importing Data Into SQL Server The chapter provides step-by-step guidance on importing Nashville housing data into SQL Server, emphasizing the importance of this process for the data cleaning project.', "Focus on Data Cleaning Emphasizing the significance of data cleaning, the chapter highlights the need for data analysts to learn data cleaning and its essential role in a data analyst's work.", 'Excitement for SQL Project The speaker expresses enthusiasm for the SQL project, underlining the significance of the project in reinforcing SQL skills and knowledge.']}], 'duration': 184.772, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU129.jpg', 'highlights': ['The chapter provides step-by-step guidance on importing Nashville housing data into SQL Server, emphasizing the importance of this process for the data cleaning project.', "Emphasizing the significance of data cleaning, the chapter highlights the need for data analysts to learn data cleaning and its essential role in a data analyst's work.", 'The speaker expresses enthusiasm for the SQL project, underlining the significance of the project in reinforcing SQL skills and knowledge.']}, {'end': 567.742, 'segs': [{'end': 294.959, 'src': 'embed', 'start': 264.835, 'weight': 1, 'content': [{'end': 267.176, 'text': "It's going to be this project walkthrough.", 'start': 264.835, 'duration': 2.341}, {'end': 270.458, 'text': "Here are the things that I'm going to show you this really quickly.", 'start': 268.297, 'duration': 2.161}, {'end': 272.96, 'text': "Here are the things that we're going to be walking through.", 'start': 270.558, 'duration': 2.402}, {'end': 274.541, 'text': "So we're going to standardize the date format.", 'start': 272.98, 'duration': 1.561}, {'end': 277.842, 'text': "we're going to populate the property address data.", 'start': 275.299, 'duration': 2.543}, {'end': 279.543, 'text': "Um, that's referring to this right here.", 'start': 278.182, 'duration': 1.361}, {'end': 284.508, 'text': "If you notice there's the address and there's also the city that it's in.", 'start': 279.784, 'duration': 4.724}, {'end': 287.251, 'text': 'So we want to be able to separate that out.', 'start': 285.369, 'duration': 1.882}, {'end': 290.735, 'text': 'Um, and that is actually right over here.', 'start': 288.032, 'duration': 2.703}, {'end': 294.959, 'text': "We're going to be doing the same thing, same thing to the owner address, except that has an address.", 'start': 290.795, 'duration': 4.164}], 'summary': 'Project walkthrough to standardize date format, populate property address data, and separate city from address.', 'duration': 30.124, 'max_score': 264.835, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU264835.jpg'}, {'end': 342.061, 'src': 'embed', 'start': 306.546, 'weight': 0, 'content': [{'end': 308.667, 'text': "That's what this one is, breaking it out into individual columns.", 'start': 306.546, 'duration': 2.121}, {'end': 309.467, 'text': "That's what we're gonna do for that.", 'start': 308.687, 'duration': 0.78}, {'end': 312.648, 'text': 'This is populating the property address.', 'start': 309.907, 'duration': 2.741}, {'end': 317.75, 'text': "If you notice, and we'll go into this a little bit, there's actually some values in the property address that are blank.", 'start': 312.688, 'duration': 5.062}, {'end': 326.699, 'text': "But I'm going to show you how you can actually populate that, which is just a cool trick that I've used a few times, and it does work.", 'start': 318.41, 'duration': 8.289}, {'end': 328.962, 'text': "I think you'll find that one interesting.", 'start': 327.78, 'duration': 1.182}, {'end': 333.547, 'text': "In the sold as vacant field, we're going to be doing some..", 'start': 330.143, 'duration': 3.404}, {'end': 335.61, 'text': 'some case statements.', 'start': 334.688, 'duration': 0.922}, {'end': 342.061, 'text': "if, then then we're going to be removing duplicates and then deleting unused columns, so we have a lot to get through.", 'start': 335.61, 'duration': 6.451}], 'summary': 'Demonstrating data manipulation and cleaning techniques with examples and tricks.', 'duration': 35.515, 'max_score': 306.546, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU306546.jpg'}, {'end': 387.401, 'src': 'embed', 'start': 358.138, 'weight': 4, 'content': [{'end': 367.882, 'text': 'but I ran into not issues on my side but issues in the fact that the vast majority of people who are gonna be watching this are not gonna be able to do what I did to configure my server.', 'start': 358.138, 'duration': 9.744}, {'end': 370.524, 'text': 'But I left it in here anyways.', 'start': 369.043, 'duration': 1.481}, {'end': 378.252, 'text': "When I think ETL is an automated process in order to extract the data from somewhere, we're gonna transform it and then put it somewhere.", 'start': 371.025, 'duration': 7.227}, {'end': 387.401, 'text': 'This was gonna be the extraction method and I was gonna put it in a stored procedure so that you could run the stored procedure or run the job.', 'start': 378.832, 'duration': 8.569}], 'summary': 'Etl process involves extracting, transforming, and storing data. challenges with server configuration.', 'duration': 29.263, 'max_score': 358.138, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU358138.jpg'}, {'end': 448.026, 'src': 'embed', 'start': 419.272, 'weight': 3, 'content': [{'end': 423.395, 'text': "This is just for, if you want to try it, it's a little more advanced.", 'start': 419.272, 'duration': 4.123}, {'end': 430.98, 'text': "And so you don't have to just important when this will be a data cleaning project instead of an ETL project.", 'start': 424.656, 'duration': 6.324}, {'end': 433.682, 'text': 'but data cleaning is what 90% of it was going to be anyways.', 'start': 430.98, 'duration': 2.702}, {'end': 438.644, 'text': "Anyways, let's go back up to the very top really quickly.", 'start': 435.023, 'duration': 3.621}, {'end': 442.985, 'text': 'I have a whole nother laptop right here as I did in the first video.', 'start': 438.664, 'duration': 4.321}, {'end': 448.026, 'text': "I didn't show it to you last time, but I have all of my queries written out over here.", 'start': 443.005, 'duration': 5.021}], 'summary': 'The transcript covers advanced data cleaning with 90% focus.', 'duration': 28.754, 'max_score': 419.272, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU419272.jpg'}, {'end': 523.315, 'src': 'embed', 'start': 490.144, 'weight': 5, 'content': [{'end': 491.786, 'text': "So let's copy this really quick.", 'start': 490.144, 'duration': 1.642}, {'end': 496.49, 'text': "And let's look at just sale date.", 'start': 493.788, 'duration': 2.702}, {'end': 502.605, 'text': 'And it has this time on the end and it serves absolutely no purpose.', 'start': 499.563, 'duration': 3.042}, {'end': 503.865, 'text': 'And it just annoys me.', 'start': 502.665, 'duration': 1.2}, {'end': 505.186, 'text': 'I want to take that off.', 'start': 503.965, 'duration': 1.221}, {'end': 514.971, 'text': "And so right now it's a date time format, but we're going to convert and we're going to do date and we're going to take sale date.", 'start': 505.306, 'duration': 9.665}, {'end': 519.413, 'text': 'Sale date.', 'start': 517.491, 'duration': 1.922}, {'end': 520.754, 'text': "And we're going to go like that.", 'start': 519.794, 'duration': 0.96}, {'end': 523.315, 'text': "And let's run this really quick.", 'start': 522.135, 'duration': 1.18}], 'summary': 'Converting date-time format to date for sale date.', 'duration': 33.171, 'max_score': 490.144, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU490144.jpg'}], 'start': 184.921, 'title': 'Etl process and data cleaning project', 'summary': 'Discusses challenges in setting up automated etl process using sql, difficulties in server configuration, transition to data cleaning project due to complexity, and focus on cleaning sale date field with 56,477 rows dataset.', 'chapters': [{'end': 342.061, 'start': 184.921, 'title': 'Data copy and sql walkthrough', 'summary': 'Details the process of copying data, performing sql operations, and highlighting the tasks involved in standardizing date format, populating property and owner addresses, and removing duplicates from a dataset containing 56,477 rows.', 'duration': 157.14, 'highlights': ['The process of standardizing date format and populating property and owner addresses is detailed, with a focus on handling over 56,000 rows of data.', "The chapter discusses populating blank values in property addresses and implementing case statements for the 'sold as vacant' field.", 'The walkthrough covers the tasks of breaking down columns, removing duplicates, and deleting unused columns to streamline the dataset.']}, {'end': 567.742, 'start': 342.061, 'title': 'Etl process and data cleaning project', 'summary': 'Discusses the challenges faced in setting up an automated etl process using sql, the difficulties in configuring the server, and the transition to a data cleaning project due to the complexity, with a focus on cleaning the sale date field and encountering issues with the update operation.', 'duration': 225.681, 'highlights': ['Setting up an automated ETL process in SQL The chapter discusses the initial intention of setting up an automated ETL process in SQL, highlighting the complexities and challenges faced in configuring the server for the majority of viewers.', 'Transition to a data cleaning project The chapter highlights the transition to a data cleaning project due to the complexities of the ETL process, emphasizing that data cleaning comprises 90% of the project.', 'Challenges with updating the sale date field The chapter details the challenges encountered in updating the sale date field, indicating that the update operation works only 80% of the time with no clear explanation for the failure in the remaining 20%.']}], 'duration': 382.821, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU184921.jpg', 'highlights': ['The walkthrough covers the tasks of breaking down columns, removing duplicates, and deleting unused columns to streamline the dataset.', 'The process of standardizing date format and populating property and owner addresses is detailed, with a focus on handling over 56,000 rows of data.', "The chapter discusses populating blank values in property addresses and implementing case statements for the 'sold as vacant' field.", 'Transition to a data cleaning project due to the complexities of the ETL process, emphasizing that data cleaning comprises 90% of the project.', 'Setting up an automated ETL process in SQL, highlighting the complexities and challenges faced in configuring the server for the majority of viewers.', 'Challenges with updating the sale date field, indicating that the update operation works only 80% of the time with no clear explanation for the failure in the remaining 20%.']}, {'end': 1142.488, 'segs': [{'end': 636.069, 'src': 'embed', 'start': 602.94, 'weight': 0, 'content': [{'end': 611.341, 'text': 'And then we can say, like this and set sale date converted.', 'start': 602.94, 'duration': 8.401}, {'end': 614.002, 'text': "Let's try this and see what happens.", 'start': 612.501, 'duration': 1.501}, {'end': 620.804, 'text': "So I'm going to add this column and then I'm going to update this and it says it's affected.", 'start': 614.202, 'duration': 6.602}, {'end': 622.145, 'text': "Let's see what happened.", 'start': 621.224, 'duration': 0.921}, {'end': 631.328, 'text': "So let's write sale date, convert sale date converted.", 'start': 623.045, 'duration': 8.283}, {'end': 632.929, 'text': "Let's see what happened.", 'start': 631.348, 'duration': 1.581}, {'end': 634.169, 'text': "Let's see if it actually worked.", 'start': 633.169, 'duration': 1}, {'end': 636.069, 'text': 'And it worked.', 'start': 635.649, 'duration': 0.42}], 'summary': 'Successfully converted sale date and updated affected column.', 'duration': 33.129, 'max_score': 602.94, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU602940.jpg'}, {'end': 725.327, 'src': 'embed', 'start': 693.464, 'weight': 4, 'content': [{'end': 696.885, 'text': "Why there are null values? I really don't know.", 'start': 693.464, 'duration': 3.421}, {'end': 699.572, 'text': 'I really am not sure.', 'start': 697.911, 'duration': 1.661}, {'end': 707.935, 'text': "But let's look at everything where this is, where it's null.", 'start': 700.532, 'duration': 7.403}, {'end': 710.176, 'text': 'So we have this property address.', 'start': 708.596, 'duration': 1.58}, {'end': 713.418, 'text': 'We have a sale date, a price, legal reference.', 'start': 710.216, 'duration': 3.202}, {'end': 716.159, 'text': "There's this parcel ID and there's this unique ID.", 'start': 713.758, 'duration': 2.401}, {'end': 725.327, 'text': 'So we have a lot of information and When you have something like this, something like an address, an address is.', 'start': 717.039, 'duration': 8.288}], 'summary': 'Data includes null values for property address, sale date, price, legal reference, parcel id, and unique id.', 'duration': 31.863, 'max_score': 693.464, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU693464.jpg'}, {'end': 840.415, 'src': 'embed', 'start': 783.091, 'weight': 1, 'content': [{'end': 784.372, 'text': "So 015, and that's the exact same address.", 'start': 783.091, 'duration': 1.281}, {'end': 789.151, 'text': "And we'll find this a lot of times.", 'start': 787.509, 'duration': 1.642}, {'end': 793.494, 'text': 'And I look through the data and it is pretty much accurate.', 'start': 789.211, 'duration': 4.283}, {'end': 798.639, 'text': 'When it does have it, it is the exact same address.', 'start': 794.795, 'duration': 3.844}, {'end': 803.583, 'text': 'So this parcel ID is going to be the same as the property address.', 'start': 798.939, 'duration': 4.644}, {'end': 808.247, 'text': 'So something that we can do is basically say if.', 'start': 805.204, 'duration': 3.043}, {'end': 815.749, 'text': 'This parcel ID has an address and this parcel ID does not have an address.', 'start': 810.265, 'duration': 5.484}, {'end': 821.693, 'text': "Let's populate it with this address that's already populated because we know these are going to be the same.", 'start': 815.789, 'duration': 5.904}, {'end': 824.155, 'text': 'That is basically what we are about to do.', 'start': 822.694, 'duration': 1.461}, {'end': 831.84, 'text': "And it's not super complicated, but let's get started writing it.", 'start': 826.156, 'duration': 5.684}, {'end': 835.492, 'text': 'Copy that down there.', 'start': 834.431, 'duration': 1.061}, {'end': 840.415, 'text': 'One thing we are going to have to do with this is do a self join.', 'start': 836.993, 'duration': 3.422}], 'summary': 'Matching parcel id and property address for accuracy.', 'duration': 57.324, 'max_score': 783.091, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU783091.jpg'}, {'end': 1077.592, 'src': 'embed', 'start': 1050.852, 'weight': 2, 'content': [{'end': 1056.034, 'text': "And just so you know, when you're doing joins in an update statement, you're not going to say Nashville housing.", 'start': 1050.852, 'duration': 5.182}, {'end': 1057.514, 'text': "Okay, that's going to give you an error.", 'start': 1056.334, 'duration': 1.18}, {'end': 1059.435, 'text': 'You need to use it by its alias.', 'start': 1057.954, 'duration': 1.481}, {'end': 1061.336, 'text': "So let's put A.", 'start': 1059.595, 'duration': 1.741}, {'end': 1067.698, 'text': "So now we're going to say property address is going to be equal to, and now we're just going to copy this is null.", 'start': 1061.336, 'duration': 6.362}, {'end': 1069.988, 'text': 'And put it right here.', 'start': 1069.068, 'duration': 0.92}, {'end': 1073.77, 'text': 'And we only want to update.', 'start': 1072.41, 'duration': 1.36}, {'end': 1075.611, 'text': "Let's see if it does take this.", 'start': 1073.79, 'duration': 1.821}, {'end': 1077.592, 'text': 'So I think this should be correct.', 'start': 1076.351, 'duration': 1.241}], 'summary': 'Using aliases in update joins to avoid errors in nashville housing data.', 'duration': 26.74, 'max_score': 1050.852, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU1050852.jpg'}], 'start': 568.182, 'title': 'Database transformation and sql self-join', 'summary': "Involves altering a table by adding a new column 'sale date converted', updating it, and identifying null values in 'property address' by comparing it with 'parcel id' to populate the address. it also discusses using a self-join to update property addresses in a sql table, ensuring unique values and populating null values, resulting in successful updates for all 35 entries.", 'chapters': [{'end': 815.749, 'start': 568.182, 'title': 'Database column transformation', 'summary': "Involves altering a table by adding a new column 'sale date converted', updating it, and identifying null values in 'property address' by comparing it with 'parcel id' to populate the address.", 'duration': 247.567, 'highlights': ["A new column 'sale date converted' is added to the table and updated, resulting in successful transformation.", "Identifying null values in 'property address' and exploring the relationship with 'parcel ID' to potentially populate the address based on existing references.", "Observation of frequent occurrence of the same address for a particular 'parcel ID', indicating a potential correlation between 'parcel ID' and 'property address'."]}, {'end': 1142.488, 'start': 815.789, 'title': 'Sql self-join for updating address', 'summary': 'Discusses using a self-join to update property addresses in a sql table, ensuring unique values and populating null values, resulting in successful updates for all 35 entries.', 'duration': 326.699, 'highlights': ['The self-join is used to compare and update property addresses in the same table, ensuring unique values by using a unique ID and populating null values.', "Utilizing the 'is null' function, the SQL query successfully updates all 35 entries with the appropriate property addresses.", "The process involves setting up the self-join condition and using the 'is null' function to populate the property addresses, resulting in the successful update of the entries."]}], 'duration': 574.306, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU568182.jpg', 'highlights': ["A new column 'sale date converted' is added to the table and updated, resulting in successful transformation.", 'The self-join is used to compare and update property addresses in the same table, ensuring unique values by using a unique ID and populating null values.', "Utilizing the 'is null' function, the SQL query successfully updates all 35 entries with the appropriate property addresses.", "Observation of frequent occurrence of the same address for a particular 'parcel ID', indicating a potential correlation between 'parcel ID' and 'property address'.", "Identifying null values in 'property address' and exploring the relationship with 'parcel ID' to potentially populate the address based on existing references.", "The process involves setting up the self-join condition and using the 'is null' function to populate the property addresses, resulting in the successful update of the entries."]}, {'end': 2242.596, 'segs': [{'end': 1175.132, 'src': 'embed', 'start': 1143.308, 'weight': 4, 'content': [{'end': 1149.65, 'text': "And let's take a look at, uh, what are we doing? The property address, the property address.", 'start': 1143.308, 'duration': 6.342}, {'end': 1151.811, 'text': 'Um, and we can get rid of this as well.', 'start': 1150.01, 'duration': 1.801}, {'end': 1158.472, 'text': 'So, if you notice, we have two things here.', 'start': 1154.247, 'duration': 4.225}, {'end': 1165.521, 'text': "We have both the address, and then there's this comma after all of them, and there is the city.", 'start': 1158.973, 'duration': 6.548}, {'end': 1175.132, 'text': "Now you don't know that, or maybe you haven't looked into this, but I have, and there are no other commas anywhere,", 'start': 1166.623, 'duration': 8.509}], 'summary': 'Analyzing property addresses, identifying address and city components, no additional commas found.', 'duration': 31.824, 'max_score': 1143.308, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU1143308.jpg'}, {'end': 1227.203, 'src': 'embed', 'start': 1199.321, 'weight': 0, 'content': [{'end': 1207.507, 'text': "For this one we're gonna be using something called a substring and we're also gonna be using something called a character index or a char index.", 'start': 1199.321, 'duration': 8.186}, {'end': 1215.392, 'text': "So let's start writing that out and let's do select and let's say substring.", 'start': 1208.547, 'duration': 6.845}, {'end': 1223.558, 'text': 'Now the substring that we want to take, we of course wanna be looking at, oops, let me put this down here so it helps us out a little bit.', 'start': 1216.133, 'duration': 7.425}, {'end': 1227.203, 'text': "And I'll get like that.", 'start': 1225.402, 'duration': 1.801}], 'summary': 'Using substring and character index for selecting data.', 'duration': 27.882, 'max_score': 1199.321, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU1199321.jpg'}, {'end': 1658.537, 'src': 'embed', 'start': 1630.336, 'weight': 3, 'content': [{'end': 1633.538, 'text': 'So property split address and property split city.', 'start': 1630.336, 'duration': 3.202}, {'end': 1637.039, 'text': "It's much more usable than this.", 'start': 1634.418, 'duration': 2.621}, {'end': 1639.62, 'text': 'I mean, this would be a nightmare, not a nightmare.', 'start': 1637.939, 'duration': 1.681}, {'end': 1641.581, 'text': "It'd just be annoying to use this column.", 'start': 1639.64, 'duration': 1.941}, {'end': 1646.403, 'text': "I mean, now that it's separated on the address and the city, it's so much more usable of data.", 'start': 1641.701, 'duration': 4.702}, {'end': 1648.004, 'text': 'I really, really is.', 'start': 1646.964, 'duration': 1.04}, {'end': 1651.946, 'text': "The next thing we're going to be looking at is this owner address.", 'start': 1649.345, 'duration': 2.601}, {'end': 1658.537, 'text': 'Now, it was hard enough or it was tough enough to do this.', 'start': 1652.046, 'duration': 6.491}], 'summary': 'Data is now more usable with split address and city, making processing easier.', 'duration': 28.201, 'max_score': 1630.336, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU1630336.jpg'}, {'end': 2123.73, 'src': 'embed', 'start': 2098.769, 'weight': 2, 'content': [{'end': 2104.694, 'text': "So let's change them to yes and no, because these are obviously the vastly more populated ones.", 'start': 2098.769, 'duration': 5.925}, {'end': 2107.656, 'text': "And we're just going to do this through a case statement.", 'start': 2105.715, 'duration': 1.941}, {'end': 2111.96, 'text': "So we're going to say, oh, yeah, let me get this ready before we start.", 'start': 2108.077, 'duration': 3.883}, {'end': 2113.922, 'text': "Oh, yeah, I'm ahead of the game now.", 'start': 2112.801, 'duration': 1.121}, {'end': 2117.605, 'text': "Let's do select and we'll do sold as vacant.", 'start': 2114.502, 'duration': 3.103}, {'end': 2119.646, 'text': "And then we'll start our case statement.", 'start': 2118.505, 'duration': 1.141}, {'end': 2122.128, 'text': "Yeah, let's do right here.", 'start': 2121.028, 'duration': 1.1}, {'end': 2123.73, 'text': "So we'll do case.", 'start': 2122.148, 'duration': 1.582}], 'summary': "Changing 'yes' and 'no' to 'sold' and 'vacant' using a case statement.", 'duration': 24.961, 'max_score': 2098.769, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU2098769.jpg'}, {'end': 2242.596, 'src': 'embed', 'start': 2215.063, 'weight': 1, 'content': [{'end': 2221.936, 'text': "And we'll say, Set, sorry, I'm talking faster than I'm going.", 'start': 2215.063, 'duration': 6.873}, {'end': 2226.702, 'text': 'Set sold as vacant equal to, and we can just literally put in this case statement.', 'start': 2222.437, 'duration': 4.265}, {'end': 2229.205, 'text': "It's not pretty, but let's try it.", 'start': 2227.523, 'duration': 1.682}, {'end': 2233.31, 'text': "Okay, now let's go look at this again and see if it made the update.", 'start': 2230.967, 'duration': 2.343}, {'end': 2234.813, 'text': 'There we go.', 'start': 2234.093, 'duration': 0.72}, {'end': 2236.334, 'text': 'The update statement worked.', 'start': 2234.833, 'duration': 1.501}, {'end': 2237.394, 'text': 'Oh, fantastic.', 'start': 2236.534, 'duration': 0.86}, {'end': 2238.174, 'text': "It's a beautiful thing.", 'start': 2237.414, 'duration': 0.76}, {'end': 2241.316, 'text': 'Okay Great.', 'start': 2239.675, 'duration': 1.641}, {'end': 2242.596, 'text': "I'm glad that one worked.", 'start': 2241.476, 'duration': 1.12}], 'summary': 'Successful update statement executed for set sold as vacant.', 'duration': 27.533, 'max_score': 2215.063, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU2215063.jpg'}], 'start': 1143.308, 'title': 'Property address and data transformation', 'summary': "Delves into the significance of using commas as delimiters for property address components, while also discussing the process of splitting and transforming data columns in a database table, resulting in more usable data for analysis, with 'no' and 'yes' being the most populated values for sold as vacant.", 'chapters': [{'end': 1197.271, 'start': 1143.308, 'title': 'Property address and delimiters', 'summary': 'Discusses the use of commas as delimiters to separate the property address components, highlighting their significance in data organization and the absence of additional commas within the address. it also touches upon the concept of delimiters in data processing.', 'duration': 53.963, 'highlights': ['The significance of commas as delimiters in separating property address components and their absence elsewhere in the address is highlighted for data organization. (Importance of delimiters in data organization)', 'The concept of delimiters and their role in separating different columns or values is briefly explained. (Introduction to delimiters in data processing)']}, {'end': 1510.746, 'start': 1199.321, 'title': 'Substring and char index in sql', 'summary': 'Covers the usage of substring and char index in sql to extract specific portions of a string, using examples and explanations to illustrate the process.', 'duration': 311.425, 'highlights': ['The usage of substring and char index in SQL to extract specific portions of a string is explained using examples and detailed step-by-step explanations, providing a comprehensive understanding of the process.', 'The process involves using the char index to search for a specific value within the string and then utilizing the substring function to extract the desired portion, with the demonstration of adjusting the position and length parameters for different string lengths.', 'An example of adjusting the position and length parameters for different string lengths is provided, showcasing the practical application of the substring and char index functions in SQL.']}, {'end': 2242.596, 'start': 1510.806, 'title': 'Data column splitting and transformation', 'summary': "Discusses the process of splitting and transforming data columns, such as property address, city, and state, and sold as vacant, in a database table, resulting in more usable data for analysis, with the most populated values being 'no' and 'yes' for sold as vacant.", 'duration': 731.79, 'highlights': ['The process of splitting and transforming data columns such as property address, city, and state, and sold as vacant in a database table is discussed. The speaker explains the process of splitting and transforming data columns, such as property address, city, and state, and sold as vacant, to make the data more usable for analysis.', "The most populated values in the sold as vacant column are 'no' and 'yes'. The sold as vacant column contains 51,000 instances of 'no' and almost 5,000 instances of 'yes', making them the most populated values in the column.", "An update statement successfully transforms the values in the sold as vacant column using a case statement. The update statement successfully transforms the values in the sold as vacant column using a case statement, resulting in a unique update and the most populated values 'no' and 'yes'."]}], 'duration': 1099.288, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU1143308.jpg', 'highlights': ['The process involves using the char index to search for a specific value within the string and then utilizing the substring function to extract the desired portion, with the demonstration of adjusting the position and length parameters for different string lengths.', "An update statement successfully transforms the values in the sold as vacant column using a case statement, resulting in a unique update and the most populated values 'no' and 'yes'.", "The most populated values in the sold as vacant column are 'no' and 'yes'. The sold as vacant column contains 51,000 instances of 'no' and almost 5,000 instances of 'yes', making them the most populated values in the column.", 'The process of splitting and transforming data columns such as property address, city, and state, and sold as vacant in a database table is discussed. The speaker explains the process of splitting and transforming data columns, such as property address, city, and state, and sold as vacant, to make the data more usable for analysis.', 'The significance of commas as delimiters in separating property address components and their absence elsewhere in the address is highlighted for data organization.']}, {'end': 2614.886, 'segs': [{'end': 2351.093, 'src': 'embed', 'start': 2320.451, 'weight': 0, 'content': [{'end': 2323.034, 'text': 'And oh my gosh, I was about to do it.', 'start': 2320.451, 'duration': 2.583}, {'end': 2328.76, 'text': "Somebody's out there just like waiting for me to make that mistake again.", 'start': 2324.335, 'duration': 4.425}, {'end': 2335.124, 'text': 'So we want to partition our data.', 'start': 2330.461, 'duration': 4.663}, {'end': 2339.526, 'text': "When you're doing removing duplicates.", 'start': 2336.845, 'duration': 2.681}, {'end': 2344.689, 'text': "we're going to have duplicate rows and we need to be able to have a way to identify those rows right?", 'start': 2339.526, 'duration': 5.163}, {'end': 2349.592, 'text': 'So you can use things like rank, order, rank row number.', 'start': 2345.329, 'duration': 4.263}, {'end': 2351.093, 'text': 'There are a few different options.', 'start': 2349.892, 'duration': 1.201}], 'summary': 'Data partitioning is essential for identifying and removing duplicates using methods like rank, order, and row number.', 'duration': 30.642, 'max_score': 2320.451, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU2320451.jpg'}], 'start': 2242.716, 'title': 'Removing duplicates in sql', 'summary': 'Provides a tutorial on using row number and partitioning by multiple columns to remove duplicates in sql, with a caution against deleting actual data.', 'chapters': [{'end': 2614.886, 'start': 2242.716, 'title': 'Removing duplicates in sql', 'summary': 'Provides a tutorial on removing duplicates in sql using row number and partitioning by multiple columns, with a caution against deleting actual data.', 'duration': 372.17, 'highlights': ['The chapter provides a tutorial on removing duplicates in SQL using row number and partitioning by multiple columns. The tutorial covers the use of row number and partitioning by multiple columns to identify and remove duplicate rows in SQL.', 'A caution is given against deleting actual data, but provides a demonstration on how to find and remove duplicate values. The speaker emphasizes that deleting actual data is not a standard practice, but demonstrates the process of identifying and removing duplicate values using row number and partitioning.', 'The speaker mentions using row number for its simplicity and effectiveness in identifying and removing duplicate rows. The speaker explains the use of row number as the chosen method for identifying and removing duplicate rows due to its simplicity and effectiveness.']}], 'duration': 372.17, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU2242716.jpg', 'highlights': ['The tutorial covers the use of row number and partitioning by multiple columns to identify and remove duplicate rows in SQL.', 'The speaker emphasizes that deleting actual data is not a standard practice, but demonstrates the process of identifying and removing duplicate values using row number and partitioning.', 'The speaker explains the use of row number as the chosen method for identifying and removing duplicate rows due to its simplicity and effectiveness.']}, {'end': 3271.22, 'segs': [{'end': 2647.159, 'src': 'embed', 'start': 2615.386, 'weight': 1, 'content': [{'end': 2617.006, 'text': "I don't know why it's okay.", 'start': 2615.386, 'duration': 1.62}, {'end': 2623.729, 'text': "So let's see, let's look at these two and let's see if I did something wrong.", 'start': 2617.246, 'duration': 6.483}, {'end': 2628.173, 'text': "Oops Don't need to pull that up.", 'start': 2623.749, 'duration': 4.424}, {'end': 2633.035, 'text': "I was doing some research when I, when the convert by wasn't working.", 'start': 2630.414, 'duration': 2.621}, {'end': 2635.535, 'text': 'Um, okay.', 'start': 2634.055, 'duration': 1.48}, {'end': 2638.816, 'text': "So this one in this one, it's giving different row numbers.", 'start': 2635.575, 'duration': 3.241}, {'end': 2644.298, 'text': "So let's look at the actual data, ignore the unique ID, but the data itself.", 'start': 2640.237, 'duration': 4.061}, {'end': 2647.159, 'text': 'So the, the sale date is the same.', 'start': 2645.178, 'duration': 1.981}], 'summary': 'Troubleshooting research issue with convert by function and data consistency.', 'duration': 31.773, 'max_score': 2615.386, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU2615386.jpg'}, {'end': 2826.705, 'src': 'heatmap', 'start': 2769.355, 'weight': 3, 'content': [{'end': 2772.097, 'text': 'Okay, so all of these are duplicates.', 'start': 2769.355, 'duration': 2.742}, {'end': 2773.898, 'text': 'We have 104 of them, it looks like.', 'start': 2772.117, 'duration': 1.781}, {'end': 2779.301, 'text': "So there's not many, but there's twos, I think threes, no threes.", 'start': 2774.158, 'duration': 5.143}, {'end': 2787.967, 'text': "There's multiple of these rows or columns that are basically duplicates and we want to delete them.", 'start': 2780.56, 'duration': 7.407}, {'end': 2797.815, 'text': "So all we're going to say is we're going to select instead of saying select everything from row, we're just going to say delete.", 'start': 2788.467, 'duration': 9.348}, {'end': 2802.58, 'text': "And yeah, I got to get rid of that order by that doesn't work.", 'start': 2799.417, 'duration': 3.163}, {'end': 2804.001, 'text': "And let's do this.", 'start': 2803.32, 'duration': 0.681}, {'end': 2808.791, 'text': "There's 104, let's see if it worked.", 'start': 2807.25, 'duration': 1.541}, {'end': 2814.756, 'text': "So now let's do, let's go back and we'll say select everything and let's see if there's any more duplicates in there.", 'start': 2809.312, 'duration': 5.444}, {'end': 2816.497, 'text': 'There are none.', 'start': 2815.877, 'duration': 0.62}, {'end': 2817.678, 'text': 'That is fantastic.', 'start': 2816.817, 'duration': 0.861}, {'end': 2820.48, 'text': "I'm like biting my nails now to see if each one of these works.", 'start': 2817.758, 'duration': 2.722}, {'end': 2823.743, 'text': "Cause I, that first one didn't work.", 'start': 2821.941, 'duration': 1.802}, {'end': 2826.705, 'text': 'So yeah, so it worked.', 'start': 2825.564, 'duration': 1.141}], 'summary': 'Identified and deleted 104 duplicates, ensuring no remaining duplicates.', 'duration': 28.46, 'max_score': 2769.355, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU2769355.jpg'}, {'end': 2858.987, 'src': 'embed', 'start': 2825.564, 'weight': 0, 'content': [{'end': 2826.705, 'text': 'So yeah, so it worked.', 'start': 2825.564, 'duration': 1.141}, {'end': 2827.766, 'text': 'We got rid of the duplicates.', 'start': 2826.865, 'duration': 0.901}, {'end': 2828.747, 'text': 'That is fantastic.', 'start': 2827.846, 'duration': 0.901}, {'end': 2835.352, 'text': "And now it's smooth sailing from here cause we're just gonna delete some unused columns that we don't care about.", 'start': 2828.907, 'duration': 6.445}, {'end': 2837.042, 'text': "This doesn't happen often.", 'start': 2835.762, 'duration': 1.28}, {'end': 2841.843, 'text': 'This, I would say actually happens more in like views.', 'start': 2838.703, 'duration': 3.14}, {'end': 2846.224, 'text': "When I'm creating views and I have a view and I'm like, oh, I didn't mean to add that column.", 'start': 2842.463, 'duration': 3.761}, {'end': 2848.305, 'text': "Let me just remove it because it's a, I don't need it.", 'start': 2846.264, 'duration': 2.041}, {'end': 2853.206, 'text': "You don't do this to like the raw data that you import.", 'start': 2848.805, 'duration': 4.401}, {'end': 2858.987, 'text': "Usually this is, I mean, again, best practices, please don't do this to your raw data that comes into your database.", 'start': 2853.866, 'duration': 5.121}], 'summary': 'Duplicates removed, unused columns deleted, improving data quality.', 'duration': 33.423, 'max_score': 2825.564, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU2825564.jpg'}, {'end': 3030.865, 'src': 'embed', 'start': 3002.623, 'weight': 2, 'content': [{'end': 3008.728, 'text': "And it may not have felt like that as we were going through, because I wasn't really looking at the cleaning data.", 'start': 3002.623, 'duration': 6.105}, {'end': 3015.697, 'text': 'We were cleaning it, but what was the purpose of it? I may not have highlighted that too much.', 'start': 3010.269, 'duration': 5.428}, {'end': 3022.281, 'text': "All these other columns that we created are just, it's much more usable, much more friendly.", 'start': 3016.177, 'duration': 6.104}, {'end': 3024.622, 'text': 'This is standardized now.', 'start': 3023.321, 'duration': 1.301}, {'end': 3030.865, 'text': 'And we did that through quite a few various methods.', 'start': 3025.542, 'duration': 5.323}], 'summary': 'Data cleaning improved usability and standardization using various methods.', 'duration': 28.242, 'max_score': 3002.623, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU3002623.jpg'}], 'start': 2615.386, 'title': 'Data management with sql', 'summary': 'Covers data comparison and deletion, involving identifying and resolving discrepancies, and cleaning data using sql. it includes removing duplicates resulting in 104 rows being deleted and altering table columns to streamline and standardize the dataset.', 'chapters': [{'end': 2674.32, 'start': 2615.386, 'title': 'Data comparison and deletion', 'summary': 'Involves identifying and resolving data discrepancies, where a query will be written to delete redundant data based on multiple matching fields, ensuring only unique records are retained.', 'duration': 58.934, 'highlights': ['A query will be written to delete redundant data based on multiple matching fields, ensuring only unique records are retained.', "Identifying data discrepancies based on matching fields such as sale date, sale price, legal reference, and owner's information."]}, {'end': 3271.22, 'start': 2674.34, 'title': 'Data cleaning with sql', 'summary': 'Demonstrates data cleaning using sql, including removing duplicates resulting in 104 rows being deleted, and the alteration of table columns to streamline data, leading to a more usable and standardized dataset.', 'duration': 596.88, 'highlights': ['The chapter demonstrates data cleaning using SQL, including removing duplicates resulting in 104 rows being deleted.', 'The alteration of table columns to streamline data, leading to a more usable and standardized dataset.', 'The use of various SQL functions and methods such as CTEs, row number, Windows function, and alter table to achieve data cleaning and standardization.']}], 'duration': 655.834, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/8rO7ztF4NtU/pics/8rO7ztF4NtU2615386.jpg', 'highlights': ['A query will be written to delete redundant data based on multiple matching fields, ensuring only unique records are retained.', "Identifying data discrepancies based on matching fields such as sale date, sale price, legal reference, and owner's information.", 'The use of various SQL functions and methods such as CTEs, row number, Windows function, and alter table to achieve data cleaning and standardization.', 'The chapter demonstrates data cleaning using SQL, including removing duplicates resulting in 104 rows being deleted.', 'The alteration of table columns to streamline data, leading to a more usable and standardized dataset.']}], 'highlights': ['The chapter provides step-by-step guidance on importing Nashville housing data into SQL Server, emphasizing the importance of this process for the data cleaning project.', 'The process of standardizing date format and populating property and owner addresses is detailed, with a focus on handling over 56,000 rows of data.', 'The tutorial covers the use of row number and partitioning by multiple columns to identify and remove duplicate rows in SQL.', 'A query will be written to delete redundant data based on multiple matching fields, ensuring only unique records are retained.', 'The chapter demonstrates data cleaning using SQL, including removing duplicates resulting in 104 rows being deleted.']}