title
I analyzed 2,050,302 jobs to solve THIS

description
πŸ“² The Data Nerd App πŸ‘‰ https://datanerd.tech Courses for Data Nerds ================================== πŸ“œ Google Data Analytics Certificate (START HERE) πŸ‘‰πŸΌ https://lukeb.co/GoogleCert 🐍 Python for Everybody πŸ‘‰πŸΌ https://lukeb.co/PythonForEverybody πŸ’Ώ SQL for Data Science πŸ‘‰πŸΌ https://lukeb.co/SQLdataScience 🧾 Excel Skills for Business πŸ‘‰πŸΌΒ  https://lukeb.co/ExcelBusinessAnalyst πŸ“Š Data Visualization with Tableau πŸ‘‰πŸΌΒ https://lukeb.co/Tableau_UCDavis πŸ΄β€β˜ οΈ Data Science: Foundations using R πŸ‘‰πŸΌ https://lukeb.co/RforDataScienceJH βž• Coursera Plus Subscription (7-day free trial) πŸ‘‰πŸΌ https://lukeb.co/CourseraPlus πŸ‘¨πŸΌβ€πŸ« All courses πŸ‘‰πŸΌ https://kit.co/lukebarousse/data-analytics-courses Books for Data Nerds ================================== πŸ“š Books I’ve read πŸ‘‰πŸΌ https://kit.co/lukebarousse/book-recommendations πŸ“— Data Analyst Must Read πŸ‘‰πŸΌ https://geni.us/StorytellingWithData Tech for Data Nerds ================================== βš™οΈ Tech I use πŸ‘‰πŸΌ https://kit.co/lukebarousse/computer-accessories πŸͺŸWindows Virtual Machine for Mac (Parallels) πŸ‘‰πŸΌ https://lukeb.co/ParallelsFreeTrial Social Media / Contact Me ====================== πŸ‘Ύ r/DataNerd πŸ‘‰πŸΌ https://www.reddit.com/r/DataNerd/ πŸŒ„ Instagram: https://www.instagram.com/lukebarousse/ ⏰ TikTok: https://www.tiktok.com/@lukebarousse πŸ“˜ Facebook: https://www.facebook.com/datavizbyluke πŸ™‹πŸΌβ€β™‚οΈNewsletter: https://www.lukebarousse.com/ As an Amazon, Coursera, and Parallels Affiliate Programs member, I earn a commission from qualifying purchases on the links above. It costs you nothing but helps me with content creation. #datanerd #dataanalyst #datascience

detail
{'title': 'I analyzed 2,050,302 jobs to solve THIS', 'heatmap': [{'end': 112.519, 'start': 94.283, 'weight': 1}, {'end': 506.935, 'start': 483.381, 'weight': 0.915}, {'end': 838.554, 'start': 782.887, 'weight': 0.777}, {'end': 865.456, 'start': 850.784, 'weight': 0.815}], 'summary': 'Analyzing 2,050,302 jobs, the video uncovers the misalignment between recommended data analyst skills and job postings, emphasizes the need for an updated dataset beyond the us, explores data engineering trends, and discusses the use of nlp and apache spark for data analysis.', 'chapters': [{'end': 241.871, 'segs': [{'end': 55.512, 'src': 'embed', 'start': 0.429, 'weight': 0, 'content': [{'end': 4.072, 'text': 'Data nerds, I found a pretty big problem in the data science industry.', 'start': 0.429, 'duration': 3.643}, {'end': 5.793, 'text': 'And well, let me show you.', 'start': 4.312, 'duration': 1.481}, {'end': 12.738, 'text': 'In my last video, I built an app that analyzed data analyst job posts in the United States for top skills and salary.', 'start': 6.433, 'duration': 6.305}, {'end': 18.362, 'text': 'With the app, data analysts can focus on learning top skills like SQL and Excel as their most common in job postings.', 'start': 12.758, 'duration': 5.604}, {'end': 25.166, 'text': 'So after building this, I was curious, how do these skills stack up to what the internet is suggesting? And well, I was in for an awakening.', 'start': 18.582, 'duration': 6.584}, {'end': 30.129, 'text': "Some sites were recommending outdated skills that weren't even close to being in my top 10.", 'start': 25.326, 'duration': 4.803}, {'end': 34.592, 'text': 'Others were suggesting a skill was number one, while conveniently also selling you this skill.', 'start': 30.129, 'duration': 4.463}, {'end': 39.775, 'text': 'Those with access to the most valuable insights provided skills that could be applied to any job.', 'start': 34.912, 'duration': 4.863}, {'end': 44.24, 'text': 'And although a lot of the sites did have skills that matched up with the job posting data,', 'start': 39.955, 'duration': 4.285}, {'end': 48.324, 'text': 'none of these sites had any sort of data to back up their claim for this.', 'start': 44.24, 'duration': 4.084}, {'end': 48.965, 'text': 'Hold up.', 'start': 48.624, 'duration': 0.341}, {'end': 49.906, 'text': 'Stop the music.', 'start': 49.305, 'duration': 0.601}, {'end': 55.512, 'text': 'How can a site recommend a top skill to a data analyst without providing any evidence to that claim?', 'start': 50.146, 'duration': 5.366}], 'summary': 'Analyzing top skills for data analysts in the us reveals discrepancies in skill recommendations and lack of evidence to support them.', 'duration': 55.083, 'max_score': 0.429, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps429.jpg'}, {'end': 118.744, 'src': 'heatmap', 'start': 90.379, 'weight': 3, 'content': [{'end': 94.243, 'text': "that's a survey done by the popular developer site Stack Overflow.", 'start': 90.379, 'duration': 3.864}, {'end': 101.009, 'text': "Now, if you're not familiar with Stack Overflow, it's a site that's primarily used to get you help with popular tools like Python SQL, surprisingly,", 'start': 94.283, 'duration': 6.726}, {'end': 101.489, 'text': 'even Excel.', 'start': 101.009, 'duration': 0.48}, {'end': 107.815, 'text': 'And they poll their users annually to find the most popular skills like programming, SQL databases, and cloud platforms.', 'start': 101.529, 'duration': 6.286}, {'end': 112.519, 'text': 'along with going as far as telling you salary for top jobs in the developer industry.', 'start': 107.975, 'duration': 4.544}, {'end': 118.744, 'text': 'And this survey is extremely valuable for aspiring developers in order to identify the skills that they need to know.', 'start': 112.719, 'duration': 6.025}], 'summary': 'Stack overflow conducts an annual survey on popular skills and salaries for developers, including programming, sql, and cloud platforms.', 'duration': 28.365, 'max_score': 90.379, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps90379.jpg'}, {'end': 151.047, 'src': 'embed', 'start': 125.99, 'weight': 4, 'content': [{'end': 133.757, 'text': "But as great as the survey is for developers, it's not so good for data nerds, who comprise less than 15% of the respondents of this survey.", 'start': 125.99, 'duration': 7.767}, {'end': 139.3, 'text': "And because of this low percentage, it's hard for data nerds to extract value of what skills they should be learning.", 'start': 133.857, 'duration': 5.443}, {'end': 142.142, 'text': 'So what about that previous app that I built for my subscribers?', 'start': 139.38, 'duration': 2.762}, {'end': 146.964, 'text': 'Well, the first major problem is that this data is only for data analysts in the United States.', 'start': 142.342, 'duration': 4.622}, {'end': 151.047, 'text': "And my subscribers aren't just data analysts and are from around the world.", 'start': 147.425, 'duration': 3.622}], 'summary': 'Less than 15% of survey respondents are data nerds, hindering skill learning and global insights.', 'duration': 25.057, 'max_score': 125.99, 'thumbnail': ''}, {'end': 205.752, 'src': 'embed', 'start': 179.006, 'weight': 5, 'content': [{'end': 184.628, 'text': "So let's get into solving that first problem of collecting data beyond just data analysts in the United States.", 'start': 179.006, 'duration': 5.622}, {'end': 187.508, 'text': "And we're going to be still following a similar approach that I did before,", 'start': 184.668, 'duration': 2.84}, {'end': 193.609, 'text': 'which is using Python to connect to an API in order to extract this data into a database,', 'start': 187.508, 'duration': 6.101}, {'end': 196.85, 'text': 'and specifically using a service called SERP API to handle this.', 'start': 193.609, 'duration': 3.241}, {'end': 200.651, 'text': "Typically, if you're trying to scrape this data from the website, they don't like this.", 'start': 196.97, 'duration': 3.681}, {'end': 205.752, 'text': "They're going to use methods to block you, such as those captures that even humans can't solve sometimes.", 'start': 201.071, 'duration': 4.681}], 'summary': 'Using python to collect data from serp api to overcome website scraping challenges.', 'duration': 26.746, 'max_score': 179.006, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps179006.jpg'}], 'start': 0.429, 'title': 'Data science industry skills', 'summary': 'Reveals the discrepancy between recommended data analyst skills and actual job postings, emphasizing the potential harm caused by outdated recommendations. it also discusses the annual developer survey by stack overflow, highlighting popular skills and the need for a new solution to collect a larger dataset beyond the united states.', 'chapters': [{'end': 90.379, 'start': 0.429, 'title': 'Data science industry skill discrepancy', 'summary': 'Reveals the discrepancy between recommended data analyst skills and actual job postings, highlighting the lack of evidence and potential harm caused by outdated recommendations.', 'duration': 89.95, 'highlights': ['The app analyzed data analyst job posts in the United States for top skills and found that SQL and Excel are the most common skills in job postings.', "Some sites were recommending outdated skills that weren't even close to being in the top 10, leading to potential harm for individuals trying to learn unnecessary tools.", 'The lack of evidence to support recommended skills is highlighted, questioning how a site can recommend a top skill to a data analyst without providing any data to back up their claim.']}, {'end': 241.871, 'start': 90.379, 'title': 'Developer survey and data analysis', 'summary': "Discusses the annual developer survey by stack overflow, highlighting the most popular skills like programming, sql databases, and cloud platforms, and emphasizes the importance of the survey for aspiring developers. however, the survey's limited representation of data analysts poses challenges, prompting the need for a new solution involving python and serp api to collect a larger dataset beyond the united states.", 'duration': 151.492, 'highlights': ['The annual developer survey by Stack Overflow identifies the most popular skills like programming, SQL databases, and cloud platforms, and is valuable for aspiring developers. The survey conducted by Stack Overflow polls its users annually to find the most popular skills like programming, SQL databases, and cloud platforms.', "The survey's limited representation of data analysts, comprising less than 15% of the respondents, poses challenges for data nerds in extracting valuable skills, prompting the need for a new solution. The survey is not favorable for data nerds, as they comprise less than 15% of the respondents, making it hard for them to extract value of what skills they should be learning.", 'The need to collect data beyond just data analysts in the United States prompts the utilization of Python and SERP API to extract a larger dataset. The chapter discusses the need to collect data beyond just data analysts in the United States, prompting the use of Python to connect to an API and specifically using a service called SERP API to handle this.']}], 'duration': 241.442, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps429.jpg', 'highlights': ['SQL and Excel are the most common skills in data analyst job postings in the United States.', 'Outdated skills recommendations can potentially harm individuals trying to learn unnecessary tools.', 'The lack of evidence to support recommended skills questions the credibility of the recommendations.', 'The annual developer survey by Stack Overflow identifies popular skills like programming, SQL databases, and cloud platforms.', "The survey's limited representation of data analysts, comprising less than 15% of the respondents, poses challenges for data nerds in extracting valuable skills.", 'The need to collect data beyond just data analysts in the United States prompts the utilization of Python and SERP API to extract a larger dataset.']}, {'end': 511.836, 'segs': [{'end': 312.33, 'src': 'embed', 'start': 284.147, 'weight': 3, 'content': [{'end': 289.171, 'text': 'I would continue to use Python to call SERP API and get this data into our database.', 'start': 284.147, 'duration': 5.024}, {'end': 293.254, 'text': "For this, I'm using a popular cloud-based solution of BigQuery from Google.", 'start': 289.231, 'duration': 4.023}, {'end': 297.658, 'text': "Now these results from SERP API come in a JSON file, which isn't very usable.", 'start': 293.314, 'duration': 4.344}, {'end': 300.26, 'text': 'So then I could use SQL to unpack all of this data.', 'start': 297.718, 'duration': 2.542}, {'end': 305.264, 'text': 'Once we have this cleaned up in our data warehouse, we can then use our app to connect to this.', 'start': 300.8, 'duration': 4.464}, {'end': 312.33, 'text': "Now to make sure we were collecting job data daily and also cleaning it up, I'd use a popular data pipeline scheduler called Airflow.", 'start': 305.584, 'duration': 6.746}], 'summary': 'Python calls serp api, data in bigquery, sql unpacking, airflow for daily job data collection.', 'duration': 28.183, 'max_score': 284.147, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps284147.jpg'}, {'end': 405.037, 'src': 'embed', 'start': 379.047, 'weight': 0, 'content': [{'end': 385.41, 'text': 'Querying it, we can see we have around 380, 000 jobs, which may be different from what you see down here in the title.', 'start': 379.047, 'duration': 6.363}, {'end': 391.052, 'text': 'Future Luke is supposed to be automating this title update so that way it matches the number of jobs in our database.', 'start': 385.45, 'duration': 5.602}, {'end': 393.473, 'text': "Next, let's look at where all these job postings are coming from.", 'start': 391.132, 'duration': 2.341}, {'end': 400.015, 'text': 'And it looks like an overwhelming majority are from LinkedIn, which also checks with what the majority of my subscribers say they use.', 'start': 393.553, 'duration': 6.462}, {'end': 401.836, 'text': "So let's do some comparisons of these postings.", 'start': 400.115, 'duration': 1.721}, {'end': 405.037, 'text': "I'm curious to find out what is the most in demand job right now.", 'start': 402.356, 'duration': 2.681}], 'summary': 'Around 380,000 jobs in the database, majority from linkedin, aiming for automated title update.', 'duration': 25.99, 'max_score': 379.047, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps379047.jpg'}, {'end': 506.935, 'src': 'heatmap', 'start': 464.388, 'weight': 1, 'content': [{'end': 469.149, 'text': 'With IT and data processing being one of the most common industries for career transitioners.', 'start': 464.388, 'duration': 4.761}, {'end': 474.173, 'text': 'Lucky for us, the data set includes information on whether a degree is mentioned in the job posting.', 'start': 469.349, 'duration': 4.824}, {'end': 481.499, 'text': 'And looking at it right here, we can see that for every one in three job postings for data engineers and analysts, they have no mention of a degree.', 'start': 474.653, 'duration': 6.846}, {'end': 483.14, 'text': 'Which I think is a pretty high number.', 'start': 481.519, 'duration': 1.621}, {'end': 486.944, 'text': 'Unfortunately for data scientists, this was only found in a severely low 7%.', 'start': 483.381, 'duration': 3.563}, {'end': 488.605, 'text': 'All right, speed round.', 'start': 486.944, 'duration': 1.661}, {'end': 490.066, 'text': "Here's some other interesting insights.", 'start': 488.725, 'duration': 1.341}, {'end': 496.111, 'text': 'Less than 10% of job postings are flagged for remote work, with data analysts at the lowest around 6%.', 'start': 490.146, 'duration': 5.965}, {'end': 498.453, 'text': 'For job locations, we have an assortment from around the world.', 'start': 496.111, 'duration': 2.342}, {'end': 503.154, 'text': 'with anywhere being the highest, which correlates to all those remote work jobs that we previously found.', 'start': 498.653, 'duration': 4.501}, {'end': 506.935, 'text': "Finally, for the different type of jobs offered, it seems like it's not even close,", 'start': 503.194, 'duration': 3.741}], 'summary': "Around 33% of data engineering and analyst jobs don't require a degree, while only 7% of data scientist roles have no degree requirement. less than 10% of jobs offer remote work, with data analysts at the lowest around 6%.", 'duration': 34.065, 'max_score': 464.388, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps464388.jpg'}], 'start': 242.111, 'title': 'Data engineering and job trends', 'summary': 'Covers building a data pipeline to collect and clean job data from various locations using python, serp api, bigquery, and airflow, resulting in a collection of around 380,000 job postings. it also explores the current trends in data job postings, revealing the shift in demand from data scientists to data engineers, the impact of degrees on job requirements, and insights on remote work and job types.', 'chapters': [{'end': 393.473, 'start': 242.111, 'title': 'Data engineering for job data pipeline', 'summary': 'Covers building a data pipeline to collect and clean job data from various locations using python, serp api, bigquery, and airflow, resulting in a collection of around 380,000 job postings.', 'duration': 151.362, 'highlights': ['Building a data pipeline to collect and clean job data The project involves building a data pipeline using Python, SERP API, BigQuery, and Airflow to collect and clean job data from various locations.', 'Collecting around 380,000 job postings The data pipeline results in the collection of approximately 380,000 job postings from different search locations and job titles.', 'Using Python, SERP API, BigQuery, and Airflow for data pipeline Python, SERP API, BigQuery, and Airflow are used to collect and clean job data, with Python for scripting, SERP API for data extraction, BigQuery for storage, and Airflow for scheduling.']}, {'end': 511.836, 'start': 393.553, 'title': 'Data job trends and insights', 'summary': 'Explores the current trends in data job postings, revealing the shift in demand from data scientists to data engineers, the impact of degrees on job requirements, and insights on remote work and job types.', 'duration': 118.283, 'highlights': ['Data engineers are in overwhelming demand, surpassing data scientists, with approximately 1 in 3 job postings not mentioning a degree requirement. Data engineers have an overwhelming majority in job postings, with approximately 1 in 3 job postings not mentioning a degree requirement, indicating a high demand for this role.', 'The prevalence of remote work in data job postings is relatively low, with less than 10% of job postings flagged for remote work, and data analysts having the lowest percentage at around 6%. The prevalence of remote work in data job postings is relatively low, with less than 10% of job postings flagged for remote work, and data analysts having the lowest percentage at around 6%, indicating a limited number of remote work opportunities in this field.', 'Most job opportunities in the data industry are focused on full-time positions, indicating a predominant trend towards traditional employment. Most job opportunities in the data industry are focused on full-time positions, indicating a predominant trend towards traditional employment in this field.']}], 'duration': 269.725, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps242111.jpg', 'highlights': ['Collecting around 380,000 job postings from different search locations and job titles.', 'Data engineers have an overwhelming majority in job postings, with approximately 1 in 3 job postings not mentioning a degree requirement, indicating a high demand for this role.', 'Most job opportunities in the data industry are focused on full-time positions, indicating a predominant trend towards traditional employment in this field.', 'Building a data pipeline using Python, SERP API, BigQuery, and Airflow to collect and clean job data from various locations.', 'Python, SERP API, BigQuery, and Airflow are used to collect and clean job data, with Python for scripting, SERP API for data extraction, BigQuery for storage, and Airflow for scheduling.', 'The prevalence of remote work in data job postings is relatively low, with less than 10% of job postings flagged for remote work, and data analysts having the lowest percentage at around 6%, indicating a limited number of remote work opportunities in this field.']}, {'end': 917.778, 'segs': [{'end': 555.665, 'src': 'embed', 'start': 525.962, 'weight': 0, 'content': [{'end': 529.925, 'text': "Sometimes it's yearly, other times it's hourly, sometimes it's a range, sometimes it's not.", 'start': 525.962, 'duration': 3.963}, {'end': 532.487, 'text': "For the skills, they're buried deep in the job descriptions.", 'start': 530.005, 'duration': 2.482}, {'end': 535.029, 'text': 'So we need to develop a way to extract these keywords out.', 'start': 532.787, 'duration': 2.242}, {'end': 540.694, 'text': 'Both of these issues to fix will be classified under NLP or natural language processing.', 'start': 535.229, 'duration': 5.465}, {'end': 543.58, 'text': 'Wrong way, natural language processing.', 'start': 541.479, 'duration': 2.101}, {'end': 544.56, 'text': 'I never know which way to go.', 'start': 543.7, 'duration': 0.86}, {'end': 547.321, 'text': "Simply put, it's a way for computers to process human language.", 'start': 544.64, 'duration': 2.681}, {'end': 550.143, 'text': "We've come a long way with NLP as witnessed by ChatGBT.", 'start': 547.341, 'duration': 2.802}, {'end': 555.665, 'text': 'So what tools should we use for this processing? Well, SQL is a little too structured for what we need.', 'start': 550.223, 'duration': 5.442}], 'summary': 'Develop nlp solution to extract keywords from job descriptions, using tools other than sql.', 'duration': 29.703, 'max_score': 525.962, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps525962.jpg'}, {'end': 589.821, 'src': 'embed', 'start': 564.509, 'weight': 4, 'content': [{'end': 570.592, 'text': "What happens if we used multiple computers? It turns out there's a tool specifically designed for this called Apache Spark.", 'start': 564.509, 'duration': 6.083}, {'end': 574.934, 'text': 'Now, when a pack of wild computers band together, they form a Spark cluster.', 'start': 570.652, 'duration': 4.282}, {'end': 576.374, 'text': 'Look how cute this little guy is.', 'start': 574.974, 'duration': 1.4}, {'end': 579.036, 'text': 'And these guys are great at fighting against big data.', 'start': 576.414, 'duration': 2.622}, {'end': 581.977, 'text': 'Now, how and where do you even run these Spark clusters?', 'start': 579.436, 'duration': 2.541}, {'end': 587.52, 'text': "Well, most all cloud providers offer this and they're more than happy to take your money to offer this service.", 'start': 582.057, 'duration': 5.463}, {'end': 589.821, 'text': 'Now, the framework for Apache Spark is written in Skype.', 'start': 587.56, 'duration': 2.261}], 'summary': 'Apache spark enables forming clusters of computers to process big data, available on most cloud providers.', 'duration': 25.312, 'max_score': 564.509, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps564509.jpg'}, {'end': 681.606, 'src': 'embed', 'start': 647.74, 'weight': 1, 'content': [{'end': 654.366, 'text': "but in the future we're probably going to have to develop some sort of machine learning algorithm to extract these keywords in a better method.", 'start': 647.74, 'duration': 6.626}, {'end': 656.248, 'text': "So now we're finally done with the cleaning.", 'start': 654.406, 'duration': 1.842}, {'end': 662.754, 'text': 'We use a Spark cluster to not only generate all that clean salary data, but also to extract all those skills for each job posting.', 'start': 656.268, 'duration': 6.486}, {'end': 663.074, 'text': 'All right.', 'start': 662.834, 'duration': 0.24}, {'end': 667.097, 'text': "So now let's jump into exploring those newly cleaned salary and skill columns.", 'start': 663.154, 'duration': 3.943}, {'end': 670.519, 'text': "First, let's look at the average salary by the different job search terms.", 'start': 667.177, 'duration': 3.342}, {'end': 673.261, 'text': 'Now I have a bad feeling about using an average for this.', 'start': 670.599, 'duration': 2.662}, {'end': 680.285, 'text': 'You see, recently there have been laws imposed in states like New York and California that require salaries be listed on job postings.', 'start': 673.381, 'duration': 6.904}, {'end': 681.606, 'text': "There's just one big problem.", 'start': 680.365, 'duration': 1.241}], 'summary': 'Using a spark cluster, they extracted clean salary data and skills from job postings, and are now exploring the data, but face issues due to new laws in new york and california.', 'duration': 33.866, 'max_score': 647.74, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps647740.jpg'}, {'end': 789.608, 'src': 'embed', 'start': 760.595, 'weight': 2, 'content': [{'end': 764.899, 'text': 'But instead of using Python on a single computer to aggregate all this data on the front end,', 'start': 760.595, 'duration': 4.304}, {'end': 770.423, 'text': "we're now using the power of multiple computers to use SQL and PySpark to aggregate this data on the back end.", 'start': 764.899, 'duration': 5.524}, {'end': 771.363, 'text': 'Basically that long,', 'start': 770.523, 'duration': 0.84}, {'end': 777.685, 'text': 'extravagant storyline of how we cleaned up the data was done in order to provide this in an easily accessible manner via this app,', 'start': 771.363, 'duration': 6.322}, {'end': 780.186, 'text': 'which you can access via datanerd.tech.', 'start': 777.685, 'duration': 2.501}, {'end': 782.826, 'text': 'You can access it via your phone or even a web browser.', 'start': 780.586, 'duration': 2.24}, {'end': 789.608, 'text': "So how should this be used? Well, let's say you're an aspiring data nerd and curious about the skills you should be focused on first to land a job.", 'start': 782.887, 'duration': 6.721}], 'summary': 'Data aggregation using sql and pyspark on multiple computers for app access at datanerd.tech.', 'duration': 29.013, 'max_score': 760.595, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps760595.jpg'}, {'end': 838.554, 'src': 'heatmap', 'start': 782.887, 'weight': 0.777, 'content': [{'end': 789.608, 'text': "So how should this be used? Well, let's say you're an aspiring data nerd and curious about the skills you should be focused on first to land a job.", 'start': 782.887, 'duration': 6.721}, {'end': 795.21, 'text': 'With the app, you can get real-time insights into what the top skills are being requested in job postings today.', 'start': 789.668, 'duration': 5.542}, {'end': 796.43, 'text': 'But this is for all data nerds.', 'start': 795.27, 'duration': 1.16}, {'end': 799.012, 'text': 'We can actually filter down further based on a job title.', 'start': 796.551, 'duration': 2.461}, {'end': 800.672, 'text': "Let's look at data engineers first.", 'start': 799.132, 'duration': 1.54}, {'end': 805.114, 'text': 'For them, we can see SQL and Python are most important, along with cloud technologies.', 'start': 800.732, 'duration': 4.382}, {'end': 811.376, 'text': 'Looking at data scientists next, we can see that Python and SQL are still important, but so are other tools such as R and Tableau.', 'start': 805.234, 'duration': 6.142}, {'end': 812.897, 'text': 'Oh yeah, about this percentage.', 'start': 811.536, 'duration': 1.361}, {'end': 819.2, 'text': 'So out of 22, 000 job postings for data scientists, 17, 000 list the skill of Python.', 'start': 812.977, 'duration': 6.223}, {'end': 822.881, 'text': 'Or basically, three out of four data scientist jobs are requesting this.', 'start': 819.34, 'duration': 3.541}, {'end': 829.366, 'text': 'Finally, with data analysts, we can see that clearly SQL is the most important, followed by spreadsheets, programming languages, and Viz tools.', 'start': 823.021, 'duration': 6.345}, {'end': 834.29, 'text': 'We can even filter this further to see how things like languages or even cloud technologies compare.', 'start': 829.446, 'duration': 4.844}, {'end': 838.554, 'text': "Now remember, we do have all of that salary data and because it's linked to these skills,", 'start': 834.39, 'duration': 4.164}], 'summary': 'Real-time insights on top skills for data jobs: 17,000/22,000 data scientist jobs require python.', 'duration': 55.667, 'max_score': 782.887, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps782887.jpg'}, {'end': 878.666, 'src': 'heatmap', 'start': 850.784, 'weight': 0.815, 'content': [{'end': 854.046, 'text': 'We can look at languages, which we use SQL and Python paying around $90, 000.', 'start': 850.784, 'duration': 3.262}, {'end': 860.691, 'text': 'For cloud technologies, we use Google Cloud and specifically BigQuery, which is around $111, 000.', 'start': 854.047, 'duration': 6.644}, {'end': 865.456, 'text': 'For libraries, we use both Airflow and Spark, which have some of the highest and also lowest salaries in this one.', 'start': 860.692, 'duration': 4.764}, {'end': 870.56, 'text': 'So based on these skills that I know and used, I feel I can get a better representation of what my potential salary could be.', 'start': 865.496, 'duration': 5.064}, {'end': 876.084, 'text': "Now, I obviously didn't leave out a salary comparator between all the different job titles, so that's available as well.", 'start': 870.7, 'duration': 5.384}, {'end': 878.666, 'text': 'And you can see both the annual and hourly rate.', 'start': 876.124, 'duration': 2.542}], 'summary': 'Proficient in sql, python, google cloud, bigquery, airflow, and spark, with potential salary around $90,000-$111,000.', 'duration': 27.882, 'max_score': 850.784, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps850784.jpg'}], 'start': 511.836, 'title': 'Data analysis and nlp', 'summary': 'Discusses challenges in extracting salary and skills data, introduces nlp as a solution, compares limitations of sql and python, and explores apache spark for data analysis, including setting up a spark cluster, extracting and analyzing job skills and salaries, and providing real-time insights for data engineers, scientists, and analysts.', 'chapters': [{'end': 564.269, 'start': 511.836, 'title': 'Data nerds: salary, skills, and nlp', 'summary': 'Discusses the challenges of extracting salary and skills data, proposes nlp as the solution, and compares the limitations of sql and python for data processing.', 'duration': 52.433, 'highlights': ['NLP is proposed as a solution for extracting keywords from job descriptions, addressing the challenge of buried skills data.', 'Challenges with salary data include varied formats such as yearly, hourly, and ranges, creating complexity in analysis.', "Comparison of processing tools reveals Python's limitations, taking nearly an hour to generate a visualization on a single computer."]}, {'end': 917.778, 'start': 564.509, 'title': 'Using apache spark for data analysis', 'summary': 'Discusses the use of apache spark for data analysis, including setting up a spark cluster, extracting salary and skills data, and building a web app to provide real-time insights into job skills and salaries, with specific insights for data engineers, data scientists, and data analysts.', 'duration': 353.269, 'highlights': ['Apache Spark is used to form a Spark cluster for fighting against big data, and can be run on most cloud providers. Apache Spark is a tool designed for using multiple computers to form a Spark cluster, which is great for fighting against big data. It can be run on most cloud providers.', 'PySpark is used to run the Spark cluster, allowing interaction with the framework using Python, and is used for daily job entry imports and data cleaning. PySpark, an API for Apache Spark, is used to run the Spark cluster and interact with the framework using Python. It is used for daily job entry imports and data cleaning.', 'A Spark cluster is used to generate clean salary data and extract skills from job postings, with a list of 250 specific words related to data science being extracted. A Spark cluster is used to generate clean salary data and extract skills from job postings. A list of 250 specific words related to data science is extracted for this purpose.', 'Median salary values are used instead of averages due to wide salary ranges in job postings, with specific examples of high and low salaries for different job titles provided. Median salary values are used instead of averages due to wide salary ranges in job postings. Specific examples of high and low salaries for different job titles are provided.', 'A web app is built using Python and PySpark to provide real-time insights into job skills and salaries, with specific insights for data engineers, data scientists, and data analysts. A web app is built using Python and PySpark to provide real-time insights into job skills and salaries, with specific insights for data engineers, data scientists, and data analysts.']}], 'duration': 405.942, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/7G_Kz5MOqps/pics/7G_Kz5MOqps511836.jpg', 'highlights': ['NLP is proposed as a solution for extracting keywords from job descriptions, addressing the challenge of buried skills data.', 'A Spark cluster is used to generate clean salary data and extract skills from job postings, with a list of 250 specific words related to data science being extracted.', 'A web app is built using Python and PySpark to provide real-time insights into job skills and salaries, with specific insights for data engineers, data scientists, and data analysts.', 'Median salary values are used instead of averages due to wide salary ranges in job postings, with specific examples of high and low salaries for different job titles provided.', 'Apache Spark is used to form a Spark cluster for fighting against big data, and can be run on most cloud providers.']}], 'highlights': ['The lack of evidence to support recommended skills questions the credibility of the recommendations.', 'Outdated skills recommendations can potentially harm individuals trying to learn unnecessary tools.', 'SQL and Excel are the most common skills in data analyst job postings in the United States.', 'The annual developer survey by Stack Overflow identifies popular skills like programming, SQL databases, and cloud platforms.', "The survey's limited representation of data analysts, comprising less than 15% of the respondents, poses challenges for data nerds in extracting valuable skills.", 'The need to collect data beyond just data analysts in the United States prompts the utilization of Python and SERP API to extract a larger dataset.', 'Data engineers have an overwhelming majority in job postings, with approximately 1 in 3 job postings not mentioning a degree requirement, indicating a high demand for this role.', 'Most job opportunities in the data industry are focused on full-time positions, indicating a predominant trend towards traditional employment in this field.', 'Building a data pipeline using Python, SERP API, BigQuery, and Airflow to collect and clean job data from various locations.', 'Python, SERP API, BigQuery, and Airflow are used to collect and clean job data, with Python for scripting, SERP API for data extraction, BigQuery for storage, and Airflow for scheduling.', 'NLP is proposed as a solution for extracting keywords from job descriptions, addressing the challenge of buried skills data.', 'A Spark cluster is used to generate clean salary data and extract skills from job postings, with a list of 250 specific words related to data science being extracted.', 'A web app is built using Python and PySpark to provide real-time insights into job skills and salaries, with specific insights for data engineers, data scientists, and data analysts.', 'Median salary values are used instead of averages due to wide salary ranges in job postings, with specific examples of high and low salaries for different job titles provided.', 'Apache Spark is used to form a Spark cluster for fighting against big data, and can be run on most cloud providers.']}