title
How I use Python as a Data Analyst
description
π² Job Data App ππΌ https://datanerd.tech/
πΒ Python For Everybody ππΌΒ https://lukeb.co/PythonForEverybody
πΒ Python for Data Science ππΌΒ https://lukeb.co/PythonDataScience
Python App Resources:
==================================
βΎοΈΒ EDA Python Notebook ππΌΒ https://www.kaggle.com/code/lukebarousse/eda-of-job-posting-data
πΒ Kaggle Dataset ππΌΒ https://www.kaggle.com/datasets/lukebarousse/data-analyst-job-postings-google-search
π½ GitHub Repo ππΌ https://github.com/lukebarousse/Data_Analyst_Streamlit_App_V1
βοΈ SerpAPI to BigQuery Python Code ππΌ https://gist.github.com/lukebarousse/ded1fc3dbde6e0050d45635140480aee
π· SerpAPI ππΌ https://serpapi.com/ (π₯20% OFFπ₯ If you tell them Luke sent you)
πΎ r/DataNerd Subreddit ππΌ https://www.reddit.com/r/DataNerd/
* NOTE: The Streamit app I built in this video keeps crashing due to heavy traffic π which is a good thing! π And so I built a new one: https://datanerd.tech/
Courses for Data Nerds
==================================
π Google Data Analytics Certificate (START HERE) ππΌ https://lukeb.co/GoogleCert
βοΈ Google Cloud Fundamentals ππΌ https://lukeb.co/GoogleCloud
πΏ SQL for Data Science ππΌ https://lukeb.co/SQLdataScience
π§Ύ Excel Skills for Business ππΌΒ https://lukeb.co/ExcelBusinessAnalyst
π Python for Everybody ππΌ https://lukeb.co/PythonForEverybody
π Data Visualization with Tableau ππΌΒ https://lukeb.co/Tableau_UCDavis
π΄ββ οΈ Data Science: Foundations using R ππΌ https://lukeb.co/RforDataScienceJH
β Coursera Plus Subscription (7-day free trial) ππΌ https://lukeb.co/CourseraPlus
π¨πΌβπ« All courses ππΌ https://kit.co/lukebarousse/data-analytics-courses
Build a Portfolio
==================================
π©π»βπ»Build portfolio here ππΌ http://hostinger.com/luke
Rebate Code: "LUKE"
My Portfolio ππΌ https://lukebarousse.tech/
Books for Data Nerds
==================================
π Books Iβve read ππΌ https://kit.co/lukebarousse/book-recommendations
π Data Analyst Must Read ππΌ https://geni.us/StorytellingWithData
π Tableau ππΌ https://geni.us/tableau
π Power BIππΌ https://geni.us/powerbi
π Python ππΌ https://geni.us/pythontricks
Tech for Data Nerds
==================================
βοΈ Tech I use ππΌ https://kit.co/lukebarousse/computer-accessories
πͺWindows on a Mac (Parallels VM) ππΌ https://lukeb.co/ParallelsFreeTrial
π¨πΌβπ» M1 Macbook Air (Mac of choice) ππΌ https://geni.us/M1macAir8GB
π» Dell XPS 13 (PC of choice) ππΌ https://geni.us/DellNewXPS13
π» Asus Vivo Book (Lowest Cost PC) ππΌ https://geni.us/AsusVivoBook15
π»Lenovo IdeaPad (Best Value PC)ππΌ https://geni.us/LenovoIdeaPad15
Social Media / Contact Me
======================
πΎ r/DataNerd ππΌ https://www.reddit.com/r/DataNerd/
π Instagram: https://www.instagram.com/lukebarousse/
β° TikTok: https://www.tiktok.com/@lukebarousse
π Facebook: https://www.facebook.com/datavizbyluke
ππΌββοΈNewsletter: https://www.lukebarousse.com/
00:00 Intro
01:31 How to collect data
02:33 APIs
03:19 Getting the Goods
04:58 What is a Database?
07:32 Cleaning the Goods
08:56 EDA of the Goods
09:31 Salary Analysis
10:01 Keyword Analysis
11:55 App Build... for the Goods
13:14 I NEED YOUR HELP!!!
As a member of the Amazon, Coursera, Hostinger, and Parallels Affiliate Programs, I earn a commission from qualifying purchases on the links above. It costs you nothing but helps me with content creation.
#datanerd #dataanalyst #datascience
detail
{'title': 'How I use Python as a Data Analyst', 'heatmap': [{'end': 143.425, 'start': 132.943, 'weight': 0.735}, {'end': 245.892, 'start': 225.095, 'weight': 0.857}, {'end': 738.073, 'start': 725.31, 'weight': 0.851}], 'summary': "Showcases python's application in data analysis, covering poll result collection, automation for identifying top data analyst skills, data collection methods like web scraping and api access, transitioning to 10,000 jobs/day using bigquery, real-time data collection automation, insights on cost-saving, job analysis of 1300 data analyst jobs with python, and creating a dashboard using streamlit for open sourcing.", 'chapters': [{'end': 299.092, 'segs': [{'end': 29.108, 'src': 'embed', 'start': 0.029, 'weight': 0, 'content': [{'end': 3.731, 'text': "What up, data nerds? Let's see how I use this as a data analyst.", 'start': 0.029, 'duration': 3.702}, {'end': 6.633, 'text': 'Now, Python is a popular tool you can install for free.', 'start': 3.811, 'duration': 2.822}, {'end': 9.354, 'text': 'And here are some recent poll results from my YouTube channel.', 'start': 6.713, 'duration': 2.641}, {'end': 10.235, 'text': "Let's analyze them.", 'start': 9.394, 'duration': 0.841}, {'end': 12.496, 'text': 'First, I collect these results into this script.', 'start': 10.275, 'duration': 2.221}, {'end': 15.117, 'text': 'Next, I perform a simple calculation to analyze it.', 'start': 12.636, 'duration': 2.481}, {'end': 16.838, 'text': 'Then I plot these findings to share.', 'start': 15.217, 'duration': 1.621}, {'end': 22.201, 'text': 'In three simple lines of code, we just performed the three major steps of any data analytics project.', 'start': 16.958, 'duration': 5.243}, {'end': 26.245, 'text': 'And this multi-purpose programming language is nearly limitless in what it can do.', 'start': 22.321, 'duration': 3.924}, {'end': 29.108, 'text': 'From building bots to scrape the internet for more data,', 'start': 26.285, 'duration': 2.823}], 'summary': 'Using python, i analyzed recent poll results from my youtube channel and performed three major steps in data analytics in three simple lines of code.', 'duration': 29.079, 'max_score': 0.029, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY29.jpg'}, {'end': 62.278, 'src': 'embed', 'start': 35.656, 'weight': 1, 'content': [{'end': 42.123, 'text': 'Speaking of content, my subscribers have been relentless in asking about top skills they should be learning as a data analyst.', 'start': 35.656, 'duration': 6.467}, {'end': 49.328, 'text': "so, like any lazy data nerd, i'm going to automate this data analysis and build an app to tell them just this.", 'start': 42.643, 'duration': 6.685}, {'end': 51.15, 'text': 'first, we need real-time data.', 'start': 49.328, 'duration': 1.822}, {'end': 54.832, 'text': "whenever a subscriber accesses the app, it needs today's results.", 'start': 51.15, 'duration': 3.682}, {'end': 58.936, 'text': "next, we'll have to perform data analysis to find the top skills within this data.", 'start': 54.832, 'duration': 4.104}, {'end': 62.278, 'text': "and last, we'll share these results via an easy, accessible app.", 'start': 58.936, 'duration': 3.342}], 'summary': 'Automating data analysis to determine top skills for data analysts using real-time data and sharing results via an app.', 'duration': 26.622, 'max_score': 35.656, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY35656.jpg'}, {'end': 118.715, 'src': 'embed', 'start': 91.741, 'weight': 3, 'content': [{'end': 95.607, 'text': "All right, so let's get in the first phase of this project of collecting the data that we need.", 'start': 91.741, 'duration': 3.866}, {'end': 97.25, 'text': 'Now, there are a number of ways to get data.', 'start': 95.687, 'duration': 1.563}, {'end': 103.539, 'text': "If you work for a company, they'll typically house their data in databases, or if they're less advanced, they'll store them in Excel sheets.", 'start': 97.49, 'duration': 6.049}, {'end': 111.73, 'text': 'No! God, please, no! No! For both of these scenarios, Python can connect to these data sources and a lot more.', 'start': 104.1, 'duration': 7.63}, {'end': 118.715, 'text': "But unfortunately, I don't have any access to any company databases, so we need to search for publicly available data.", 'start': 112.51, 'duration': 6.205}], 'summary': 'Project phase: collecting data from company databases or excel sheets using python. need to search for publicly available data.', 'duration': 26.974, 'max_score': 91.741, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY91741.jpg'}, {'end': 163.795, 'src': 'heatmap', 'start': 132.943, 'weight': 0.735, 'content': [{'end': 136.024, 'text': 'With this data collected, I then analyzed it for skill requirements.', 'start': 132.943, 'duration': 3.081}, {'end': 139.104, 'text': 'Unfortunately, I ended up getting banned on LinkedIn from doing this.', 'start': 136.104, 'duration': 3}, {'end': 143.425, 'text': "So this wouldn't satisfy our rule number one of collecting real-time data.", 'start': 139.244, 'duration': 4.181}, {'end': 147.266, 'text': 'Now, I will admit, web scraping is great if you need something once or even twice.', 'start': 143.525, 'duration': 3.741}, {'end': 153.147, 'text': "But most likely, if you're not getting actively blocked by websites, your code will eventually break due to web pages changing.", 'start': 147.306, 'duration': 5.841}, {'end': 155.007, 'text': 'So where the heck are we going to get this data??', 'start': 153.187, 'duration': 1.82}, {'end': 163.795, 'text': "well, there's actually a more reliable source than web scraping, and that is apis or application programming interface.", 'start': 155.447, 'duration': 8.348}], 'summary': 'Linkedin ban hinders real-time data collection, suggests apis as reliable source.', 'duration': 30.852, 'max_score': 132.943, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY132943.jpg'}, {'end': 174.505, 'src': 'embed', 'start': 143.525, 'weight': 4, 'content': [{'end': 147.266, 'text': 'Now, I will admit, web scraping is great if you need something once or even twice.', 'start': 143.525, 'duration': 3.741}, {'end': 153.147, 'text': "But most likely, if you're not getting actively blocked by websites, your code will eventually break due to web pages changing.", 'start': 147.306, 'duration': 5.841}, {'end': 155.007, 'text': 'So where the heck are we going to get this data??', 'start': 153.187, 'duration': 1.82}, {'end': 163.795, 'text': "well, there's actually a more reliable source than web scraping, and that is apis or application programming interface.", 'start': 155.447, 'duration': 8.348}, {'end': 167.819, 'text': 'wrong application programming interface.', 'start': 163.795, 'duration': 4.024}, {'end': 174.505, 'text': 'these apis allow you to send code from your computer, the client, to another computer called the server,', 'start': 167.819, 'duration': 6.686}], 'summary': 'Web scraping is less reliable than using apis for data retrieval.', 'duration': 30.98, 'max_score': 143.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY143525.jpg'}, {'end': 253.379, 'src': 'embed', 'start': 225.095, 'weight': 2, 'content': [{'end': 228.097, 'text': 'So we can use SERP API to get all these different job postings.', 'start': 225.095, 'duration': 3.002}, {'end': 228.958, 'text': 'So check this out.', 'start': 228.297, 'duration': 0.661}, {'end': 230.639, 'text': 'First, I can install SERP API.', 'start': 229.098, 'duration': 1.541}, {'end': 234.302, 'text': 'Then I can specify things like job title and search location.', 'start': 230.859, 'duration': 3.443}, {'end': 236.624, 'text': 'From there, I call the API with this information.', 'start': 234.382, 'duration': 2.242}, {'end': 238.225, 'text': 'In less than a second, we have it.', 'start': 236.844, 'duration': 1.381}, {'end': 240.707, 'text': 'And from there, I can actually print out these results.', 'start': 238.365, 'duration': 2.342}, {'end': 245.892, 'text': 'This has all the information we need such as title, location, description, and salary.', 'start': 241.287, 'duration': 4.605}, {'end': 253.379, 'text': 'If we actually went back to that web page itself, we can see that this job right here is requesting things like SQL, Python and Power BI,', 'start': 246.072, 'duration': 7.307}], 'summary': 'Serp api retrieves job postings in less than a second, providing details on title, location, description, and required skills.', 'duration': 28.284, 'max_score': 225.095, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY225095.jpg'}, {'end': 253.379, 'src': 'heatmap', 'start': 225.095, 'weight': 0.857, 'content': [{'end': 228.097, 'text': 'So we can use SERP API to get all these different job postings.', 'start': 225.095, 'duration': 3.002}, {'end': 228.958, 'text': 'So check this out.', 'start': 228.297, 'duration': 0.661}, {'end': 230.639, 'text': 'First, I can install SERP API.', 'start': 229.098, 'duration': 1.541}, {'end': 234.302, 'text': 'Then I can specify things like job title and search location.', 'start': 230.859, 'duration': 3.443}, {'end': 236.624, 'text': 'From there, I call the API with this information.', 'start': 234.382, 'duration': 2.242}, {'end': 238.225, 'text': 'In less than a second, we have it.', 'start': 236.844, 'duration': 1.381}, {'end': 240.707, 'text': 'And from there, I can actually print out these results.', 'start': 238.365, 'duration': 2.342}, {'end': 245.892, 'text': 'This has all the information we need such as title, location, description, and salary.', 'start': 241.287, 'duration': 4.605}, {'end': 253.379, 'text': 'If we actually went back to that web page itself, we can see that this job right here is requesting things like SQL, Python and Power BI,', 'start': 246.072, 'duration': 7.307}], 'summary': 'Using serp api, we can retrieve job postings with details such as title, location, and required skills in less than a second.', 'duration': 28.284, 'max_score': 225.095, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY225095.jpg'}], 'start': 0.029, 'title': 'Python for data analysis and data collection in python', 'summary': 'Demonstrates the use of python for data analysis, including the collection and analysis of poll results from a youtube channel, automation of data analysis for identifying top skills for data analysts, and the use of python for various data-related tasks. it also discusses various methods for data collection in python, including web scraping, accessing apis, and using python to call the serp api for job postings, highlighting the advantages and challenges of each method and recommending further learning resources.', 'chapters': [{'end': 91.601, 'start': 0.029, 'title': 'Python for data analysis', 'summary': 'Demonstrates the use of python for data analysis, including the collection and analysis of poll results from a youtube channel, automation of data analysis for identifying top skills for data analysts, and the use of python for various data-related tasks.', 'duration': 91.572, 'highlights': ['Python is used to collect and analyze recent poll results from the YouTube channel, demonstrating the simplicity and versatility of the programming language.', 'The chapter highlights the automation of data analysis to identify top skills for data analysts, emphasizing the use of real-time data and sharing results via an accessible app.', 'The speaker emphasizes the accessibility and ease of sharing code using Python, and also mentions promoting Python skill improvement through recommended courses from Coursera.']}, {'end': 299.092, 'start': 91.741, 'title': 'Data collection and apis in python', 'summary': 'Discusses various methods for data collection in python, including web scraping, accessing apis, and using python to call the serp api for job postings, highlighting the advantages and challenges of each method and recommending further learning resources.', 'duration': 207.351, 'highlights': ['Python can connect to company databases or Excel sheets for data collection. Python can connect to company databases or Excel sheets for data collection, providing flexibility in data source connectivity.', 'Web scraping is unreliable for real-time data collection due to potential bans and website layout changes. Web scraping is unreliable for real-time data collection due to potential bans and website layout changes, emphasizing its limitations for continuous data retrieval.', 'Usage of APIs, such as the SERP API, for sustainable and reliable data collection. Usage of APIs, such as the SERP API, for sustainable and reliable data collection, highlighting the advantages of APIs over web scraping for consistent data retrieval.', 'Demonstration of using Python to call the SERP API for job postings, emphasizing the ease and speed of obtaining desired information. Demonstration of using Python to call the SERP API for job postings, emphasizing the ease and speed of obtaining desired information, showcasing the efficient usage of Python for API integration.', "Recommendation of 'Python for Everybody' specialization for beginners to gain comprehensive Python skills. Recommendation of 'Python for Everybody' specialization for beginners to gain comprehensive Python skills, suggesting a learning resource for foundational Python knowledge."]}], 'duration': 299.063, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY29.jpg', 'highlights': ['Python is used to collect and analyze recent poll results from the YouTube channel, demonstrating the simplicity and versatility of the programming language.', 'The chapter highlights the automation of data analysis to identify top skills for data analysts, emphasizing the use of real-time data and sharing results via an accessible app.', 'Demonstration of using Python to call the SERP API for job postings, emphasizing the ease and speed of obtaining desired information, showcasing the efficient usage of Python for API integration.', 'Python can connect to company databases or Excel sheets for data collection, providing flexibility in data source connectivity.', 'Usage of APIs, such as the SERP API, for sustainable and reliable data collection, highlighting the advantages of APIs over web scraping for consistent data retrieval.']}, {'end': 543.67, 'segs': [{'end': 327.787, 'src': 'embed', 'start': 299.232, 'weight': 0, 'content': [{'end': 305.735, 'text': "So right now, we're collecting around 100 jobs per day, but I'm planning to up-ramp this to around 10,000 jobs a day.", 'start': 299.232, 'duration': 6.503}, {'end': 309.137, 'text': "If I do the math for that, that's around 3 million jobs per year.", 'start': 305.855, 'duration': 3.282}, {'end': 314.86, 'text': 'Wow So storing this data in something like a CSV or Excel file is not an option.', 'start': 309.657, 'duration': 5.203}, {'end': 319.962, 'text': "So we need to store this data in a sustainable solution, something that's designed for large amounts of data.", 'start': 314.94, 'duration': 5.022}, {'end': 322.284, 'text': "So we're going to use a SQL database.", 'start': 320.243, 'duration': 2.041}, {'end': 327.787, 'text': "Wow, Now, because I want this data to be accessible by everybody, we're going to use a cloud-based solution,", 'start': 322.664, 'duration': 5.123}], 'summary': 'Collecting 100 jobs per day, planning to scale to 10,000 jobs/day, totaling 3 million jobs per year. data to be stored in a sql database for accessibility via a cloud-based solution.', 'duration': 28.555, 'max_score': 299.232, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY299232.jpg'}, {'end': 393.56, 'src': 'embed', 'start': 357.135, 'weight': 1, 'content': [{'end': 358.536, 'text': 'Now, remember the first rule of this project.', 'start': 357.135, 'duration': 1.401}, {'end': 360.456, 'text': 'We need to have real-time data.', 'start': 358.636, 'duration': 1.82}, {'end': 367.521, 'text': 'Because of this, I need to execute this Python script daily in order to grab the jobs from SERP API and insert them into BigQuery.', 'start': 361.016, 'duration': 6.505}, {'end': 372.525, 'text': "Now, if we have to rely on me to run this Python code daily, we're going to be in big trouble, as I'm going to probably forget.", 'start': 367.581, 'duration': 4.944}, {'end': 378.73, 'text': 'Instead, I took all this Python code and put it into Google Cloud to be automated and run daily on its own.', 'start': 372.685, 'duration': 6.045}, {'end': 383.353, 'text': 'So now this data pipeline is fully automated and will continue to collect in the future.', 'start': 378.87, 'duration': 4.483}, {'end': 385.055, 'text': "But that's not even the best part.", 'start': 383.714, 'duration': 1.341}, {'end': 393.56, 'text': "I've taken it a step further to export this data from BigQuery into Kaggle, so that way all my subscribers have access to this real-time data,", 'start': 385.095, 'duration': 8.465}], 'summary': 'Automated python script collects job data from serp api, inserts into bigquery, then exports to kaggle for real-time access.', 'duration': 36.425, 'max_score': 357.135, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY357135.jpg'}, {'end': 477.198, 'src': 'embed', 'start': 431.796, 'weight': 3, 'content': [{'end': 435.359, 'text': 'So if this would have ran all year, it would have cost me over 3000 bucks.', 'start': 431.796, 'duration': 3.563}, {'end': 441.023, 'text': "And Google Cloud doesn't have any alerts or notifications to let you know when you're using more of their service.", 'start': 435.839, 'duration': 5.184}, {'end': 445.386, 'text': 'Luckily for my studies, I learned the basics of monitoring services and checking balances.', 'start': 441.163, 'duration': 4.223}, {'end': 448.248, 'text': 'So I was able to find this issue before escalating further.', 'start': 445.506, 'duration': 2.742}, {'end': 452.531, 'text': "Now my bill is back to normal and it's only costing about a few cents a day, which isn't that bad.", 'start': 448.428, 'duration': 4.103}, {'end': 455.995, 'text': "All right, so let's actually get into cleaning and analyzing the data.", 'start': 452.691, 'duration': 3.304}, {'end': 458.819, 'text': 'And this is where I spend the majority of my time.', 'start': 456.436, 'duration': 2.383}, {'end': 464.126, 'text': "When I first started as a data analyst, my expectation was that I'd spend the majority of my time analyzing data.", 'start': 458.879, 'duration': 5.247}, {'end': 467.11, 'text': 'In reality, I spend most of my time trying to clean up the data.', 'start': 464.226, 'duration': 2.884}, {'end': 468.051, 'text': 'What the..', 'start': 467.25, 'duration': 0.801}, {'end': 471.394, 'text': "Remember our goal, we're trying to find what are the top skills of a data analyst.", 'start': 468.351, 'duration': 3.043}, {'end': 477.198, 'text': 'Inspecting one of the job descriptions, we can see buried inside of it is a list of tools that are required for this job.', 'start': 471.514, 'duration': 5.684}], 'summary': 'Avoided $3000 cost by monitoring google cloud; spends most time cleaning data.', 'duration': 45.402, 'max_score': 431.796, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY431796.jpg'}, {'end': 519.438, 'src': 'embed', 'start': 490.247, 'weight': 5, 'content': [{'end': 497.57, 'text': 'With these lists of keywords, I then created a loop to go through each one of the job descriptions and extract these keywords out of it.', 'start': 490.247, 'duration': 7.323}, {'end': 499.751, 'text': 'So now, looking at this updated data set,', 'start': 497.71, 'duration': 2.041}, {'end': 505.173, 'text': 'we can see that Python went through and pulled out those keywords that were in each of those job descriptions.', 'start': 499.751, 'duration': 5.422}, {'end': 507.314, 'text': "But we're not done with data cleaning just yet.", 'start': 505.333, 'duration': 1.981}, {'end': 510.335, 'text': "like to clean one more column and that's salary.", 'start': 507.734, 'duration': 2.601}, {'end': 512.176, 'text': "right now it's in an unusable format.", 'start': 510.335, 'duration': 1.841}, {'end': 519.438, 'text': "sometimes it's hourly or yearly, sometimes there's ranges i can go through and create rules to clean up all of those different salary columns.", 'start': 512.176, 'duration': 7.262}], 'summary': 'Python extracted keywords from job descriptions for data cleaning, including salaries.', 'duration': 29.191, 'max_score': 490.247, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY490247.jpg'}], 'start': 299.232, 'title': 'Data pipeline and analyst skills', 'summary': 'Covers the transition from 100 to 10,000 jobs/day using bigquery, automation of python script for real-time data collection, and insights on cost-saving. it also discusses the challenges of data cleaning, emphasizing the time spent on it, use of python for keyword extraction, and effort in cleaning salary data, taking a few days to complete.', 'chapters': [{'end': 448.248, 'start': 299.232, 'title': 'Automated data pipeline with bigquery', 'summary': 'Discusses the transition from collecting 100 jobs per day to up-ramping to 10,000 jobs a day, utilizing a sql database and bigquery from the google cloud platform, automating the python script for real-time data collection, and the cost-saving knowledge gained from learning google cloud services.', 'duration': 149.016, 'highlights': ['Transition from collecting 100 jobs per day to up-ramping to 10,000 jobs a day Planning to increase job collection to 10,000 jobs per day, totaling around 3 million jobs per year.', 'Utilizing a SQL database and BigQuery from the Google Cloud Platform Setting up a blank database in BigQuery for importing query results from SERP API and automating the data pipeline for future collection.', 'Automating the Python script for real-time data collection Automating the Python code in Google Cloud to execute daily for grabbing jobs from SERP API and inserting them into BigQuery.', 'Cost-saving knowledge gained from learning Google Cloud services Learning to monitor services and check balances, preventing a potential cost of $3,000 by identifying and rectifying an unnecessary expensive service.']}, {'end': 543.67, 'start': 448.428, 'title': 'Data analyst skills and data cleaning', 'summary': 'Discusses the challenges of data cleaning, emphasizing that despite expectations, a data analyst spends most of their time cleaning data rather than analyzing it. it also highlights the use of python to extract keywords from job descriptions and the effort required to clean salary data, taking a few days to complete.', 'duration': 95.242, 'highlights': ["The majority of a data analyst's time is spent on data cleaning rather than data analysis, contrary to initial expectations.", 'Python is used to extract keywords from job descriptions, aiding in the identification of required skills for a data analyst role.', 'Cleaning salary data is a time-consuming process, taking a few days to complete.']}], 'duration': 244.438, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY299232.jpg', 'highlights': ['Transitioning from collecting 100 jobs per day to up-ramping to 10,000 jobs a day, totaling around 3 million jobs per year.', 'Utilizing a SQL database and BigQuery from the Google Cloud Platform for automating the data pipeline for future collection.', 'Automating the Python script in Google Cloud for real-time data collection from SERP API and inserting them into BigQuery.', 'Learning to monitor services and check balances, preventing a potential cost of $3,000 by identifying and rectifying an unnecessary expensive service.', "The majority of a data analyst's time is spent on data cleaning rather than data analysis, contrary to initial expectations.", 'Python is used to extract keywords from job descriptions, aiding in the identification of required skills for a data analyst role.', 'Cleaning salary data is a time-consuming process, taking a few days to complete.']}, {'end': 834.371, 'segs': [{'end': 568.128, 'src': 'embed', 'start': 543.67, 'weight': 0, 'content': [{'end': 550.196, 'text': "it looks like we've collected around 1300 jobs already and are averaging about 100 jobs a day since we started this.", 'start': 543.67, 'duration': 6.526}, {'end': 554.759, 'text': "Let's actually go in now and visualize a lot of the different columns and see what values they have in them.", 'start': 550.236, 'duration': 4.523}, {'end': 558.322, 'text': 'We can see that data analyst is one of the most frequent titles for the job type.', 'start': 554.839, 'duration': 3.483}, {'end': 562.225, 'text': "It looks like there's mostly full time jobs available with some contractor.", 'start': 558.442, 'duration': 3.783}, {'end': 568.128, 'text': 'But what I think is most interesting is that most of these jobs are coming from LinkedIn, the place that I wanted to scrape originally,', 'start': 562.266, 'duration': 5.862}], 'summary': 'Collected 1300 jobs, averaging 100 per day. data analyst is frequent title. most jobs from linkedin.', 'duration': 24.458, 'max_score': 543.67, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY543670.jpg'}, {'end': 631.782, 'src': 'embed', 'start': 579.813, 'weight': 1, 'content': [{'end': 583.393, 'text': 'It looks like we have a high clumping around $100,000.', 'start': 579.813, 'duration': 3.58}, {'end': 585.315, 'text': "We'll probably need to investigate this at some point.", 'start': 583.394, 'duration': 1.921}, {'end': 586.335, 'text': 'So next is the hourly pay.', 'start': 585.355, 'duration': 0.98}, {'end': 587.836, 'text': "And it looks like it's ranging from 20 to 80 bucks.", 'start': 586.435, 'duration': 1.401}, {'end': 594.079, 'text': 'I actually dove into this further and found that the majority of these postings came from Upwork, which is a freelance site.', 'start': 588.976, 'duration': 5.103}, {'end': 599.322, 'text': 'So for both of these salary plots, this includes not only entry level, but also those that have more experience.', 'start': 594.099, 'duration': 5.223}, {'end': 602.424, 'text': 'So this is why this is such a big range for both of these.', 'start': 599.803, 'duration': 2.621}, {'end': 609.949, 'text': "All right, so let's actually dive into that final analytical question of what is the top skill of data analysts? For this, we need a metric.", 'start': 602.625, 'duration': 7.324}, {'end': 615.632, 'text': 'Because we have skills in a job posting, we can evaluate the likelihood of a skill appearing in a job posting.', 'start': 610.089, 'duration': 5.543}, {'end': 619.695, 'text': 'If Python is in two of three job postings, its likelihood is 66%.', 'start': 615.652, 'duration': 4.043}, {'end': 622.416, 'text': 'So once again, Python does this really conveniently.', 'start': 619.695, 'duration': 2.721}, {'end': 629.32, 'text': 'I can go through each of those keywords and calculate a percentage for each one to determine their likelihood to be in a job posting.', 'start': 622.516, 'duration': 6.804}, {'end': 631.782, 'text': 'Which, drum roll please, is..', 'start': 629.52, 'duration': 2.262}], 'summary': 'Data analysis: salary clumps around $100,000, hourly pay ranges from $20 to $80. python likely top skill in job postings.', 'duration': 51.969, 'max_score': 579.813, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY579813.jpg'}, {'end': 715.767, 'src': 'embed', 'start': 686.648, 'weight': 5, 'content': [{'end': 689.229, 'text': 'I now have this script in an automated fashion.', 'start': 686.648, 'duration': 2.581}, {'end': 693.011, 'text': 'So say I wanted updated results tomorrow, I could just go and run the script.', 'start': 689.609, 'duration': 3.402}, {'end': 696.453, 'text': "Actually, let's go see future Luke to see what the results are tomorrow.", 'start': 693.431, 'duration': 3.022}, {'end': 699.458, 'text': "Future Luke here, and I guess I'll just stop editing to answer this question.", 'start': 696.693, 'duration': 2.765}, {'end': 706.13, 'text': 'Anyway, I ran this Python script and it looks like SQL is still in the lead, but that we also have Python catching up.', 'start': 699.799, 'duration': 6.331}, {'end': 707.693, 'text': 'Nice Thanks, Future Luke.', 'start': 706.29, 'duration': 1.403}, {'end': 708.655, 'text': 'Cool story, bro.', 'start': 707.913, 'duration': 0.742}, {'end': 709.697, 'text': "I'm going to get back to editing.", 'start': 708.835, 'duration': 0.862}, {'end': 715.767, 'text': 'Okay, so this is good for me and for future me, but how the heck do I get these results to my subscribers?', 'start': 710.305, 'duration': 5.462}], 'summary': 'Automated script shows sql in lead, python catching up. future results accessible tomorrow.', 'duration': 29.119, 'max_score': 686.648, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY686648.jpg'}, {'end': 756.436, 'src': 'heatmap', 'start': 725.31, 'weight': 0.851, 'content': [{'end': 728.711, 'text': 'Now, Python has a lot of different support and packages to use for this.', 'start': 725.31, 'duration': 3.401}, {'end': 733.272, 'text': 'I decided to just go with the most popular, most user friendly option, which was Streamlit.', 'start': 729.091, 'duration': 4.181}, {'end': 738.073, 'text': 'I was able to reuse a lot of the work I had done already from importing and cleaning the data,', 'start': 733.512, 'duration': 4.561}, {'end': 741.133, 'text': 'performing all those calculations and then visualizing it all.', 'start': 738.073, 'duration': 3.06}, {'end': 744.054, 'text': 'The best part is that Streamlit also deploys your app for free.', 'start': 741.193, 'duration': 2.861}, {'end': 744.914, 'text': "So let's check it out.", 'start': 744.214, 'duration': 0.7}, {'end': 751.495, 'text': 'On the first page, it has where you can see the top skills for data analysts also have options to sort by different languages.', 'start': 745.014, 'duration': 6.481}, {'end': 756.436, 'text': 'And then also you can go in to even see a daily trend analysis of each of these skills.', 'start': 751.655, 'duration': 4.781}], 'summary': 'Used streamlit for data visualization, deploying app for free.', 'duration': 31.126, 'max_score': 725.31, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY725310.jpg'}, {'end': 808.402, 'src': 'embed', 'start': 779.228, 'weight': 6, 'content': [{'end': 780.709, 'text': 'This is a really crazy solution.', 'start': 779.228, 'duration': 1.481}, {'end': 791.494, 'text': "We're using Python for everything from gathering a data from Google jobs to putting it into a database at BigQuery and then extracting it into providing in this dashboard that anybody can access.", 'start': 781.189, 'duration': 10.305}, {'end': 794.676, 'text': 'And this is hopefully only the beginning of this project.', 'start': 791.594, 'duration': 3.082}, {'end': 799.498, 'text': "Right now, we're only extracting about a hundred jobs a day for data analysts in the United States.", 'start': 794.776, 'duration': 4.722}, {'end': 804.7, 'text': 'I want to grow this further to those beyond data analysts, such as data scientists and data engineers.', 'start': 799.798, 'duration': 4.902}, {'end': 808.402, 'text': 'And thankfully, SERP API has agreed to help with this by providing some free credits.', 'start': 804.82, 'duration': 3.582}], 'summary': 'Using python, we extract 100 data analyst jobs daily from google jobs to bigquery for a dashboard accessible to all, with plans to expand to data scientists and engineers, supported by serp api.', 'duration': 29.174, 'max_score': 779.228, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY779228.jpg'}, {'end': 834.371, 'src': 'embed', 'start': 816.425, 'weight': 7, 'content': [{'end': 824.608, 'text': "I'm hoping that by open sourcing this project, we can get more contributions and make it easier for those that are aspiring to become data analysts,", 'start': 816.425, 'duration': 8.183}, {'end': 828.169, 'text': 'engineers or scientists to land their jobs in this field.', 'start': 824.608, 'duration': 3.561}, {'end': 829.049, 'text': 'All right.', 'start': 828.689, 'duration': 0.36}, {'end': 832.63, 'text': 'As always, if you got value out of this video, smash that like button.', 'start': 829.609, 'duration': 3.021}, {'end': 834.371, 'text': "With that, I'll see you in the next one.", 'start': 832.87, 'duration': 1.501}], 'summary': 'Open sourcing project to help aspiring data professionals land jobs.', 'duration': 17.946, 'max_score': 816.425, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY816425.jpg'}], 'start': 543.67, 'title': 'Data analyst job analysis and python for job posting analysis', 'summary': 'Provides insights from analyzing 1300 data analyst jobs, with an average of 100 jobs per day. it also discusses the use of python to analyze job postings, showcasing the likelihood of specific skills appearing, building an automated script, and creating a dashboard using streamlit. the aim is to expand the project beyond data analysts to data scientists and engineers, encouraging contributions through open sourcing.', 'chapters': [{'end': 609.949, 'start': 543.67, 'title': 'Data analyst job analysis', 'summary': 'Reveals insights from analyzing 1300 data analyst jobs, with an average of 100 jobs per day, including the most frequent job title, job sources, salary distribution, and the top skill of data analysts.', 'duration': 66.279, 'highlights': ['The majority of the 1300 data analyst jobs analyzed came from LinkedIn, Upwork, Monster, and Talent, showcasing a diverse source distribution.', 'The salary distribution for data analyst jobs ranges from $50,000 to $200,000, with a high clumping around $100,000, indicating the need for further investigation.', 'The hourly pay for data analyst jobs ranges from $20 to $80, with the majority of postings coming from Upwork, a freelance site, highlighting the prevalence of freelance opportunities in this field.', 'The analysis also aims to identify the top skill of data analysts, emphasizing the need for a metric to address this analytical question.']}, {'end': 834.371, 'start': 610.089, 'title': 'Python for job posting analysis', 'summary': 'Discusses the use of python to analyze job postings, showcasing the likelihood of specific skills appearing, building an automated script, and creating a dashboard using streamlit to visualize data and salary information, aiming to expand the project beyond data analysts to data scientists and engineers, and encouraging contributions through open sourcing.', 'duration': 224.282, 'highlights': ["Python is used to analyze the likelihood of specific skills appearing in job postings, with Python catching up to SQL in updated results. Python's likelihood of appearing in job postings is showcased, with Python catching up to SQL in updated results.", 'Creation of an automated script using Python to obtain updated results for job postings. An automated script using Python is created to obtain updated results for job postings.', 'Development of a dashboard using Streamlit to visualize data and salary information, with the aim to expand the project beyond data analysts to data scientists and engineers. A dashboard is developed using Streamlit to visualize data and salary information, with the aim to expand the project beyond data analysts to data scientists and engineers.', 'Encouraging contributions through open sourcing the project, aiming to make it easier for aspiring data analysts, engineers, and scientists to land jobs in the field. Open sourcing the project is encouraged, aiming to make it easier for aspiring data analysts, engineers, and scientists to land jobs in the field.']}], 'duration': 290.701, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/iNEwkaYmPqY/pics/iNEwkaYmPqY543670.jpg', 'highlights': ['The majority of the 1300 data analyst jobs analyzed came from LinkedIn, Upwork, Monster, and Talent, showcasing a diverse source distribution.', 'The salary distribution for data analyst jobs ranges from $50,000 to $200,000, with a high clumping around $100,000, indicating the need for further investigation.', 'The hourly pay for data analyst jobs ranges from $20 to $80, with the majority of postings coming from Upwork, a freelance site, highlighting the prevalence of freelance opportunities in this field.', 'The analysis also aims to identify the top skill of data analysts, emphasizing the need for a metric to address this analytical question.', 'Python is used to analyze the likelihood of specific skills appearing in job postings, with Python catching up to SQL in updated results.', 'Creation of an automated script using Python to obtain updated results for job postings.', 'Development of a dashboard using Streamlit to visualize data and salary information, with the aim to expand the project beyond data analysts to data scientists and engineers.', 'Encouraging contributions through open sourcing the project, aiming to make it easier for aspiring data analysts, engineers, and scientists to land jobs in the field.']}], 'highlights': ['Python is used to collect and analyze recent poll results from the YouTube channel, demonstrating the simplicity and versatility of the programming language.', 'The chapter highlights the automation of data analysis to identify top skills for data analysts, emphasizing the use of real-time data and sharing results via an accessible app.', 'Demonstration of using Python to call the SERP API for job postings, emphasizing the ease and speed of obtaining desired information, showcasing the efficient usage of Python for API integration.', 'Python can connect to company databases or Excel sheets for data collection, providing flexibility in data source connectivity.', 'Usage of APIs, such as the SERP API, for sustainable and reliable data collection, highlighting the advantages of APIs over web scraping for consistent data retrieval.', 'Transitioning from collecting 100 jobs per day to up-ramping to 10,000 jobs a day, totaling around 3 million jobs per year.', 'Utilizing a SQL database and BigQuery from the Google Cloud Platform for automating the data pipeline for future collection.', 'Automating the Python script in Google Cloud for real-time data collection from SERP API and inserting them into BigQuery.', 'Learning to monitor services and check balances, preventing a potential cost of $3,000 by identifying and rectifying an unnecessary expensive service.', "The majority of a data analyst's time is spent on data cleaning rather than data analysis, contrary to initial expectations.", 'Python is used to extract keywords from job descriptions, aiding in the identification of required skills for a data analyst role.', 'Cleaning salary data is a time-consuming process, taking a few days to complete.', 'The majority of the 1300 data analyst jobs analyzed came from LinkedIn, Upwork, Monster, and Talent, showcasing a diverse source distribution.', 'The salary distribution for data analyst jobs ranges from $50,000 to $200,000, with a high clumping around $100,000, indicating the need for further investigation.', 'The hourly pay for data analyst jobs ranges from $20 to $80, with the majority of postings coming from Upwork, a freelance site, highlighting the prevalence of freelance opportunities in this field.', 'The analysis also aims to identify the top skill of data analysts, emphasizing the need for a metric to address this analytical question.', 'Python is used to analyze the likelihood of specific skills appearing in job postings, with Python catching up to SQL in updated results.', 'Creation of an automated script using Python to obtain updated results for job postings.', 'Development of a dashboard using Streamlit to visualize data and salary information, with the aim to expand the project beyond data analysts to data scientists and engineers.', 'Encouraging contributions through open sourcing the project, aiming to make it easier for aspiring data analysts, engineers, and scientists to land jobs in the field.']}