title

Python for Data Science | Data Science With Python | Python Data Science Tutorial | Intellipaat

description

π₯Intellipaat Python for Data Science Course: https://intellipaat.com/python-for-data-science-training/
In this python for data science video you will learn end to end on data science with python. So this python data science tutorial will help you learn various python concepts and machine learning algorithms to get you started in this technology.
#PythonforDataScience #DataScienceWithPython #PythonDataScienceTutorial #PythonforDataScienceTraining #PythonforDataScienceCourse #LearnPython #PythonDataScience #PythonforDataAnalysis
π Do subscribe to Intellipaat channel & get regular updates on videos: http://bit.ly/Intellipaat
π Watch complete Data Science tutorials here: https://www.youtube.com/watch?v=LRcIJHHESaY&list=PLVHgQku8Z934OCWXhq5YsfiMGvStaFB1i
π Read complete Data Science tutorial here: https://intellipaat.com/tutorial/data-science-tutorial/
π°Interested to learn Data Science still more? Please check similar what is Data Science blog here: https://intellipaat.com/blog/what-is-data-science/
πRead insightful blog on Python for data science: https://intellipaat.com/blog/python-for-data-science/
π‘Know various data science certifications in a detailed blog: https://intellipaat.com/blog/data-science-certification/
πThe following topics are covered in this tutorial:
0:00 - Python for Data Science
1:03β - Introduction to Pandas
4:24β - Pandas vs Numpy
5:20β - How to import Pandas in Python
6:04β - Data-set in Pandas
6:41β - What is a series object
10:54β - DataFrame in Pandas
11:56β - How to create a DataFrame
18:07β - Merge,Join & Concatenate in Pandas
32:15β - Importing & Analyzing the Dataset
43:00β - Manipulating the Dataset
49:46β - Introduction to Machine Learning
53:23β - How does Machine Learn
55:07β - Machine Learning popular MYTH!
57:21β - Types of Machine Learning
1:13:31β - What is Regression
1:21:17β - Types of Regression
1:26:54β - What is Linear Regression
1:30:00β - Understanding Linear Regression
1:41:41β - Mean Square Error
2:33:33β - Logistic Regression Algorithm
2:40:43β - Linear regression (Recap)
2:45:59 - Introduction to logistic Regression
2:52:57β - Why Logistic Regression
2:54:19β - Spam Email Classifier
3:37:57β - Demo Logistic regression
4:05:48β - What is Classification
4:06:20β - Classification vs Regression
4:06:36β - Types of Classification
4:18:02β - Visualizing a Decision Tree
4:24:26 - Creating a Decision Tree
4:27:46β - Calculating Entropy
4:43:45β - Understanding Confusion Matrix
4:45:48β - Understanding Naive Bayes Classifier
5:13:30 - What is Clustering
5:18:13 - Types of Clustering
5:22:54β - What is K-Means Clustering?
5:26:27β - understanding K-Means Algorithm
5:44:25β - Quiz 1
5:44:47β - Quiz 2
5:47:25β - Quiz 3
5:48:27β - Quiz 4
5:48:44 - Quiz 5
If youβve enjoyed this python data analysis tutorial, Like us and Subscribe to our channel for more similar informative tutorials.
Got any questions about python for data science tutorial? Ask us in the comment section below.
----------------------------
Intellipaat Edge
1. 24*7 Life time Access & Support
2. Flexible Class Schedule
3. Job Assistance
4. Mentors with +14 yrs
5. Industry Oriented Course ware
6. Life time free Course Upgrade
------------------------------
Why should you watch this Python for Data Science tutorial?
You can learn Data Science much faster than any other technology and this Data Science tutorial helps you do just that. Data Science is one of the best technological advances that is finding increased applications for machine learning and in a lot of industry domains. We are offering the top Data Science tutorial to gain knowledge in Data Science.
Who should watch this Python for Data Science tutorial video?
If you want to learn what is Data Science to become a Data Scientist then this Intellipaat Data Science tutorial is for you. The Intellipaat Data Science video is your first step to learn Data Science. Since this Data Science tutorial video can be taken by anybody, so if you are a beginner in technology then you can also enroll for Data Science training to take your skills to the next level.
Why should you opt for a Python for Data Science career?
If you want to fast-track your career then you should strongly consider Data Science. The reason for this is that it is one of the fastest growing technology. There is a huge demand for Data Scientist. The salaries for Data Scientist is fantastic.There is a huge growth opportunity in this domain as well. Hence this Intellipaat Data Science with tutorial is your stepping stone to a successful career!
------------------------------
For more Information:
Please write us to sales@intellipaat.com, or call us at: +91- 7847955955
Website: https://intellipaat.com/python-for-data-science-training/
Facebook: https://www.facebook.com/intellipaatonline
LinkedIn: https://www.linkedin.com/in/intellipaat/
Twitter: https://twitter.com/Intellipaat

detail

{'title': 'Python for Data Science | Data Science With Python | Python Data Science Tutorial | Intellipaat', 'heatmap': [{'end': 2819.126, 'start': 2600.356, 'weight': 1}], 'summary': 'This tutorial series on python for data science covers topics like pandas and numpy, data manipulation, machine learning, linear regression, logistic regression, decision tree analysis, and clustering with implementations achieving mean accuracies of 97.299% and 77.124% on various datasets.', 'chapters': [{'end': 854.549, 'segs': [{'end': 78.425, 'src': 'embed', 'start': 53.316, 'weight': 0, 'content': [{'end': 58.258, 'text': "Going ahead, we'll work with two classification algorithms, which are decision tree and Naive Bayes.", 'start': 53.316, 'duration': 4.942}, {'end': 62.339, 'text': "And finally, we'll work with an unsupervised learning algorithm, which is K-Means.", 'start': 58.638, 'duration': 3.701}, {'end': 65.18, 'text': "So let's start with what exactly is Panda?", 'start': 62.639, 'duration': 2.541}, {'end': 71.782, 'text': 'Well, Panda is an open source, simple and powerful Python library which is mainly used for data manipulation and data analysis.', 'start': 65.26, 'duration': 6.522}, {'end': 75.564, 'text': 'Okay, now you might be wondering where does this name, Panda, come from?', 'start': 72.063, 'duration': 3.501}, {'end': 78.425, 'text': 'so what do you think is the name of an animal?', 'start': 76.104, 'duration': 2.321}], 'summary': 'Working with decision tree, naive bayes, and k-means algorithms in data analysis using panda, a python library.', 'duration': 25.109, 'max_score': 53.316, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN453316.jpg'}, {'end': 139.498, 'src': 'embed', 'start': 113.197, 'weight': 1, 'content': [{'end': 117.221, 'text': 'so on, number one we have is series object and data frame.', 'start': 113.197, 'duration': 4.024}, {'end': 121.004, 'text': 'well, this is the primary and one of the most important features of panda.', 'start': 117.221, 'duration': 3.783}, {'end': 127.549, 'text': 'it allows it to deal with one dimensional and two dimensional label data in the form of series object and data frame.', 'start': 121.004, 'duration': 6.545}, {'end': 130.612, 'text': "don't worry, we'll discuss about them in detail later in our session.", 'start': 127.549, 'duration': 3.063}, {'end': 135.135, 'text': 'okay, next, on number two we have is handling of missing data.', 'start': 130.612, 'duration': 4.523}, {'end': 139.498, 'text': 'well, whenever you have a missing data, panda represented as nan.', 'start': 135.135, 'duration': 4.363}], 'summary': 'Pandas can handle 1d and 2d label data with series object and data frame, and represent missing data as nan.', 'duration': 26.301, 'max_score': 113.197, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4113197.jpg'}, {'end': 308.011, 'src': 'embed', 'start': 280.865, 'weight': 2, 'content': [{'end': 287.888, 'text': 'So the first point that we have is Panda works better with NumPy while dealing with data which has more than 500, 000 rows.', 'start': 280.865, 'duration': 7.023}, {'end': 293.251, 'text': 'And when you are using NumPy, so it performs better for 50, 000 rows or less.', 'start': 288.649, 'duration': 4.602}, {'end': 297.633, 'text': 'So in working with large real world data, you would need Panda for computation.', 'start': 293.971, 'duration': 3.662}, {'end': 308.011, 'text': 'Next we have is panda series object is more flexible as you can use it to define your own label index to index and access elements of an array.', 'start': 299.187, 'duration': 8.824}], 'summary': 'Pandas is better for data with over 500,000 rows, while numpy is suitable for 50,000 rows or less. pandas series object offers flexibility for defining label indexes.', 'duration': 27.146, 'max_score': 280.865, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4280865.jpg'}, {'end': 354.874, 'src': 'embed', 'start': 323.959, 'weight': 3, 'content': [{'end': 327.361, 'text': 'So for that, we have a command import pandas as pd.', 'start': 323.959, 'duration': 3.402}, {'end': 328.142, 'text': "That's it.", 'start': 327.761, 'duration': 0.381}, {'end': 337.459, 'text': 'Okay Next is what kind of data does suit Pandas the most? So we have tabular data, time series data and arbitrary matrix data.', 'start': 328.702, 'duration': 8.757}, {'end': 340.322, 'text': 'All of these kind of data you can feed it to the Pandas.', 'start': 337.68, 'duration': 2.642}, {'end': 348.629, 'text': 'So now that we have learned the history features and how to import Panda, let us dive deep and see what kind of data suits Pandas the most.', 'start': 341.062, 'duration': 7.567}, {'end': 354.874, 'text': 'So deals with tabular data with heterogeneously typed columns as an SQL table or Excel spreadsheet.', 'start': 348.849, 'duration': 6.025}], 'summary': 'Pandas is well-suited for tabular data, time series data, and arbitrary matrix data, handling heterogeneously typed columns like an sql table or excel spreadsheet.', 'duration': 30.915, 'max_score': 323.959, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4323959.jpg'}, {'end': 453.035, 'src': 'embed', 'start': 371.725, 'weight': 4, 'content': [{'end': 377.268, 'text': 'so you can work on one dimensional, two dimensional or any multi-dimensional data set in pandas.', 'start': 371.725, 'duration': 5.543}, {'end': 381.03, 'text': "if you are working on a one dimensional data set, it's known as a series object.", 'start': 377.268, 'duration': 3.762}, {'end': 385.433, 'text': 'if you are working on a two dimensional, the panda will create a data frame for it.', 'start': 381.03, 'duration': 4.403}, {'end': 389.956, 'text': 'if you are working on more than two dimensional data, panda would create a panel data for it.', 'start': 385.433, 'duration': 4.523}, {'end': 393.278, 'text': 'okay, now you might be wondering what are these terminologies right?', 'start': 389.956, 'duration': 3.322}, {'end': 395.46, 'text': "so let's have a look at them one by one.", 'start': 393.658, 'duration': 1.802}, {'end': 398.664, 'text': "so for this session we'll be mainly focusing on series object.", 'start': 395.46, 'duration': 3.204}, {'end': 402.709, 'text': 'so what exactly is the series object?', 'start': 398.664, 'duration': 4.045}, {'end': 411.28, 'text': 'well, a series object in panda is a one labeled array which is capable of holding any data type, like integers, strings, floating point number,', 'start': 402.709, 'duration': 8.571}, {'end': 412.281, 'text': 'python object, etc.', 'start': 411.28, 'duration': 1.001}, {'end': 416.748, 'text': 'Okay, it can contain data of similar or any mixed data type.', 'start': 413.024, 'duration': 3.724}, {'end': 423.575, 'text': "You can also say that a series in Panda is quite similar to a list or an array in Python, but yet it's more powerful.", 'start': 417.048, 'duration': 6.527}, {'end': 430.382, 'text': 'It represents a series of both numeric and non-numeric values as a column of data containing any type of mixed data.', 'start': 424.035, 'duration': 6.347}, {'end': 435.986, 'text': "So you must remember the fact that a series can contain any type of data, even if it's of a mixed data type.", 'start': 430.822, 'duration': 5.164}, {'end': 440.029, 'text': "Okay Now let's see how we can create a series object in pandas.", 'start': 436.466, 'duration': 3.563}, {'end': 444.673, 'text': 'So here in this example, we have a list of numeric data stored in data.', 'start': 440.609, 'duration': 4.064}, {'end': 448.291, 'text': 'Okay And now we want to create a series object out of it.', 'start': 445.153, 'duration': 3.138}, {'end': 453.035, 'text': 'So for this, Panda has a function series with capital S.', 'start': 448.671, 'duration': 4.364}], 'summary': 'Pandas allows working with one, two, or multi-dimensional data sets, creating series, data frames, and panel data, with a focus on series objects in this session.', 'duration': 81.31, 'max_score': 371.725, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4371725.jpg'}, {'end': 563.366, 'src': 'embed', 'start': 532.609, 'weight': 7, 'content': [{'end': 536.112, 'text': 'Now, what if I want to check if it is a series object or not?', 'start': 532.609, 'duration': 3.503}, {'end': 541.956, 'text': 'So you can do it simply by typing the command type and, inside that, pass the name of the object.', 'start': 536.272, 'duration': 5.684}, {'end': 545.418, 'text': 'So here we have passed series one as the name of the object.', 'start': 542.176, 'duration': 3.242}, {'end': 549.163, 'text': "Let's check it in our Jupyter notebook, what it has to say.", 'start': 545.782, 'duration': 3.381}, {'end': 558.225, 'text': 'So in order to check the type, all you have to do is mention type and inside this, write the name of the series object, which is here, series one.', 'start': 549.823, 'duration': 8.402}, {'end': 563.366, 'text': "So if you execute it, you'll get the output as pandas.co.series.series.", 'start': 558.465, 'duration': 4.901}], 'summary': 'Check if a given object is a series by using the type command and passing the object name.', 'duration': 30.757, 'max_score': 532.609, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4532609.jpg'}, {'end': 595.761, 'src': 'embed', 'start': 572.149, 'weight': 8, 'content': [{'end': 578.612, 'text': "Now, what if I want to change the index name from 0123 to ABCD? So how will I do it? So let's see.", 'start': 572.149, 'duration': 6.463}, {'end': 584.895, 'text': 'So for that, all you have to do is add a new parameter index over there and define the index inside it.', 'start': 579.072, 'duration': 5.823}, {'end': 585.676, 'text': "That's it.", 'start': 585.295, 'duration': 0.381}, {'end': 586.516, 'text': "Let's see how.", 'start': 585.976, 'duration': 0.54}, {'end': 595.761, 'text': 'So what we are doing up here, changing the index of a series object.', 'start': 587.216, 'duration': 8.545}], 'summary': "Changing index name from 0123 to abcd by adding a new parameter 'index' and defining the index inside it.", 'duration': 23.612, 'max_score': 572.149, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4572149.jpg'}, {'end': 699.223, 'src': 'embed', 'start': 650.492, 'weight': 9, 'content': [{'end': 653.534, 'text': 'Okay I hope the concept of series object is clear to you.', 'start': 650.492, 'duration': 3.042}, {'end': 660.644, 'text': 'Well, as I told you earlier, in Pandas, two-dimensional data is found in the form of data frames.', 'start': 655.322, 'duration': 5.322}, {'end': 664.705, 'text': 'So now the question arises what exactly is a data frame?', 'start': 661.244, 'duration': 3.461}, {'end': 672.228, 'text': 'Well, a data frame is a two-dimensional label data with columns which can contain data of different types.', 'start': 665.425, 'duration': 6.803}, {'end': 674.869, 'text': 'Correct? Let me show you an example.', 'start': 672.908, 'duration': 1.961}, {'end': 684.052, 'text': 'So here in this example, you can see we have a two-dimensional data with the column label on the x-axis and the index on the y-axis.', 'start': 675.389, 'duration': 8.663}, {'end': 686.957, 'text': "Correct? So there's an example of a data frame.", 'start': 684.532, 'duration': 2.425}, {'end': 693.14, 'text': "So now that you know what exactly is a data frame, let's move ahead and see what are the different features of a data frame.", 'start': 687.277, 'duration': 5.863}, {'end': 699.223, 'text': 'So a data frame contains column of different type.', 'start': 696.482, 'duration': 2.741}], 'summary': 'Pandas has two-dimensional data in data frames with columns of different types.', 'duration': 48.731, 'max_score': 650.492, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4650492.jpg'}, {'end': 822.656, 'src': 'embed', 'start': 793.398, 'weight': 10, 'content': [{'end': 801.284, 'text': 'so, as you can see, on our x axis we got the label as 0 and on the y axis we got the index as 0, 1, 2, 3 and 4..', 'start': 793.398, 'duration': 7.886}, {'end': 805.126, 'text': "okay. so there's a simple example of a data frame.", 'start': 801.284, 'duration': 3.842}, {'end': 807.708, 'text': 'so this data frame was created using list.', 'start': 805.126, 'duration': 2.582}, {'end': 817.293, 'text': 'so let me just add this Creating a data frame using a list, okay.', 'start': 807.708, 'duration': 9.585}, {'end': 818.354, 'text': "So next what I'll do?", 'start': 817.293, 'duration': 1.061}, {'end': 822.656, 'text': "I'll show you an example of creating a data frame using a dictionary.", 'start': 818.354, 'duration': 4.302}], 'summary': 'Creating data frames using list and dictionary examples shown.', 'duration': 29.258, 'max_score': 793.398, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4793398.jpg'}], 'start': 2.961, 'title': 'Data science with python: pandas and numpy', 'summary': 'Introduces the relevance of data science, compares pandas with numpy, and explains series object in pandas. it also covers data frame creation, demonstrating list and dictionary usage.', 'chapters': [{'end': 623.71, 'start': 2.961, 'title': 'Data science with python: pandas and numpy', 'summary': 'Introduces the relevance of data science in everyday technologies, outlines the agenda for the session, discusses the features and comparison of pandas with numpy, and provides a detailed explanation of the series object in pandas, with examples of creating and modifying a series object.', 'duration': 620.749, 'highlights': ['The chapter introduces the relevance of data science in everyday technologies, outlines the agenda for the session, discusses the features and comparison of Pandas with Numpy, and provides a detailed explanation of the series object in Pandas, with examples of creating and modifying a series object. Relevance of data science in everyday technologies, agenda for the session, features and comparison of Pandas with Numpy, detailed explanation of the series object in Pandas', 'Pandas is an open source, simple and powerful Python library primarily used for data manipulation and analysis, created by American software developer and businessman Wes McKinney in 2015. Pandas as an open source, simple and powerful Python library, created by Wes McKinney in 2015', 'Pandas features include series object and data frame, handling of missing data, data alignment, group by functionality, slicing, indexing, subsetting, merging and joining, reshaping, hierarchical labeling of axis, robust input/output tool, and time series specific functionality. Features of Pandas: series object, data frame, handling missing data, data alignment, group by functionality, slicing, indexing, subsetting, merging, joining, reshaping, hierarchical labeling of axis, input/output tool, time series specific functionality', 'Pandas works better than NumPy for data with more than 500,000 rows, and it is more flexible in defining label index to access elements of an array. Pandas superiority over NumPy for data with more than 500,000 rows, flexibility in defining label index', 'Pandas is suitable for tabular data, time series data, and arbitrary matrix data. Suitability of Pandas for tabular data, time series data, arbitrary matrix data', 'Pandas can work with one-dimensional, two-dimensional, or multi-dimensional data sets, creating series object, data frame, or panel data accordingly. Pandas capability to work with one-dimensional, two-dimensional, or multi-dimensional data sets', 'A series object in Pandas is a one-labeled array capable of holding any data type and is more powerful than a list or array in Python. Definition of series object in Pandas, its capability and comparison with list or array in Python', "The process of creating a series object involves using the 'series' function in Pandas with the given data. Process of creating a series object using the 'series' function in Pandas", "The type of a series object can be checked using the 'type' command, which confirms its status as a series object. Checking the type of a series object using the 'type' command", "The index name of a series object can be changed by adding a new parameter 'index' and defining the desired index names. Process of changing the index name of a series object by adding a new parameter 'index'"]}, {'end': 854.549, 'start': 624.434, 'title': 'Pandas data frame creation', 'summary': 'Explains the creation of a data frame in pandas, including the concept of series objects and two-dimensional label data with columns containing different types of data. it also demonstrates how to create a data frame using a list and a dictionary.', 'duration': 230.115, 'highlights': ['The chapter explains the concept of series objects and two-dimensional label data with columns containing different types of data.', 'It demonstrates the creation of a data frame using a list and a dictionary.', 'A data frame is a two-dimensional label data with columns containing data of different types.', 'Pandas allows for the creation of a data frame using different arrays, dictionaries, or any scalar or constant value.']}], 'duration': 851.588, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN42961.jpg', 'highlights': ['Pandas as an open source, simple and powerful Python library, created by Wes McKinney in 2015', 'Features of Pandas: series object, data frame, handling missing data, data alignment, group by functionality, slicing, indexing, subsetting, merging, joining, reshaping, hierarchical labeling of axis, input/output tool, time series specific functionality', 'Pandas superiority over NumPy for data with more than 500,000 rows, flexibility in defining label index', 'Suitability of Pandas for tabular data, time series data, arbitrary matrix data', 'Pandas capability to work with one-dimensional, two-dimensional, or multi-dimensional data sets', 'Definition of series object in Pandas, its capability and comparison with list or array in Python', "Process of creating a series object using the 'series' function in Pandas", "Checking the type of a series object using the 'type' command", "Process of changing the index name of a series object by adding a new parameter 'index'", 'The chapter explains the concept of series objects and two-dimensional label data with columns containing different types of data', 'It demonstrates the creation of a data frame using a list and a dictionary', 'A data frame is a two-dimensional label data with columns containing data of different types', 'Pandas allows for the creation of a data frame using different arrays, dictionaries, or any scalar or constant value']}, {'end': 2460.309, 'segs': [{'end': 1112.367, 'src': 'embed', 'start': 1086.405, 'weight': 0, 'content': [{'end': 1093.713, 'text': 'okay. so, moving on ahead, our session will be on merge, join and concatenate operations that can be done on the data frame.', 'start': 1086.405, 'duration': 7.308}, {'end': 1099.358, 'text': 'you will see how you can perform merge operation on two different data frames using pandas,', 'start': 1093.713, 'duration': 5.645}, {'end': 1105.485, 'text': 'how you can perform join operation between them and how you can perform concatenation between them.', 'start': 1099.358, 'duration': 6.127}, {'end': 1112.367, 'text': 'okay, So, without delaying any further, let me just switch back to my Jupyter notebook and let me show you how you can perform these operations.', 'start': 1105.485, 'duration': 6.882}], 'summary': 'Session on merge, join, and concatenate operations in data frames using pandas.', 'duration': 25.962, 'max_score': 1086.405, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN41086405.jpg'}, {'end': 1546.649, 'src': 'embed', 'start': 1517.596, 'weight': 1, 'content': [{'end': 1520.117, 'text': 'So that is why it mentioned NAN over here.', 'start': 1517.596, 'duration': 2.521}, {'end': 1523.479, 'text': 'Okay So this is how you perform a left merge.', 'start': 1520.437, 'duration': 3.042}, {'end': 1526.501, 'text': "Now next, let's see how to do a right merge.", 'start': 1524.14, 'duration': 2.361}, {'end': 1530.561, 'text': "So, again, you don't have to do anything.", 'start': 1528.68, 'duration': 1.881}, {'end': 1533.142, 'text': 'Just change left to right.', 'start': 1531.081, 'duration': 2.061}, {'end': 1535.703, 'text': 'So, left becomes right.', 'start': 1534.223, 'duration': 1.48}, {'end': 1537.424, 'text': 'So, you got this value.', 'start': 1536.244, 'duration': 1.18}, {'end': 1542.167, 'text': "So, why I'm getting this value? So, similar to left merge, now it's performing right merge.", 'start': 1537.704, 'duration': 4.463}, {'end': 1544.848, 'text': 'So, all the values from your right table would be here.', 'start': 1542.207, 'duration': 2.641}, {'end': 1546.649, 'text': 'Okay So, player 1, 5, 6.', 'start': 1545.248, 'duration': 1.401}], 'summary': 'Demonstrating left and right merge with examples, including 3 players', 'duration': 29.053, 'max_score': 1517.596, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN41517596.jpg'}, {'end': 1975.763, 'src': 'embed', 'start': 1949.576, 'weight': 3, 'content': [{'end': 1954.2, 'text': 'So for importing the data set, the very first thing that you will be doing is importing the pandas library.', 'start': 1949.576, 'duration': 4.624}, {'end': 1955.761, 'text': "So I'm importing pandas as PD.", 'start': 1954.28, 'duration': 1.481}, {'end': 1959.458, 'text': 'So next, we are going to read the data set and store it in a data frame.', 'start': 1956.257, 'duration': 3.201}, {'end': 1961.879, 'text': "So I'm defining a data frame named cars.", 'start': 1960.058, 'duration': 1.821}, {'end': 1966.98, 'text': "And inside this, I'm passing the path of my CSV file, which is empty cars to dot CSV.", 'start': 1962.239, 'duration': 4.741}, {'end': 1970.281, 'text': 'At this point, you can specify the path to your CSV file.', 'start': 1967.42, 'duration': 2.861}, {'end': 1972.462, 'text': 'Okay And finally, we are printing it.', 'start': 1970.641, 'duration': 1.821}, {'end': 1973.742, 'text': "Let's see the output.", 'start': 1973.002, 'duration': 0.74}, {'end': 1975.763, 'text': 'So this is my entire data.', 'start': 1974.243, 'duration': 1.52}], 'summary': 'Import data using pandas library, read csv file into data frame, and print it.', 'duration': 26.187, 'max_score': 1949.576, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN41949576.jpg'}, {'end': 2332.405, 'src': 'embed', 'start': 2306.677, 'weight': 2, 'content': [{'end': 2315.919, 'text': 'we took the mean of the entire 29 and whatever value we got for the mean, we replaced all the null values with that particular value.', 'start': 2306.677, 'duration': 9.242}, {'end': 2321.321, 'text': 'okay, so replacing the missing data with the mean value will help analyzing the data in a better way.', 'start': 2315.919, 'duration': 5.402}, {'end': 2324.801, 'text': "okay. So there's how we can replace the null values with the mean of the column.", 'start': 2321.321, 'duration': 3.48}, {'end': 2327.123, 'text': 'Next is drop the unwanted column.', 'start': 2325.282, 'duration': 1.841}, {'end': 2332.405, 'text': 'So here we can drop all those unwanted column which do not add any value in analyzing the data.', 'start': 2327.563, 'duration': 4.842}], 'summary': 'Replaced null values with mean, dropped unwanted columns.', 'duration': 25.728, 'max_score': 2306.677, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN42306677.jpg'}], 'start': 854.549, 'title': 'Pandas data operations', 'summary': 'Covers creating data frames and series using pandas, merging and joining data frames, and importing and analyzing datasets. it includes demonstrations of merge, join, and concatenate operations, statistical calculations, and data cleaning.', 'chapters': [{'end': 1086.405, 'start': 854.549, 'title': 'Creating data frames and series', 'summary': 'Demonstrates creating data frames and series using pandas, including creating a data frame from a dictionary and a series, and creating a data frame from a numpy array, with examples and code demonstrations.', 'duration': 231.856, 'highlights': ["Creating a data frame using a series, where a one-dimensional series is converted to a two-dimensional data frame with index as 'a' and 'b'.", "Creating a data frame using a dictionary, showcasing the labels as 'fruits' and 'count' and the index as 0, 1, and 2.", "Creating a data frame out of a numpy array with column names 'name' and 'salary', with index numbers 0 and 1."]}, {'end': 1542.167, 'start': 1086.405, 'title': 'Dataframe merge operations', 'summary': 'Covers merge, join, and concatenate operations on data frames using pandas, demonstrating inner, left, and right merge operations, with a focus on common attributes and resulting values.', 'duration': 455.762, 'highlights': ['The chapter covers merge, join, and concatenate operations on data frames using Pandas. The chapter introduces merge, join, and concatenate operations on data frames using Pandas library.', 'Demonstrating inner, left, and right merge operations. The transcript demonstrates the inner, left, and right merge operations on data frames, showcasing the differences in resulting values.', 'Focus on common attributes and resulting values. The chapter emphasizes the identification of common attributes and the resulting values after merge operations on data frames.']}, {'end': 1882.154, 'start': 1542.207, 'title': 'Data frame merging and joining', 'summary': 'Covers merging data frames using inner, left, right, and outer joins, demonstrating the use of these operations in pandas, while explaining the differences between merge and join, and how to perform join operations based on index values.', 'duration': 339.947, 'highlights': ['The chapter covers merging data frames using inner, left, right, and outer joins The chapter provides a comprehensive overview of merging data frames using inner, left, right, and outer joins in pandas, showcasing the versatility of these operations.', 'Explaining the differences between merge and join The chapter elucidates the differences between merge and join operations, highlighting that merge is based on attribute names while join is based on index values, and clarifying the usage of inner, left, right, and outer joins in both.', 'Demonstrating how to perform join operations based on index values The chapter demonstrates the process of performing join operations based on index values in pandas, emphasizing the necessity of specifying the set index while performing join and providing a clear comparison between inner, left, right, and outer joins based on index values.']}, {'end': 2460.309, 'start': 1882.874, 'title': 'Pandas data analysis', 'summary': 'Covers how to import and analyze a dataset using pandas, including concatenating dataframes, importing a dataset, analyzing data attributes, calculating statistics, data cleaning, and finding the correlation matrix.', 'duration': 577.435, 'highlights': ['Executing pd.concat to concatenate data frame 3 with data frame 4, resulting in a concatenated data frame. Performed concatenation using pd.concat to merge data frame 3 and data frame 4, obtaining the concatenated result.', "Importing a dataset using pandas, defining a data frame 'cars', reading the CSV file 'empty_cars.csv', and printing the entire dataset. Imported a dataset by defining a data frame 'cars', reading the CSV file 'empty_cars.csv', and displaying the complete dataset.", 'Analyzing the dataset by checking type, viewing first and last records, checking data frame shape, getting column summary, calculating mean, median, standard deviation, maximum and minimum values, counting non-null records, and obtaining descriptive statistics summary using data frame functions. Analyzed the dataset by checking type, viewing first and last records, data frame shape, column summary, calculating mean, median, standard deviation, maximum and minimum values, counting non-null records, and obtaining descriptive statistics summary.', 'Explaining the process of data cleaning, including renaming columns, filling null values with the mean, dropping unwanted columns, and finding the correlation matrix to identify variable relationships. Explained the data cleaning process, involving renaming columns, filling null values with the mean, dropping unwanted columns, and finding the correlation matrix to identify variable relationships.']}], 'duration': 1605.76, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN4854549.jpg', 'highlights': ['The chapter covers merging data frames using inner, left, right, and outer joins in pandas, showcasing the versatility of these operations.', 'Demonstrating inner, left, and right merge operations, showcasing the differences in resulting values.', 'Explaining the process of data cleaning, including renaming columns, filling null values with the mean, dropping unwanted columns, and finding the correlation matrix to identify variable relationships.', "Importing a dataset using pandas, defining a data frame 'cars', reading the CSV file 'empty_cars.csv', and printing the entire dataset."]}, {'end': 3731.663, 'segs': [{'end': 2548.351, 'src': 'embed', 'start': 2521.176, 'weight': 0, 'content': [{'end': 2526.418, 'text': 'and even for the qsec attribute, we have replaced all the uh null values from its mean right.', 'start': 2521.176, 'duration': 5.242}, {'end': 2528.858, 'text': "so here it's giving me 32 null values.", 'start': 2526.418, 'duration': 2.44}, {'end': 2532.179, 'text': 'that is now it is not having any null values in it.', 'start': 2528.858, 'duration': 3.321}, {'end': 2534.36, 'text': "okay, let's move ahead.", 'start': 2532.179, 'duration': 2.181}, {'end': 2536.602, 'text': 'so again, we are finding the correlation.', 'start': 2534.36, 'duration': 2.242}, {'end': 2541.265, 'text': 'this time the mpg part is included as it has become of a floating type.', 'start': 2536.602, 'duration': 4.663}, {'end': 2548.351, 'text': 'remember, while calculating the correlation between two attributes, both the attributes should be numerical in nature.', 'start': 2541.265, 'duration': 7.086}], 'summary': 'Replaced null values with mean for qsec attribute, resulting in 32 null values. calculated correlation with mpg as a floating type attribute.', 'duration': 27.175, 'max_score': 2521.176, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN42521176.jpg'}, {'end': 2599.666, 'src': 'embed', 'start': 2560.817, 'weight': 1, 'content': [{'end': 2564.939, 'text': 'right now, since we have changed the data type of mpg to float over here.', 'start': 2560.817, 'duration': 4.122}, {'end': 2569.561, 'text': 'so, uh, next, while finding the correlation, we are getting mpg value.', 'start': 2564.939, 'duration': 4.622}, {'end': 2576.183, 'text': 'okay, now python is treating it as a numerical attribute and it is using it to find the correlation.', 'start': 2569.561, 'duration': 6.622}, {'end': 2579.945, 'text': 'okay, so this was all about how you can clean your data set.', 'start': 2576.183, 'duration': 3.762}, {'end': 2587.295, 'text': "okay, So in this session you'll learn about how to perform indexing on the data set, how to set a value for a column,", 'start': 2579.945, 'duration': 7.35}, {'end': 2593.501, 'text': "how to manipulate the data using lambda function, how to sort a column and finally, you'll see how to filter your records.", 'start': 2587.295, 'duration': 6.206}, {'end': 2599.666, 'text': "So without delaying any further, let's start with our first topic, Manipulating Data Set Indexing by Position.", 'start': 2594.121, 'duration': 5.545}], 'summary': 'Data set is cleaned, and indexing, value setting, manipulation, sorting, and filtering are shown in python.', 'duration': 38.849, 'max_score': 2560.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN42560817.jpg'}, {'end': 2819.126, 'src': 'heatmap', 'start': 2600.356, 'weight': 1, 'content': [{'end': 2602.718, 'text': 'Well, what if you want to view a single column?', 'start': 2600.356, 'duration': 2.362}, {'end': 2603.699, 'text': 'So how will you do that?', 'start': 2602.838, 'duration': 0.861}, {'end': 2610.704, 'text': 'Well, Panda gives you a functionality of iloc function in which you can specify the parameters to view a single column.', 'start': 2604.039, 'duration': 6.665}, {'end': 2613.887, 'text': 'So, for example, I want to view HP column only.', 'start': 2611.244, 'duration': 2.643}, {'end': 2621.452, 'text': 'So, my HP column belongs to fourth row, right? So, I mentioned the iloc colon which means row.', 'start': 2614.347, 'duration': 7.105}, {'end': 2624.455, 'text': 'So, entire row comma fourth column.', 'start': 2621.753, 'duration': 2.702}, {'end': 2630.441, 'text': 'Okay, so what it will give me the fourth column of the data set, which is nothing but the HP column.', 'start': 2624.995, 'duration': 5.446}, {'end': 2632.283, 'text': "Okay, let's move ahead.", 'start': 2631.082, 'duration': 1.201}, {'end': 2636.567, 'text': 'Next is what if I want to view first five record of the HP column.', 'start': 2632.823, 'duration': 3.744}, {'end': 2640.071, 'text': "So in it, I'll mention this is the number of row.", 'start': 2637.248, 'duration': 2.823}, {'end': 2643.495, 'text': 'So I love row starting from zero till five.', 'start': 2640.211, 'duration': 3.284}, {'end': 2647.379, 'text': 'Okay, actually zero till four just before five.', 'start': 2644.235, 'duration': 3.144}, {'end': 2650.282, 'text': 'and the column fourth.', 'start': 2648.28, 'duration': 2.002}, {'end': 2653.964, 'text': 'so it will give me the first five result from the column HP.', 'start': 2650.282, 'duration': 3.682}, {'end': 2657.707, 'text': 'okay, next, is what if I want to view all the rows and all the columns?', 'start': 2653.964, 'duration': 3.743}, {'end': 2661.13, 'text': 'so all you have to do is I lock colon comma colon.', 'start': 2657.707, 'duration': 3.423}, {'end': 2663.575, 'text': 'okay, First colon means all the rows.', 'start': 2661.13, 'duration': 2.445}, {'end': 2665.337, 'text': 'Second column means all the column.', 'start': 2663.935, 'duration': 1.402}, {'end': 2667.098, 'text': 'So it will give you the entire data set.', 'start': 2665.677, 'duration': 1.421}, {'end': 2668.939, 'text': "Next is let's see.", 'start': 2667.678, 'duration': 1.261}, {'end': 2674.283, 'text': 'I want to see all the column from HP to car and see all the record from index number six.', 'start': 2668.999, 'duration': 5.284}, {'end': 2680.688, 'text': 'So how will I do that? So for that I have row start from six till the end.', 'start': 2674.764, 'duration': 5.924}, {'end': 2684.432, 'text': 'and column start from fourth till the end.', 'start': 2681.249, 'duration': 3.183}, {'end': 2689.376, 'text': 'so fourth column till the end and row from sixth column till the end.', 'start': 2684.432, 'duration': 4.944}, {'end': 2691.057, 'text': "okay, let's move ahead.", 'start': 2689.376, 'duration': 1.681}, {'end': 2694.16, 'text': 'so now we want to look at all the rows from the first column.', 'start': 2691.057, 'duration': 3.103}, {'end': 2697.723, 'text': 'so again, so here inside iloc, i have mentioned colon.', 'start': 2694.16, 'duration': 3.563}, {'end': 2699.524, 'text': 'so this represents the row.', 'start': 2697.723, 'duration': 1.801}, {'end': 2702.186, 'text': 'so we need all the rows.', 'start': 2699.524, 'duration': 2.662}, {'end': 2705.129, 'text': 'so for representing all the rows, we have colon.', 'start': 2702.186, 'duration': 2.943}, {'end': 2708.131, 'text': 'okay, and the next parameter is the column number.', 'start': 2705.129, 'duration': 3.002}, {'end': 2709.813, 'text': 'so our column number is one.', 'start': 2708.131, 'duration': 1.682}, {'end': 2715.157, 'text': 'so, as you can see in the output, we got all the rows from our first column.', 'start': 2710.593, 'duration': 4.564}, {'end': 2716.938, 'text': "okay, let's move ahead.", 'start': 2715.157, 'duration': 1.781}, {'end': 2721.202, 'text': 'well, you can even, uh, view your record on the base of the name of the attribute.', 'start': 2716.938, 'duration': 4.264}, {'end': 2724.625, 'text': 'okay. so, for example, i want to see all the record of mpg column.', 'start': 2721.202, 'duration': 3.423}, {'end': 2725.705, 'text': 'so how will i do that?', 'start': 2724.625, 'duration': 1.08}, {'end': 2729.288, 'text': "so, instead of ilock, first of all i'll be using lock over here.", 'start': 2725.705, 'duration': 3.583}, {'end': 2730.669, 'text': 'that is location.', 'start': 2729.288, 'duration': 1.381}, {'end': 2733.732, 'text': 'okay. then inside that the row and the column.', 'start': 2730.669, 'duration': 3.063}, {'end': 2738.876, 'text': 'so colon includes all the rows and the column is mpg.', 'start': 2733.732, 'duration': 5.144}, {'end': 2739.797, 'text': 'so what it will give?', 'start': 2738.876, 'duration': 0.921}, {'end': 2743.399, 'text': 'it will give me all the rows from the mpg column.', 'start': 2740.217, 'duration': 3.182}, {'end': 2748.981, 'text': 'okay, next is, uh, display the record from index number 0 to index 6 from mpg column.', 'start': 2743.399, 'duration': 5.582}, {'end': 2750.182, 'text': 'so similarly.', 'start': 2748.981, 'duration': 1.201}, {'end': 2756.404, 'text': 'so lock starting from 0 until 6 and from which column in mpg column.', 'start': 2750.182, 'duration': 6.222}, {'end': 2761.487, 'text': 'so it gave me records from index number 0 to index number 6 from mpg column.', 'start': 2756.404, 'duration': 5.083}, {'end': 2767.031, 'text': 'okay, next is to see first seven record from mpg to qsec column.', 'start': 2761.487, 'duration': 5.544}, {'end': 2768.573, 'text': "so here what i'll do.", 'start': 2767.031, 'duration': 1.542}, {'end': 2772.396, 'text': 'so inside, uh, location, what i have mentioned the row.', 'start': 2768.573, 'duration': 3.823}, {'end': 2774.618, 'text': 'so starting from zero till six.', 'start': 2772.396, 'duration': 2.222}, {'end': 2776.22, 'text': 'so how many records are over there?', 'start': 2774.618, 'duration': 1.602}, {'end': 2779.302, 'text': "it's like zero, one, two, three, four, five and six.", 'start': 2776.22, 'duration': 3.082}, {'end': 2782.566, 'text': 'so in total there are seven records From which column.', 'start': 2779.302, 'duration': 3.264}, {'end': 2787.011, 'text': 'so a column should start from mpg and it should enter q sec.', 'start': 2782.566, 'duration': 4.445}, {'end': 2788.794, 'text': 'So from mpg to q sec.', 'start': 2787.011, 'duration': 1.783}, {'end': 2791.277, 'text': 'I need seven rows over here, fine.', 'start': 2788.794, 'duration': 2.483}, {'end': 2795.522, 'text': 'So this was all about how you can play with index values while manipulating your data set.', 'start': 2791.277, 'duration': 4.245}, {'end': 2798.754, 'text': 'Okay, so next is setting the value.', 'start': 2796.023, 'duration': 2.731}, {'end': 2804.598, 'text': 'well, if you want to set a specific value to the entire column, so you can do that easily using panda.', 'start': 2798.754, 'duration': 5.844}, {'end': 2806.118, 'text': 'so here, cars, is our data frame.', 'start': 2804.598, 'duration': 1.52}, {'end': 2808.8, 'text': 'so what you are mentioning cards of am.', 'start': 2806.118, 'duration': 2.682}, {'end': 2811.401, 'text': "so am is what it's a column name.", 'start': 2808.8, 'duration': 2.601}, {'end': 2814.463, 'text': 'okay, so data frame of am equal one.', 'start': 2811.401, 'duration': 3.062}, {'end': 2815.384, 'text': 'so what it will do?', 'start': 2814.463, 'duration': 0.921}, {'end': 2819.126, 'text': 'so it will assign value one to the entire column of am.', 'start': 2815.384, 'duration': 3.742}], 'summary': 'Pandas iloc and loc functions allow for easy manipulation of data, such as viewing single columns or setting specific values in columns.', 'duration': 218.77, 'max_score': 2600.356, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN42600356.jpg'}, {'end': 3128.321, 'src': 'embed', 'start': 3103.407, 'weight': 3, 'content': [{'end': 3109.729, 'text': 'Machine learning is a sub branch of artificial intelligence which focuses mainly on machine learning from their experience.', 'start': 3103.407, 'duration': 6.322}, {'end': 3116.716, 'text': 'It is a subset of artificial intelligence which gives the machine the ability to learn without being explicitly programmed.', 'start': 3110.293, 'duration': 6.423}, {'end': 3119.597, 'text': 'Here the data is the key and the learning algorithm.', 'start': 3117.076, 'duration': 2.521}, {'end': 3122.659, 'text': 'The data is the most important part of the machine learning algorithms.', 'start': 3119.737, 'duration': 2.922}, {'end': 3126.06, 'text': 'The machine trains itself according to the data provided to it.', 'start': 3123.059, 'duration': 3.001}, {'end': 3128.321, 'text': 'Let me explain you this with an analogy.', 'start': 3126.521, 'duration': 1.8}], 'summary': 'Machine learning is a subset of ai, focusing on learning from experience and training itself based on data.', 'duration': 24.914, 'max_score': 3103.407, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN43103407.jpg'}, {'end': 3550.109, 'src': 'embed', 'start': 3517.662, 'weight': 4, 'content': [{'end': 3519.063, 'text': 'So this is supervised learning.', 'start': 3517.662, 'duration': 1.401}, {'end': 3522.485, 'text': "So let's see some of the use cases of supervised learning.", 'start': 3519.963, 'duration': 2.522}, {'end': 3524.386, 'text': 'So first we have a spam classifier.', 'start': 3522.625, 'duration': 1.761}, {'end': 3529.37, 'text': "How do you think your mail is getting classified as whether it's a spam mail or not a spam mail?", 'start': 3524.747, 'duration': 4.623}, {'end': 3532.713, 'text': 'Well, the spam detection basically works on the basis of filter.', 'start': 3529.711, 'duration': 3.002}, {'end': 3536.578, 'text': 'These are setting that are constantly updated based on new technologies,', 'start': 3533.035, 'duration': 3.543}, {'end': 3541.322, 'text': 'new spam identification and the feedback given by Gmail user about potential spammers.', 'start': 3536.578, 'duration': 4.744}, {'end': 3550.109, 'text': 'Spam filter uses either text filter or it eliminates the thread based on sender and their history, whether the sender was previously reported or not.', 'start': 3542.042, 'duration': 8.067}], 'summary': 'Supervised learning use cases include spam classification based on filters and user feedback.', 'duration': 32.447, 'max_score': 3517.662, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN43517662.jpg'}], 'start': 2460.809, 'title': 'Data manipulation and machine learning', 'summary': 'Covers data cleaning, correlation analysis, data set indexing, value setting, sorting, and filtering. it also introduces machine learning, distinguishing it from ai and deep learning, and discusses the impact of robots and machine learning on the job market, highlighting examples of supervised and unsupervised learning applications.', 'chapters': [{'end': 2579.945, 'start': 2460.809, 'title': 'Data cleaning and correlation analysis', 'summary': "Explains the process of converting data types for manipulation, handling null values, and finding correlations, with a focus on the conversion of the 'mpg' attribute from string to float and its impact on correlation analysis.", 'duration': 119.136, 'highlights': ["The process of converting the 'mpg' attribute from string to float is explained, enabling data manipulation and calculation on the attribute, which is essential for further analysis.", "The removal of null values from the 'qsec' attribute by replacing them with the mean is highlighted, resulting in the elimination of 32 null values from the dataset.", "The significance of converting the 'mpg' attribute from string to float is emphasized, as it enables the inclusion of 'mpg' in correlation analysis, enhancing the overall analysis of the dataset.", "The importance of numerical data type for calculating correlation is underscored, with the demonstration of how the change in data type of 'mpg' attribute enables its consideration in correlation analysis, unlike when it was of string data type."]}, {'end': 3081.626, 'start': 2579.945, 'title': 'Data set indexing and manipulation, value setting, sorting, and filtering', 'summary': 'Covers manipulating data set indexing using iloc and loc functions, setting values for a column, applying lambda function to double records, sorting data in ascending and descending order, and filtering records based on conditions. the chapter also discusses the applications of machine learning such as product recommendation, amazon alexa, movie recommendation by netflix, and google traffic prediction.', 'duration': 501.681, 'highlights': ['The chapter covers manipulating data set indexing using iloc and loc functions, setting values for a column, applying lambda function to double records, sorting data in ascending and descending order, and filtering records based on conditions.', 'Amazon is using the technique of machine learning to build a recommendation engine which recommends product to you, and it is one of the application of machine learning used in Amazon.', 'The chapter discusses the applications of machine learning such as product recommendation, Amazon Alexa, movie recommendation by Netflix, and Google traffic prediction.', 'Google traffic prediction is an application of machine learning which predicts the traffic on a route when using Google maps for navigation.']}, {'end': 3394.851, 'start': 3082.182, 'title': 'Introduction to machine learning', 'summary': 'Introduces machine learning as a sub branch of artificial intelligence, emphasizing its ability to learn from experience and the process of training a machine with data, including the use of training and test data sets, and the importance of providing more data for accuracy. it also distinguishes between artificial intelligence, machine learning, and deep learning, highlighting their differences and relationships.', 'duration': 312.669, 'highlights': ['Machine learning is a subset of artificial intelligence which focuses mainly on machine learning from their experience. Emphasizes the focus of machine learning on learning from experience.', 'The machine trains itself according to the data provided to it. Explains the process of the machine training itself based on provided data.', 'The importance of providing more data for accuracy, as more data provided leads to a more accurate machine. Emphasizes the correlation between data quantity and machine accuracy.', 'Distinguishing between artificial intelligence, machine learning, and deep learning, highlighting their differences and relationships. Clear distinction and relationship between artificial intelligence, machine learning, and deep learning.', 'The process of training a machine with data, including the use of training and test data sets. Describes the process of training a machine with data, utilizing training and test data sets.']}, {'end': 3731.663, 'start': 3395.371, 'title': 'Impact of robots on job market', 'summary': 'Explains that robots and machine learning algorithms will create more job opportunities rather than taking away jobs, with examples of how supervised learning is used in spam detection and fingerprint analysis, and unsupervised learning in clustering images.', 'duration': 336.292, 'highlights': ['Supervised learning in spam detection involves text filters and client filters to identify and block spam emails, with modern algorithms capable of detecting misspellings and character substitutions.', "Supervised learning is also applied in fingerprint analysis, where the machine verifies a person's identity by comparing the fingerprint data stored in its system.", 'Unsupervised learning involves the machine identifying clusters of similar images without the need for labeled data, demonstrating its ability to group images based on similarities.', 'The chapter emphasizes that robots and machine learning algorithms will open up numerous job opportunities, as showcased by the example of Uber creating jobs through its use of machine learning algorithms.']}], 'duration': 1270.854, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN42460809.jpg', 'highlights': ["The removal of null values from the 'qsec' attribute by replacing them with the mean is highlighted, resulting in the elimination of 32 null values from the dataset.", "The process of converting the 'mpg' attribute from string to float is explained, enabling data manipulation and calculation on the attribute, which is essential for further analysis.", 'The chapter covers manipulating data set indexing using iloc and loc functions, setting values for a column, applying lambda function to double records, sorting data in ascending and descending order, and filtering records based on conditions.', 'Machine learning is a subset of artificial intelligence which focuses mainly on machine learning from their experience. Emphasizes the focus of machine learning on learning from experience.', 'Supervised learning in spam detection involves text filters and client filters to identify and block spam emails, with modern algorithms capable of detecting misspellings and character substitutions.']}, {'end': 4400.465, 'segs': [{'end': 3926.399, 'src': 'embed', 'start': 3882.563, 'weight': 0, 'content': [{'end': 3886.687, 'text': 'So it means that you can use Alexa to control things that are not even on the list.', 'start': 3882.563, 'duration': 4.124}, {'end': 3891.396, 'text': 'OK Well however Alexa needs internet connectivity and AVS to work.', 'start': 3887.087, 'duration': 4.309}, {'end': 3898.064, 'text': "If your internet connection is slow or it's not working then Alexa won't be available and your Echo will be useless.", 'start': 3891.757, 'duration': 6.307}, {'end': 3904.652, 'text': "So if Amazon decides to charge for a service or just close it down then in that case you'll be just left with a useless device.", 'start': 3898.525, 'duration': 6.127}, {'end': 3907.209, 'text': 'Well, Amazon is not the only one in town.', 'start': 3905.148, 'duration': 2.061}, {'end': 3915.513, 'text': 'We have Google, Apple and Microsoft, which offers similar services that can perform tasks by voice command in the form of Ok, Google,', 'start': 3907.609, 'duration': 7.904}, {'end': 3916.914, 'text': 'Siri and Cortana.', 'start': 3915.513, 'duration': 1.401}, {'end': 3921.296, 'text': 'They use similar approach voice commands that are processed in a cloud service,', 'start': 3917.254, 'duration': 4.042}, {'end': 3926.399, 'text': 'but most of them are not as flexible or as integrated with other service as Alexa is.', 'start': 3921.296, 'duration': 5.103}], 'summary': 'Alexa requires internet and avs to function; other competitors offer similar voice command services but are not as flexible or integrated.', 'duration': 43.836, 'max_score': 3882.563, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN43882563.jpg'}, {'end': 3976.803, 'src': 'embed', 'start': 3947.76, 'weight': 3, 'content': [{'end': 3952.943, 'text': 'So another use case of unsupervised learning is Netflix recommendation.', 'start': 3947.76, 'duration': 5.183}, {'end': 3959.368, 'text': "So more than 80% of the TV shows that you guys watch on Netflix are discovered through the platform's recommendation system.", 'start': 3953.344, 'duration': 6.024}, {'end': 3966.668, 'text': 'That means the majority of what you decide to watch on Netflix is the result of decisions made by a mysterious black box of an algorithm.', 'start': 3959.687, 'duration': 6.981}, {'end': 3969.3, 'text': 'Let me just give you a brief of how it works.', 'start': 3967.179, 'duration': 2.121}, {'end': 3976.803, 'text': 'Well, Netflix uses machine learning algorithm to recommend you the list of movies and find shows that you might not have initially chosen.', 'start': 3969.8, 'duration': 7.003}], 'summary': "Netflix's recommendation system drives over 80% of tv show discovery.", 'duration': 29.043, 'max_score': 3947.76, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN43947760.jpg'}, {'end': 4076.313, 'src': 'embed', 'start': 4047.21, 'weight': 4, 'content': [{'end': 4054.474, 'text': 'the data is generated from dozens of in-house and freelance staff watch every minute or every show on Netflix and tag it.', 'start': 4047.21, 'duration': 7.264}, {'end': 4065.804, 'text': "Now all these tags and the user behavior data are taken and is fed to a very sophisticated machine learning algorithm that figures out what's most important and what should it weigh like,", 'start': 4054.935, 'duration': 10.869}, {'end': 4069.607, 'text': 'or how much should it matter if a consumer has watched something yesterday.', 'start': 4065.804, 'duration': 3.803}, {'end': 4076.313, 'text': 'Should that count twice as much or 10 times as much compared to what they have watched a whole year ago.', 'start': 4069.987, 'duration': 6.326}], 'summary': 'In-house and freelance staff tag netflix content, fed into sophisticated ml algorithm for personalized recommendations.', 'duration': 29.103, 'max_score': 4047.21, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN44047210.jpg'}, {'end': 4329.113, 'src': 'embed', 'start': 4296.277, 'weight': 5, 'content': [{'end': 4298.779, 'text': 'So this is where self-driving cars comes into picture.', 'start': 4296.277, 'duration': 2.502}, {'end': 4303.322, 'text': 'The autonomous car or the self-driving cars are much safer than human driven car.', 'start': 4299.179, 'duration': 4.143}, {'end': 4307.225, 'text': 'They are unaffected by factors like driver fatigue, emotions or illness.', 'start': 4303.542, 'duration': 3.683}, {'end': 4308.826, 'text': 'This makes them very safe.', 'start': 4307.665, 'duration': 1.161}, {'end': 4315.19, 'text': 'Self-driving cars are always active and attentive, observing the environments and scanning multiple directions.', 'start': 4309.326, 'duration': 5.864}, {'end': 4318.863, 'text': 'it would be difficult to make a move that car has not anticipated.', 'start': 4315.56, 'duration': 3.303}, {'end': 4329.113, 'text': 'So how does a self-driving car work?. Well, self-driving cars mainly rely on three technologies IoT sensors, IoT connectivity and software algorithm.', 'start': 4319.484, 'duration': 9.629}], 'summary': 'Self-driving cars are safer, unaffected by fatigue or emotions, relying on iot sensors, connectivity, and software algorithm.', 'duration': 32.836, 'max_score': 4296.277, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN44296277.jpg'}], 'start': 3731.663, 'title': 'Unsupervised learning: amazon alexa and netflix recommendation system & reinforcement learning', 'summary': "Explores amazon alexa's unsupervised learning functionalities and potential challenges, as well as netflix's machine learning-based recommendation system which influences 80% of tv show selections and the application of reinforcement learning in reducing road accidents by over 90%.", 'chapters': [{'end': 3947.74, 'start': 3731.663, 'title': 'Unsupervised learning: amazon alexa', 'summary': "Explores the functionality of amazon alexa, an unsupervised learning-based voice assistant, capable of performing various tasks, connecting with other technologies, and integrating with online services, showcasing amazon's strategy to expand its usage and potential challenges in case of internet connectivity issues or service changes.", 'duration': 216.077, 'highlights': ['Amazon Alexa, an unsupervised learning-based voice assistant, can perform various tasks, connect with other technologies, and integrate with online services. Amazon Alexa serves as a voice assistant, capable of performing tasks, connecting with other technologies, and integrating with online services.', "Amazon's strategy is to expand the usage of Alexa and potential challenges in case of internet connectivity issues or service changes. Amazon aims to expand the usage of Alexa and may face potential challenges in case of internet connectivity issues or service changes.", 'Comparison with other voice assistant services such as Google, Apple, and Microsoft, where Alexa is highlighted for its flexibility and integration with other services. Alexa is compared with other voice assistant services, highlighting its flexibility and integration with other services compared to Google, Apple, and Microsoft.']}, {'end': 4400.465, 'start': 3947.76, 'title': 'Netflix recommendation system & reinforcement learning', 'summary': 'Discusses how netflix uses machine learning to recommend 80% of tv shows, the three-legged working model of netflix, and how reinforcement learning is applied to train a self-driving car, emphasizing its potential to reduce road accidents by over 90%.', 'duration': 452.705, 'highlights': ["Netflix uses machine learning to recommend 80% of TV shows watched on the platform More than 80% of the TV shows watched on Netflix are discovered through the platform's recommendation system.", "Netflix's three-legged working model involves Netflix members, content taggers, and machine learning algorithms Netflix's working model involves three legs: Netflix members, content taggers, and machine learning algorithms, combining data from over 250 million active profiles to understand user behavior and content.", 'Reinforcement learning has the potential to reduce road accidents by over 90% A recent study has shown that over 90% of road accidents are caused by human error, emphasizing the potential of self-driving cars, unaffected by factors like driver fatigue, emotions, or illness, to reduce accidents.']}], 'duration': 668.802, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN43731663.jpg', 'highlights': ['Amazon Alexa serves as a voice assistant, capable of performing tasks, connecting with other technologies, and integrating with online services.', 'Amazon aims to expand the usage of Alexa and may face potential challenges in case of internet connectivity issues or service changes.', 'Alexa is compared with other voice assistant services, highlighting its flexibility and integration with other services compared to Google, Apple, and Microsoft.', "More than 80% of the TV shows watched on Netflix are discovered through the platform's recommendation system.", "Netflix's working model involves three legs: Netflix members, content taggers, and machine learning algorithms, combining data from over 250 million active profiles to understand user behavior and content.", 'A recent study has shown that over 90% of road accidents are caused by human error, emphasizing the potential of self-driving cars, unaffected by factors like driver fatigue, emotions, or illness, to reduce accidents.']}, {'end': 5330.1, 'segs': [{'end': 4568.99, 'src': 'embed', 'start': 4524.272, 'weight': 1, 'content': [{'end': 4528.834, 'text': 'okay, so the entire thing is based on this concept only y equals mx plus c.', 'start': 4524.272, 'duration': 4.562}, {'end': 4530.415, 'text': 'it is as simple as that.', 'start': 4528.834, 'duration': 1.581}, {'end': 4536.778, 'text': 'so we will have some kind of a equation which will be in the form of y equals mx plus c.', 'start': 4530.415, 'duration': 6.363}, {'end': 4539.72, 'text': 'we will predict the values of the data points.', 'start': 4536.778, 'duration': 2.942}, {'end': 4547.759, 'text': 'uh, that is, that is, that is being sent to our data set when it is deployed, when it is deployed, and then we will get the values of y, okay.', 'start': 4539.72, 'duration': 8.039}, {'end': 4551.421, 'text': "so that's what is the whole concept of machine learning, linear regression.", 'start': 4547.759, 'duration': 3.662}, {'end': 4555.243, 'text': 'so linear regression is a technique to find relationship between two or more variables.', 'start': 4551.421, 'duration': 3.822}, {'end': 4560.845, 'text': 'yes, right, so if we have more than two variables, then how the equation will look like it will be.', 'start': 4555.243, 'duration': 5.602}, {'end': 4562.946, 'text': "let's say, y depends on more than one, right.", 'start': 4560.845, 'duration': 2.101}, {'end': 4568.99, 'text': 'so it will be mx plus ny, sorry nz, something like that.', 'start': 4562.946, 'duration': 6.044}], 'summary': 'Introduction to linear regression for predicting data points and finding relationships between variables.', 'duration': 44.718, 'max_score': 4524.272, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN44524272.jpg'}, {'end': 4785.343, 'src': 'embed', 'start': 4754.079, 'weight': 2, 'content': [{'end': 4759.541, 'text': 'So this is the regression is a technique that displays the relationship between Y and X, as you already know.', 'start': 4754.079, 'duration': 5.462}, {'end': 4763.102, 'text': 'So if temperature drops down, the guys will start putting more jackets.', 'start': 4759.961, 'duration': 3.141}, {'end': 4768.824, 'text': 'I mean, more jackets to avoid the cold, right? To keep them warm.', 'start': 4763.542, 'duration': 5.282}, {'end': 4771.544, 'text': "So that's a straightforward relation between two variables.", 'start': 4768.964, 'duration': 2.58}, {'end': 4773.125, 'text': 'This is an example of linear regression.', 'start': 4771.604, 'duration': 1.521}, {'end': 4776.566, 'text': 'Okay Temperature versus number of cones sold at the ice cream store.', 'start': 4773.405, 'duration': 3.161}, {'end': 4778.086, 'text': 'If it gets hot outside.', 'start': 4776.786, 'duration': 1.3}, {'end': 4785.343, 'text': 'then the number of ice cream sold will be more right inches of rain versus new car sold.', 'start': 4780.157, 'duration': 5.186}], 'summary': 'Regression shows relation between variables, e.g. more ice cream sold in hot weather.', 'duration': 31.264, 'max_score': 4754.079, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN44754079.jpg'}, {'end': 4975.723, 'src': 'embed', 'start': 4947.349, 'weight': 4, 'content': [{'end': 4952.393, 'text': 'Okay These are a few types or a few factors which will help us deciding that.', 'start': 4947.349, 'duration': 5.044}, {'end': 4957.476, 'text': 'Okay So see here, first of all, it will be continuous lead for linear regression.', 'start': 4952.734, 'duration': 4.742}, {'end': 4959.036, 'text': 'The variables has to be continuous.', 'start': 4957.496, 'duration': 1.54}, {'end': 4965.319, 'text': "That means it shouldn't be categorical or skewed at any two points, right? So it will solve regression issues.", 'start': 4959.517, 'duration': 5.802}, {'end': 4967.26, 'text': 'That means it should be a good fit.', 'start': 4965.359, 'duration': 1.901}, {'end': 4970.821, 'text': 'Okay, good fit to a point to some data points.', 'start': 4967.56, 'duration': 3.261}, {'end': 4975.723, 'text': 'Okay, and it should be some somewhat it should be represented by a straight line curve.', 'start': 4971.181, 'duration': 4.542}], 'summary': 'Factors for linear regression: continuous variables, good fit, represented by straight line curve.', 'duration': 28.374, 'max_score': 4947.349, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN44947349.jpg'}, {'end': 5079.056, 'src': 'embed', 'start': 5054.277, 'weight': 0, 'content': [{'end': 5061.243, 'text': 'if you build a model of linear I mean housing price predictor there has to be some kind of relation between all the variables.', 'start': 5054.277, 'duration': 6.966}, {'end': 5065.165, 'text': 'right, but for classification it has to meet some criteria.', 'start': 5061.243, 'duration': 3.922}, {'end': 5069.048, 'text': "there won't be any relation between those criterias individually.", 'start': 5065.165, 'duration': 3.883}, {'end': 5070.61, 'text': 'but there should be a crowd.', 'start': 5069.048, 'duration': 1.562}, {'end': 5074.533, 'text': 'I mean there should be a skewed value and the variables will be categorical.', 'start': 5070.61, 'duration': 3.923}, {'end': 5079.056, 'text': 'okay, when I say variables, so can anyone tell what is a variable?', 'start': 5074.533, 'duration': 4.523}], 'summary': 'Discussion on building a linear housing price predictor and criteria for classification.', 'duration': 24.779, 'max_score': 5054.277, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN45054277.jpg'}, {'end': 5138.4, 'src': 'embed', 'start': 5115.747, 'weight': 3, 'content': [{'end': 5123.469, 'text': 'right when we say continuous and categorical, that is, that is the dependent variables, that is the y values, that is the outcome values.', 'start': 5115.747, 'duration': 7.722}, {'end': 5126.55, 'text': 'right, because independent variables can be anywhere.', 'start': 5123.469, 'duration': 3.081}, {'end': 5128.747, 'text': 'It can be anywhere in the plane.', 'start': 5127.01, 'duration': 1.737}, {'end': 5135.378, 'text': "Correct? So that's why when we consider fitting a model, we shouldn't be looking at the feature first.", 'start': 5129.009, 'duration': 6.369}, {'end': 5138.4, 'text': 'We should look at what kind of output we are expecting.', 'start': 5135.459, 'duration': 2.941}], 'summary': 'Consider dependent variable type when fitting a model.', 'duration': 22.653, 'max_score': 5115.747, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN45115747.jpg'}], 'start': 4400.985, 'title': 'Understanding linear regression and classification', 'summary': "Explains the concept of linear regression and its application in machine learning, emphasizing the equation y=mx+c and the use of multiple variables, while also providing an overview of r square's significance in model performance evaluation and the distinction between continuous and categorical variables in predicting outcomes.", 'chapters': [{'end': 4614.296, 'start': 4400.985, 'title': 'Understanding linear regression', 'summary': 'Explains the concept of linear regression, emphasizing the equation y=mx+c and its application in machine learning to find the relationship between variables, with a mention of using multiple variables in the equation.', 'duration': 213.311, 'highlights': ['Linear regression is a technique to find relationship between two or more variables. Linear regression is used to find the relationship between variables, and it can be extended to involve more than two variables in the equation.', 'The equation y=mx+c is fundamental in machine learning for predicting values of data points. The equation y=mx+c is used to predict the values of data points in machine learning when the model is deployed.', 'The relationship between variables can be expressed with an equation involving multiple independent variables. The equation for the relationship between variables can involve multiple independent variables, extending beyond the basic y=mx+c form.']}, {'end': 5039.842, 'start': 4614.336, 'title': 'Understanding linear regression', 'summary': 'Provides an overview of linear regression, including the concept of r square, its significance in model performance evaluation, and examples illustrating its application in different scenarios.', 'duration': 425.506, 'highlights': ['Linear regression explained with examples such as temperature versus number of ice creams sold, inches of rain versus new car sold, and daily snowfall versus number of visitors at the skiing park. The examples include straightforward relations between two variables, demonstrating the application of linear regression in real-world scenarios.', 'Explanation of the concept of R square as a measure of goodness of fit, with the example of a model having a small R square value of 0.06 indicating a poor representation of the data. The discussion emphasizes the significance of R square in assessing model performance, with the specific example of a low R square value indicating a lack of fit in the linear regression model.', 'Criteria for using linear regression, including the requirement for continuous variables, even spread of data, and a straight line curve representation, and differentiation from logistic regression. The criteria for utilizing linear regression are detailed, highlighting the need for continuous variables, even data spread, and a straight line curve representation, and distinguishing it from logistic regression.']}, {'end': 5330.1, 'start': 5039.842, 'title': 'Understanding linear regression and classification', 'summary': 'Discusses the concepts of linear regression and classification, emphasizing the distinction between continuous and categorical variables and their significance in predicting outcomes.', 'duration': 290.258, 'highlights': ['Linear regression involves establishing relationships between variables to predict continuous outcomes such as housing prices, while classification deals with categorical outcomes like spam or not spam, with emphasis on understanding the difference between independent and dependent variables.', 'The distinction between continuous and categorical variables is crucial, where continuous variables are independent and can take on any value, while categorical variables are dependent and limited to specific categories, as illustrated by examples of housing price and survival data sets.', 'The importance of features in building models is highlighted, with the example of predicting housing prices using realistic features like area, garden space, and number of amenities, and the relevance of categorical variables in predicting outcomes such as cancer growth.']}], 'duration': 929.115, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN44400985.jpg', 'highlights': ['Linear regression involves establishing relationships between variables to predict continuous outcomes such as housing prices, while classification deals with categorical outcomes like spam or not spam, with emphasis on understanding the difference between independent and dependent variables.', 'The equation y=mx+c is fundamental in machine learning for predicting values of data points. The equation y=mx+c is used to predict the values of data points in machine learning when the model is deployed.', 'Linear regression explained with examples such as temperature versus number of ice creams sold, inches of rain versus new car sold, and daily snowfall versus number of visitors at the skiing park. The examples include straightforward relations between two variables, demonstrating the application of linear regression in real-world scenarios.', 'The distinction between continuous and categorical variables is crucial, where continuous variables are independent and can take on any value, while categorical variables are dependent and limited to specific categories, as illustrated by examples of housing price and survival data sets.', 'Criteria for using linear regression, including the requirement for continuous variables, even spread of data, and a straight line curve representation, and differentiation from logistic regression. The criteria for utilizing linear regression are detailed, highlighting the need for continuous variables, even data spread, and a straight line curve representation, and distinguishing it from logistic regression.', 'The relationship between variables can be expressed with an equation involving multiple independent variables. The equation for the relationship between variables can involve multiple independent variables, extending beyond the basic y=mx+c form.']}, {'end': 7220.276, 'segs': [{'end': 5384.137, 'src': 'embed', 'start': 5353.161, 'weight': 0, 'content': [{'end': 5355.764, 'text': 'okay, as you are rightly told, it will be fixed.', 'start': 5353.161, 'duration': 2.603}, {'end': 5359.548, 'text': 'but for continuous, it will be, it will vary, it will be in a range, okay.', 'start': 5355.764, 'duration': 3.784}, {'end': 5364.314, 'text': "so that's how we decide if the variable data set that we are having is categorical or continuous.", 'start': 5359.548, 'duration': 4.766}, {'end': 5371.727, 'text': 'okay, so the based on that we decide, is it a regression problem or is it a classification problem?', 'start': 5364.901, 'duration': 6.826}, {'end': 5374.329, 'text': 'and based on that, we are going to choose our models.', 'start': 5371.727, 'duration': 2.602}, {'end': 5379.353, 'text': "if it's a regression, then we are going to go with this linear regression.", 'start': 5374.329, 'duration': 5.024}, {'end': 5384.137, 'text': 'otherwise we are going to go with logistic regression, naive bias or xyz, xyz, xyz.', 'start': 5379.353, 'duration': 4.784}], 'summary': 'Data type determines regression or classification; models chosen accordingly.', 'duration': 30.976, 'max_score': 5353.161, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN45353161.jpg'}, {'end': 5815.384, 'src': 'embed', 'start': 5768.059, 'weight': 1, 'content': [{'end': 5777.063, 'text': 'for statisticians named it as r squared, or it is also known as least.', 'start': 5768.059, 'duration': 9.004}, {'end': 5788.147, 'text': 'this is also known as least squared error error method.', 'start': 5777.063, 'duration': 11.084}, {'end': 5793.195, 'text': 'okay, so This up here denotes goodness of fit.', 'start': 5788.147, 'duration': 5.048}, {'end': 5796.036, 'text': 'That means how good our model is.', 'start': 5793.495, 'duration': 2.541}, {'end': 5798.838, 'text': 'Now see what happens to this.', 'start': 5796.557, 'duration': 2.281}, {'end': 5804.02, 'text': 'Okay After this I will pause for questions.', 'start': 5798.858, 'duration': 5.162}, {'end': 5806.461, 'text': 'So the value that we have right.', 'start': 5804.76, 'duration': 1.701}, {'end': 5815.384, 'text': 'So the value that we have is somewhere like 1 minus SE line.', 'start': 5806.881, 'duration': 8.503}], 'summary': 'R-squared, also known as the coefficient of determination, measures the goodness of fit of a statistical model.', 'duration': 47.325, 'max_score': 5768.059, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN45768059.jpg'}, {'end': 6621.919, 'src': 'embed', 'start': 6590.619, 'weight': 2, 'content': [{'end': 6592.26, 'text': "Okay That's what this works.", 'start': 6590.619, 'duration': 1.641}, {'end': 6594.061, 'text': "That's what this linear equation works.", 'start': 6592.5, 'duration': 1.561}, {'end': 6597.344, 'text': 'And for that to change, you need to change the M and C values.', 'start': 6594.121, 'duration': 3.223}, {'end': 6599.025, 'text': 'The hyper parameters will get changed.', 'start': 6597.364, 'duration': 1.661}, {'end': 6602.33, 'text': 'Then you apply the goodness of fit.', 'start': 6599.485, 'duration': 2.845}, {'end': 6603.09, 'text': 'you will see.', 'start': 6602.33, 'duration': 0.76}, {'end': 6607.432, 'text': 'if the value is coming near equals one, then it would be a good fit.', 'start': 6603.09, 'duration': 4.342}, {'end': 6609.413, 'text': "if it doesn't, then it wouldn't be a good fit.", 'start': 6607.432, 'duration': 1.981}, {'end': 6611.634, 'text': 'you would need to go for another iteration.', 'start': 6609.413, 'duration': 2.221}, {'end': 6613.955, 'text': "okay, so that's what it got and that's what.", 'start': 6611.634, 'duration': 2.321}, {'end': 6615.596, 'text': "that's how it goes in.", 'start': 6613.955, 'duration': 1.641}, {'end': 6618.798, 'text': "okay, that's what the concept of this is, okay.", 'start': 6615.596, 'duration': 3.202}, {'end': 6621.919, 'text': 'so is this clear now, guys?', 'start': 6618.798, 'duration': 3.121}], 'summary': 'Changing m and c values impact linear equation fit; aim for goodness of fit near 1.', 'duration': 31.3, 'max_score': 6590.619, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN46590619.jpg'}, {'end': 6768.332, 'src': 'embed', 'start': 6727.489, 'weight': 3, 'content': [{'end': 6729.81, 'text': 'okay, and hypothesis will be.', 'start': 6727.489, 'duration': 2.321}, {'end': 6733.772, 'text': 'hypothesis will be h, theta x.', 'start': 6729.81, 'duration': 3.962}, {'end': 6741.704, 'text': 'okay, and when we write the equation for linear regression, it should be theta 0 plus theta 1 x.', 'start': 6733.772, 'duration': 7.932}, {'end': 6746.465, 'text': 'this theta 1 is called is equivalent to m.', 'start': 6741.704, 'duration': 4.761}, {'end': 6749.306, 'text': 'this theta 0 is equivalent to c.', 'start': 6746.465, 'duration': 2.841}, {'end': 6754.628, 'text': 'together, this theta 0 and theta 1 are called hyper parameters.', 'start': 6749.306, 'duration': 5.322}, {'end': 6768.332, 'text': 'with trial and error, we can change these values of theta 0 and theta 1 to get a good h theta, that is This h theta value will be the predicted value.', 'start': 6754.628, 'duration': 13.704}], 'summary': 'Linear regression equation: h(theta) = theta0 + theta1x. theta1 is equivalent to m, theta0 to c. hyper parameters can be changed via trial and error to get a good predicted value.', 'duration': 40.843, 'max_score': 6727.489, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN46727489.jpg'}, {'end': 6946.082, 'src': 'embed', 'start': 6916.431, 'weight': 4, 'content': [{'end': 6920.691, 'text': 'so this theta vector is also called weight weightage vector.', 'start': 6916.431, 'duration': 4.26}, {'end': 6925.312, 'text': 'okay, what do we mean by weightage vector?', 'start': 6920.691, 'duration': 4.621}, {'end': 6929.173, 'text': 'think about survival data set.', 'start': 6925.312, 'duration': 3.861}, {'end': 6933.214, 'text': 'okay, think about house price, house market.', 'start': 6929.173, 'duration': 4.041}, {'end': 6936.295, 'text': 'i mean, uh, what should we think about?', 'start': 6933.214, 'duration': 3.081}, {'end': 6937.915, 'text': 'what should we think about?', 'start': 6936.295, 'duration': 1.62}, {'end': 6942.46, 'text': "let's say, let's say, I mean housing price calculator.", 'start': 6937.915, 'duration': 4.545}, {'end': 6946.082, 'text': "let's go with the classic example on housing price.", 'start': 6942.46, 'duration': 3.622}], 'summary': 'Exploring the concept of weightage vector in relation to survival data and housing prices.', 'duration': 29.651, 'max_score': 6916.431, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN46916431.jpg'}], 'start': 5330.1, 'title': 'Linear regression concepts', 'summary': 'Covers distinguishing between categorical and continuous data, linear regression, r-squared calculation, goodness of fit, model tuning, and its application in machine learning, with an emphasis on feature weighting and real-world significance.', 'chapters': [{'end': 5390.682, 'start': 5330.1, 'title': 'Distinguishing categorical and continuous data', 'summary': 'Explains how to differentiate between categorical and continuous data, emphasizing the impact on model selection and identifying regression and classification problems, with a focus on linear and logistic regression.', 'duration': 60.582, 'highlights': ['The distinction between categorical and continuous data is crucial for determining regression or classification problems, influencing model selection. (Relevance: 5)', 'For categorical data, the options may include yes, no, or moderate, whereas for continuous data, the values vary within a range. (Relevance: 4)', 'Model selection is influenced by the nature of the data, with linear regression chosen for regression problems and logistic regression, naive bias, or other models for classification problems. (Relevance: 3)']}, {'end': 5913.785, 'start': 5391.244, 'title': 'Linear regression and r-squared', 'summary': 'Discusses the concept of linear regression, including the calculation of the r-squared value to determine the goodness of fit, by comparing the squared errors of the regression line and the mean of the dataset, with smaller values indicating a better fit and approaching an r-squared value of 1, while larger values indicate a poorer fit approaching an r-squared value of 0.', 'duration': 522.541, 'highlights': ['The concept of linear regression and the calculation of R-squared to determine the goodness of fit is explained. The chapter provides an explanation of linear regression and the calculation of R-squared to assess the goodness of fit of the model.', 'The method of calculating the squared error from the regression line and the mean of the dataset is detailed. The process of calculating the squared error from the regression line and the mean of the dataset is elaborated, demonstrating the comparison of the errors to evaluate the fit of the model.', "The significance of smaller squared errors indicating a better fit and approaching an R-squared value of 1, while larger errors indicate a poorer fit approaching an R-squared value of 0 is highlighted. The importance of smaller squared errors, indicating a better fit with an R-squared value approaching 1, and larger errors indicating a poorer fit with an R-squared value approaching 0, is emphasized in assessing the model's performance."]}, {'end': 6471.711, 'start': 5914.025, 'title': 'Goodness of fit in linear regression', 'summary': "Explains the concept of goodness of fit in linear regression, including the calculation of r square value, testing the model's performance, and the significance of statistics in machine learning.", 'duration': 557.686, 'highlights': ['The chapter explains the concept of goodness of fit in linear regression It discusses how the R square value determines the fit of the model, with a larger R square indicating a better fit.', "Calculation of R square value and testing the model's performance The process involves calculating the R square value to assess the model's fit, with a resulting value of 0.36 indicating a poor fit.", 'Significance of statistics in machine learning The speaker emphasizes the importance of having a clear understanding of statistics, stating that it constitutes a significant portion of knowledge required for machine learning, along with algorithms and programming.']}, {'end': 6700.629, 'start': 6471.711, 'title': 'Linear regression model tuning', 'summary': 'Explains how to tune the parameters m and c in the linear regression model to improve the fit, reduce squared line error, and minimize the error between the mean and data points, ultimately aiming to maximize the r square value.', 'duration': 228.918, 'highlights': ["Tuning M and C values to find a good fit for Y By changing the parameters M and C in the linear regression model, the aim is to find a good fit for the predicted values of Y, ultimately improving the model's performance and reducing errors.", 'Effect of changing M and C on reducing errors and improving fit Changing the hyperparameters M and C in the linear regression model helps in reducing the squared line error and minimizing the error between the mean and data points, ultimately leading to a better fit for the model.', 'Application of parameter tuning to maximize R square value The process of tuning the parameters M and C in the linear regression model aims to maximize the R square value, indicating the goodness of fit for the model and its ability to accurately represent the data.']}, {'end': 6980.645, 'start': 6700.969, 'title': 'Linear regression for machine learning', 'summary': 'Discusses the key concepts of linear regression for machine learning, including the hypothesis equation h(theta x), the calculation of predicted values, and the significance of the weightage vector in real-world applications.', 'duration': 279.676, 'highlights': ['The hypothesis equation h(theta x) is defined as h theta of x equals theta 0 plus theta 1 x, where theta 0 and theta 1 are hyper parameters, and can be changed through trial and error to obtain the predicted value. hypothesis equation, hyper parameters, trial and error', 'The calculated h theta of x yields the values of predicted y, with the equation representing the relationship between the feature vector X and the predicted values. calculated predicted values, relationship between feature vector and predicted values', 'The weightage vector, denoted as the theta vector, plays a crucial role in real-world applications and is significant in scenarios such as housing price prediction. importance of weightage vector in housing price prediction']}, {'end': 7220.276, 'start': 6980.645, 'title': 'Linear regression model tuning', 'summary': 'Discusses the importance of feature weighting in linear regression model tuning, emphasizing the impact of features like area and amenities on house prices, and the necessity to avoid unnecessary features like color to improve model performance and fit.', 'duration': 239.631, 'highlights': ["The importance of giving a good weightage to the 'area' feature in the linear regression model due to its significant impact on house prices. Highlighting the significance of the 'area' feature and the necessity to assign it a substantial weightage in the model to accurately predict house prices.", "The role of hyperparameter tuning in adjusting the theta parameters to achieve a good fit for the model and avoid errors caused by unusual or arbitrary feature values. Explaining the process of hyperparameter tuning to optimize the model's performance by adjusting theta parameters and avoiding errors caused by unusual feature values.", "The impact of amenities on house prices, indicating that a higher number of amenities leads to higher house prices, highlighting the crucial role of amenities in determining the property's value. Emphasizing the influence of amenities on house prices and the necessity to consider amenities as a significant factor in determining property value."]}], 'duration': 1890.176, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN45330100.jpg', 'highlights': ['The distinction between categorical and continuous data is crucial for determining regression or classification problems, influencing model selection. (Relevance: 5)', 'The concept of linear regression and the calculation of R-squared to determine the goodness of fit is explained. (Relevance: 4)', 'Tuning M and C values to find a good fit for Y by changing the parameters M and C in the linear regression model. (Relevance: 3)', 'The hypothesis equation h(theta x) is defined as h theta of x equals theta 0 plus theta 1 x, where theta 0 and theta 1 are hyper parameters. (Relevance: 2)', 'The weightage vector, denoted as the theta vector, plays a crucial role in real-world applications and is significant in scenarios such as housing price prediction. (Relevance: 1)']}, {'end': 8328.406, 'segs': [{'end': 7290.404, 'src': 'embed', 'start': 7260.328, 'weight': 1, 'content': [{'end': 7262.249, 'text': 'you can introduce features to the data set.', 'start': 7260.328, 'duration': 1.921}, {'end': 7266.492, 'text': 'also right, we have seen like we can do that i have shown you, right.', 'start': 7262.249, 'duration': 4.243}, {'end': 7268.473, 'text': 'so these things will take into account.', 'start': 7266.492, 'duration': 1.981}, {'end': 7270.955, 'text': 'these things will help you when deciding all these things.', 'start': 7268.473, 'duration': 2.482}, {'end': 7275.758, 'text': "that's how you will decide the h theta that that you are going to take.", 'start': 7270.955, 'duration': 4.803}, {'end': 7278.04, 'text': "you will, and that's how your machine will help you.", 'start': 7275.758, 'duration': 2.282}, {'end': 7281.422, 'text': "that's how your algorithm will help you to decide this.", 'start': 7278.04, 'duration': 3.382}, {'end': 7284.724, 'text': 'okay, we can change the based alone based on our analogy.', 'start': 7281.422, 'duration': 3.302}, {'end': 7287.266, 'text': 'yes, right, but this will be taken care by the model itself.', 'start': 7284.724, 'duration': 2.542}, {'end': 7290.404, 'text': 'okay, the model will return you the best fit.', 'start': 7287.642, 'duration': 2.762}], 'summary': 'Introduce features to dataset for better decision-making, model returns best fit.', 'duration': 30.076, 'max_score': 7260.328, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN47260328.jpg'}, {'end': 7428.475, 'src': 'embed', 'start': 7385.379, 'weight': 0, 'content': [{'end': 7390.583, 'text': 'What this correlation means, how these machines can be treated, how this R-square value can be calculated like that.', 'start': 7385.379, 'duration': 5.204}, {'end': 7399.416, 'text': 'okay. so one thing here before starting anything so we have to first check is to go to the check for R square correct,', 'start': 7391.374, 'duration': 8.042}, {'end': 7403.457, 'text': 'whether data is good or something is there right.', 'start': 7399.416, 'duration': 4.041}, {'end': 7405.617, 'text': "if you don't have a model right, how can you check it?", 'start': 7403.457, 'duration': 2.16}, {'end': 7407.718, 'text': 'so first task will be to build a model.', 'start': 7405.617, 'duration': 2.101}, {'end': 7410.278, 'text': 'then you can check for the R square value right.', 'start': 7407.718, 'duration': 2.56}, {'end': 7412.139, 'text': 'related values should be passed as dependent.', 'start': 7410.278, 'duration': 1.861}, {'end': 7413.599, 'text': "yes, right, Ravi, so that's how.", 'start': 7412.139, 'duration': 1.46}, {'end': 7415.2, 'text': 'so you need to find out which values.', 'start': 7413.599, 'duration': 1.601}, {'end': 7417, 'text': 'so see, this will come in two ways.', 'start': 7415.2, 'duration': 1.8}, {'end': 7419.021, 'text': 'first, stats will help you.', 'start': 7417, 'duration': 2.021}, {'end': 7420.401, 'text': 'next, your domain knowledge will help you.', 'start': 7419.021, 'duration': 1.38}, {'end': 7425.954, 'text': 'okay, Once you have this two, this will help you to get the get the good model.', 'start': 7420.401, 'duration': 5.553}, {'end': 7427.555, 'text': 'Okay Okay.', 'start': 7426.314, 'duration': 1.241}, {'end': 7428.475, 'text': "That's okay.", 'start': 7427.835, 'duration': 0.64}], 'summary': 'Explains calculating r-square value, building model, and using stats/domain knowledge for good model.', 'duration': 43.096, 'max_score': 7385.379, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN47385379.jpg'}, {'end': 7486.121, 'src': 'embed', 'start': 7454.221, 'weight': 7, 'content': [{'end': 7455.902, 'text': 'then we will prove it for n minus one.', 'start': 7454.221, 'duration': 1.681}, {'end': 7460.624, 'text': 'then we will say it is true for n also right, like that we used to do mathematical induction.', 'start': 7455.902, 'duration': 4.722}, {'end': 7464.326, 'text': 'So all these things that we take into account is a hypothesis.', 'start': 7461.224, 'duration': 3.102}, {'end': 7465.787, 'text': "So that's just a wild guess.", 'start': 7464.346, 'duration': 1.441}, {'end': 7472.472, 'text': 'So when you are saying H theta X equals theta 0 plus theta 1 X 1 theta 2 X 2 like that.', 'start': 7466.108, 'duration': 6.364}, {'end': 7480.318, 'text': "So that becomes our hypothesis because we don't know the values of theta 0 theta 1 yet and we are going to predict something.", 'start': 7472.853, 'duration': 7.465}, {'end': 7486.121, 'text': 'So now based on our testing, the goodness of it testing or how we do it, I will show you right now.', 'start': 7480.798, 'duration': 5.323}], 'summary': 'Using mathematical induction, we form a hypothesis for predicting values based on testing and goodness of fit.', 'duration': 31.9, 'max_score': 7454.221, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN47454221.jpg'}, {'end': 7693.429, 'src': 'embed', 'start': 7667.638, 'weight': 3, 'content': [{'end': 7672.361, 'text': 'so that will yield you the machine learning model that you are trying to build.', 'start': 7667.638, 'duration': 4.723}, {'end': 7675.823, 'text': 'so the concept is using gradient descent.', 'start': 7672.361, 'duration': 3.462}, {'end': 7686.327, 'text': 'we will minimize the cost function for the, for the data set, and after that, when, once we get the hyper parameter values, that is, the theta values,', 'start': 7675.823, 'duration': 10.504}, {'end': 7693.429, 'text': 'we would replace those in the main equation, which will yield you the best possible values,', 'start': 7686.327, 'duration': 7.102}], 'summary': 'Using gradient descent to yield the best machine learning model.', 'duration': 25.791, 'max_score': 7667.638, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN47667638.jpg'}, {'end': 7976.399, 'src': 'embed', 'start': 7951.935, 'weight': 4, 'content': [{'end': 7961.043, 'text': 'okay, so if i, if i, if i calculate this value, this r square has to be somewhere near around one to get a good fit.', 'start': 7951.935, 'duration': 9.108}, {'end': 7964.657, 'text': "okay. So that's why we are when I say minimizing the cost.", 'start': 7961.043, 'duration': 3.614}, {'end': 7966.077, 'text': 'function means not the value.', 'start': 7964.657, 'duration': 1.42}, {'end': 7970.798, 'text': "Okay It's the whole idea that R square has to be near equals one.", 'start': 7966.297, 'duration': 4.501}, {'end': 7974.159, 'text': "So that's what is referred to as minimizing the cost function.", 'start': 7971.238, 'duration': 2.921}, {'end': 7976.399, 'text': 'Okay It is not like that.', 'start': 7974.359, 'duration': 2.04}], 'summary': 'In order to achieve a good fit, the r square value needs to be close to one, which is why minimizing the cost function is important.', 'duration': 24.464, 'max_score': 7951.935, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN47951935.jpg'}, {'end': 8235.138, 'src': 'embed', 'start': 8204.425, 'weight': 5, 'content': [{'end': 8205.887, 'text': "Okay That's how your model will look like.", 'start': 8204.425, 'duration': 1.462}, {'end': 8210.429, 'text': 'and and we have printed the values of estimated coefficients as well.', 'start': 8206.428, 'duration': 4.001}, {'end': 8214.271, 'text': "okay, so that's how you, by yourself, can build this.", 'start': 8210.429, 'duration': 3.842}, {'end': 8222.634, 'text': 'now, in the next step, what you can do for getting linear regression, you can try adding the r square to it and looping it through.', 'start': 8214.271, 'duration': 8.363}, {'end': 8229.516, 'text': "okay, looping it through all the steps, but that won't be required as we already have this functionality, uh, available in scikit-learn package.", 'start': 8222.634, 'duration': 6.882}, {'end': 8235.138, 'text': 'okay, in the scikit-learn package, for when we see examples of logistic regression, i will show you this.', 'start': 8229.516, 'duration': 5.622}], 'summary': 'Demonstrated model building with estimated coefficients and noted the availability of r square functionality in scikit-learn package.', 'duration': 30.713, 'max_score': 8204.425, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN48204424.jpg'}, {'end': 8316.94, 'src': 'embed', 'start': 8287.887, 'weight': 6, 'content': [{'end': 8290.709, 'text': 'those things right, those what, what are those called?', 'start': 8287.887, 'duration': 2.822}, {'end': 8291.83, 'text': 'those are independent variables.', 'start': 8290.709, 'duration': 1.121}, {'end': 8294.052, 'text': 'more than one individual variables.', 'start': 8291.83, 'duration': 2.222}, {'end': 8302.897, 'text': 'yeah, so more than one independent variable, multivariate, a single, only one, one value we are taking, that is a univariate.', 'start': 8294.052, 'duration': 8.845}, {'end': 8306.361, 'text': 'okay, now we are looking at this fancy boston house data set.', 'start': 8302.897, 'duration': 3.464}, {'end': 8308.635, 'text': 'okay, So this is for calculating house prices.', 'start': 8306.361, 'duration': 2.274}, {'end': 8311.777, 'text': 'So if you see here, these are the description.', 'start': 8309.055, 'duration': 2.722}, {'end': 8316.94, 'text': 'So see, this is our target variable median value of owner occupied home in thousands.', 'start': 8312.156, 'duration': 4.784}], 'summary': 'Discussion on independent and target variables in the context of boston house data set for calculating house prices.', 'duration': 29.053, 'max_score': 8287.887, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN48287887.jpg'}], 'start': 7220.276, 'title': 'Model validation and linear regression', 'summary': 'Covers model validation, model tuning, feature introduction in datasets, and the importance of r-squared value. it also delves into understanding linear regression, hypothesis, cost function, gradient descent, and achieving the best model for predicting unseen data.', 'chapters': [{'end': 7428.475, 'start': 7220.276, 'title': 'Machine learning model validation', 'summary': 'Discusses the validation and tuning of machine learning models, including selecting the proper model, tuning the algorithm, introducing features to the dataset, and checking for the r-squared value to ensure a good fit.', 'duration': 208.199, 'highlights': ['Tuning the machine learning algorithm and checking for the R-squared value to ensure a good fit. The task involves tuning the model and checking the R-squared value to ensure a good fit, which is essential for validating the machine learning model.', 'Selecting a proper model and introducing features to the dataset. The process involves selecting a suitable model and introducing features to the dataset, which are crucial steps in the validation and tuning of machine learning models.', 'Utilizing stats and domain knowledge to build a good model. The combination of statistical analysis and domain knowledge is emphasized as crucial for building a good machine learning model.']}, {'end': 7951.935, 'start': 7429.756, 'title': 'Understanding linear regression hypothesis and cost function', 'summary': 'Discusses the concept of hypothesis in linear regression, the role of mathematical induction, the formulation of the hypothesis equation, the importance of the cost function in evaluating model goodness, and the use of gradient descent to minimize the cost function for achieving the best possible model, ultimately leading to the prediction of unseen data.', 'duration': 522.179, 'highlights': ['The chapter explains the concept of hypothesis in linear regression, using mathematical induction to prove the validity of the hypothesis for different values of n. It provides an understanding of the hypothesis in the context of linear regression, highlighting the use of mathematical induction for validation.', 'It delves into the formulation of the hypothesis equation H theta X = theta 0 + theta 1 X 1 + theta 2 X 2, emphasizing that it is a wild guess due to the unknown values of theta 0 and theta 1. The discussion focuses on the formulation of the hypothesis equation and highlights the uncertainty stemming from unknown theta values.', 'The importance of the cost function in evaluating model goodness is explained, with the goal being the minimization of the cost function to obtain the best model. It emphasizes the significance of the cost function in assessing model goodness and the objective of minimizing it to achieve the best model.', 'The use of gradient descent to minimize the cost function for achieving the best possible model is detailed, along with the subsequent prediction of unseen data using the derived model. It provides insights into the practical application of gradient descent for minimizing the cost function and the utilization of the derived model for predicting unseen data.']}, {'end': 8328.406, 'start': 7951.935, 'title': 'Minimizing cost function for best fit', 'summary': 'Explains the concept of minimizing the cost function to achieve an r square value near equals one for getting the best fit, and the process of calculating coefficients and building a model for linear regression.', 'duration': 376.471, 'highlights': ['The concept of minimizing the cost function to achieve an R square value near equals one for getting the best fit is explained. Emphasizes the importance of R square value near equals one for a good fit.', 'The process of calculating coefficients and building a model for linear regression is detailed, including the mathematical formulas and plotting the predicted line. Describes the steps for calculating coefficients and plotting the predicted line for linear regression modeling.', 'Distinguishing between single variable and multivariate linear regression is explained. Clarifies the difference between single variable and multivariate linear regression equations.', 'Introduction to the Boston house data set for calculating house prices is provided. Introduces the target variable and features of the Boston house data set for calculating house prices.']}], 'duration': 1108.13, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN47220276.jpg', 'highlights': ['Utilizing stats and domain knowledge to build a good model.', 'Selecting a proper model and introducing features to the dataset.', 'Tuning the machine learning algorithm and checking for the R-squared value to ensure a good fit.', 'The use of gradient descent to minimize the cost function for achieving the best possible model is detailed.', 'The concept of minimizing the cost function to achieve an R square value near equals one for getting the best fit is explained.', 'The process of calculating coefficients and building a model for linear regression is detailed.', 'Introduction to the Boston house data set for calculating house prices is provided.', 'The chapter explains the concept of hypothesis in linear regression, using mathematical induction to prove the validity of the hypothesis for different values of n.']}, {'end': 9209.398, 'segs': [{'end': 8461.245, 'src': 'embed', 'start': 8435.614, 'weight': 0, 'content': [{'end': 8442.859, 'text': 'so we have taken out lstat and maedv, these two points, from the data set that we have, from the data set that we have.', 'start': 8435.614, 'duration': 7.245}, {'end': 8444.88, 'text': 'okay, then what we are doing?', 'start': 8442.859, 'duration': 2.021}, {'end': 8453.601, 'text': 'we are now doing a train states test split of 80, 20, as i, as i shown you in the last class, we are taking out the split now.', 'start': 8445.757, 'duration': 7.844}, {'end': 8454.782, 'text': 'see the size.', 'start': 8453.601, 'duration': 1.181}, {'end': 8455.442, 'text': 'it is like that.', 'start': 8454.782, 'duration': 0.66}, {'end': 8461.245, 'text': 'so we have 404 values for training and 102 values for testing.', 'start': 8455.442, 'duration': 5.803}], 'summary': 'Removed lstat and maedv from dataset. split 80/20 for training/testing: 404 values for training and 102 for testing.', 'duration': 25.631, 'max_score': 8435.614, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN48435614.jpg'}, {'end': 8516.27, 'src': 'embed', 'start': 8486.457, 'weight': 2, 'content': [{'end': 8487.258, 'text': 'X and Y values.', 'start': 8486.457, 'duration': 0.801}, {'end': 8488.498, 'text': 'So X train.', 'start': 8487.678, 'duration': 0.82}, {'end': 8491.58, 'text': 'So next line what we are doing? We are fitting the line.', 'start': 8489.158, 'duration': 2.422}, {'end': 8493.201, 'text': 'Right Regressor dot fit.', 'start': 8491.64, 'duration': 1.561}, {'end': 8494.321, 'text': 'X train.', 'start': 8493.861, 'duration': 0.46}, {'end': 8495.161, 'text': 'Y train.', 'start': 8494.761, 'duration': 0.4}, {'end': 8499.304, 'text': 'Right So now we have a model ready.', 'start': 8495.562, 'duration': 3.742}, {'end': 8501.705, 'text': 'That is the theta parameters and everything.', 'start': 8499.704, 'duration': 2.001}, {'end': 8504.066, 'text': 'Those things are done already.', 'start': 8501.785, 'duration': 2.281}, {'end': 8507.666, 'text': 'and we have a linear regression model in hand.', 'start': 8504.565, 'duration': 3.101}, {'end': 8511.788, 'text': 'so now we can use this to predict the outputs.', 'start': 8507.666, 'duration': 4.122}, {'end': 8516.27, 'text': 'so see, these are the coefficients and intercept values that is being.', 'start': 8511.788, 'duration': 4.482}], 'summary': 'Fitting a linear regression model to x and y values for prediction.', 'duration': 29.813, 'max_score': 8486.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN48486457.jpg'}, {'end': 9048.127, 'src': 'embed', 'start': 9017.362, 'weight': 1, 'content': [{'end': 9018.242, 'text': 'root, mean squared error.', 'start': 9017.362, 'duration': 0.88}, {'end': 9019.963, 'text': 'it is too large, right.', 'start': 9018.242, 'duration': 1.721}, {'end': 9022.425, 'text': 'so this is a good fit for our model.', 'start': 9019.963, 'duration': 2.462}, {'end': 9025.111, 'text': 'okay, so this is a good fit for our model.', 'start': 9022.99, 'duration': 2.121}, {'end': 9027.773, 'text': "so that's how these things are calculated.", 'start': 9025.111, 'duration': 2.662}, {'end': 9031.136, 'text': "okay, so that's how these things are calculated.", 'start': 9027.773, 'duration': 3.363}, {'end': 9033.717, 'text': 'okay, next is linear regression.', 'start': 9031.136, 'duration': 2.581}, {'end': 9038.56, 'text': 'one question like see, we are plotting with between the two like l stat and medv.', 'start': 9033.717, 'duration': 4.843}, {'end': 9048.127, 'text': 'right, l stat is our, l stat is our x independent independent variable, and mvd is a dependent variable, correct, right.', 'start': 9038.56, 'duration': 9.567}], 'summary': 'Discussing model fit and calculation of variables for linear regression.', 'duration': 30.765, 'max_score': 9017.362, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN49017362.jpg'}], 'start': 8328.967, 'title': 'Univariate linear regression analysis', 'summary': "Covers the process of univariate linear regression analysis, including splitting the dataset into training and testing sets, fitting a linear regression model, and evaluating the model's performance with mean squared error, demonstrating a good fit for the model.", 'chapters': [{'end': 9209.398, 'start': 8328.967, 'title': 'Univariate linear regression analysis', 'summary': "Covers the process of univariate linear regression analysis, including splitting the dataset into training and testing sets, fitting a linear regression model, and evaluating the model's performance with mean squared error, demonstrating a good fit for the model.", 'duration': 880.431, 'highlights': ["The process of splitting the dataset into 80% for training and 20% for testing is essential, with 404 values for training and 102 values for testing, ensuring a robust evaluation of the model's performance.", "Fitting the linear regression model using scikit-learn's 'linear regression' function simplifies the process, reducing the complexity of the code and providing the model with the necessary theta parameters and intercept values for prediction.", "The evaluation of the model's performance using mean squared error demonstrates a good fit for the model, indicating a successful application of univariate linear regression analysis in predicting the dependent variable based on the independent variable."]}], 'duration': 880.431, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN48328967.jpg', 'highlights': ["The process of splitting the dataset into 80% for training and 20% for testing is essential, with 404 values for training and 102 values for testing, ensuring a robust evaluation of the model's performance.", "The evaluation of the model's performance using mean squared error demonstrates a good fit for the model, indicating a successful application of univariate linear regression analysis in predicting the dependent variable based on the independent variable.", "Fitting the linear regression model using scikit-learn's 'linear regression' function simplifies the process, reducing the complexity of the code and providing the model with the necessary theta parameters and intercept values for prediction."]}, {'end': 10521.464, 'segs': [{'end': 9331.49, 'src': 'embed', 'start': 9304.729, 'weight': 0, 'content': [{'end': 9308.834, 'text': 'so typical examples of logistic regression would include typically.', 'start': 9304.729, 'duration': 4.105}, {'end': 9312.81, 'text': 'typical examples would include like, or tumor prediction.', 'start': 9308.834, 'duration': 3.976}, {'end': 9314.572, 'text': 'is it a malignant or benign one?', 'start': 9312.81, 'duration': 1.762}, {'end': 9317.034, 'text': 'okay, it would be spam classifier.', 'start': 9314.572, 'duration': 2.462}, {'end': 9319.777, 'text': 'it will be fraudulent transaction detection.', 'start': 9317.034, 'duration': 2.743}, {'end': 9322.461, 'text': 'all those kind of things will come under logistic regression.', 'start': 9319.777, 'duration': 2.684}, {'end': 9327.266, 'text': 'so, as you can understand from the name itself here, the data will be too skewed, right.', 'start': 9322.461, 'duration': 4.805}, {'end': 9331.49, 'text': 'it will either have a yes or no value, or you can say it like 0 and 1..', 'start': 9327.266, 'duration': 4.224}], 'summary': 'Logistic regression is used for tumor prediction, spam classification, and fraud detection with skewed data having binary values (0 and 1).', 'duration': 26.761, 'max_score': 9304.729, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN49304729.jpg'}, {'end': 9521.314, 'src': 'embed', 'start': 9491.543, 'weight': 1, 'content': [{'end': 9497.466, 'text': 'see, now, this fits in perfectly, right, this fits in perfectly in the under training example.', 'start': 9491.543, 'duration': 5.923}, {'end': 9502.268, 'text': 'so this curve is again called.', 'start': 9497.466, 'duration': 4.802}, {'end': 9505.389, 'text': 'this curve is called sigmoid function.', 'start': 9502.268, 'duration': 3.121}, {'end': 9509.731, 'text': 'right, this x of x shift curve s shaped curve.', 'start': 9505.389, 'duration': 4.342}, {'end': 9511.612, 'text': 'so what is the features of it?', 'start': 9509.731, 'duration': 1.881}, {'end': 9515.491, 'text': 'it will, it is an asymptotic curve, right, what?', 'start': 9511.612, 'duration': 3.879}, {'end': 9516.772, 'text': 'when do we say asymptotic?', 'start': 9515.491, 'duration': 1.281}, {'end': 9521.314, 'text': 'that means it will never reach one, it will never reach zero, okay.', 'start': 9516.772, 'duration': 4.542}], 'summary': 'The sigmoid function is an asymptotic curve that never reaches one or zero.', 'duration': 29.771, 'max_score': 9491.543, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN49491543.jpg'}, {'end': 9965.229, 'src': 'embed', 'start': 9939.258, 'weight': 2, 'content': [{'end': 9943.921, 'text': "So this kind of problems won't be, you wouldn't be able to solve using linear regression models.", 'start': 9939.258, 'duration': 4.663}, {'end': 9948.765, 'text': "Okay This kind of problems you won't be able to solve using linear regression model.", 'start': 9944.321, 'duration': 4.444}, {'end': 9954.869, 'text': 'So in this case, you need that funky S-shaped curve, that sigmoid function curve that we shown here.', 'start': 9948.805, 'duration': 6.064}, {'end': 9957.011, 'text': "Okay So that's the whole idea.", 'start': 9955.209, 'duration': 1.802}, {'end': 9960.814, 'text': 'So now we will see how we can have it.', 'start': 9957.231, 'duration': 3.583}, {'end': 9965.229, 'text': "Okay So this kind of a classification problem, maybe a regression won't be able to answer.", 'start': 9961.094, 'duration': 4.135}], 'summary': 'Linear regression not suitable for solving classification problems, sigmoid function curve needed.', 'duration': 25.971, 'max_score': 9939.258, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN49939258.jpg'}, {'end': 10011.97, 'src': 'embed', 'start': 9980.956, 'weight': 4, 'content': [{'end': 9986.038, 'text': "This kind of small, small questions, this kind of small, small classification questions it can't answer.", 'start': 9980.956, 'duration': 5.082}, {'end': 9989.5, 'text': "Okay So let's move, go ahead with this PPT.", 'start': 9986.358, 'duration': 3.142}, {'end': 9996.934, 'text': 'Okay So see machine learning supervised unsupervised supervised is again divided into two broader categories regression classification regression.', 'start': 9989.82, 'duration': 7.114}, {'end': 9998.416, 'text': 'We already covered linear regression.', 'start': 9996.954, 'duration': 1.462}, {'end': 10000.277, 'text': 'Today we are going to cover logistic regression.', 'start': 9998.476, 'duration': 1.801}, {'end': 10001.739, 'text': "That's another classification problem.", 'start': 10000.318, 'duration': 1.421}, {'end': 10002.56, 'text': 'There are many others.', 'start': 10001.779, 'duration': 0.781}, {'end': 10004.302, 'text': 'We will come to those one by one.', 'start': 10003.06, 'duration': 1.242}, {'end': 10005.623, 'text': 'But these are some major chunks.', 'start': 10004.382, 'duration': 1.241}, {'end': 10011.97, 'text': 'OK So a statistical classification model as you already know this machine learning heart of machine learning is stats.', 'start': 10005.964, 'duration': 6.006}], 'summary': 'The transcript covers machine learning, including supervised and unsupervised learning, and discusses logistic regression as a classification problem.', 'duration': 31.014, 'max_score': 9980.956, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN49980956.jpg'}], 'start': 9209.738, 'title': 'Logistic regression in supervised learning', 'summary': 'Introduces logistic regression, emphasizing its importance in handling categorical data and skewed output, and highlights its application in machine learning with examples including tumor prediction and spam classification. it discusses the limitations of linear regression for logistic regression problems and introduces the sigmoid function as a better fit for classification problems. it also explains the use of logistic regression instead of linear regression, emphasizing its application for predicting categorical values and its role in handling categorical dependent variables in machine learning.', 'chapters': [{'end': 9327.266, 'start': 9209.738, 'title': 'Introduction to supervised learning and logistic regression', 'summary': 'Introduces the concepts of supervised and unsupervised learning, emphasizing the need for logistic regression in handling categorical data and skewed output, with examples including tumor prediction and spam classification.', 'duration': 117.528, 'highlights': ['The chapter introduces the concepts of supervised and unsupervised learning, emphasizing the need for logistic regression in handling categorical data and skewed output, with examples including tumor prediction and spam classification.', 'In supervised learning, there is always a definite objective, which can be a continuous set of variables or categorical variables.', 'The data for supervised learning algorithms will always be definitive and provide a specific output, while unsupervised learning algorithms will have an indefinite output, such as cluster of points.', 'Logistic regression is necessary for handling categorical problems where the data is skewed, such as tumor prediction and spam classification.', 'If the data is continuous and spread across a good range, linear regression can be used for prediction, while logistic regression is suitable for categorical problems with skewed data.', 'Examples of problems suitable for logistic regression include tumor prediction, spam classification, and fraudulent transaction detection.']}, {'end': 9615.807, 'start': 9327.266, 'title': 'Logistic regression and sigmoid function', 'summary': 'Discusses the limitations of linear regression for logistic regression problems, and introduces the sigmoid function, an asymptotic curve that provides a good fit for classification problems.', 'duration': 288.541, 'highlights': ['The limitations of linear regression for logistic regression problems Linear regression fails for logistic regression problems due to skewed data, leading to a significant error percentage and an inadequate model fit.', 'Introduction of the sigmoid function The sigmoid function, an asymptotic curve, is introduced as a suitable model for classification problems, asymptoting to one at positive infinity and to zero at negative infinity.', "Explanation of the sigmoid function's equation and characteristics The equation for the sigmoid function is detailed, along with its characteristics of asymptoting to one at positive infinity and to zero at negative infinity."]}, {'end': 9938.817, 'start': 9615.807, 'title': 'Understanding logistic regression and linear regression', 'summary': 'Explains the concept of logistic regression, emphasizing the use of sigmoid function instead of linear regression. it also highlights the limitations of linear regression in solving categorical problems and the application of logistic regression for predicting categorical values.', 'duration': 323.01, 'highlights': ['The concept of logistic regression using the sigmoid function is explained as an alternative to linear regression for predicting categorical values. Emphasizes the use of sigmoid function instead of linear regression for logistic regression.', 'Limitations of linear regression in solving categorical problems are discussed, emphasizing the need for logistic regression for such scenarios. Highlights the limitations of linear regression in solving categorical problems.', 'The application of logistic regression for predicting categorical values and the challenges of using linear regression for such predictions are illustrated. Illustrates the application of logistic regression for predicting categorical values and the challenges of using linear regression for such predictions.']}, {'end': 10521.464, 'start': 9939.258, 'title': 'Logistic regression in machine learning', 'summary': 'Introduces logistic regression as a solution for classification problems, emphasizing its application in machine learning, its ability to handle categorical dependent variables, and the role of probability theory in predicting outcomes.', 'duration': 582.206, 'highlights': ['Logistic regression as a solution for classification problems It emphasizes the importance of logistic regression in solving classification problems, indicating the need to apply it to answer small classification questions and mentioning its role in the supervised learning category.', 'Handling categorical dependent variables It explains the ability of logistic regression to deal with categorical dependent variables, highlighting its capability to handle binary or dichotomous variables and its potential for multinomial and univariate scenarios.', 'Role of probability theory in predicting outcomes It discusses the role of probability theory in logistic regression, emphasizing its significance in predicting outcomes and deriving probabilities for events, such as the probability of an email being classified as spam.']}], 'duration': 1311.726, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN49209738.jpg', 'highlights': ['Logistic regression is necessary for handling categorical problems where the data is skewed, such as tumor prediction and spam classification.', 'The sigmoid function, an asymptotic curve, is introduced as a suitable model for classification problems, asymptoting to one at positive infinity and to zero at negative infinity.', 'Emphasizes the use of sigmoid function instead of linear regression for logistic regression.', 'It discusses the limitations of linear regression for logistic regression problems and introduces the sigmoid function as a better fit for classification problems.', 'It emphasizes the importance of logistic regression in solving classification problems, indicating the need to apply it to answer small classification questions and mentioning its role in the supervised learning category.']}, {'end': 12474.965, 'segs': [{'end': 10573.162, 'src': 'embed', 'start': 10541.017, 'weight': 1, 'content': [{'end': 10544.318, 'text': 'we find out log of odds to predict the value.', 'start': 10541.017, 'duration': 3.301}, {'end': 10546.719, 'text': 'that is what is maximum likelihood estimator.', 'start': 10544.318, 'duration': 2.401}, {'end': 10547.779, 'text': 'so math behind it.', 'start': 10546.719, 'duration': 1.06}, {'end': 10551.7, 'text': "it is a bit complex, so i won't go to that, so just take my words for it.", 'start': 10547.779, 'duration': 3.921}, {'end': 10554.941, 'text': 'if you want to go through it, you can check it.', 'start': 10551.7, 'duration': 3.241}, {'end': 10559.443, 'text': 'so maximum likelihood estimator, as the name suggests it, has likelihood.', 'start': 10554.941, 'duration': 4.502}, {'end': 10560.343, 'text': 'this name right.', 'start': 10559.443, 'duration': 0.9}, {'end': 10562.203, 'text': 'so what is the likelihood of an event?', 'start': 10560.343, 'duration': 1.86}, {'end': 10563.904, 'text': 'i will tell you just a short story.', 'start': 10562.203, 'duration': 1.701}, {'end': 10573.162, 'text': 'so we all know, right, there are two terms with probabilistic probability theories odds and events right.', 'start': 10564.64, 'duration': 8.522}], 'summary': 'Maximum likelihood estimator predicts value based on log of odds and likelihood of an event.', 'duration': 32.145, 'max_score': 10541.017, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN410541017.jpg'}, {'end': 10627.657, 'src': 'embed', 'start': 10595.064, 'weight': 0, 'content': [{'end': 10597.205, 'text': 'Right That is what is probability theory.', 'start': 10595.064, 'duration': 2.141}, {'end': 10598.926, 'text': 'That is what is maximum likelihood.', 'start': 10597.225, 'duration': 1.701}, {'end': 10602.209, 'text': 'That is that is what is odd or just call.', 'start': 10599.827, 'duration': 2.382}, {'end': 10603.99, 'text': 'OK That is what odd is called.', 'start': 10602.549, 'duration': 1.441}, {'end': 10609.754, 'text': 'So this thing, this maximum likelihood estimator is is entirely based on that.', 'start': 10604.39, 'duration': 5.364}, {'end': 10615.978, 'text': 'OK So we take log of it to to minimize the impact.', 'start': 10610.074, 'duration': 5.904}, {'end': 10617.419, 'text': 'So we will see why we take log.', 'start': 10615.998, 'duration': 1.421}, {'end': 10618.099, 'text': 'We will see later.', 'start': 10617.439, 'duration': 0.66}, {'end': 10620.901, 'text': 'OK For logistic regression, why we take log.', 'start': 10618.52, 'duration': 2.381}, {'end': 10621.662, 'text': 'We will see that.', 'start': 10621.121, 'duration': 0.541}, {'end': 10627.657, 'text': 'For now, just think that it is log of this kind of an equation, log of probability of odds.', 'start': 10622.435, 'duration': 5.222}], 'summary': 'Maximum likelihood estimator is based on probability theory, and involves taking the log to minimize the impact.', 'duration': 32.593, 'max_score': 10595.064, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN410595064.jpg'}, {'end': 10726.382, 'src': 'embed', 'start': 10700.629, 'weight': 3, 'content': [{'end': 10705.513, 'text': 'our estimate is if the value is greater than four, then it will be a spam.', 'start': 10700.629, 'duration': 4.884}, {'end': 10710.456, 'text': "if it's not greater than less than if it's greater than or equals five, It will be a spam.", 'start': 10705.513, 'duration': 4.943}, {'end': 10714.138, 'text': 'Otherwise, it will be a not spam, right? So this zone is safe.', 'start': 10710.616, 'duration': 3.522}, {'end': 10715.018, 'text': 'This zone is red.', 'start': 10714.178, 'duration': 0.84}, {'end': 10722.821, 'text': 'Okay Now what we will do, we will take 1013 to 89 mails where so for the first one, we have two spam words.', 'start': 10715.418, 'duration': 7.403}, {'end': 10724.561, 'text': 'So the probability will be zero.', 'start': 10723.281, 'duration': 1.28}, {'end': 10726.382, 'text': 'This is our label data that we are preparing.', 'start': 10724.581, 'duration': 1.801}], 'summary': 'Using a threshold of 4, spam will be identified, with 1013 to 89 mails analyzed for spam words.', 'duration': 25.753, 'max_score': 10700.629, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN410700629.jpg'}, {'end': 10792.798, 'src': 'embed', 'start': 10764.022, 'weight': 2, 'content': [{'end': 10765.743, 'text': "So that's how our label data looks like.", 'start': 10764.022, 'duration': 1.721}, {'end': 10767.684, 'text': 'Okay Now we will plot this.', 'start': 10766.063, 'duration': 1.621}, {'end': 10774.468, 'text': 'So how we will plot? We will plot this probability of the mail being spam in the Y axis and count of words in the X axis.', 'start': 10767.924, 'duration': 6.544}, {'end': 10777.069, 'text': 'Okay Count of Y will be in the X axis.', 'start': 10774.948, 'duration': 2.121}, {'end': 10780.931, 'text': 'Right So how the graph will look like? The graph will look like something like this.', 'start': 10777.329, 'duration': 3.602}, {'end': 10785.394, 'text': "Okay So let's say for one word, the mail is not a spam.", 'start': 10781.492, 'duration': 3.902}, {'end': 10788.115, 'text': 'Okay Say two words, mail is not a spam.', 'start': 10785.714, 'duration': 2.401}, {'end': 10789.756, 'text': 'Three words, mail is not a spam.', 'start': 10788.315, 'duration': 1.441}, {'end': 10792.798, 'text': 'So we all have Y values to be fixed at zero.', 'start': 10790.156, 'duration': 2.642}], 'summary': 'Plot probability of mail being spam vs word count', 'duration': 28.776, 'max_score': 10764.022, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN410764022.jpg'}, {'end': 11013.191, 'src': 'embed', 'start': 10956.484, 'weight': 4, 'content': [{'end': 10962.671, 'text': 'probability of not getting a fish is, uh, like 3 by 5, right for not getting a fish.', 'start': 10956.484, 'duration': 6.187}, {'end': 10967.653, 'text': 'so probability odds is chances for it.', 'start': 10962.671, 'duration': 4.982}, {'end': 10975.618, 'text': "so chances that the guy will catch a fish is 2 and the chances that the guy doesn't catch a fish is 3.", 'start': 10967.653, 'duration': 7.965}, {'end': 10983.728, 'text': 'so odd, for that event being happening is 2 by 3 right, or for the event that will happen is two by three.', 'start': 10975.618, 'duration': 8.11}, {'end': 10986.569, 'text': "So that's how we have for for betting site.", 'start': 10983.748, 'duration': 2.821}, {'end': 10987.869, 'text': "That's a good example.", 'start': 10986.929, 'duration': 0.94}, {'end': 10990.03, 'text': 'Betting is a good example of this regression.', 'start': 10987.929, 'duration': 2.101}, {'end': 10992.57, 'text': 'But I am not encouraging to bet as is illegal.', 'start': 10990.05, 'duration': 2.52}, {'end': 10995.971, 'text': 'Okay So here it is just a just for an example.', 'start': 10992.91, 'duration': 3.061}, {'end': 10997.512, 'text': 'So we have odds for everything.', 'start': 10996.011, 'duration': 1.501}, {'end': 10998.812, 'text': 'Every team winning or not.', 'start': 10997.712, 'duration': 1.1}, {'end': 11001.433, 'text': 'Right So how they do it, how they do it.', 'start': 10998.872, 'duration': 2.561}, {'end': 11005.974, 'text': 'We do it using like the team chances.', 'start': 11001.533, 'duration': 4.441}, {'end': 11011.529, 'text': 'team chances for being for winning and team chances for not winning.', 'start': 11006.621, 'duration': 4.908}, {'end': 11013.191, 'text': "Okay So that's how.", 'start': 11012.25, 'duration': 0.941}], 'summary': 'Probability and odds explained with examples, 3/5 chance of not catching a fish, 2/3 odds for fishing event.', 'duration': 56.707, 'max_score': 10956.484, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN410956484.jpg'}, {'end': 11457.592, 'src': 'embed', 'start': 11420.571, 'weight': 7, 'content': [{'end': 11433.633, 'text': 'so if the prediction is correct, okay, if the prediction is correct, that is, y i is zero.', 'start': 11420.571, 'duration': 13.062}, {'end': 11436.434, 'text': 'h theta x is also zero.', 'start': 11433.633, 'duration': 2.801}, {'end': 11438.474, 'text': 'that is a true prediction.', 'start': 11436.434, 'duration': 2.04}, {'end': 11442.655, 'text': 'cost function becomes what zero.', 'start': 11438.474, 'duration': 4.181}, {'end': 11444.141, 'text': 'So no penalties.', 'start': 11443.32, 'duration': 0.821}, {'end': 11448.845, 'text': 'We are encouraging our model to use that because it has done the same prediction.', 'start': 11444.501, 'duration': 4.344}, {'end': 11457.592, 'text': 'If you substitute the values for 1 also if it is for 1 then also it will be turned to 0.', 'start': 11449.325, 'duration': 8.267}], 'summary': 'Encouraging correct predictions with zero cost to avoid penalties.', 'duration': 37.021, 'max_score': 11420.571, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN411420571.jpg'}, {'end': 11620.728, 'src': 'embed', 'start': 11592.013, 'weight': 6, 'content': [{'end': 11594.896, 'text': 'okay, to build a cost function for logistic regression.', 'start': 11592.013, 'duration': 2.883}, {'end': 11599.821, 'text': 'and this whole concept that i explained is the cost function of logistic regression.', 'start': 11594.896, 'duration': 4.925}, {'end': 11609.55, 'text': 'so if our model is correct and makes a prediction to be prediction is correct, the odd the, the cost function will be zero.', 'start': 11599.821, 'duration': 9.729}, {'end': 11618.547, 'text': 'if it is wrong, then it will become negative infinity, which will yield you a very, very high value, high cost value to the function,', 'start': 11609.55, 'duration': 8.997}, {'end': 11620.728, 'text': 'and that will be dangerous for the model.', 'start': 11618.547, 'duration': 2.181}], 'summary': 'Logistic regression cost function penalizes wrong predictions with negative infinity and incentivizes correct predictions with zero cost.', 'duration': 28.715, 'max_score': 11592.013, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN411592013.jpg'}, {'end': 11902.403, 'src': 'embed', 'start': 11878.485, 'weight': 8, 'content': [{'end': 11885.351, 'text': 'So first whatever values of X you give it can be continuous or categorical, but it will give you a value of 0 and 1.', 'start': 11878.485, 'duration': 6.866}, {'end': 11893.797, 'text': "Second, linear regression can't fit this kind of a curve because it won't always predict 0 and 1, right? It will predict a continuous set of data.", 'start': 11885.351, 'duration': 8.446}, {'end': 11896.639, 'text': "So that's why we use linear regression in here.", 'start': 11894.237, 'duration': 2.402}, {'end': 11897.96, 'text': 'It is in the other words.', 'start': 11897.059, 'duration': 0.901}, {'end': 11902.403, 'text': 'In other words, this is how the SIGMAT function works.', 'start': 11898.74, 'duration': 3.663}], 'summary': 'Sigmat function converts varied x values to 0s and 1s, suitable for linear regression.', 'duration': 23.918, 'max_score': 11878.485, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN411878485.jpg'}, {'end': 12013.492, 'src': 'embed', 'start': 11982.979, 'weight': 9, 'content': [{'end': 11984.019, 'text': 'our model will fail.', 'start': 11982.979, 'duration': 1.04}, {'end': 11986.821, 'text': 'so see, here we have a wrong prediction.', 'start': 11984.019, 'duration': 2.802}, {'end': 11991.283, 'text': 'so this log of words line that we have drawn is not correct, right.', 'start': 11986.821, 'duration': 4.462}, {'end': 11995.224, 'text': 'so the theta parameters need to be adjusted now.', 'start': 11991.283, 'duration': 3.941}, {'end': 11996.245, 'text': 'okay, so the.', 'start': 11995.224, 'duration': 1.021}, {'end': 12000.047, 'text': 'so we need to rotate the log of words line now.', 'start': 11996.245, 'duration': 3.802}, {'end': 12006.488, 'text': 'okay, see, for this one, for this male be spam or not, it still predicts this mail.', 'start': 12000.047, 'duration': 6.441}, {'end': 12008.989, 'text': 'actually it still predicts correctly.', 'start': 12006.488, 'duration': 2.501}, {'end': 12011.491, 'text': 'so the it is correct.', 'start': 12008.989, 'duration': 2.502}, {'end': 12013.492, 'text': 'but this one is a wrong prediction.', 'start': 12011.491, 'duration': 2.001}], 'summary': 'Model needs adjustment for more accurate predictions.', 'duration': 30.513, 'max_score': 11982.979, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN411982979.jpg'}, {'end': 12450.821, 'src': 'embed', 'start': 12423.038, 'weight': 10, 'content': [{'end': 12425.82, 'text': 'i have shown how we can put this into this value.', 'start': 12423.038, 'duration': 2.782}, {'end': 12426.721, 'text': 'i have shown.', 'start': 12425.82, 'duration': 0.901}, {'end': 12427.681, 'text': "so that's what is.", 'start': 12426.721, 'duration': 0.96}, {'end': 12435.787, 'text': 'the concept of theoretical, mathematical concept, of hypothesis and of of logistic regression looks like okay.', 'start': 12427.681, 'duration': 8.106}, {'end': 12436.627, 'text': "that's what is about.", 'start': 12435.787, 'duration': 0.84}, {'end': 12439.289, 'text': 'the concept of logistic regression looks like okay.', 'start': 12436.627, 'duration': 2.662}, {'end': 12448.8, 'text': 'so the next thing that we will do is we will check the demo of logistic regression.', 'start': 12439.289, 'duration': 9.511}, {'end': 12450.821, 'text': 'okay, after this i will.', 'start': 12448.8, 'duration': 2.021}], 'summary': 'Demonstrating the application of logistic regression in theoretical and mathematical concepts.', 'duration': 27.783, 'max_score': 12423.038, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN412423038.jpg'}], 'start': 10521.872, 'title': 'Logistic regression fundamentals', 'summary': 'Covers key concepts of logistic regression including maximum likelihood estimator, univariate logistic regression for spam detection, logistic regression basics, understanding logistic regression cost function, and logistic regression with sigmat function. it emphasizes practical applications and techniques for model prediction and curve fitting.', 'chapters': [{'end': 10641.264, 'start': 10521.872, 'title': 'Maximum likelihood estimator in machine learning', 'summary': 'Discusses the concept of maximum likelihood estimator in machine learning, explaining its relevance in finding the best fitted curve using log odds to predict values for logistic regression.', 'duration': 119.392, 'highlights': ['The maximum likelihood estimator is used to find the best fitted curve using log odds to predict values for logistic regression.', 'It is based on the likelihood of an event, which is the chance of the event happening divided by the chance of the event not happening.', 'The estimator uses the log of the likelihood to minimize its impact, which is essential for logistic regression.', 'Understanding the concept of maximum likelihood estimator is crucial for machine learning problems, particularly in logistic regression.']}, {'end': 10935.732, 'start': 10641.844, 'title': 'Univariate logistic regression for spam detection', 'summary': 'Discusses the use of univariate logistic regression for spam detection, using the count of spam words as the independent variable and probability of mail being spam as the dependent variable, with a threshold of four for classification, and the process involves plotting the probability of mail being spam against the count of words and drawing the regression curve, log of odds line, and sigmoid function.', 'duration': 293.888, 'highlights': ['The process involves plotting the probability of mail being spam against the count of words and drawing the regression curve, log of odds line, and sigmoid function. The chapter discusses the process of plotting the probability of mail being spam against the count of words and drawing the regression curve, log of odds line, and sigmoid function, illustrating the logistic regression analysis.', 'Using the count of spam words as the independent variable and probability of mail being spam as the dependent variable, with a threshold of four for classification. The analysis involves using the count of spam words as the independent variable and probability of mail being spam as the dependent variable, with a threshold of four for classification, indicating the criteria for classifying mails as spam or not.', 'The estimate is if the value is greater than or equals five, it will be a spam. Otherwise, it will be a not spam. The estimation criteria for classifying mails as spam or not is based on a threshold of greater than or equals five for classification as spam, providing a clear classification rule.']}, {'end': 11271.077, 'start': 10935.732, 'title': 'Logistic regression basics', 'summary': 'Explains the basics of logistic regression with examples and details on probability, odds, odds ratio, log of odds, and log likelihood, emphasizing the application in betting and model prediction, and the concept of cost function.', 'duration': 335.345, 'highlights': ['The probability of getting a fish for dinner is 2/5 and the probability of not getting a fish is 3/5, with corresponding odds of 2 and 3, demonstrating the concept of probability and odds in a tangible example. Probability of getting a fish is 2/5, probability of not getting a fish is 3/5, odds of getting a fish is 2, and odds of not getting a fish is 3.', 'The explanation of betting as an example for logistic regression, highlighting the odds for every team winning or not, and the application of logit function and odds ratio in predicting outcomes. Betting is used as an example for logistic regression, illustrating the odds for every team winning or not, and the application of logit function and odds ratio in predicting outcomes.', 'The concept of log of odds and log likelihood, with insights into how log of odds is calculated and its application in model prediction and cost function of logistic regression. Explanation of log of odds and log likelihood, and its application in model prediction and the cost function of logistic regression.']}, {'end': 11736.618, 'start': 11271.63, 'title': 'Understanding logistic regression cost function', 'summary': 'Explains the cost function for logistic regression, highlighting how correct predictions yield zero cost, while incorrect predictions yield negative infinity cost, encouraging model adjustments for healthy curve fitting.', 'duration': 464.988, 'highlights': ['Correct predictions yield zero cost, encouraging model to use the same prediction. If the prediction is correct, i.e., yi is zero and h theta x is also zero, the cost function becomes zero, encouraging the model to use the same prediction.', 'Incorrect predictions yield negative infinity cost, discouraging model usage. If yi is 1 and h theta x is predicted to be 1, the cost becomes negative infinity, discouraging model usage as it yields a high cost.', 'Explanation of sigmoid function and logistic regression function in stats. The transcript provides an explanation of the sigmoid function and logistic regression function in stats, including the parameters L, k, and x0.']}, {'end': 12474.965, 'start': 11736.618, 'title': 'Logistic regression and sigmat function', 'summary': 'Introduces the concept of logistic regression and sigmat function, explaining how sigmat function maps real-valued numbers to a range between 0 and 1, facilitating classification problems, and details the process of optimizing the theta parameters for accurate predictions.', 'duration': 738.347, 'highlights': ['SIGMAT function maps real-valued numbers to a range between 0 and 1, facilitating classification problems. The SIGMAT function maps any real-valued number to a value between 0 and 1, allowing easy classification, as it yields a value between 0 and 1 based on the input value.', 'Process of optimizing the theta parameters for accurate predictions in logistic regression. The chapter details the process of optimizing the theta parameters, explaining that the model needs to be adjusted by rotating the log of odds line and finding the best fit to improve the accuracy of predictions.', 'Explanation of the hypothesis for logistic regression, outlining the mathematical concept and theoretical model. The chapter explains the hypothesis for logistic regression, detailing the mathematical concept and theoretical model of logistic regression and how the theta parameters are applied to the equation for machine learning.']}], 'duration': 1953.093, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN410521872.jpg', 'highlights': ['The maximum likelihood estimator is used to find the best fitted curve using log odds to predict values for logistic regression.', 'Understanding the concept of maximum likelihood estimator is crucial for machine learning problems, particularly in logistic regression.', 'The process involves plotting the probability of mail being spam against the count of words and drawing the regression curve, log of odds line, and sigmoid function.', 'The estimate is if the value is greater than or equals five, it will be a spam. Otherwise, it will be a not spam.', 'Probability of getting a fish is 2/5, probability of not getting a fish is 3/5, odds of getting a fish is 2, and odds of not getting a fish is 3.', 'Betting is used as an example for logistic regression, illustrating the odds for every team winning or not, and the application of logit function and odds ratio in predicting outcomes.', 'Explanation of log of odds and log likelihood, and its application in model prediction and the cost function of logistic regression.', 'If the prediction is correct, i.e., yi is zero and h theta x is also zero, the cost function becomes zero, encouraging the model to use the same prediction.', 'The SIGMAT function maps any real-valued number to a value between 0 and 1, allowing easy classification, as it yields a value between 0 and 1 based on the input value.', 'The chapter details the process of optimizing the theta parameters, explaining that the model needs to be adjusted by rotating the log of odds line and finding the best fit to improve the accuracy of predictions.', 'The chapter explains the hypothesis for logistic regression, detailing the mathematical concept and theoretical model of logistic regression and how the theta parameters are applied to the equation for machine learning.']}, {'end': 13774.158, 'segs': [{'end': 12562.066, 'src': 'embed', 'start': 12506.788, 'weight': 5, 'content': [{'end': 12514.753, 'text': 'okay, now for the, for all the features, we are calculating this y hat plus equals coefficients i plus 1 star rho 1.', 'start': 12506.788, 'duration': 7.965}, {'end': 12517.775, 'text': 'okay, so then we are substituting this value.', 'start': 12514.753, 'duration': 3.022}, {'end': 12522.084, 'text': 'okay, because this is theta transpose x into x.', 'start': 12518.382, 'duration': 3.702}, {'end': 12523.825, 'text': 'right, this this line.', 'start': 12522.084, 'duration': 1.741}, {'end': 12524.725, 'text': 'we are calculating.', 'start': 12523.825, 'duration': 0.9}, {'end': 12526.006, 'text': 'theta transpose x.', 'start': 12524.725, 'duration': 1.281}, {'end': 12528.347, 'text': 'right. so theta 0 plus theta 1 x 1.', 'start': 12526.006, 'duration': 2.341}, {'end': 12529.188, 'text': 'theta 2 x 2.', 'start': 12528.347, 'duration': 0.841}, {'end': 12529.948, 'text': 'theta 3 x 3.', 'start': 12529.188, 'duration': 0.76}, {'end': 12532.429, 'text': 'like that, we are calculating here in this line.', 'start': 12529.948, 'duration': 2.481}, {'end': 12534.33, 'text': 'right, this is the y hat.', 'start': 12532.429, 'duration': 1.901}, {'end': 12536.331, 'text': 'that is the theta 0.', 'start': 12534.33, 'duration': 2.001}, {'end': 12538.092, 'text': 'okay, this is the theta 0.', 'start': 12536.331, 'duration': 1.761}, {'end': 12544.152, 'text': 'now we are returning this function right, 1 by 1 plus e, to the power minus.', 'start': 12538.092, 'duration': 6.06}, {'end': 12545.413, 'text': 'theta transpose x.', 'start': 12544.152, 'duration': 1.261}, {'end': 12547.815, 'text': 'so we have theta transpose x in y hat.', 'start': 12545.413, 'duration': 2.402}, {'end': 12550.437, 'text': 'so e to the power minus y hat.', 'start': 12547.815, 'duration': 2.622}, {'end': 12552.278, 'text': 'okay, we are returning this function.', 'start': 12550.437, 'duration': 1.841}, {'end': 12554.22, 'text': 'so this is our predictor function.', 'start': 12552.278, 'duration': 1.942}, {'end': 12560.124, 'text': 'okay, now, we are anyway.', 'start': 12554.22, 'duration': 5.904}, {'end': 12562.066, 'text': 'so this is our theta transpose x.', 'start': 12560.124, 'duration': 1.942}], 'summary': 'Calculating predictor function theta transpose x for features and returning function.', 'duration': 55.278, 'max_score': 12506.788, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN412506788.jpg'}, {'end': 12658.112, 'src': 'embed', 'start': 12631.265, 'weight': 7, 'content': [{'end': 12639.791, 'text': 'Okay So the cost function that we have for linear regression is mean squared error.', 'start': 12631.265, 'duration': 8.526}, {'end': 12647.757, 'text': 'Right. So for categorical data, for this kind of data, where the values are categorical, if you draw that cost function,', 'start': 12640.111, 'duration': 7.646}, {'end': 12651.179, 'text': 'then it will yield some kind of a curve like this', 'start': 12647.757, 'duration': 3.422}, {'end': 12655.091, 'text': 'This kind of a zigzag curve you will get.', 'start': 12652.629, 'duration': 2.462}, {'end': 12658.112, 'text': 'For the mean squared error function.', 'start': 12656.211, 'duration': 1.901}], 'summary': 'Linear regression cost function is mean squared error, creating zigzag curve for categorical data.', 'duration': 26.847, 'max_score': 12631.265, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN412631265.jpg'}, {'end': 12808.629, 'src': 'embed', 'start': 12783.023, 'weight': 8, 'content': [{'end': 12794.241, 'text': 'okay, so if you use the cost function, for if you use cost function of linear regression for this logistic regression,', 'start': 12783.023, 'duration': 11.218}, {'end': 12800.564, 'text': 'your function will get stuck into local minimas, which will be dangerous for you.', 'start': 12794.241, 'duration': 6.323}, {'end': 12802.805, 'text': 'it will yield you wrong values.', 'start': 12800.564, 'duration': 2.241}, {'end': 12808.629, 'text': 'you will never get the values that were truly correct, okay, that were truly correct.', 'start': 12802.805, 'duration': 5.824}], 'summary': 'Using linear regression cost function for logistic regression leads to local minimas and incorrect values.', 'duration': 25.606, 'max_score': 12783.023, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN412783023.jpg'}, {'end': 13017.89, 'src': 'embed', 'start': 12989.608, 'weight': 4, 'content': [{'end': 12993.869, 'text': 'So if you see, for predictions we have this kind of logistic regression function.', 'start': 12989.608, 'duration': 4.261}, {'end': 13001.451, 'text': 'The like what we had yesterday, I mean, the SIGMAT function, then we have calculating coefficients also.', 'start': 12995.049, 'duration': 6.402}, {'end': 13003.252, 'text': 'Right Coefficients also we have.', 'start': 13001.791, 'duration': 1.461}, {'end': 13006.773, 'text': 'Then for prediction, we have the SIGMAT function again.', 'start': 13003.712, 'duration': 3.061}, {'end': 13008.633, 'text': "OK, so that's how this works.", 'start': 13007.153, 'duration': 1.48}, {'end': 13015.335, 'text': 'So no need to no need to go much detail into it because we are not going to write this algorithm on our hand.', 'start': 13008.993, 'duration': 6.342}, {'end': 13017.89, 'text': 'So we will basically do one thing.', 'start': 13015.888, 'duration': 2.002}], 'summary': 'Logistic regression used for predictions with coefficients calculated. sigmat function applied for prediction.', 'duration': 28.282, 'max_score': 12989.608, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN412989608.jpg'}, {'end': 13085.224, 'src': 'embed', 'start': 13029.96, 'weight': 0, 'content': [{'end': 13032.902, 'text': 'And how do we use scikit machine learning? This one actually.', 'start': 13029.96, 'duration': 2.942}, {'end': 13034.823, 'text': 'This example.', 'start': 13033.623, 'duration': 1.2}, {'end': 13042.129, 'text': 'So how you can use scikit machine learning machine learning package to work with logistic regression thing.', 'start': 13035.164, 'duration': 6.965}, {'end': 13048.304, 'text': 'Okay So now what we will do, we will see how you can apply logistic regression to this.', 'start': 13042.45, 'duration': 5.854}, {'end': 13052.145, 'text': 'You will see getting done in one minute, right? One line of code.', 'start': 13048.744, 'duration': 3.401}, {'end': 13054.246, 'text': "So that's the main funny part of machine learning.", 'start': 13052.485, 'duration': 1.761}, {'end': 13060.228, 'text': "Even if you skip it, then also you will you can write the code, but you won't get the basic of it.", 'start': 13054.846, 'duration': 5.382}, {'end': 13060.768, 'text': "So that's right.", 'start': 13060.248, 'duration': 0.52}, {'end': 13062.889, 'text': "Anyways So let's see.", 'start': 13061.488, 'duration': 1.401}, {'end': 13066.15, 'text': 'We are going to do with what hard this is data set.', 'start': 13062.989, 'duration': 3.161}, {'end': 13068.771, 'text': 'Okay I will open that up for you.', 'start': 13066.71, 'duration': 2.061}, {'end': 13069.791, 'text': 'Come on.', 'start': 13068.791, 'duration': 1}, {'end': 13076.615, 'text': 'yeah, i have this, i have this.', 'start': 13072.611, 'duration': 4.004}, {'end': 13085.224, 'text': 'okay. so we, uh, for this example, right, we will be using a heart disease data set which will have 303 rows and 13 attributes.', 'start': 13076.615, 'duration': 8.609}], 'summary': 'Using scikit machine learning for logistic regression on a heart disease dataset with 303 rows and 13 attributes.', 'duration': 55.264, 'max_score': 13029.96, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN413029960.jpg'}, {'end': 13175.495, 'src': 'embed', 'start': 13148.042, 'weight': 2, 'content': [{'end': 13151.504, 'text': 'okay, we will divide the data into independent and dependent variables.', 'start': 13148.042, 'duration': 3.462}, {'end': 13154.925, 'text': 'then we will split it into train and test set of 80, 20 percentage.', 'start': 13151.504, 'duration': 3.421}, {'end': 13156.546, 'text': 'right, we will do that.', 'start': 13154.925, 'duration': 1.621}, {'end': 13159.287, 'text': 'then we will train our model using these lines.', 'start': 13156.546, 'duration': 2.741}, {'end': 13164.149, 'text': 'so, using scikit-learn logistic regression model, we are going to train our output model.', 'start': 13159.287, 'duration': 4.862}, {'end': 13167.969, 'text': 'okay, then we are going to predict our values.', 'start': 13164.149, 'duration': 3.82}, {'end': 13170.211, 'text': 'okay, we are going to predict our values.', 'start': 13167.969, 'duration': 2.242}, {'end': 13172.292, 'text': 'so next will be calculating accuracy.', 'start': 13170.211, 'duration': 2.081}, {'end': 13175.495, 'text': 'we will use some precision, recall and confusion matrix.', 'start': 13172.292, 'duration': 3.203}], 'summary': 'Data divided into independent and dependent variables. trained model using logistic regression. predicted values and calculated accuracy.', 'duration': 27.453, 'max_score': 13148.042, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN413148042.jpg'}, {'end': 13547.08, 'src': 'embed', 'start': 13524.853, 'weight': 3, 'content': [{'end': 13535.677, 'text': 'But think of a situation when that patient originally has a heart disease but you fail to recognize that that will be a huge penalty.', 'start': 13524.853, 'duration': 10.824}, {'end': 13536.317, 'text': 'kind of thing, right?', 'start': 13535.677, 'duration': 0.64}, {'end': 13539.918, 'text': "That will be a huge loss for the patient's family.", 'start': 13536.337, 'duration': 3.581}, {'end': 13547.08, 'text': "So that's why this F1 score is into picture, which will yield you the weighted average of precision and recall,", 'start': 13540.278, 'duration': 6.802}], 'summary': 'F1 score measures precision and recall, critical for patient outcomes.', 'duration': 22.227, 'max_score': 13524.853, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN413524853.jpg'}, {'end': 13723.043, 'src': 'embed', 'start': 13689.779, 'weight': 9, 'content': [{'end': 13692.3, 'text': 'This has a value of 165.', 'start': 13689.779, 'duration': 2.521}, {'end': 13694.861, 'text': 'So just to visualize the data.', 'start': 13692.3, 'duration': 2.561}, {'end': 13697.781, 'text': 'Okay Just to visualize the data, how the data looks like.', 'start': 13695.141, 'duration': 2.64}, {'end': 13699.282, 'text': 'Okay How the data looks like.', 'start': 13698.221, 'duration': 1.061}, {'end': 13699.882, 'text': 'We will do that.', 'start': 13699.322, 'duration': 0.56}, {'end': 13705.563, 'text': 'Okay Next, we will split the data into features X and target Y level sets.', 'start': 13700.162, 'duration': 5.401}, {'end': 13708.244, 'text': 'Right So we are creating a feature vector here.', 'start': 13705.923, 'duration': 2.321}, {'end': 13711.465, 'text': 'Yes C1 is also a type of a plot.', 'start': 13708.824, 'duration': 2.641}, {'end': 13716.586, 'text': 'Okay You can use matplotlib or cbound plot, but cbound is not much used.', 'start': 13711.805, 'duration': 4.781}, {'end': 13723.043, 'text': 'so c bond is a plotting library which was built on top of matplotlib.', 'start': 13717.316, 'duration': 5.727}], 'summary': 'Data has a value of 165, split into features x and target y level sets.', 'duration': 33.264, 'max_score': 13689.779, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN413689779.jpg'}, {'end': 13776.1, 'src': 'embed', 'start': 13747.604, 'weight': 11, 'content': [{'end': 13750.506, 'text': 'uh, sorry, what is this plotting library?', 'start': 13747.604, 'duration': 2.902}, {'end': 13751.607, 'text': 'separate graphing library.', 'start': 13750.506, 'duration': 1.101}, {'end': 13756.633, 'text': 'It also has the same kind of features as of Matplotlib, but it is not much use.', 'start': 13752.071, 'duration': 4.562}, {'end': 13761.695, 'text': 'So see one main thing in Python is Python is open source, right?', 'start': 13756.933, 'duration': 4.762}, {'end': 13771.379, 'text': 'So you also can go ahead, take the Matplotlib out and you can build your own like some kind of a library and name it Revy library that you can do.', 'start': 13762.115, 'duration': 9.264}, {'end': 13774.158, 'text': 'Okay, so that you can do so.', 'start': 13771.919, 'duration': 2.239}, {'end': 13776.1, 'text': "that's why this kind of things came out.", 'start': 13774.158, 'duration': 1.942}], 'summary': "Python's open-source nature allows for building custom graphing libraries like revy to extend functionalities.", 'duration': 28.496, 'max_score': 13747.604, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN413747604.jpg'}], 'start': 12474.965, 'title': 'Logistic regression and heart disease classifier', 'summary': "Covers the creation of logistic regression function using feature vectors and coefficients, logistic regression's limitations and gradient descent, and the building of a heart disease classifier achieving a mean accuracy of 77.124% on a dataset with 303 rows and 13 attributes.", 'chapters': [{'end': 12562.066, 'start': 12474.965, 'title': 'Logistic regression function', 'summary': 'Discusses the creation of a logistic regression function using feature vectors and coefficients, calculating predictions, and returning the predictor function.', 'duration': 87.101, 'highlights': ['The chapter discusses the creation of a logistic regression function using feature vectors and coefficients, calculating predictions, and returning the predictor function.', 'The function calculates theta transpose x for predictions, where theta 0 plus theta 1 x 1, theta 2 x 2, theta 3 x 3 are calculated for all features.', 'The predictor function returns 1 by 1 plus e to the power of minus theta transpose x, and this is our predictor function.']}, {'end': 13054.246, 'start': 12562.066, 'title': 'Logistic regression and gradient descent', 'summary': 'Explains the limitations of using mean squared error in logistic regression, demonstrates the concept of gradient descent and its potential to get stuck in local minima, and showcases the implementation and evaluation of logistic regression using scikit machine learning, achieving a mean accuracy of 77.124%.', 'duration': 492.18, 'highlights': ['The chapter explains the limitations of using mean squared error in logistic regression Mean squared error cannot be used in logistic regression for categorical data as it yields a zigzag curve due to the nature of the hypothesis function, leading to potential issues with finding the minimum value and getting stuck in local minimas.', 'Demonstrates the concept of gradient descent and its potential to get stuck in local minima The chapter elaborates on the concept of gradient descent and how it can get stuck in local minimas when used with the cost function of linear regression in logistic regression, emphasizing the need for a suitable cost function to avoid incorrect values.', 'Showcases the implementation and evaluation of logistic regression using scikit machine learning, achieving a mean accuracy of 77.124% The implementation and evaluation of logistic regression using scikit machine learning resulted in a mean accuracy of 77.124% on a diabetes dataset, demonstrating the practical application and effectiveness of the logistic regression model.']}, {'end': 13570.136, 'start': 13054.846, 'title': 'Building heart disease classifier', 'summary': 'Covers building a heart disease classifier using logistic regression on a data set with 303 rows and 13 attributes, training the model with 80-20 train-test split, and evaluating its accuracy, precision, recall, and f1 score.', 'duration': 515.29, 'highlights': ['The data set consists of 303 rows and 13 attributes, and the goal is to predict the presence of heart disease in individuals. The dataset contains 303 rows and 13 attributes, with the objective of predicting the presence of heart disease in individuals, representing the foundation of the classifier.', 'Model training involves using logistic regression and a 80-20 train-test split. The model is trained using logistic regression and a train-test split of 80-20, depicting the key steps in building the heart disease classifier.', "Evaluation metrics include accuracy, precision, recall, and F1 score to measure the model's performance in predicting heart disease. The evaluation encompasses metrics such as accuracy, precision, recall, and F1 score to gauge the model's effectiveness in predicting heart disease."]}, {'end': 13774.158, 'start': 13570.136, 'title': 'Machine learning example with heart dataset', 'summary': 'Covers an example using a heart dataset with 303 records and 13 features, demonstrating data visualization, feature splitting, and an insight into a plotting library.', 'duration': 204.022, 'highlights': ['The dataset contains 165 records of one category and 138 records of another category, demonstrating class imbalance.', 'The chapter includes visualizations such as bar plots to understand the data distribution and features like splitting data into X and Y level sets.', "The discussion provides insights into the usage and comparison of plotting libraries, highlighting the significance of Python's open-source nature for creating custom libraries."]}], 'duration': 1299.193, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN412474965.jpg', 'highlights': ['The implementation and evaluation of logistic regression using scikit machine learning resulted in a mean accuracy of 77.124% on a diabetes dataset, demonstrating the practical application and effectiveness of the logistic regression model.', 'The data set consists of 303 rows and 13 attributes, with the objective of predicting the presence of heart disease in individuals, representing the foundation of the classifier.', 'The model is trained using logistic regression and a train-test split of 80-20, depicting the key steps in building the heart disease classifier.', "The evaluation encompasses metrics such as accuracy, precision, recall, and F1 score to gauge the model's effectiveness in predicting heart disease.", 'The chapter discusses the creation of a logistic regression function using feature vectors and coefficients, calculating predictions, and returning the predictor function.', 'The function calculates theta transpose x for predictions, where theta 0 plus theta 1 x 1, theta 2 x 2, theta 3 x 3 are calculated for all features.', 'The predictor function returns 1 by 1 plus e to the power of minus theta transpose x, and this is our predictor function.', 'The chapter explains the limitations of using mean squared error in logistic regression Mean squared error cannot be used in logistic regression for categorical data as it yields a zigzag curve due to the nature of the hypothesis function, leading to potential issues with finding the minimum value and getting stuck in local minimas.', 'The chapter elaborates on the concept of gradient descent and how it can get stuck in local minimas when used with the cost function of linear regression in logistic regression, emphasizing the need for a suitable cost function to avoid incorrect values.', 'The dataset contains 165 records of one category and 138 records of another category, demonstrating class imbalance.', 'The chapter includes visualizations such as bar plots to understand the data distribution and features like splitting data into X and Y level sets.', "The discussion provides insights into the usage and comparison of plotting libraries, highlighting the significance of Python's open-source nature for creating custom libraries."]}, {'end': 15546.408, 'segs': [{'end': 13990.352, 'src': 'embed', 'start': 13955.093, 'weight': 0, 'content': [{'end': 13956.495, 'text': 'has it okay?', 'start': 13955.093, 'duration': 1.402}, {'end': 13958.617, 'text': 'so now, using the model, we will predict x test.', 'start': 13956.495, 'duration': 2.122}, {'end': 13960.417, 'text': 'We will predict X test.', 'start': 13959.316, 'duration': 1.101}, {'end': 13962.858, 'text': 'So Y prediction will be our X test.', 'start': 13960.857, 'duration': 2.001}, {'end': 13969.262, 'text': 'Okay So for finding the accuracy, we have a function called score in the log model.', 'start': 13963.278, 'duration': 5.984}, {'end': 13977.987, 'text': 'So when we pass X test and Y test, it will automatically calculate the predicted values that is Y predicted and it will yield you the result.', 'start': 13969.642, 'duration': 8.345}, {'end': 13983.21, 'text': 'Okay So what it will do, it will take the X test that is a feature vector into account.', 'start': 13978.307, 'duration': 4.903}, {'end': 13990.352, 'text': 'Okay, then it will apply the logistic regression function that you have already created or fitted.', 'start': 13983.57, 'duration': 6.782}], 'summary': 'Using the model to predict x test and calculate accuracy with the score function.', 'duration': 35.259, 'max_score': 13955.093, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN413955093.jpg'}, {'end': 14159.632, 'src': 'embed', 'start': 14133.243, 'weight': 4, 'content': [{'end': 14137.044, 'text': 'Okay Because here the program is data driven.', 'start': 14133.243, 'duration': 3.801}, {'end': 14147.266, 'text': "Okay So if you don't give it good data to eat, it won't be able to eat the data and it won't be able to give you the outputs.", 'start': 14137.064, 'duration': 10.202}, {'end': 14148.426, 'text': "Right So that's how.", 'start': 14147.326, 'duration': 1.1}, {'end': 14153.308, 'text': 'So the petrol that you supply to the machine learning model has to be clean.', 'start': 14149.046, 'duration': 4.262}, {'end': 14155.149, 'text': 'Okay That has to be clean.', 'start': 14153.328, 'duration': 1.821}, {'end': 14157.871, 'text': "Okay That shouldn't be having any kind of a dust.", 'start': 14155.449, 'duration': 2.422}, {'end': 14159.632, 'text': "That's why we have petrol pumps for our cars.", 'start': 14157.891, 'duration': 1.741}], 'summary': 'Machine learning model requires clean data for accurate outputs.', 'duration': 26.389, 'max_score': 14133.243, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN414133243.jpg'}, {'end': 14376.801, 'src': 'embed', 'start': 14331.188, 'weight': 1, 'content': [{'end': 14333.029, 'text': 'so this is the value under the curve.', 'start': 14331.188, 'duration': 1.841}, {'end': 14343.357, 'text': 'if you go ahead and calculate, it should be 0.74, as this curve tells so our curve, that our model that we have built has the accuracy of 74,', 'start': 14333.029, 'duration': 10.328}, {'end': 14344.518, 'text': 'which is good enough for this.', 'start': 14343.357, 'duration': 1.161}, {'end': 14347.379, 'text': "okay, so that's how we have.", 'start': 14345.098, 'duration': 2.281}, {'end': 14348.74, 'text': "okay, that's how we have.", 'start': 14347.379, 'duration': 1.361}, {'end': 14352.301, 'text': 'then this is the best accuracy that we can achieve from our model.', 'start': 14348.74, 'duration': 3.561}, {'end': 14356.243, 'text': "okay, so that's how this logistic regression works, from scikit-learn.", 'start': 14352.301, 'duration': 3.942}, {'end': 14360.504, 'text': "from scikit-learn, that's how you can use logistic regression to work.", 'start': 14356.243, 'duration': 4.261}, {'end': 14367.007, 'text': 'okay. so see, here we have some pima indians diabetes data set, so this will be this one.', 'start': 14360.504, 'duration': 6.503}, {'end': 14370.699, 'text': 'so this is this data set.', 'start': 14369.238, 'duration': 1.461}, {'end': 14372.96, 'text': 'okay, this is the target value of one and zero.', 'start': 14370.699, 'duration': 2.261}, {'end': 14374.84, 'text': 'these are some fancy column names.', 'start': 14372.96, 'duration': 1.88}, {'end': 14376.801, 'text': 'okay, so we have this data set.', 'start': 14374.84, 'duration': 1.961}], 'summary': 'Logistic regression model has 74% accuracy on pima indians diabetes dataset.', 'duration': 45.613, 'max_score': 14331.188, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN414331188.jpg'}, {'end': 15151.92, 'src': 'embed', 'start': 15123.325, 'weight': 3, 'content': [{'end': 15128.649, 'text': "so if he doesn't eat some junk foods daily so that's what we are checking if he eats it, then unfit.", 'start': 15123.325, 'duration': 5.324}, {'end': 15129.289, 'text': "if it doesn't.", 'start': 15128.649, 'duration': 0.64}, {'end': 15135.823, 'text': "it's a he is fit right, But if it's greater than 30, then you need to exercise.", 'start': 15129.289, 'duration': 6.534}, {'end': 15137.905, 'text': 'Because after 30, your health starts degrading.', 'start': 15136.043, 'duration': 1.862}, {'end': 15139.727, 'text': 'If it does, you are fit.', 'start': 15138.306, 'duration': 1.421}, {'end': 15141.389, 'text': 'If not, you are unfit.', 'start': 15139.807, 'duration': 1.582}, {'end': 15143.171, 'text': "So that's how our decision tree grows.", 'start': 15141.409, 'duration': 1.762}, {'end': 15145.994, 'text': "That's how our decision tree works actually.", 'start': 15143.752, 'duration': 2.242}, {'end': 15151.92, 'text': 'So, for random forest, what we have, random forest will be having multiple decision trees.', 'start': 15146.395, 'duration': 5.525}], 'summary': 'Decision tree determines fitness based on junk food intake and age, with a threshold of 30 for exercise. random forest uses multiple decision trees.', 'duration': 28.595, 'max_score': 15123.325, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN415123325.jpg'}, {'end': 15195.739, 'src': 'embed', 'start': 15168.528, 'weight': 5, 'content': [{'end': 15171.889, 'text': "the problem with random forest is why we don't prefer it?", 'start': 15168.528, 'duration': 3.361}, {'end': 15178.042, 'text': 'because random forest, as we have multiple decision trees, we will be having a problem of overfitting,', 'start': 15171.889, 'duration': 6.153}, {'end': 15181.685, 'text': 'because we have n number of decision trees based on number of features.', 'start': 15178.042, 'duration': 3.643}, {'end': 15187.211, 'text': 'so if we create a decision tree for each of the features that we have, it might overfit the training data.', 'start': 15181.685, 'duration': 5.526}, {'end': 15188.672, 'text': 'so it is a good algorithm.', 'start': 15187.211, 'duration': 1.461}, {'end': 15192.836, 'text': 'but for data science or machine learning anywhere, we need to use it carefully.', 'start': 15188.672, 'duration': 4.164}, {'end': 15195.739, 'text': "so that's why we kind of try to avoid this.", 'start': 15192.836, 'duration': 2.903}], 'summary': 'Random forest can lead to overfitting due to multiple decision trees, requiring careful use in data science and machine learning.', 'duration': 27.211, 'max_score': 15168.528, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN415168528.jpg'}], 'start': 13774.158, 'title': 'Logistic regression and model evaluation', 'summary': 'Covers logistic regression model training with an accuracy score, data preprocessing, and model evaluation using confusion matrix, classification report, and roc curve with a 74% accuracy. it also discusses logistic regression and decision tree application on a diabetes dataset, and explains the decision tree and random forest algorithms.', 'chapters': [{'end': 14000.054, 'start': 13774.158, 'title': 'Logistic regression model training', 'summary': 'Covers the process of creating and training a logistic regression model, including separating feature and target vectors, splitting the dataset into 80-20 for training and testing, fitting the model and calculating the accuracy, achieving a percentage score of the predicted values.', 'duration': 225.896, 'highlights': ['The process involves separating the last column, the target value column, from the dataset to create the feature vector, achieving a clear division of feature and target vectors.', 'Splitting the dataset into 80-20 for training and testing, enabling the model to be trained using X train and Y train, and then fitting the logistic regression model, demonstrating a clear training process.', 'Utilizing the logistic regression model to predict X test and calculate the accuracy using the score function, resulting in a percentage score for the predicted values, providing a measure of model performance.']}, {'end': 14352.301, 'start': 14000.054, 'title': 'Data preprocessing and model evaluation', 'summary': 'Emphasizes the importance of data preprocessing for successful model prediction, including handling of na values and categorical data, and explains the process of model evaluation using confusion matrix, classification report, and roc curve, yielding an accuracy of 74%.', 'duration': 352.247, 'highlights': ['The importance of data preprocessing for successful model prediction, including handling of NA values and categorical data, is emphasized, with the process of converting string values into categorical values explained.', 'Model evaluation is demonstrated through the use of confusion matrix, classification report, and ROC curve, with precision, recall, and F1 score calculations provided, yielding an accuracy of 74% for the model.', 'The process of handling NA values and their impact on model prediction is explained, showcasing the significance of clean data for accurate model outputs.', "The significance of clean data for accurate model outputs is highlighted, emphasizing that building a fancy model won't yield good results, but good data will.", 'The importance of providing clean data to the model for accurate outputs is emphasized, comparing it to supplying clean petrol to a car engine for optimal performance.']}, {'end': 15099.202, 'start': 14352.301, 'title': 'Logistic regression and decision tree', 'summary': 'Discusses the application of logistic regression using scikit-learn on a pima indians diabetes dataset and explains the working of the logistic regression algorithm including the calculation of coefficients using gradient descent. it also provides an overview of decision tree and its structure, illustrating the branching and leaf nodes.', 'duration': 746.901, 'highlights': ['The chapter discusses the application of logistic regression using scikit-learn on a Pima Indians diabetes dataset The logistic regression application using scikit-learn on a Pima Indians diabetes dataset is explained.', 'Explains the working of the logistic regression algorithm including the calculation of coefficients using gradient descent The working of the logistic regression algorithm, including the calculation of coefficients using gradient descent, is explained.', 'Provides an overview of decision tree and its structure, illustrating the branching and leaf nodes An overview of decision tree and its structure, illustrating the branching and leaf nodes, is provided.']}, {'end': 15546.408, 'start': 15099.202, 'title': 'Decision tree and random forest', 'summary': 'Explains the decision tree and random forest algorithms, highlighting the process of decision tree growth, the problem of overfitting in random forest due to multiple decision trees, and the theory and application of naive bias classification algorithm.', 'duration': 447.206, 'highlights': ["The process of decision tree growth involves checking the person's age to determine fitness, with individuals under 30 being fitter and those over 30 needing exercise due to health degradation. Fitness determination based on age, problem of health degradation after 30, decision tree growth process", 'The problem of overfitting in random forest is due to the creation of multiple decision trees, potentially leading to overfitting of training data and the need for cautious use in data science and machine learning. Overfitting issue in random forest, caution needed in usage for data science and machine learning', 'Naive bias classification algorithm is based on conditional probability theory and historical knowledge, applicable for scenarios such as cancer prediction, provided the features in the dataset are unrelated to each other. Theory and application of naive bias classification algorithm, relevance in cancer prediction, importance of unrelated features in dataset']}], 'duration': 1772.25, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN413774158.jpg', 'highlights': ['Utilizing logistic regression model to predict X test and calculate accuracy using score function, resulting in a percentage score for predicted values, providing a measure of model performance.', 'Model evaluation demonstrated through confusion matrix, classification report, and ROC curve, with precision, recall, and F1 score calculations, yielding an accuracy of 74% for the model.', 'The logistic regression application using scikit-learn on a Pima Indians diabetes dataset is explained, including the working of the logistic regression algorithm and the calculation of coefficients using gradient descent.', "The process of decision tree growth involves checking the person's age to determine fitness, with individuals under 30 being fitter and those over 30 needing exercise due to health degradation.", 'The importance of providing clean data to the model for accurate outputs is emphasized, comparing it to supplying clean petrol to a car engine for optimal performance.', 'The problem of overfitting in random forest is due to the creation of multiple decision trees, potentially leading to overfitting of training data and the need for cautious use in data science and machine learning.']}, {'end': 17097.853, 'segs': [{'end': 15607.318, 'src': 'embed', 'start': 15566.457, 'weight': 0, 'content': [{'end': 15568.899, 'text': "then if it's less than three, then we have only cherry.", 'start': 15566.457, 'duration': 2.442}, {'end': 15570.72, 'text': 'right, because it is a r1 and cherry.', 'start': 15568.899, 'duration': 1.821}, {'end': 15573.722, 'text': 'so diameter of cherry is one, so it will be cherry.', 'start': 15570.72, 'duration': 3.002}, {'end': 15580.207, 'text': "if it's greater than three, then we have two options, like mango and lemon, right mango and lemon.", 'start': 15573.722, 'duration': 6.485}, {'end': 15582.128, 'text': 'so now we will be checking color.', 'start': 15580.207, 'duration': 1.921}, {'end': 15584.47, 'text': "if the color is yellow, then we will say it's lemon.", 'start': 15582.128, 'duration': 2.342}, {'end': 15586.011, 'text': 'if not, then we will say it is a mango.', 'start': 15584.47, 'duration': 1.541}, {'end': 15588.892, 'text': "okay, so that's how this decision tree works.", 'start': 15586.391, 'duration': 2.501}, {'end': 15590.172, 'text': "that's how didn't is built.", 'start': 15588.892, 'duration': 1.28}, {'end': 15591.633, 'text': "let's see with the example now.", 'start': 15590.172, 'duration': 1.461}, {'end': 15598.355, 'text': 'okay, before that, these are a few few terminologies root node, leaf node, parent, child and branch and sub trees.', 'start': 15591.633, 'duration': 6.722}, {'end': 15603.717, 'text': 'you can see like these are all related to the, to the, to the like present world only.', 'start': 15598.355, 'duration': 5.362}, {'end': 15605.057, 'text': 'so root node, leaf node.', 'start': 15603.717, 'duration': 1.34}, {'end': 15607.318, 'text': 'i have already covered parent and child node.', 'start': 15605.057, 'duration': 2.261}], 'summary': 'Decision tree classifies fruits based on diameter and color.', 'duration': 40.861, 'max_score': 15566.457, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN415566457.jpg'}, {'end': 15702.626, 'src': 'embed', 'start': 15655.884, 'weight': 3, 'content': [{'end': 15664.407, 'text': "so splitting is the heart of decision tree, because if you, if you don't split it wisely, then your decision tree can be a mess.", 'start': 15655.884, 'duration': 8.523}, {'end': 15666.488, 'text': 'okay. so splitting is a part of it.', 'start': 15664.407, 'duration': 2.081}, {'end': 15671.35, 'text': 'so splitting means splitting the root node into into.', 'start': 15666.488, 'duration': 4.862}, {'end': 15675.352, 'text': 'splitting the root nodes into child nodes, okay.', 'start': 15671.35, 'duration': 4.002}, {'end': 15680.416, 'text': 'so pruning it will be, and random forest we would be covering in that much detail.', 'start': 15675.352, 'duration': 5.064}, {'end': 15682.377, 'text': "so let's not worry about it for now.", 'start': 15680.416, 'duration': 1.961}, {'end': 15684.659, 'text': 'for now, just have the decision tree into mind.', 'start': 15682.377, 'duration': 2.282}, {'end': 15687.361, 'text': 'you will see you are getting confused with the decision tree alone.', 'start': 15684.659, 'duration': 2.702}, {'end': 15689.683, 'text': "so let's just grab this concept first.", 'start': 15687.361, 'duration': 2.322}, {'end': 15692.965, 'text': "okay, pruning, that's a very important thing, okay.", 'start': 15689.683, 'duration': 3.282}, {'end': 15697.849, 'text': 'so if you are trying to build a decision tree, you can run into one of the two problems,', 'start': 15692.965, 'duration': 4.884}, {'end': 15702.626, 'text': 'like you can either under fit the training data or you can overfit the training data.', 'start': 15697.849, 'duration': 4.777}], 'summary': 'Understanding the importance of splitting in decision trees and the issues of underfitting and overfitting.', 'duration': 46.742, 'max_score': 15655.884, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN415655884.jpg'}, {'end': 15915.406, 'src': 'embed', 'start': 15885.584, 'weight': 5, 'content': [{'end': 15887.725, 'text': 'how do we create a decision tree manually?', 'start': 15885.584, 'duration': 2.141}, {'end': 15890.086, 'text': 'okay, i mean mathematically, okay.', 'start': 15887.725, 'duration': 2.361}, {'end': 15897.715, 'text': 'so see, this is our data set outlook, temperature, humidity, windy and play, so These all are features from outlook till windy.', 'start': 15890.086, 'duration': 7.629}, {'end': 15903.769, 'text': 'These are all our features and play is the binary variable that we are going to going to predict.', 'start': 15897.776, 'duration': 5.993}, {'end': 15909.183, 'text': "if the outlook is sunny, temperature is hot, humidity high, windy it's not windy.", 'start': 15904.921, 'duration': 4.262}, {'end': 15910.264, 'text': 'should we play or not?', 'start': 15909.183, 'duration': 1.081}, {'end': 15911.224, 'text': "so it's a no.", 'start': 15910.264, 'duration': 0.96}, {'end': 15913.405, 'text': 'so this is our label data set that we have.', 'start': 15911.224, 'duration': 2.181}, {'end': 15915.406, 'text': "so now let's go ahead and see.", 'start': 15913.405, 'duration': 2.001}], 'summary': 'Creating decision tree based on given data set and features.', 'duration': 29.822, 'max_score': 15885.584, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN415885584.jpg'}, {'end': 16041.265, 'src': 'embed', 'start': 16011.915, 'weight': 6, 'content': [{'end': 16015.476, 'text': 'then there is information gain that is calculated based on entropy.', 'start': 16011.915, 'duration': 3.561}, {'end': 16022.678, 'text': 'actually, and if you see if where the information gain is highest, From there we will start in start splitting the data.', 'start': 16015.476, 'duration': 7.202}, {'end': 16024.679, 'text': 'Gini index is also the same thing.', 'start': 16023.058, 'duration': 1.621}, {'end': 16033.522, 'text': 'So that is again used in cart model of so decision tree in cart when we do like cart means classification and integration trees.', 'start': 16025.039, 'duration': 8.483}, {'end': 16041.265, 'text': 'Okay So when we when we use cart model of decision tree, then we like then we do it.', 'start': 16033.922, 'duration': 7.343}], 'summary': 'Information gain and gini index are used in decision tree models for data splitting and classification.', 'duration': 29.35, 'max_score': 16011.915, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN416011915.jpg'}, {'end': 16244.756, 'src': 'embed', 'start': 16208.386, 'weight': 7, 'content': [{'end': 16212.93, 'text': 'So that is the perfect equilibrium scenario, right? That is the perfect equilibrium scenario.', 'start': 16208.386, 'duration': 4.544}, {'end': 16214.971, 'text': "So that's how entropy works.", 'start': 16213.37, 'duration': 1.601}, {'end': 16217.933, 'text': 'So entropy is a measure of impurity in the data set.', 'start': 16215.011, 'duration': 2.922}, {'end': 16225.78, 'text': 'So, as it lowers, the value of entropy will be lowered as it, as it gets higher, the value of entropy will get higher and 0.5,', 'start': 16218.313, 'duration': 7.467}, {'end': 16228.682, 'text': 'it will attain its maximum value of 1.', 'start': 16225.78, 'duration': 2.902}, {'end': 16231.224, 'text': "okay, so that's what is about entropy.", 'start': 16228.682, 'duration': 2.542}, {'end': 16232.105, 'text': 'and how do we calculate?', 'start': 16231.224, 'duration': 0.881}, {'end': 16233.166, 'text': 'we calculate like that.', 'start': 16232.105, 'duration': 1.061}, {'end': 16244.756, 'text': 'so p of event multiplied by log of p of event, minus p of another event multiplied by log of p of log of p of that event, like that, okay,', 'start': 16233.166, 'duration': 11.59}], 'summary': 'Entropy measures impurity in data set, ranging from 0 to 1.', 'duration': 36.37, 'max_score': 16208.386, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN416208386.jpg'}, {'end': 16558.345, 'src': 'embed', 'start': 16499.11, 'weight': 1, 'content': [{'end': 16502.529, 'text': 'okay, this is the weighted average multiplied by this.', 'start': 16499.11, 'duration': 3.419}, {'end': 16505.591, 'text': 'so let me write it down to clear your doubts here.', 'start': 16502.529, 'duration': 3.062}, {'end': 16513.773, 'text': 'so, for information gain, right, we have an equation like e of s, that is,', 'start': 16505.591, 'duration': 8.182}, {'end': 16520.674, 'text': 'entropy of the sample minus weighted average multiplied by entropy of feature.', 'start': 16513.773, 'duration': 6.901}, {'end': 16526.597, 'text': 'right. so this part is called information.', 'start': 16521.874, 'duration': 4.723}, {'end': 16528.379, 'text': 'right, this part is called information.', 'start': 16526.597, 'duration': 1.782}, {'end': 16529.779, 'text': 'why information?', 'start': 16528.379, 'duration': 1.4}, {'end': 16533.722, 'text': 'because if you see this number, what does that mean?', 'start': 16529.779, 'duration': 3.943}, {'end': 16544.93, 'text': 'that this outlook feature in the data set has a weight of 69.693 in deciding the outcome of the data set.', 'start': 16533.722, 'duration': 11.208}, {'end': 16553.722, 'text': 'so this is called information that we have from the amount of information that we can get from this outlook value.', 'start': 16544.93, 'duration': 8.792}, {'end': 16555.582, 'text': 'now, what is information gain?', 'start': 16553.722, 'duration': 1.86}, {'end': 16558.345, 'text': 'it will be when we subtract it from entropy.', 'start': 16555.582, 'duration': 2.763}], 'summary': 'Information gain calculated as weighted average multiplied by entropy of feature, with a weight of 69.693 for outlook feature.', 'duration': 59.235, 'max_score': 16499.11, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN416499110.jpg'}, {'end': 17059.209, 'src': 'embed', 'start': 17030.39, 'weight': 9, 'content': [{'end': 17033.051, 'text': 'This we have shown in the this we have seen in the last class.', 'start': 17030.39, 'duration': 2.661}, {'end': 17037.832, 'text': 'Right So not going to explain this here also because it will be a repetitive thing.', 'start': 17033.091, 'duration': 4.741}, {'end': 17041.013, 'text': 'Right So here just a calculation I will show you.', 'start': 17038.112, 'duration': 2.901}, {'end': 17044.034, 'text': 'So see here we have an expectant.', 'start': 17041.413, 'duration': 2.621}, {'end': 17046.415, 'text': 'So everyone remembers confusion matrix.', 'start': 17044.074, 'duration': 2.341}, {'end': 17049.256, 'text': 'Right True positive, false positive, false negative, true negative.', 'start': 17046.455, 'duration': 2.801}, {'end': 17050.436, 'text': 'Everyone remembers these terms.', 'start': 17049.316, 'duration': 1.12}, {'end': 17053.067, 'text': "Right. so let's go ahead and see an example here.", 'start': 17050.476, 'duration': 2.591}, {'end': 17059.209, 'text': 'okay, so we have a expected and predicted c sheet where we have man and woman as predicted.', 'start': 17053.067, 'duration': 6.142}], 'summary': 'Demonstrating confusion matrix with expected and predicted gender data.', 'duration': 28.819, 'max_score': 17030.39, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN417030390.jpg'}], 'start': 15546.408, 'title': 'Decision tree analysis', 'summary': "Discusses decision tree analysis, emphasizing the importance of information gain and providing a scenario-based explanation for the decision-making process, ultimately leading to the classification of fruits based on their diameter and color. it also covers decision tree basics, understanding decision tree algorithm, and entropy's impact on values with a detailed calculation of information gain for decision tree construction.", 'chapters': [{'end': 15586.011, 'start': 15546.408, 'title': 'Decision tree analysis', 'summary': 'Discusses decision tree analysis, emphasizing the importance of information gain and providing a scenario-based explanation for the decision-making process, ultimately leading to the classification of fruits based on their diameter and color.', 'duration': 39.603, 'highlights': ['The importance of information gain is highlighted, emphasizing its role in determining when to stop splitting the tree, thus optimizing the decision-making process.', 'A scenario-based explanation is provided, illustrating the decision-making process based on the diameter and color of fruits, with specific criteria for classifying cherries, mangoes, and lemons.']}, {'end': 15852.191, 'start': 15586.391, 'title': 'Decision tree basics', 'summary': 'Covers the basics of a decision tree, including terminologies such as root node, leaf node, parent, child, and branch, as well as important concepts like splitting and pruning to avoid underfitting or overfitting the training data.', 'duration': 265.8, 'highlights': ['The chapter emphasizes the terminologies of a decision tree, including root node, leaf node, parent, child, and branch, which are essential for understanding the structure and functioning of a decision tree.', 'It highlights the significance of splitting, the heart of a decision tree, as it determines the division of root nodes into child nodes, crucial for effective decision tree construction and avoiding a messy outcome.', 'The concept of pruning is explained, emphasizing its importance in preventing underfitting or overfitting of training data by selectively trimming branches based on information gain and entropy to ensure optimal decision tree construction.']}, {'end': 16139.793, 'start': 15852.191, 'title': 'Understanding decision tree algorithm', 'summary': 'Explains the process of manually creating a decision tree, discussing key metrics such as entropy, reduction in variance, information gain, and gini index, which are crucial in determining attribute selection for classification, with a focus on implications and formulae.', 'duration': 287.602, 'highlights': ['The process of manually creating a decision tree is explained, focusing on attribute selection and branching strategies, providing a practical understanding of how to decide whether to play or not given certain conditions.', 'The key metrics for attribute selection in decision tree algorithm, including entropy, reduction in variance, information gain, and Gini index, are discussed, highlighting their roles in determining the best classifying attribute and the implications of using them.']}, {'end': 17097.853, 'start': 16139.793, 'title': 'Understanding entropy and information gain', 'summary': 'Explains the concept of entropy in a data set, demonstrating how it measures impurity and its impact on values, with a detailed calculation of information gain for decision tree construction, and a demonstration of a confusion matrix for model accuracy assessment.', 'duration': 958.06, 'highlights': ['Entropy measures the impurity in a data set, with a value of 1 representing the highest impurity and 0 representing perfect equilibrium, and is calculated using the formula p(event) * log(p(event)) - p(another event) * log(p(another event)). Entropy concept and calculation, with values ranging from 0 to 1, demonstrating the measure of impurity and equilibrium in a data set.', 'Information gain is computed as the difference between the entropy of the sample and the weighted average multiplied by the entropy of each feature, aiming to decide the best feature for decision tree construction. Explanation of information gain calculation and its role in selecting the best feature for decision tree construction based on entropy values.', 'Demonstration of a confusion matrix for model accuracy assessment, showcasing the classification of predicted and expected values, and the calculation of accuracy based on the correctly predicted instances. Explanation and demonstration of confusion matrix for assessing the accuracy of a model based on predicted and expected values, with an example showcasing the calculation of accuracy.']}], 'duration': 1551.445, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN415546408.jpg', 'highlights': ['A scenario-based explanation illustrates the decision-making process based on fruit diameter and color.', 'The importance of information gain is highlighted for optimizing the decision-making process.', 'The terminologies of a decision tree, including root node, leaf node, parent, child, and branch, are emphasized for understanding the structure and functioning of a decision tree.', 'The significance of splitting, which determines the division of root nodes into child nodes, is crucial for effective decision tree construction.', 'The concept of pruning is explained, emphasizing its importance in preventing underfitting or overfitting of training data.', 'The process of manually creating a decision tree is explained, focusing on attribute selection and branching strategies.', 'Key metrics for attribute selection in decision tree algorithm, including entropy, reduction in variance, information gain, and Gini index, are discussed.', 'Entropy measures the impurity in a data set, with a value of 1 representing the highest impurity and 0 representing perfect equilibrium.', 'Information gain is computed as the difference between the entropy of the sample and the weighted average multiplied by the entropy of each feature.', 'Demonstration of a confusion matrix for model accuracy assessment, showcasing the classification of predicted and expected values.']}, {'end': 18741.965, 'segs': [{'end': 17179.412, 'src': 'embed', 'start': 17150.012, 'weight': 0, 'content': [{'end': 17153.236, 'text': 'Before that we will see the example of the decision tree hands on.', 'start': 17150.012, 'duration': 3.224}, {'end': 17163.559, 'text': 'Okay, so we have a banknote data set.', 'start': 17161.758, 'duration': 1.801}, {'end': 17167.002, 'text': 'Okay, it will be classifying the banknotes as true, false like that.', 'start': 17163.86, 'duration': 3.142}, {'end': 17169.184, 'text': 'Okay, this is a banknote data set.', 'start': 17167.322, 'duration': 1.862}, {'end': 17173.247, 'text': 'Okay, now we are going to define a function to load the data.', 'start': 17169.584, 'duration': 3.663}, {'end': 17175.328, 'text': 'So this is nothing but pd.readcsv.', 'start': 17173.267, 'duration': 2.061}, {'end': 17177.53, 'text': 'Okay, this is our moon function we have defined.', 'start': 17175.769, 'duration': 1.761}, {'end': 17179.412, 'text': 'You can go for pd.readcsv also.', 'start': 17177.57, 'duration': 1.842}], 'summary': 'Hands-on decision tree example using banknote dataset for classification.', 'duration': 29.4, 'max_score': 17150.012, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN417150012.jpg'}, {'end': 17248.347, 'src': 'embed', 'start': 17220.647, 'weight': 3, 'content': [{'end': 17224.368, 'text': 'Okay, this is for creating cross validation set.', 'start': 17220.647, 'duration': 3.721}, {'end': 17228.73, 'text': "What is cross validation set? Let's say you have thousand rows of data to process.", 'start': 17224.428, 'duration': 4.302}, {'end': 17232.152, 'text': "You don't feed the entire data to the model at once.", 'start': 17229.15, 'duration': 3.002}, {'end': 17237.697, 'text': 'Okay, you, you, you, just you provide the data to the model in folds, multiple folds.', 'start': 17232.512, 'duration': 5.185}, {'end': 17238.838, 'text': 'what that would do?', 'start': 17237.697, 'duration': 1.141}, {'end': 17246.446, 'text': 'that would help you in gaining the most of mode, most out of the model, because in the first model, whatever mistakes it does, it will learn,', 'start': 17238.838, 'duration': 7.608}, {'end': 17248.347, 'text': 'it will go for the second step, like that.', 'start': 17246.446, 'duration': 1.901}], 'summary': 'Cross validation involves feeding data in folds to gain the most from the model.', 'duration': 27.7, 'max_score': 17220.647, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN417220647.jpg'}, {'end': 17641.987, 'src': 'embed', 'start': 17597.098, 'weight': 2, 'content': [{'end': 17597.918, 'text': 'Okay All the folds.', 'start': 17597.098, 'duration': 0.82}, {'end': 17599.398, 'text': 'All the 5, 7, 5, 6 folds.', 'start': 17597.978, 'duration': 1.42}, {'end': 17605.16, 'text': 'So see how the accuracy started to slowly improve throughout the folds.', 'start': 17599.818, 'duration': 5.342}, {'end': 17606.981, 'text': 'Right How it slowly improved.', 'start': 17605.58, 'duration': 1.401}, {'end': 17609.982, 'text': 'And the mean accuracy that we are getting is 97.299.', 'start': 17607.361, 'duration': 2.621}, {'end': 17614.103, 'text': "Okay So that's how this decision tree walks in a real world.", 'start': 17609.982, 'duration': 4.121}, {'end': 17616.284, 'text': 'So again this is not how you gonna use it.', 'start': 17614.423, 'duration': 1.861}, {'end': 17617.644, 'text': 'You gonna use it with scikit-learn.', 'start': 17616.324, 'duration': 1.32}, {'end': 17618.984, 'text': 'So just try one thing.', 'start': 17617.984, 'duration': 1}, {'end': 17620.685, 'text': "So that's how you can do it.", 'start': 17619.565, 'duration': 1.12}, {'end': 17623.26, 'text': 'innovation, others, for the genie index.', 'start': 17621.599, 'duration': 1.661}, {'end': 17629.562, 'text': 'genie index is a measure, again a measure that is used for, like in cart models, that is a metric that is used.', 'start': 17623.26, 'duration': 6.302}, {'end': 17631.143, 'text': 'it is same as information gain.', 'start': 17629.562, 'duration': 1.581}, {'end': 17635.264, 'text': 'so information gain and genie index, anyone you can, anything you can use.', 'start': 17631.143, 'duration': 4.121}, {'end': 17641.267, 'text': 'but for card models, basically we use both, but somewhere we use genie index only.', 'start': 17635.264, 'duration': 6.003}, {'end': 17641.987, 'text': 'so it is.', 'start': 17641.267, 'duration': 0.72}], 'summary': 'Decision tree achieved mean accuracy of 97.299, improving throughout the 5, 7, 5, 6 folds.', 'duration': 44.889, 'max_score': 17597.098, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN417597098.jpg'}, {'end': 18409.794, 'src': 'embed', 'start': 18382.045, 'weight': 5, 'content': [{'end': 18387.909, 'text': 'so, according to base theorem, 0.07 into 0.1, divided by 0.05, so that is 0.4.', 'start': 18382.045, 'duration': 5.864}, {'end': 18394.733, 'text': 'that means so, if the patient is alcoholic, then their chances of having liver disease is 14.', 'start': 18387.909, 'duration': 6.824}, {'end': 18397.075, 'text': "okay. so that's what is.", 'start': 18394.733, 'duration': 2.342}, {'end': 18401.069, 'text': "that's what is we find out using Bayes theorem?", 'start': 18398.127, 'duration': 2.942}, {'end': 18403.97, 'text': "okay, that's what we find out using Bayes theorem.", 'start': 18401.069, 'duration': 2.901}, {'end': 18405.491, 'text': 'so next, slide.', 'start': 18403.97, 'duration': 1.521}, {'end': 18408.273, 'text': 'it shows you already the thing that I have shown.', 'start': 18405.491, 'duration': 2.782}, {'end': 18409.794, 'text': 'so probability of A by B.', 'start': 18408.273, 'duration': 1.521}], 'summary': 'Using bayes theorem, the chance of an alcoholic patient having liver disease is 14%.', 'duration': 27.749, 'max_score': 18382.045, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN418382045.jpg'}], 'start': 17097.853, 'title': 'Implementing machine learning algorithms', 'summary': 'Covers confusion matrix creation, decision tree implementation, and cross validation sets, achieving a mean accuracy of 97.299, also discussing naive bayes classifier, base theorem, and its application with a 73% result in determining liver disease probability.', 'chapters': [{'end': 17220.207, 'start': 17097.853, 'title': 'Confusion matrix and decision tree', 'summary': 'Discusses the creation of confusion matrixes for evaluating model performance, with a specific example of classifying banknotes using a decision tree and converting string columns to float for analysis, using a function to load the data and calculating gini index.', 'duration': 122.354, 'highlights': ['The creation of confusion matrixes for evaluating model performance, with a specific example of classifying banknotes using a decision tree and converting string columns to float for analysis.', 'Demonstration of loading banknote dataset using a function, pd.readcsv, and the necessity of converting string columns to float for decision tree analysis.', 'Explanation of Gini index calculations and the use of scikit-learn for working with CSV files.']}, {'end': 17618.984, 'start': 17220.647, 'title': 'Implementing cross validation sets', 'summary': 'Explains the concept of cross validation sets and the process of creating, evaluating, and predicting with decision tree algorithm, achieving a mean accuracy of 97.299.', 'duration': 398.337, 'highlights': ['The mean accuracy achieved is 97.299 The mean accuracy of the decision tree algorithm achieved after evaluating all folds.', 'Explaining the process of creating cross validation sets The process of splitting the data into cross validation sets to feed to the model in folds, allowing the model to learn from its mistakes and improve.', 'The manual implementation of the decision tree algorithm Demonstrating the manual implementation of the decision tree algorithm and its components, emphasizing the option of using scikit-learn for simplicity.']}, {'end': 18310.177, 'start': 17619.565, 'title': 'Decision tree, naive bayes, and base theorem', 'summary': 'Covers the concepts of decision trees, naive bayes classifier, including its limitations and use cases, and the fundamental of base theorem with examples, emphasizing conditional probability and its application in data science.', 'duration': 690.612, 'highlights': ['The genie index is a metric used in decision trees, similar to information gain, and is based on weighted average, with both metrics being utilized in cart models, albeit genie index being preferred in some cases.', 'The naive Bayes classifier assumes independence of features and is used in cases with prior knowledge, but may fall short in feature engineering and prediction accuracy, although it can be effective in scenarios with historical data and limitations of sophistication and experience.', 'The base theorem is applied to calculate conditional probabilities, enabling the calculation of reverse probabilities and coming into play when events are unrelated and independent, requiring historical knowledge for its application in algorithms.']}, {'end': 18741.965, 'start': 18310.497, 'title': 'Bayes theorem application', 'summary': 'Discusses the application of bayes theorem in determining the probability of a patient having liver disease if they are alcoholic, yielding a 73% result.', 'duration': 431.468, 'highlights': ['The chapter discusses the application of Bayes theorem in determining the probability of a patient having liver disease if they are alcoholic, yielding a 73% result. Application of Bayes theorem in medical scenario, calculation yielding a 73% probability result.', 'Probability of a patient who is alcoholic having a liver disease is calculated using Bayes theorem as 0.07 into 0.1, divided by 0.05, resulting in 0.4, indicating a 40% chance of having liver disease if the patient is alcoholic. Calculation of probability of a patient being alcoholic and having liver disease using Bayes theorem, resulting in a 40% chance of having liver disease if the patient is alcoholic.', 'Calculation of probability of playing a game in ideal condition by multiplying values, yielding a value of 0.033, indicating a 3.3% probability. Calculation of probability of playing a game in ideal condition, resulting in a 3.3% probability.']}], 'duration': 1644.112, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN417097853.jpg', 'highlights': ['Demonstration of loading banknote dataset using a function, pd.readcsv, and the necessity of converting string columns to float for decision tree analysis.', 'The creation of confusion matrixes for evaluating model performance, with a specific example of classifying banknotes using a decision tree and converting string columns to float for analysis.', 'The mean accuracy achieved is 97.299 The mean accuracy of the decision tree algorithm achieved after evaluating all folds.', 'Explaining the process of creating cross validation sets The process of splitting the data into cross validation sets to feed to the model in folds, allowing the model to learn from its mistakes and improve.', 'The genie index is a metric used in decision trees, similar to information gain, and is based on weighted average, with both metrics being utilized in cart models, albeit genie index being preferred in some cases.', 'The chapter discusses the application of Bayes theorem in determining the probability of a patient having liver disease if they are alcoholic, yielding a 73% result.']}, {'end': 21669.328, 'segs': [{'end': 18845.832, 'src': 'embed', 'start': 18805.757, 'weight': 0, 'content': [{'end': 18808.739, 'text': 'so cluster means what are the cluster?', 'start': 18805.757, 'duration': 2.982}, {'end': 18809.8, 'text': 'what is clustering?', 'start': 18808.739, 'duration': 1.061}, {'end': 18816.884, 'text': 'clustering means to divide data set into groups consisting similar data points.', 'start': 18809.8, 'duration': 7.084}, {'end': 18826.164, 'text': 'so that means cricket Cricketers, footballers, hockey players, lawn tennis players.', 'start': 18816.884, 'duration': 9.28}, {'end': 18828.385, 'text': 'they are all players.', 'start': 18826.164, 'duration': 2.221}, {'end': 18830.346, 'text': 'but we can cluster them like that.', 'start': 18828.385, 'duration': 1.961}, {'end': 18845.832, 'text': 'So all the cricketers will have the same kind of attributes, same kind of nature that they will be playing cricket for their country.', 'start': 18831.586, 'duration': 14.246}], 'summary': 'Clustering divides data set into similar groups, like categorizing cricketers, footballers, hockey players, and lawn tennis players based on their attributes and nature.', 'duration': 40.075, 'max_score': 18805.757, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN418805757.jpg'}, {'end': 19097.073, 'src': 'embed', 'start': 19066.387, 'weight': 1, 'content': [{'end': 19068.208, 'text': 'So that they might buy that.', 'start': 19066.387, 'duration': 1.821}, {'end': 19072.73, 'text': 'So content filtering based on the content, collaborative filtering based on the customer or target group.', 'start': 19068.268, 'duration': 4.462}, {'end': 19074.311, 'text': "So that's how you can think of.", 'start': 19073.15, 'duration': 1.161}, {'end': 19078.873, 'text': 'So these are also a few examples of clustering that is being used in the real world.', 'start': 19074.631, 'duration': 4.242}, {'end': 19082.706, 'text': 'This kind of clustering are being used in the real world.', 'start': 19079.965, 'duration': 2.741}, {'end': 19089.73, 'text': 'So Amazon, Netflix, Flickr and all those places this clustering is used.', 'start': 19084.607, 'duration': 5.123}, {'end': 19093.611, 'text': 'Now we can see there are three types of clustering.', 'start': 19090.53, 'duration': 3.081}, {'end': 19097.073, 'text': 'Exclusive, overlapping and hierarchical.', 'start': 19094.492, 'duration': 2.581}], 'summary': 'Real-world examples of content and collaborative filtering used by amazon, netflix, and flickr, employing three types of clustering.', 'duration': 30.686, 'max_score': 19066.387, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN419066387.jpg'}, {'end': 19588.79, 'src': 'embed', 'start': 19562.807, 'weight': 2, 'content': [{'end': 19568.791, 'text': 'So how much item is needed for which item is needed in what quantity, we can take into account all those things.', 'start': 19562.807, 'duration': 5.984}, {'end': 19570.433, 'text': 'And bots and anomalies.', 'start': 19569.332, 'duration': 1.101}, {'end': 19579.725, 'text': "we can directly segregate the human behavior and the bot's behavior from the if a bot enters in our system.", 'start': 19570.433, 'duration': 9.292}, {'end': 19584.228, 'text': 'so these, in these cases, k-means algorithm is really very helpful.', 'start': 19579.725, 'duration': 4.503}, {'end': 19588.79, 'text': "to cluster these things, these clusters means a lot, okay, so that's how it works.", 'start': 19584.228, 'duration': 4.562}], 'summary': 'K-means algorithm helps segregate human and bot behavior, aiding in item quantity analysis.', 'duration': 25.983, 'max_score': 19562.807, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN419562807.jpg'}, {'end': 20366.121, 'src': 'embed', 'start': 20339.328, 'weight': 3, 'content': [{'end': 20345.99, 'text': 'okay, so this is the best to best number of clusters that can exist in a data set near this elbow point.', 'start': 20339.328, 'duration': 6.662}, {'end': 20349.191, 'text': "okay, so that's where, when you see.", 'start': 20345.99, 'duration': 3.201}, {'end': 20350.771, 'text': 'so how do what do we point?', 'start': 20349.191, 'duration': 1.58}, {'end': 20351.591, 'text': 'we, we?', 'start': 20350.771, 'duration': 0.82}, {'end': 20358.614, 'text': 'we just plot the total distortion or the total variation in the data set with the number of clusters.', 'start': 20351.591, 'duration': 7.023}, {'end': 20366.121, 'text': 'now, if we keep on plotting this, it will, it will bend at some point, And this bend is the ideal number of clusters.', 'start': 20358.614, 'duration': 7.507}], 'summary': 'The ideal number of clusters is determined by plotting total distortion with number of clusters at the elbow point.', 'duration': 26.793, 'max_score': 20339.328, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN420339328.jpg'}, {'end': 21581.989, 'src': 'embed', 'start': 21555.326, 'weight': 4, 'content': [{'end': 21559.248, 'text': "okay. so that's what is the idea, and this is how you just can.", 'start': 21555.326, 'duration': 3.922}, {'end': 21559.708, 'text': 'you can.', 'start': 21559.248, 'duration': 0.46}, {'end': 21562.537, 'text': 'you can use k means from scikit-learn.', 'start': 21559.708, 'duration': 2.829}, {'end': 21570.662, 'text': 'so import that k-means module from scikit-learn, call a k-means function with the number of intended clusters that you want to create.', 'start': 21562.537, 'duration': 8.125}, {'end': 21573.904, 'text': 'then you just fit the model into k-means.', 'start': 21570.662, 'duration': 3.242}, {'end': 21575.885, 'text': 'okay, with the x data.', 'start': 21573.904, 'duration': 1.981}, {'end': 21581.989, 'text': 'then you create the labels for those, like, like you predict, for the x axis, and then you print the.', 'start': 21575.885, 'duration': 6.104}], 'summary': 'Use k-means from scikit-learn to create clusters.', 'duration': 26.663, 'max_score': 21555.326, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN421555326.jpg'}], 'start': 18742.425, 'title': 'Clustering in machine learning', 'summary': 'Covers the concept of conditional probability and bayes theorem with a 73% chance of playing under ideal conditions. it discusses unsupervised learning algorithms, particularly clustering, with a focus on exclusive clustering and its real-world applications. it introduces the k-means algorithm, explaining its steps and application, as well as its implementation in python using custom and scikit-learn algorithms.', 'chapters': [{'end': 18862.937, 'start': 18742.425, 'title': 'Bayes theorem and unsupervised learning', 'summary': 'Discusses the concept of conditional probability, with a 73% chance of playing under ideal conditions, and explains the workings of bayes theorem. it then delves into unsupervised learning algorithms, particularly clustering, which involves dividing a dataset into groups of similar data points, such as cricket players, footballers, hockey players, and lawn tennis players.', 'duration': 120.512, 'highlights': ['The concept of conditional probability is explained, with a 73% chance of playing under ideal conditions.', 'The workings of Bayes theorem are discussed, providing insight into its application.', 'Unsupervised learning algorithms, particularly clustering, are detailed, emphasizing the division of datasets into groups of similar data points like cricket players, footballers, hockey players, and lawn tennis players.']}, {'end': 19562.467, 'start': 18862.937, 'title': 'Clustering in machine learning', 'summary': 'Discusses the concept of clustering in machine learning, including use cases in real-world applications like amazon and netflix, and the types of clustering including exclusive, overlapping, and hierarchical, with a focus on exclusive clustering in data science.', 'duration': 699.53, 'highlights': ['Clustering is used in real-world applications like Amazon and Netflix for recommendation engines, which categorize products or movies into specific groups based on customer preferences. Real-world applications: Amazon, Netflix; Use case: Recommendation engines; Categorization of products/movies based on customer preferences', 'Exclusive clustering is a type of clustering where each data point exists in only one of the clusters, and it is commonly used in data science, unlike overlapping and hierarchical clustering. Type of clustering: Exclusive; Data point existence in one cluster; Commonly used in data science', "K-means clustering is an example of exclusive clustering algorithm, which focuses on grouping similar data points into a specific number of clusters, with 'K' representing the number of clusters. Clustering algorithm: K-means; Focus on grouping similar data points; 'K' represents the number of clusters", 'Real-world applications of K-means clustering include behavioral segmentation, sorting sensor measurements, dating bots, and inventory categorization. Real-world applications: Behavioral segmentation, sorting sensor measurements, dating bots, inventory categorization']}, {'end': 20220.816, 'start': 19562.807, 'title': 'Understanding k-means algorithm', 'summary': 'Introduces the k-means algorithm, explaining its steps and application through an example, highlighting the process of identifying the number of clusters and determining the value of k through hit and trial.', 'duration': 658.009, 'highlights': ["The K-means algorithm is used to segregate human behavior and bot behavior in a system by clustering data, with the example demonstrating the process of identifying clusters and determining the number of clusters, which is crucial for the algorithm's application.", 'The process of K-means algorithm involves selecting a random number of clusters, calculating distances and assigning data points to the nearest cluster, and iteratively updating the cluster centroids until convergence, with the example emphasizing the step-by-step process of clustering data points and updating the cluster centroids.', 'Determining the value of k in the K-means algorithm is described as an interesting challenge, addressed through hit and trial method by starting with k=1 and gradually increasing the value to observe the clustering results, showcasing the iterative approach to identifying the optimal number of clusters.']}, {'end': 20975.19, 'start': 20220.816, 'title': 'Understanding k-means clustering', 'summary': 'Discusses the k-means algorithm for clustering, emphasizing the iterative process of calculating total variation and ideal cluster number using the elbow point in the cost function graph, as well as the convergence criteria. it also covers the challenges k-means faces and the concept of scree plot in factor analysis.', 'duration': 754.374, 'highlights': ['The chapter discusses the iterative process of calculating total variation and ideal cluster number using the elbow point in the cost function graph. Iterative process, total variation calculation, ideal cluster number determination, elbow point in the cost function graph', 'It covers the convergence criteria of the K-means algorithm and the challenges it faces, such as outliers causing distorted clusters and non-convex data distributions impairing clustering. Convergence criteria, challenges (outliers, non-convex data distributions)', 'The concept of scree plot in factor analysis was briefly introduced, highlighting its use in multivariate statistics for plotting eigenvalues of factors against the factors or principles. Scree plot, factor analysis, multivariate statistics, eigenvalues plotting']}, {'end': 21669.328, 'start': 20975.19, 'title': 'K-means clustering in python', 'summary': 'Covers the implementation of k-means clustering in python using a custom algorithm and the scikit-learn library, demonstrating the clustering of data points into three distinct clusters and the comparison of results with hand-built and scikit-learn algorithms.', 'duration': 694.138, 'highlights': ['The chapter covers the implementation of K-means clustering in Python using a custom algorithm and the scikit-learn library. Demonstrates the use of both a custom algorithm and the scikit-learn library for K-means clustering in Python.', 'Data points are clustered into three distinct clusters, as demonstrated using a scatter plot. Shows the clustering of data points into three distinct clusters using a scatter plot.', 'Demonstrates the comparison of results between hand-built and scikit-learn algorithms for K-means clustering. Compares the results obtained from the custom algorithm and the scikit-learn library for K-means clustering.']}], 'duration': 2926.903, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/mNdbcHECGN4/pics/mNdbcHECGN418742425.jpg', 'highlights': ['Unsupervised learning algorithms, particularly clustering, are detailed, emphasizing the division of datasets into groups of similar data points like cricket players, footballers, hockey players, and lawn tennis players.', 'Clustering is used in real-world applications like Amazon and Netflix for recommendation engines, which categorize products or movies into specific groups based on customer preferences.', "The K-means algorithm is used to segregate human behavior and bot behavior in a system by clustering data, with the example demonstrating the process of identifying clusters and determining the number of clusters, which is crucial for the algorithm's application.", 'The chapter discusses the iterative process of calculating total variation and ideal cluster number using the elbow point in the cost function graph.', 'The chapter covers the implementation of K-means clustering in Python using a custom algorithm and the scikit-learn library.']}], 'highlights': ['Pandas as an open source, simple and powerful Python library, created by Wes McKinney in 2015', 'Features of Pandas: series object, data frame, handling missing data, data alignment, group by functionality, slicing, indexing, subsetting, merging, joining, reshaping, hierarchical labeling of axis, input/output tool, time series specific functionality', 'Pandas superiority over NumPy for data with more than 500,000 rows, flexibility in defining label index', 'Suitability of Pandas for tabular data, time series data, arbitrary matrix data', 'Pandas capability to work with one-dimensional, two-dimensional, or multi-dimensional data sets', 'Demonstrating inner, left, and right merge operations, showcasing the differences in resulting values', 'Explaining the process of data cleaning, including renaming columns, filling null values with the mean, dropping unwanted columns, and finding the correlation matrix to identify variable relationships', "The removal of null values from the 'qsec' attribute by replacing them with the mean is highlighted, resulting in the elimination of 32 null values from the dataset", "The process of converting the 'mpg' attribute from string to float is explained, enabling data manipulation and calculation on the attribute, which is essential for further analysis", 'Machine learning is a subset of artificial intelligence which focuses mainly on machine learning from their experience. Emphasizes the focus of machine learning on learning from experience', 'Linear regression involves establishing relationships between variables to predict continuous outcomes such as housing prices, while classification deals with categorical outcomes like spam or not spam, with emphasis on understanding the difference between independent and dependent variables', 'The equation y=mx+c is fundamental in machine learning for predicting values of data points. The equation y=mx+c is used to predict the values of data points in machine learning when the model is deployed', 'The distinction between continuous and categorical variables is crucial, where continuous variables are independent and can take on any value, while categorical variables are dependent and limited to specific categories, as illustrated by examples of housing price and survival data sets', "The process of splitting the dataset into 80% for training and 20% for testing is essential, with 404 values for training and 102 values for testing, ensuring a robust evaluation of the model's performance", 'Logistic regression is necessary for handling categorical problems where the data is skewed, such as tumor prediction and spam classification', 'The sigmoid function, an asymptotic curve, is introduced as a suitable model for classification problems, asymptoting to one at positive infinity and to zero at negative infinity', 'The maximum likelihood estimator is used to find the best fitted curve using log odds to predict values for logistic regression', 'Understanding the concept of maximum likelihood estimator is crucial for machine learning problems, particularly in logistic regression', 'A scenario-based explanation illustrates the decision-making process based on fruit diameter and color', 'Unsupervised learning algorithms, particularly clustering, are detailed, emphasizing the division of datasets into groups of similar data points like cricket players, footballers, hockey players, and lawn tennis players', 'Clustering is used in real-world applications like Amazon and Netflix for recommendation engines, which categorize products or movies into specific groups based on customer preferences']}