title
AWS Glue Tutorial | Getting Started with AWS Glue ETL | AWS Tutorial for Beginners | Edureka

description
馃敟Edureka AWS Architect Certification Training: https://www.edureka.co/aws-certification-training This Edureka video on AWS Glue talks about the features and benefits of AWS Glue. It shows how AWS Glue is a simple and cost-effective ETL service for data analytics. The topics that we will cover in this session are as follows: 路 What is AWS Glue? 路 When do you use AWS Glue? 路 AWS Glue Benefits 路 AWS Glue Concepts 路 AWS Glue Terminologies 路 How does AWS Glue work? Subscribe to our channel to get video updates. Hit the subscribe button above: https://goo.gl/6ohpTV ------------------------------------------------------------------------------------------------ Edureka Community: https://bit.ly/EdurekaCommunity Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka Telegram: https://t.me/edurekaupdates SlideShare: https://www.slideshare.net/EdurekaIN #edureka #EdurekaAWS #AWSGlue #awsgluetutorial #awstraining #cloudcomputing ------------------------------------------------------------------------------------------------ How does it work? 1. This is a 5 Week Instructor-led Online Course. 2. The course consists of 30 hours of online classes, 30 hours of assignment, 20 hours of project 3. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 4. You will get Lifetime Access to the recordings in the LMS. 5. At the end of the training, you will have to complete the project based on which we will provide you a Verifiable Certificate! ------------------------------------------------------------------------------------------------ About the Course AWS Architect Certification Training from Edureka is designed to provide in-depth knowledge about Amazon AWS architectural principles and its components. The sessions will be conducted by Industry practitioners who will train you to leverage AWS services to make the AWS cloud infrastructure scalable, reliable, and highly available. This course is completely aligned to AWS Architect Certification - Associate Level exam conducted by Amazon Web Services. During this AWS Architect Online training, you'll learn: 1. AWS Architecture and different models of Cloud Computing 2. Compute Services: Amazon EC2, Auto Scaling, and Load Balancing, AWS Lambda, Elastic Beanstalk 3. Amazon Storage Services: EBS, S3 AWS, Glacier, CloudFront, Snowball, Storage Gateway 4. Database Services: RDS, DynamoDB, ElastiCache, RedShift 5. Security and Identity Services: IAM, KMS 6. Networking Services: Amazon VPC, Route 53, Direct Connect 7. Management Tools: CloudTrail, CloudWatch, CloudFormation, OpsWorks, Trusty Advisor 8. Application Services: SES, SNS, SQS ------------------------------------------------------------------------------------------------ Course Objectives On completion of the AWS Architect Certification Training, the learner will be able to: 1. Design and deploy scalable, highly available, and fault-tolerant systems on AWS 2. Understand the lift and shift of an existing on-premises application to AWS 3. Ingress and egress of data to and from AWS 4. Identifying appropriate use of AWS architectural best practices 5. Estimating AWS costs and identifying cost control mechanisms ------------------------------------------------------------------------------------------------ Pre-requisites There are no specific prerequisites for this course. Any professional who has an understanding of IT Service Management can join this training. There is no programming knowledge needed and no prior AWS experience required. ------------------------------------------------------------------------------------------------ If you are looking for live online training, write back to us at sales@edureka.in or call us at the US: + 18338555775 (Toll-Free) or India: +91 9606058406 for more information.

detail
{'title': 'AWS Glue Tutorial | Getting Started with AWS Glue ETL | AWS Tutorial for Beginners | Edureka', 'heatmap': [{'end': 120.981, 'start': 102.412, 'weight': 0.737}, {'end': 394.852, 'start': 350.783, 'weight': 0.747}, {'end': 541.813, 'start': 446.692, 'weight': 0.822}, {'end': 872.474, 'start': 850.006, 'weight': 0.862}, {'end': 926.807, 'start': 891.79, 'weight': 0.861}, {'end': 988.21, 'start': 970.699, 'weight': 0.718}, {'end': 1132.443, 'start': 1010.911, 'weight': 0.751}], 'summary': 'This tutorial provides an in-depth introduction to aws glue, covering its etl solution for data management, key terminologies, and a practical demo showcasing the etl process using aws glue, including data organization and transformation.', 'chapters': [{'end': 90.149, 'segs': [{'end': 48.941, 'src': 'embed', 'start': 11.533, 'weight': 0, 'content': [{'end': 18.578, 'text': 'The ETL process has been designed specifically for the purposes of transferring data from its source database into a data warehouse.', 'start': 11.533, 'duration': 7.045}, {'end': 27.163, 'text': 'However, the challenges and complexities of the ETL can make it hard to implement successfully for all your Enterprise data.', 'start': 19.178, 'duration': 7.985}, {'end': 30.205, 'text': 'for this reason, Amazon has introduced AWS glue.', 'start': 27.163, 'duration': 3.042}, {'end': 31.326, 'text': 'So hello guys.', 'start': 30.786, 'duration': 0.54}, {'end': 35.869, 'text': 'This is Arvind here from Edureka and I welcome you all to this amazing session on AWS glue.', 'start': 31.366, 'duration': 4.503}, {'end': 40.372, 'text': "So before we move any further, let us have a quick look at the agenda for today's session.", 'start': 36.369, 'duration': 4.003}, {'end': 44.237, 'text': 'First will be starting with what exactly is AWS glue.', 'start': 41.215, 'duration': 3.022}, {'end': 48.941, 'text': 'Then we will talk about when do you use AWS glue moving on.', 'start': 44.758, 'duration': 4.183}], 'summary': 'The etl process can be challenging, leading amazon to introduce aws glue for data transfer.', 'duration': 37.408, 'max_score': 11.533, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI11533.jpg'}, {'end': 102.412, 'src': 'embed', 'start': 70.98, 'weight': 2, 'content': [{'end': 75.622, 'text': 'guys, pretty simple, just a reminder in case you have not yet subscribed to our YouTube channel,', 'start': 70.98, 'duration': 4.642}, {'end': 80.024, 'text': 'please do subscribe and also hit the Bell icon so that you never miss an update from edureka.', 'start': 75.622, 'duration': 4.402}, {'end': 86.708, 'text': "And also if you're someone who's looking for a course in AWS, then you can find the link for that course in the description box below.", 'start': 80.544, 'duration': 6.164}, {'end': 90.149, 'text': 'So without any further ado, let us begin with our first topic.', 'start': 87.388, 'duration': 2.761}, {'end': 91.59, 'text': 'What is AWS glue?', 'start': 90.469, 'duration': 1.121}, {'end': 102.412, 'text': 'AWS glue is a fully managed ETL that is extract, transform and load service that makes it simple and cost-effective to categorize your data,', 'start': 92.907, 'duration': 9.505}], 'summary': 'Reminder to subscribe to youtube channel, find aws course link, and start aws glue topic.', 'duration': 31.432, 'max_score': 70.98, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI70980.jpg'}], 'start': 11.533, 'title': 'Introduction to aws glue', 'summary': 'Introduces aws glue as a solution for etl challenges, outlining its agenda, benefits, and concepts, and emphasizing a demo for practical understanding.', 'chapters': [{'end': 90.149, 'start': 11.533, 'title': 'Introduction to aws glue', 'summary': 'Introduces aws glue as a solution for etl challenges, outlining its agenda, benefits, and concepts, and emphasizing a demo for practical understanding.', 'duration': 78.616, 'highlights': ['The chapter discusses the challenges and complexities of ETL processes and introduces AWS Glue as a solution, emphasizing its purpose of transferring data from source databases into a data warehouse.', 'Arvind from Edureka presents the agenda for the session, which includes an introduction to AWS Glue, its benefits, concepts, and a practical demo to enhance understanding.', 'The session emphasizes the importance of subscribing to the Edureka YouTube channel and provides a link for an AWS course in the description box below.']}], 'duration': 78.616, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI11533.jpg', 'highlights': ['Introduces AWS Glue as a solution for ETL challenges, emphasizing its purpose of transferring data from source databases into a data warehouse.', 'Arvind from Edureka presents the agenda for the session, including an introduction to AWS Glue, its benefits, concepts, and a practical demo.', 'Emphasizes the importance of subscribing to the Edureka YouTube channel and provides a link for an AWS course in the description box below.']}, {'end': 434.881, 'segs': [{'end': 122.582, 'src': 'heatmap', 'start': 90.469, 'weight': 0, 'content': [{'end': 91.59, 'text': 'What is AWS glue?', 'start': 90.469, 'duration': 1.121}, {'end': 102.412, 'text': 'AWS glue is a fully managed ETL that is extract, transform and load service that makes it simple and cost-effective to categorize your data,', 'start': 92.907, 'duration': 9.505}, {'end': 106.074, 'text': 'clean it enrich it and move it reliably between various data stores.', 'start': 102.412, 'duration': 3.662}, {'end': 112.577, 'text': 'It consists of a central metadata repository known as the AWS glue data catalog,', 'start': 106.814, 'duration': 5.763}, {'end': 120.981, 'text': 'an ETL engine that automatically generates python or Scala code and a flexible scheduler that handles dependency resolution,', 'start': 112.577, 'duration': 8.404}, {'end': 122.582, 'text': 'job monitoring and retries.', 'start': 120.981, 'duration': 1.601}], 'summary': 'Aws glue is a fully managed etl service for data categorization, cleaning, enrichment, and movement between data stores.', 'duration': 32.113, 'max_score': 90.469, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI90469.jpg'}, {'end': 326.17, 'src': 'embed', 'start': 277.907, 'weight': 1, 'content': [{'end': 287.616, 'text': 'AWS glue natively supports data stored in Amazon Aurora and other Amazon RDS engines, Amazon redshift and Amazon s3,', 'start': 277.907, 'duration': 9.709}, {'end': 294.002, 'text': 'along with common database engines and databases in your virtual private cloud running on Amazon ec2..', 'start': 287.616, 'duration': 6.386}, {'end': 299.147, 'text': 'The next benefit here is the cost-effectiveness AWS glue is serverless.', 'start': 294.743, 'duration': 4.404}, {'end': 301.99, 'text': 'There is no infrastructure to provision or manage.', 'start': 299.648, 'duration': 2.342}, {'end': 310.477, 'text': 'AWS glue handles provisioning, configuration and scaling of the resources required to run your ETL jobs on a fully managed,', 'start': 302.771, 'duration': 7.706}, {'end': 312.499, 'text': 'scaled out Apache spark environment.', 'start': 310.477, 'duration': 2.022}, {'end': 317.383, 'text': 'You pay only for the resources that you use while your jobs are running.', 'start': 313.179, 'duration': 4.204}, {'end': 320.565, 'text': 'and the third benefit here is the increased power.', 'start': 317.383, 'duration': 3.182}, {'end': 326.17, 'text': 'AWS glue automates much of the effort in building, maintaining and running ETL jobs.', 'start': 320.565, 'duration': 5.605}], 'summary': 'Aws glue supports various data sources, is cost-effective, and automates etl job efforts.', 'duration': 48.263, 'max_score': 277.907, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI277907.jpg'}, {'end': 377.465, 'src': 'embed', 'start': 350.783, 'weight': 5, 'content': [{'end': 355.104, 'text': 'Let us discuss the architecture of AWS glue as you can see on the screen.', 'start': 350.783, 'duration': 4.321}, {'end': 357.505, 'text': 'This is the diagram that represents the architecture.', 'start': 355.124, 'duration': 2.381}, {'end': 367.559, 'text': 'So you define jobs in AWS glue to accomplish the work that is required to extract, transform and load data from a data source to a data target.', 'start': 358.553, 'duration': 9.006}, {'end': 377.465, 'text': 'So, if you talk about the workflow, the first step here is you define a crawler to populate your AWS data catalog with metadata table definitions.', 'start': 368.399, 'duration': 9.066}], 'summary': 'Aws glue architecture defines jobs to extract, transform, and load data from a source to a target, starting with defining a crawler to populate the data catalog.', 'duration': 26.682, 'max_score': 350.783, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI350783.jpg'}, {'end': 394.852, 'src': 'heatmap', 'start': 350.783, 'weight': 0.747, 'content': [{'end': 355.104, 'text': 'Let us discuss the architecture of AWS glue as you can see on the screen.', 'start': 350.783, 'duration': 4.321}, {'end': 357.505, 'text': 'This is the diagram that represents the architecture.', 'start': 355.124, 'duration': 2.381}, {'end': 367.559, 'text': 'So you define jobs in AWS glue to accomplish the work that is required to extract, transform and load data from a data source to a data target.', 'start': 358.553, 'duration': 9.006}, {'end': 377.465, 'text': 'So, if you talk about the workflow, the first step here is you define a crawler to populate your AWS data catalog with metadata table definitions.', 'start': 368.399, 'duration': 9.066}, {'end': 382.648, 'text': 'you point your crawler at a data source and the crawler creates table definitions in the data catalog.', 'start': 377.465, 'duration': 5.183}, {'end': 389.91, 'text': 'In addition to table definitions the data catalog contains other metadata that is required to define ETL jobs.', 'start': 383.228, 'duration': 6.682}, {'end': 394.852, 'text': 'You use this metadata when you define a job to transform your data.', 'start': 390.53, 'duration': 4.322}], 'summary': 'Aws glue architecture defines jobs to etl data, using metadata from a data catalog.', 'duration': 44.069, 'max_score': 350.783, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI350783.jpg'}], 'start': 90.469, 'title': 'Aws glue for data management', 'summary': 'Introduces aws glue, a managed etl service for simplifying data management with features such as central metadata repository, automatic code generation, and serverless architecture. it also discusses use cases, benefits, integration with aws services, cost-effectiveness, and automation of etl jobs.', 'chapters': [{'end': 133.646, 'start': 90.469, 'title': 'Aws glue: managed etl service', 'summary': 'Introduces aws glue, a fully managed etl service that simplifies data categorization, cleaning, enrichment, and movement between data stores, with a central metadata repository, automatic code generation, and a serverless architecture.', 'duration': 43.177, 'highlights': ['AWS Glue is a fully managed ETL service that simplifies data categorization, cleaning, enrichment, and movement between data stores.', 'It consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python or Scala code, and a flexible scheduler.', 'AWS Glue is serverless, requiring no infrastructure setup or management.']}, {'end': 434.881, 'start': 134.187, 'title': 'Using aws glue for data management', 'summary': 'Discusses the various use cases of aws glue, such as building data warehouses, running serverless queries against amazon s3 data, creating event-driven etl pipelines, and understanding data assets. it also highlights the benefits of aws glue, including its integration with a wide range of aws services, cost-effectiveness, and automation of etl jobs, along with the core architecture and workflow.', 'duration': 300.694, 'highlights': ['Benefits of AWS Glue AWS Glue provides integration across a wide range of AWS services, such as Amazon Aurora, Amazon RDS, Amazon Redshift, and Amazon S3, making it less hassle for users. It is also cost-effective as it is serverless, with no infrastructure to provision or manage. Additionally, AWS Glue automates much of the effort in building, maintaining, and running ETL jobs, leading to increased power and efficiency.', 'Use Cases of AWS Glue AWS Glue is utilized for building data warehouses to organize, clean, validate, and format data, transforming and moving AWS cloud data into data stores, loading data from disparate sources into data warehouses for reporting and analysis, running serverless queries against Amazon S3 data, creating event-driven ETL pipelines, and understanding data assets by maintaining a unified view of data across various AWS services.', 'Architecture and Workflow of AWS Glue The architecture of AWS Glue involves defining jobs to extract, transform, and load data from a source to a target. This includes the use of crawlers to populate the AWS data catalog with metadata table definitions, generating scripts to transform data, running jobs on demand or based on triggers, and executing scripts in an Apache Spark environment. The workflow showcases the process of defining crawlers, generating scripts, and running jobs to extract, transform, and load data.']}], 'duration': 344.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI90469.jpg', 'highlights': ['AWS Glue is a fully managed ETL service that simplifies data categorization, cleaning, enrichment, and movement between data stores.', 'AWS Glue provides integration across a wide range of AWS services, making it less hassle for users and is cost-effective as it is serverless.', 'AWS Glue automates much of the effort in building, maintaining, and running ETL jobs, leading to increased power and efficiency.', 'AWS Glue is serverless, requiring no infrastructure setup or management.', 'AWS Glue is utilized for building data warehouses, transforming and moving AWS cloud data into data stores, loading data from disparate sources into data warehouses for reporting and analysis, running serverless queries against Amazon S3 data, creating event-driven ETL pipelines, and maintaining a unified view of data across various AWS services.', 'The architecture of AWS Glue involves defining jobs to extract, transform, and load data from a source to a target, using crawlers to populate the AWS data catalog with metadata table definitions, generating scripts to transform data, running jobs on demand or based on triggers, and executing scripts in an Apache Spark environment.']}, {'end': 658.125, 'segs': [{'end': 541.813, 'src': 'heatmap', 'start': 446.692, 'weight': 0.822, 'content': [{'end': 449.333, 'text': 'So let us start with the first one that is the data catalog.', 'start': 446.692, 'duration': 2.641}, {'end': 454.275, 'text': 'So data catalog is nothing but the persistent metadata store in AWS glue.', 'start': 450.113, 'duration': 4.162}, {'end': 461.737, 'text': 'It contains table definitions job definitions and other control information to manage your AWS glue environment.', 'start': 454.955, 'duration': 6.782}, {'end': 465.478, 'text': 'Each AWS account has one data catalog per region.', 'start': 462.317, 'duration': 3.161}, {'end': 471.26, 'text': 'The next term here is the classifier a classifier determines the schema of your data.', 'start': 466.238, 'duration': 5.022}, {'end': 479.376, 'text': 'AWS glue provides classifiers for common file types such as CSV Jason XML Avro and so on.', 'start': 472.071, 'duration': 7.305}, {'end': 486.08, 'text': 'It also provides classifiers for common relational database management systems using a JDBC connection.', 'start': 480.176, 'duration': 5.904}, {'end': 493.245, 'text': 'You can write your own classifier by using a grok pattern or by specifying a row tag in an XML document.', 'start': 486.861, 'duration': 6.384}, {'end': 495.566, 'text': 'The third term here is the connection.', 'start': 494.025, 'duration': 1.541}, {'end': 501.51, 'text': 'So connection contains the properties that are required to connect to your data store moving on.', 'start': 496.247, 'duration': 5.263}, {'end': 507.739, 'text': 'Let us talk about the term crawler a crawler is nothing but a program that connects to a data store.', 'start': 501.934, 'duration': 5.805}, {'end': 515.845, 'text': 'Maybe a source or a target progresses through a prioritized list of classifiers to determine the schema for your data,', 'start': 508.439, 'duration': 7.406}, {'end': 519.288, 'text': 'and then it creates metadata tables in the data catalog.', 'start': 515.845, 'duration': 3.443}, {'end': 521.749, 'text': 'The next term here is the database.', 'start': 519.967, 'duration': 1.782}, {'end': 530.156, 'text': 'So database is a set of associated data catalog table definitions organized into a logical group in AWS group.', 'start': 522.49, 'duration': 7.666}, {'end': 537.069, 'text': 'So the next term here is the data store a data store is a repository for persistently storing your data.', 'start': 531.004, 'duration': 6.065}, {'end': 541.813, 'text': 'The examples include Amazon s3 buckets and relational databases.', 'start': 537.79, 'duration': 4.023}], 'summary': 'Aws glue includes data catalog, classifiers, connections, crawlers, databases, and data stores for managing metadata and data storage, with one data catalog per aws account per region.', 'duration': 95.121, 'max_score': 446.692, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI446692.jpg'}, {'end': 479.376, 'src': 'embed', 'start': 454.955, 'weight': 0, 'content': [{'end': 461.737, 'text': 'It contains table definitions job definitions and other control information to manage your AWS glue environment.', 'start': 454.955, 'duration': 6.782}, {'end': 465.478, 'text': 'Each AWS account has one data catalog per region.', 'start': 462.317, 'duration': 3.161}, {'end': 471.26, 'text': 'The next term here is the classifier a classifier determines the schema of your data.', 'start': 466.238, 'duration': 5.022}, {'end': 479.376, 'text': 'AWS glue provides classifiers for common file types such as CSV Jason XML Avro and so on.', 'start': 472.071, 'duration': 7.305}], 'summary': "Aws glue manages data catalog, job definitions, and classifiers for common file types in each aws account's data catalog per region.", 'duration': 24.421, 'max_score': 454.955, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI454955.jpg'}, {'end': 541.813, 'src': 'embed', 'start': 501.934, 'weight': 2, 'content': [{'end': 507.739, 'text': 'Let us talk about the term crawler a crawler is nothing but a program that connects to a data store.', 'start': 501.934, 'duration': 5.805}, {'end': 515.845, 'text': 'Maybe a source or a target progresses through a prioritized list of classifiers to determine the schema for your data,', 'start': 508.439, 'duration': 7.406}, {'end': 519.288, 'text': 'and then it creates metadata tables in the data catalog.', 'start': 515.845, 'duration': 3.443}, {'end': 521.749, 'text': 'The next term here is the database.', 'start': 519.967, 'duration': 1.782}, {'end': 530.156, 'text': 'So database is a set of associated data catalog table definitions organized into a logical group in AWS group.', 'start': 522.49, 'duration': 7.666}, {'end': 537.069, 'text': 'So the next term here is the data store a data store is a repository for persistently storing your data.', 'start': 531.004, 'duration': 6.065}, {'end': 541.813, 'text': 'The examples include Amazon s3 buckets and relational databases.', 'start': 537.79, 'duration': 4.023}], 'summary': 'A crawler connects to a data store, while a database is a set of data catalog table definitions in aws. a data store is a repository for persistently storing data, such as amazon s3 buckets and relational databases.', 'duration': 39.879, 'max_score': 501.934, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI501934.jpg'}, {'end': 658.125, 'src': 'embed', 'start': 609.703, 'weight': 4, 'content': [{'end': 611.164, 'text': 'The next term here is the script.', 'start': 609.703, 'duration': 1.461}, {'end': 618.466, 'text': 'So script contains the code that extracts data from sources transforms it and loads it into the targets.', 'start': 611.904, 'duration': 6.562}, {'end': 622.551, 'text': 'AWS glue generates by spark or Scala scripts.', 'start': 619.309, 'duration': 3.242}, {'end': 624.912, 'text': 'So the next term here is the table.', 'start': 623.371, 'duration': 1.541}, {'end': 632.516, 'text': 'So table in AWS glue data catalog consists of the names of columns, data type definitions,', 'start': 625.612, 'duration': 6.904}, {'end': 637.338, 'text': 'partition information and other metadata about a base data set.', 'start': 632.516, 'duration': 4.822}, {'end': 643.481, 'text': 'the schema of your data is represented in AWS glue table definition and the next term here is the transform.', 'start': 637.338, 'duration': 6.143}, {'end': 652.002, 'text': 'You use the code logic to manipulate your data into a different format using the transform, and the last term here is the trigger.', 'start': 644.178, 'duration': 7.824}, {'end': 653.863, 'text': 'a trigger initiates an ETL job.', 'start': 652.002, 'duration': 1.861}, {'end': 658.125, 'text': 'You can define triggers based on a scheduled time or an event.', 'start': 654.783, 'duration': 3.342}], 'summary': 'Aws glue uses scripts to extract, transform, and load data into tables, allowing for data manipulation and etl job triggers.', 'duration': 48.422, 'max_score': 609.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI609703.jpg'}], 'start': 435.681, 'title': 'Aws glue terminology overview', 'summary': 'Provides an overview of key aws glue terminologies, including data catalog, classifier, crawler, database, data store, data source, data target, development endpoint, job, notebook server, script, table, transform, and trigger, with emphasis on their roles and functions within the aws glue environment.', 'chapters': [{'end': 479.376, 'start': 435.681, 'title': 'Aws glue terminologies', 'summary': 'Introduces key terminologies in aws glue, including data catalog and classifier, which houses metadata and determines data schemas, with each aws account having one data catalog per region, and aws glue providing classifiers for common file types such as csv, json, xml, and avro.', 'duration': 43.695, 'highlights': ['The data catalog is the persistent metadata store in AWS glue, containing table definitions, job definitions, and other control information to manage the AWS glue environment, with each AWS account having one data catalog per region.', 'A classifier in AWS glue determines the schema of the data and provides classifiers for common file types such as CSV, JSON, XML, and Avro.']}, {'end': 658.125, 'start': 480.176, 'title': 'Aws glue terminology overview', 'summary': 'Provides an overview of aws glue terminology, including definitions for terms such as crawler, database, data store, data source, data target, development endpoint, job, notebook server, script, table, transform, and trigger, emphasizing the roles and functions of each term within the aws glue environment.', 'duration': 177.949, 'highlights': ['crawler A program that connects to a data store, progresses through a prioritized list of classifiers, and creates metadata tables in the data catalog.', 'table Consists of column names, data type definitions, partition information, and other metadata about a base data set in the AWS Glue data catalog.', 'job Composed of a transformation script, data sources, and data targets, with job runs initiated by triggers that can be scheduled or triggered by events.', 'data store A repository for persistently storing data, including examples such as Amazon S3 buckets and relational databases.', 'trigger Initiates an ETL job, with triggers defined based on scheduled time or an event.']}], 'duration': 222.444, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI435681.jpg', 'highlights': ['The data catalog is the persistent metadata store in AWS glue, containing table definitions, job definitions, and other control information to manage the AWS glue environment, with each AWS account having one data catalog per region.', 'A classifier in AWS glue determines the schema of the data and provides classifiers for common file types such as CSV, JSON, XML, and Avro.', 'A program that connects to a data store, progresses through a prioritized list of classifiers, and creates metadata tables in the data catalog.', 'Consists of column names, data type definitions, partition information, and other metadata about a base data set in the AWS Glue data catalog.', 'Composed of a transformation script, data sources, and data targets, with job runs initiated by triggers that can be scheduled or triggered by events.', 'A repository for persistently storing data, including examples such as Amazon S3 buckets and relational databases.', 'Initiates an ETL job, with triggers defined based on scheduled time or an event.']}, {'end': 823.641, 'segs': [{'end': 682.891, 'src': 'embed', 'start': 658.966, 'weight': 0, 'content': [{'end': 666.65, 'text': 'So these were a few terminologies that you must be aware of that are related to AWS glue now coming to the main part of this session.', 'start': 658.966, 'duration': 7.684}, {'end': 669.291, 'text': 'That is how does AWS glue works.', 'start': 667.03, 'duration': 2.261}, {'end': 674.628, 'text': 'So guys as you can see on the screen, these are the steps that we are going to perform in this demo.', 'start': 670.646, 'duration': 3.982}, {'end': 679.65, 'text': 'So let me just quickly show you how are we going to deal with this demo part here.', 'start': 675.528, 'duration': 4.122}, {'end': 682.891, 'text': 'So guys this one prerequisite here.', 'start': 680.89, 'duration': 2.001}], 'summary': 'Exploring aws glue terminologies and demo steps.', 'duration': 23.925, 'max_score': 658.966, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI658966.jpg'}, {'end': 745.526, 'src': 'embed', 'start': 716.26, 'weight': 1, 'content': [{'end': 724.125, 'text': 'so for creating a bucket, you must click on this button here create bucket, so you can name your bucket whatever you want here.', 'start': 716.26, 'duration': 7.865}, {'end': 734.037, 'text': 'so say, for example glue demo bucket, edureka, okay, and just select the region here and click on create.', 'start': 724.125, 'duration': 9.912}, {'end': 741.303, 'text': 'So this will create one Amazon s3 bucket here as you can see this bucket has been created here.', 'start': 735.058, 'duration': 6.245}, {'end': 745.526, 'text': 'And in this glue demo bucket Edureka, we will create two folders.', 'start': 741.323, 'duration': 4.203}], 'summary': "To create an amazon s3 bucket, click on 'create bucket', name it, select the region, and create. two folders will be created in the bucket.", 'duration': 29.266, 'max_score': 716.26, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI716260.jpg'}, {'end': 829.523, 'src': 'embed', 'start': 803.586, 'weight': 2, 'content': [{'end': 809.491, 'text': 'So, as you can see here, there are only 10 rows in which the first column is the rank of the movie, movie title,', 'start': 803.586, 'duration': 5.905}, {'end': 813.154, 'text': 'year of release and the rating for that movie, just for the demo purpose.', 'start': 809.491, 'duration': 3.663}, {'end': 820.32, 'text': 'I have purposely taken small data set and what we are going to do is we will perform an ETL operation on this data set.', 'start': 813.174, 'duration': 7.146}, {'end': 823.641, 'text': "I'll talk about the ETL operation in the later part of the session.", 'start': 820.881, 'duration': 2.76}, {'end': 829.523, 'text': 'So this was a data set here and we just have to upload this on the.', 'start': 824.482, 'duration': 5.041}], 'summary': '10 rows of movie data for etl operation', 'duration': 25.937, 'max_score': 803.586, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI803586.jpg'}], 'start': 658.966, 'title': 'Aws glue demo and etl operation', 'summary': 'Covers the steps to perform an etl operation using aws glue, including creating an s3 bucket, organizing data, and uploading a small csv file from imdb website for demonstration purposes.', 'chapters': [{'end': 823.641, 'start': 658.966, 'title': 'Aws glue demo and etl operation', 'summary': 'Covers the steps to perform an etl operation using aws glue, such as creating an s3 bucket, organizing data into folders, and uploading a small csv file from the imdb website for demonstration purposes.', 'duration': 164.675, 'highlights': ['The chapter covers the steps to perform an ETL operation using AWS Glue The main focus of the session.', 'Creating an S3 bucket and organizing data into folders Necessary steps for storing and organizing data.', 'Uploading a small CSV file from the IMDb website for demonstration purposes Demonstration of using actual data for ETL operations.']}], 'duration': 164.675, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI658966.jpg', 'highlights': ['The chapter covers the steps to perform an ETL operation using AWS Glue The main focus of the session.', 'Creating an S3 bucket and organizing data into folders Necessary steps for storing and organizing data.', 'Uploading a small CSV file from the IMDb website for demonstration purposes Demonstration of using actual data for ETL operations.']}, {'end': 1299.783, 'segs': [{'end': 872.474, 'src': 'heatmap', 'start': 824.482, 'weight': 0, 'content': [{'end': 829.523, 'text': 'So this was a data set here and we just have to upload this on the.', 'start': 824.482, 'duration': 5.041}, {'end': 834.405, 'text': 'so this was the data set here and we have to upload it on the Amazon S3 bucket here.', 'start': 829.523, 'duration': 4.882}, {'end': 842.127, 'text': 'Okay, so for that click on upload here click on add files just navigate to the folder click here.', 'start': 835.105, 'duration': 7.022}, {'end': 848.225, 'text': 'So once you do that click on upload As you can see here this file has been uploaded here.', 'start': 843.487, 'duration': 4.738}, {'end': 854.487, 'text': 'Now, let us move to the next step and click on AWS glue here.', 'start': 850.006, 'duration': 4.481}, {'end': 861.17, 'text': 'So now as per our steps here, we will create a crawler here.', 'start': 856.868, 'duration': 4.302}, {'end': 868.853, 'text': 'So as you can see AWS has provided a definition of crawler a crawler connects to a data store in our case.', 'start': 862.33, 'duration': 6.523}, {'end': 872.474, 'text': 'It is an Amazon s3 bucket here and in that we have a CSV file.', 'start': 868.913, 'duration': 3.561}], 'summary': 'Uploaded dataset to amazon s3 bucket, created crawler in aws glue for csv file.', 'duration': 47.992, 'max_score': 824.482, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI824482.jpg'}, {'end': 926.807, 'src': 'heatmap', 'start': 891.79, 'weight': 0.861, 'content': [{'end': 898.995, 'text': 'You can name the crawler whatever you want to say for example glue demo.', 'start': 891.79, 'duration': 7.205}, {'end': 901.177, 'text': 'Click on next year.', 'start': 900.376, 'duration': 0.801}, {'end': 911.216, 'text': 'for crawler data source type click on data stores click on next year and we just have to navigate to the read folder here.', 'start': 902.15, 'duration': 9.066}, {'end': 917.321, 'text': 'Okay here it says choose s3 path and we just have to select this folder that we created in the previous step.', 'start': 911.396, 'duration': 5.925}, {'end': 926.807, 'text': 'Okay read folder and it has our movie data CSV file click on select here click on next add another data store as of now, it is not rated.', 'start': 917.841, 'duration': 8.966}], 'summary': 'Configuring crawler for data source, selecting s3 path, adding data store', 'duration': 35.017, 'max_score': 891.79, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI891790.jpg'}, {'end': 993.377, 'src': 'heatmap', 'start': 970.699, 'weight': 0.718, 'content': [{'end': 978.244, 'text': 'So say for example, you can name it whatever you want as per your convenience glue demo DB.', 'start': 970.699, 'duration': 7.545}, {'end': 988.21, 'text': 'Okay, click on create here click on next quickly review the information if you want just check if all the details are correct and click on finish.', 'start': 978.624, 'duration': 9.586}, {'end': 993.377, 'text': 'So here it you can see crawler was created to run on demand run it now.', 'start': 989.434, 'duration': 3.943}], 'summary': "Creating a crawler named 'glue demo db' and running it on demand.", 'duration': 22.678, 'max_score': 970.699, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI970699.jpg'}, {'end': 1132.443, 'src': 'heatmap', 'start': 1010.911, 'weight': 0.751, 'content': [{'end': 1014.974, 'text': 'So the crawler was successful and now let us just click on databases here.', 'start': 1010.911, 'duration': 4.063}, {'end': 1020.744, 'text': 'So, as you can see here, this was the database that we created, and in this database, if you click on tables here,', 'start': 1015.294, 'duration': 5.45}, {'end': 1026.285, 'text': 'you will see a table has been generated by the name read this means that crawler was successful.', 'start': 1020.744, 'duration': 5.541}, {'end': 1029.526, 'text': 'So the next step here is to create a job here.', 'start': 1027.065, 'duration': 2.461}, {'end': 1034.146, 'text': 'Okay, So for that, in this ETL section, click on jobs.', 'start': 1030.406, 'duration': 3.74}, {'end': 1035.847, 'text': 'and what is the job?', 'start': 1034.146, 'duration': 1.701}, {'end': 1041.348, 'text': 'so, as discussed earlier, a job is your business logic that is required to perform ETL operation.', 'start': 1035.847, 'duration': 5.501}, {'end': 1042.569, 'text': 'for that, click on add job.', 'start': 1041.348, 'duration': 1.221}, {'end': 1045.79, 'text': 'here again the name of the job as per your convenience.', 'start': 1042.569, 'duration': 3.221}, {'end': 1048.306, 'text': 'So let me just name it glue.', 'start': 1046.604, 'duration': 1.702}, {'end': 1051.068, 'text': 'demo job can choose the.', 'start': 1048.306, 'duration': 2.762}, {'end': 1053.01, 'text': 'I am role here by default.', 'start': 1051.068, 'duration': 1.942}, {'end': 1057.433, 'text': 'the type is Park here and the glue version says you can see here in this section.', 'start': 1053.01, 'duration': 4.423}, {'end': 1061.277, 'text': 'This job runs a proposed script generated by AWS glue.', 'start': 1057.453, 'duration': 3.824}, {'end': 1063.939, 'text': 'So AWS glue can generate a script for you.', 'start': 1061.737, 'duration': 2.202}, {'end': 1068.783, 'text': 'Also, you can have your own script in which you have your logic of ETL operation.', 'start': 1064.559, 'duration': 4.224}, {'end': 1073.267, 'text': "Let's just click on this option here a proposed script generated by AWS glue.", 'start': 1069.524, 'duration': 3.743}, {'end': 1083.099, 'text': 'Now you have to click on this security configuration and in this section you have to enter the maximum capacity as to which is the minimum,', 'start': 1074.355, 'duration': 8.744}, {'end': 1084.9, 'text': 'and job timeout as 10 minutes.', 'start': 1083.099, 'duration': 1.801}, {'end': 1095.885, 'text': 'Okay So once you do that click on next year choose a data source click on this read table here click on next choose a transform type by default.', 'start': 1085.14, 'duration': 10.745}, {'end': 1101.268, 'text': 'It is chain schema click on next year choose a data target read in our case.', 'start': 1095.905, 'duration': 5.363}, {'end': 1109.129, 'text': 'So as you can see here in this section map the source columns to the target columns by default.', 'start': 1103.786, 'duration': 5.343}, {'end': 1110.73, 'text': 'This has been mapped here.', 'start': 1109.509, 'duration': 1.221}, {'end': 1115.413, 'text': 'You can change the mapping if you want you can add a new column as per your requirements.', 'start': 1110.87, 'duration': 4.543}, {'end': 1119.916, 'text': "So as of now in our example, we don't have to change anything.", 'start': 1116.413, 'duration': 3.503}, {'end': 1122.657, 'text': 'Click on save job and edit script.', 'start': 1121.096, 'duration': 1.561}, {'end': 1127.94, 'text': 'So as you can see here AWS has generated a script for us.', 'start': 1125.019, 'duration': 2.921}, {'end': 1132.443, 'text': 'But if you want your own script here, you can also do that.', 'start': 1129.281, 'duration': 3.162}], 'summary': 'Successful crawler generated table, created aws glue job with proposed script and configured security settings.', 'duration': 121.532, 'max_score': 1010.911, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI1010911.jpg'}, {'end': 1045.79, 'src': 'embed', 'start': 1020.744, 'weight': 1, 'content': [{'end': 1026.285, 'text': 'you will see a table has been generated by the name read this means that crawler was successful.', 'start': 1020.744, 'duration': 5.541}, {'end': 1029.526, 'text': 'So the next step here is to create a job here.', 'start': 1027.065, 'duration': 2.461}, {'end': 1034.146, 'text': 'Okay, So for that, in this ETL section, click on jobs.', 'start': 1030.406, 'duration': 3.74}, {'end': 1035.847, 'text': 'and what is the job?', 'start': 1034.146, 'duration': 1.701}, {'end': 1041.348, 'text': 'so, as discussed earlier, a job is your business logic that is required to perform ETL operation.', 'start': 1035.847, 'duration': 5.501}, {'end': 1042.569, 'text': 'for that, click on add job.', 'start': 1041.348, 'duration': 1.221}, {'end': 1045.79, 'text': 'here again the name of the job as per your convenience.', 'start': 1042.569, 'duration': 3.221}], 'summary': 'Successful crawler generated table. now create etl job for business logic.', 'duration': 25.046, 'max_score': 1020.744, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI1020744.jpg'}, {'end': 1247.998, 'src': 'embed', 'start': 1221.584, 'weight': 2, 'content': [{'end': 1228.728, 'text': 'as you can see at this file has been generated here and if you click on open here, this file gets downloaded on your system.', 'start': 1221.584, 'duration': 7.144}, {'end': 1231.989, 'text': 'So let me just open this file here.', 'start': 1229.968, 'duration': 2.021}, {'end': 1242.195, 'text': 'So guys as you can see here, this is the output that has been generated as per our ETL script as you can see here.', 'start': 1235.871, 'duration': 6.324}, {'end': 1245.236, 'text': 'The first column is the decade movie count and rating me.', 'start': 1242.235, 'duration': 3.001}, {'end': 1247.998, 'text': 'So this also is a CSV file here.', 'start': 1246.337, 'duration': 1.661}], 'summary': 'Etl script generated a csv file with decade movie count and rating.', 'duration': 26.414, 'max_score': 1221.584, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI1221584.jpg'}], 'start': 824.482, 'title': 'Aws glue etl process', 'summary': 'Details the process of creating and running an aws glue etl job, including configuring crawlers, creating a database, running a crawler, generating tables, and creating and running an etl job to perform data transformation, resulting in a csv file with movie data.', 'chapters': [{'end': 949.752, 'start': 824.482, 'title': 'Amazon s3 data upload and aws glue crawler', 'summary': 'Covers the process of uploading a data set to amazon s3 and creating a crawler in aws glue to connect to the data store and progress through a list of classifiers to determine the schema for the data.', 'duration': 125.27, 'highlights': ['The process involves uploading a data set to Amazon S3, with the file being successfully uploaded after following the steps.', 'Creating a crawler in AWS Glue is discussed, which involves connecting to a data store, progressing through a list of classifiers to determine schema, and creating metadata tables in the data catalog.', 'Instructions for creating a crawler in AWS Glue involve selecting data source type, choosing an S3 path, and selecting an IAM role, with the requirement of having create role, create policy, and attach role policy permissions.', 'The existing IAM role is chosen for the crawler creation process, with a mention of the necessary permissions for creating a new IAM role.']}, {'end': 1299.783, 'start': 949.752, 'title': 'Aws glue etl process', 'summary': 'Details the process of creating and running an aws glue etl job, including configuring crawlers, creating a database, running a crawler, generating tables, and creating and running an etl job to perform data transformation, resulting in a csv file with movie data.', 'duration': 350.031, 'highlights': ['The process includes configuring crawlers, creating a database, running a crawler, generating tables, and creating an ETL job to perform data transformation, resulting in a CSV file with movie data. CSV file with movie data', 'The ETL job involved performing operations to obtain the decade, count of movies, and average rating for movies, resulting in a CSV file containing the output data. Decade, movie count, and average rating for movies in a CSV file', 'The ETL job was successfully executed, resulting in the generation of a CSV file with the output data. Successful execution of ETL job and generation of CSV file']}], 'duration': 475.301, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/Qpv7BzOM-UI/pics/Qpv7BzOM-UI824482.jpg', 'highlights': ['Creating a crawler in AWS Glue involves connecting to a data store and creating metadata tables in the data catalog', 'The process includes configuring crawlers, creating a database, running a crawler, generating tables, and creating an ETL job to perform data transformation', 'The ETL job involved obtaining the decade, count of movies, and average rating for movies, resulting in a CSV file with the output data', 'The process involves uploading a data set to Amazon S3, with the file being successfully uploaded after following the steps']}], 'highlights': ['AWS Glue is a fully managed ETL service that simplifies data categorization, cleaning, enrichment, and movement between data stores.', 'AWS Glue provides integration across a wide range of AWS services, making it less hassle for users and is cost-effective as it is serverless.', 'AWS Glue automates much of the effort in building, maintaining, and running ETL jobs, leading to increased power and efficiency.', 'The architecture of AWS Glue involves defining jobs to extract, transform, and load data from a source to a target, using crawlers to populate the AWS data catalog with metadata table definitions, generating scripts to transform data, running jobs on demand or based on triggers, and executing scripts in an Apache Spark environment.', 'The data catalog is the persistent metadata store in AWS glue, containing table definitions, job definitions, and other control information to manage the AWS glue environment, with each AWS account having one data catalog per region.', 'A classifier in AWS glue determines the schema of the data and provides classifiers for common file types such as CSV, JSON, XML, and Avro.', 'Creating an S3 bucket and organizing data into folders Necessary steps for storing and organizing data.', 'The ETL job involved obtaining the decade, count of movies, and average rating for movies, resulting in a CSV file with the output data']}