Coursnap

title
Hive Tutorial 1 | Hive Tutorial for Beginners | Understanding Hive In Depth | Edureka

description
🔥 Edureka Hadoop Training (Use Code "𝐘𝐎𝐔𝐓𝐔𝐁𝐄𝟐𝟎"): https://www.edureka.co/big-data-hadoop-training-certification Check out our Hive Tutorial blog series: https://goo.gl/2N440M This Hive tutorial gives in-depth knowledge on Apache Hive. Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive structures data into well-understood database concepts such as tables, rows, columns and partitions. Check our complete Hadoop playlist here: https://goo.gl/ExJdZs --------------------Edureka Big Data Training and Certifications------------------------ 🔵 Edureka Hadoop Training: http://bit.ly/2YBlw29 🔵 Edureka Spark Training: http://bit.ly/2PeHvc9 🔵 Edureka Kafka Training: http://bit.ly/34e7Riy 🔵 Edureka Cassandra Training: http://bit.ly/2E9AK54 🔵 Edureka Talend Training: http://bit.ly/2YzYIjg 🔵 Edureka Hadoop Administration Training: http://bit.ly/2YE8Nf9 #edureka #edurekaHive #HiveTutorial #ApacheHiveTutorial #HiveTutorialForBeginners The video talks about the following points: 1. What is Hive? 2. Why to use Hive? 3. Where to use hive and not Pig? 4. Hive Architecture 5. Hive Components 6. How Facebook Uses hive? 7. Hive vs RDBMS 8. Limitations of hive 9. Hive Types 10. Hive commands and Hive queries Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka How it Works? 1. This is a 5 Week Instructor led Online Course, 40 hours of assignment and 30 hours of project work 2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 3. At the end of the training you will have to undergo a 2-hour LIVE Practical Exam based on which we will provide you a Grade and a Verifiable Certificate! - - - - - - - - - - - - - - About the Course Edureka’s Big Data and Hadoop online training is designed to help you become a top Hadoop developer. During this course, our expert Hadoop instructors will help you: 1. Master the concepts of HDFS and MapReduce framework 2. Understand Hadoop 2.x Architecture 3. Setup Hadoop Cluster and write Complex MapReduce programs 4. Learn data loading techniques using Sqoop and Flume 5. Perform data analytics using Pig, Hive and YARN 6. Implement HBase and MapReduce integration 7. Implement Advanced Usage and Indexing 8. Schedule jobs using Oozie 9. Implement best practices for Hadoop development 10. Work on a real life Project on Big Data Analytics 11. Understand Spark and its Ecosystem 12. Learn how to work in RDD in Spark - - - - - - - - - - - - - - Who should go for this course? If you belong to any of the following groups, knowledge of Big Data and Hadoop is crucial for you if you want to progress in your career: 1. Analytics professionals 2. BI /ETL/DW professionals 3. Project managers 4. Testing professionals 5. Mainframe professionals 6. Software developers and architects 7. Recent graduates passionate about building successful career in Big Data - - - - - - - - - - - - - - Why Learn Hadoop? Big Data! A Worldwide Problem? According to Wikipedia, "Big data is collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications." The problem lies in the use of traditional systems to store enormous data. Though these systems were a success a few years ago, with increasing amount and complexity of data, these are soon becoming obsolete. The good news is - Hadoop has become an integral part for storing, handling, evaluating and retrieving hundreds of terabytes, and even petabytes of data. - - - - - - - - - - - - - - Opportunities for Hadoopers! Opportunities for Hadoopers are infinite - from a Hadoop Developer, to a Hadoop Tester or a Hadoop Architect, and so on. If cracking and managing BIG Data is your passion in life, then think no more and Join Edureka's Hadoop Online course and carve a niche for yourself! For more information, Please write back to us at sales@edureka.in or call us at IND: 9606058406 / US: 18338555775 (toll-free). Customer Review: Michael Harkins, System Architect, Hortonworks says: “The courses are top rate. The best part is live instruction, with playback. But my favorite feature is viewing a previous class. Also, they are always there to answer questions, and prompt when you open an issue if you are having any trouble. Added bonus ~ you get lifetime access to the course you took!!! ~ This is the killer education app... I've take two courses, and I'm taking two more.”

detail
{'title': 'Hive Tutorial 1 | Hive Tutorial for Beginners | Understanding Hive In Depth | Edureka', 'heatmap': [{'end': 3092.785, 'start': 3002.424, 'weight': 0.765}, {'end': 4121.642, 'start': 4016.306, 'weight': 0.779}, {'end': 6015.528, 'start': 5915.35, 'weight': 0.854}, {'end': 6862.932, 'start': 6771.823, 'weight': 1}, {'end': 7298.987, 'start': 7202.393, 'weight': 0.882}], 'summary': "This tutorial covers introduction to hive, facebook's transition to hive, hive and pig data management, hive data loading and database management, creating and managing hive tables, hive data processing, data organization and processing in hadoop and hive, sql partition modes, and hadoop data management, providing in-depth insights into hive's architecture, capabilities, and efficient data handling.", 'chapters': [{'end': 53.666, 'segs': [{'end': 53.666, 'src': 'embed', 'start': 4.218, 'weight': 0, 'content': [{'end': 12.243, 'text': "Okay guys, so let's get started with Hive and we will see how this whole Hive stuff goes.", 'start': 4.218, 'duration': 8.025}, {'end': 21.73, 'text': 'So what we are going to cover in terms of Hive is, we are going to cover what Hive is and where to use Hive.', 'start': 15.165, 'duration': 6.565}, {'end': 30.015, 'text': "So we will understand what Hive is, what is the Hive architecture, that's what we are going to understand.", 'start': 24.111, 'duration': 5.904}, {'end': 32.557, 'text': 'And we will understand also scenario where we can use that.', 'start': 30.215, 'duration': 2.342}, {'end': 39.832, 'text': 'Then what we will next understand is like why to go for Hive when we have the pig that basically makes our life easier.', 'start': 33.786, 'duration': 6.046}, {'end': 42.595, 'text': 'Then why again we are using Hive.', 'start': 40.253, 'duration': 2.342}, {'end': 44.136, 'text': 'So we will see the difference between that.', 'start': 42.615, 'duration': 1.521}, {'end': 51.984, 'text': 'And then, as to like Hive architecture, different components of Hive, little bit background about Hive, like from where it is started,', 'start': 45.358, 'duration': 6.626}, {'end': 53.666, 'text': 'the whole journey of Hive.', 'start': 51.984, 'duration': 1.682}], 'summary': 'Introduction to hive covering its architecture, use cases, and comparison with pig.', 'duration': 49.448, 'max_score': 4.218, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE4218.jpg'}], 'start': 4.218, 'title': 'Introduction to hive', 'summary': "Covers the introduction to hive, its architecture, use cases, and comparison with pig, along with a brief history of hive's development.", 'chapters': [{'end': 53.666, 'start': 4.218, 'title': 'Introduction to hive', 'summary': "Covers the introduction to hive, its architecture, use cases, and comparison with pig, along with a brief history of hive's development.", 'duration': 49.448, 'highlights': ['The chapter covers what Hive is, its architecture, and use cases, along with a comparison with Pig in terms of ease of use.', "Explains the different components of Hive architecture and provides a brief history of Hive's development.", 'Discusses the reasons for using Hive despite the availability of Pig and compares their respective advantages.']}], 'duration': 49.448, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE4218.jpg', 'highlights': ['Covers the introduction to Hive, its architecture, use cases, and comparison with Pig.', "Explains the different components of Hive architecture and provides a brief history of Hive's development.", 'Discusses the reasons for using Hive despite the availability of Pig and compares their respective advantages.']}, {'end': 1121.878, 'segs': [{'end': 533.005, 'src': 'embed', 'start': 504.179, 'weight': 2, 'content': [{'end': 510.921, 'text': 'So at any point, if you have a requirement to write your own custom code, you can easily write that and integrate with Hive.', 'start': 504.179, 'duration': 6.742}, {'end': 517.592, 'text': 'And the next thing what you have with the Hive is the JDBC ODBC driver.', 'start': 513.208, 'duration': 4.384}, {'end': 524.598, 'text': 'So nowadays, all of these vendors, with any reporting tool you know MicroStrategy Tableau, like whichever you are using,', 'start': 517.633, 'duration': 6.965}, {'end': 526.54, 'text': 'they have connectors available to Hive.', 'start': 524.598, 'duration': 1.942}, {'end': 533.005, 'text': 'So by using this JDBC ODBC connection, you can easily read the data which is available in Hive.', 'start': 527.02, 'duration': 5.985}], 'summary': 'Easily write custom code, integrate with hive, and access data via jdbc odbc connection.', 'duration': 28.826, 'max_score': 504.179, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE504179.jpg'}, {'end': 705.745, 'src': 'embed', 'start': 677.805, 'weight': 1, 'content': [{'end': 684.195, 'text': 'So they prefer to go with Hive because you can easily use the existing SQL expertise to start interacting with Hive.', 'start': 677.805, 'duration': 6.39}, {'end': 692.637, 'text': 'And mainly targeted towards the user who are more comfortable with the SQL kind of interface so they can easily use that.', 'start': 686.354, 'duration': 6.283}, {'end': 701.282, 'text': "And similar to kind of the SQL and call, it's generally like you have the SQL, it supports a lot of functionality.", 'start': 693.198, 'duration': 8.084}, {'end': 705.745, 'text': "In case of Hive, it's not everything what you have in the SQL it supports in Hive query language.", 'start': 701.402, 'duration': 4.343}], 'summary': 'Hive is preferred for its sql interface, supporting most sql functionality.', 'duration': 27.94, 'max_score': 677.805, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE677805.jpg'}, {'end': 799.251, 'src': 'embed', 'start': 775.046, 'weight': 0, 'content': [{'end': 785.591, 'text': "And it's one of the, you know, the tool that's widely used inside the Hadoop stack for, you know, accessing the data, analyzing the data.", 'start': 775.046, 'duration': 10.545}, {'end': 793.365, 'text': 'And as the Facebook was the main contributor for this particular whole project,', 'start': 788.519, 'duration': 4.846}, {'end': 799.251, 'text': 'Facebook claims that they analyze several terabytes of the data every day with the help of the Hive interface.', 'start': 793.365, 'duration': 5.886}], 'summary': 'Facebook analyzes several terabytes of data daily using hive in hadoop stack.', 'duration': 24.205, 'max_score': 775.046, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE775046.jpg'}, {'end': 909.4, 'src': 'embed', 'start': 887.067, 'weight': 3, 'content': [{'end': 896.538, 'text': 'you can basically always enhance this whole high functionality in terms of you know the processing or to meeting any kind of a particular business requirement which you have.', 'start': 887.067, 'duration': 9.471}, {'end': 904.878, 'text': 'So where all you can use Hive? So as like one of the things, Hive is generally suitable for the structured data.', 'start': 899.215, 'duration': 5.663}, {'end': 909.4, 'text': 'So whenever you have a structured data set, you can easily use Hive for that.', 'start': 904.918, 'duration': 4.482}], 'summary': 'Hive is suitable for enhancing high functionality and processing structured data sets.', 'duration': 22.333, 'max_score': 887.067, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE887067.jpg'}], 'start': 55.439, 'title': "Facebook's transition to hive", 'summary': "Discusses facebook's transition from traditional rdbms to hive due to data influx, challenges serving a huge user base, dealing with terabytes of data, and analyzing various types of data. it also covers challenges with hadoop, emergence of hive, hive data warehousing capabilities, and a comparative analysis of hive vs pig.", 'chapters': [{'end': 253.343, 'start': 55.439, 'title': "Facebook's transition to hive", 'summary': 'Discusses the transition of facebook from using traditional rdbms to adopting hive due to the influx of data, with challenges such as serving a huge user base, dealing with terabytes of data per day, and analyzing various types of data.', 'duration': 197.904, 'highlights': ["Facebook's transition to Hive was driven by the influx of data, including terabytes of data per day and various types of data, leading to the adoption of a big data solution like Hadoop.", 'The challenges faced by Facebook included serving a huge user base, running numerous SQL queries, and analyzing diverse data types, such as user activity and image data.', 'Initially, Facebook used Oracle for data capture and analysis, but it was unable to scale to meet the data requirements, prompting the move towards a big data solution like Hadoop.']}, {'end': 466.319, 'start': 253.343, 'title': 'Challenges with hadoop and the emergence of hive', 'summary': 'Discusses the challenge of processing large data sets in hadoop using mapreduce due to the expertise of sql users, leading to the development of hive which provides a sql interface for writing processing logic and translates it into mapreduce jobs, addressing the gap in expertise and tools.', 'duration': 212.976, 'highlights': ['Hive was developed at Facebook to address the challenge of converting SQL requirements into MapReduce jobs, serving as a tool to bridge the expertise gap and enable the use of SQL interface on top of Hadoop.', 'Pig, developed at Yahoo, also aimed to address the challenge of converting requirements into MapReduce jobs, providing an alternative approach to Hive for users to utilize in handling data processing.']}, {'end': 887.067, 'start': 468.281, 'title': 'Hive data warehousing on hadoop', 'summary': "Highlights the capabilities of hive as a data warehousing package built on top of hadoop, including its schema flexibility, support for custom code integration, jdbc odbc driver for data access, and limitations in processing only structured data, while also mentioning facebook's contribution and extensive use for analyzing terabytes of data daily.", 'duration': 418.786, 'highlights': ['Facebook claims to analyze several terabytes of data every day with the help of the Hive interface. Facebook extensively uses Hive for analyzing terabytes of data daily, showcasing its capability for large-scale data processing.', 'Hive provides schema flexibility, allowing alteration of tables and columns without reloading the entire dataset. Hive offers schema flexibility, enabling alteration of tables and columns without the need to reload the entire dataset, simplifying data management.', 'Hive supports integration with custom mapper and reducer code, allowing the creation of custom code for specific requirements. Hive allows integration with custom mapper and reducer code, facilitating the creation of custom code to address specific data processing requirements.', 'Hive supports JDBC ODBC driver for data access, enabling connectivity with various reporting tools like MicroStrategy and Tableau. Hive supports JDBC ODBC driver for data access, facilitating connectivity with reporting tools such as MicroStrategy and Tableau.', 'Hive can only process structured data and is not suitable for processing unstructured or semi-structured data. Hive is limited to processing only structured data, making it unsuitable for unstructured or semi-structured data processing.']}, {'end': 1121.878, 'start': 887.067, 'title': 'Hive vs pig: a comparative analysis', 'summary': 'Discusses the applications of hive for structured data processing, including customer-facing bi solutions and data mining, and compares it with pig, highlighting their differences in data representation and interaction, sql vs pig latin, schema handling, partitioning support, and thrift api.', 'duration': 234.811, 'highlights': ['Hive is suitable for processing structured data sets, including logs and customer-facing BI solutions, allowing users to run SQL queries and perform data mining, enabling analysis of huge data sets and building indexing and predictive modeling.', 'Hive provides a SQL interface and represents data in table and column format, making it suitable for analysts to easily analyze the data, while PIG uses a PIG Latin language for code interaction and directly interacts with the data files, preferred by programmers and researchers.', 'Hive requires providing a schema for reading data and supports partitioning for faster processing, while PIG uses an implicit schema and does not support partitioning, but has the support of Thrift for API calls.']}], 'duration': 1066.439, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE55439.jpg', 'highlights': ['Facebook transitioned to Hive due to data influx, serving a huge user base, and analyzing diverse data types.', 'Hive was developed at Facebook to bridge the expertise gap and enable the use of SQL interface on top of Hadoop.', 'Hive provides schema flexibility, supports integration with custom mapper and reducer code, and JDBC ODBC driver for data access.', 'Hive is suitable for processing structured data sets, provides a SQL interface, and requires providing a schema for reading data.']}, {'end': 1956.888, 'segs': [{'end': 1188.739, 'src': 'embed', 'start': 1121.878, 'weight': 0, 'content': [{'end': 1130.113, 'text': 'you can query this Hive meta store and you can see all the details about this Hive tables and all from this Thrift API call.', 'start': 1121.878, 'duration': 8.235}, {'end': 1137.878, 'text': 'In terms of the UDF, user defined function, both of them support and most of the UDF will be in Java.', 'start': 1132.914, 'duration': 4.964}, {'end': 1138.978, 'text': "So that's what they support.", 'start': 1137.898, 'duration': 1.08}, {'end': 1144.542, 'text': 'CDLIS and DCLIS and both Hive and the PIG supports the CDLIS and DCLIS.', 'start': 1139.939, 'duration': 4.603}, {'end': 1149.926, 'text': 'DFS direct access, yes, you can read the data directly from the Hive as well as the PIG.', 'start': 1145.403, 'duration': 4.523}, {'end': 1154.829, 'text': 'You can both, the data set can be stored inside the Hadoop file system, it can be done.', 'start': 1150.026, 'duration': 4.803}, {'end': 1159.421, 'text': 'In terms of the shell.', 'start': 1157.858, 'duration': 1.563}, {'end': 1161.464, 'text': 'you have this both Hive and the Piction,', 'start': 1159.421, 'duration': 2.043}, {'end': 1166.353, 'text': 'so both have their own interactive shell where which you can use it to interact with the data set which you have.', 'start': 1161.464, 'duration': 4.889}, {'end': 1170.369, 'text': 'And both have this streaming support as well.', 'start': 1168.208, 'duration': 2.161}, {'end': 1177.513, 'text': 'And the interface, Hive has a web interface where you can see the data set, see the table, all those kind of stuff.', 'start': 1170.509, 'duration': 7.004}, {'end': 1179.955, 'text': "But then PIC doesn't have any such interface.", 'start': 1177.553, 'duration': 2.402}, {'end': 1185.518, 'text': 'In terms of JDBC, ODBC connectivity, in Hive, yes, you have this ODBC connectivity.', 'start': 1180.135, 'duration': 5.383}, {'end': 1188.739, 'text': 'But in case PIC, it will just write the data inside the output file.', 'start': 1185.538, 'duration': 3.201}], 'summary': 'Hive and pig support udfs, cdlis, dclis, and dfs direct access, with hive having a web interface and jdbc/odbc connectivity.', 'duration': 66.861, 'max_score': 1121.878, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1121878.jpg'}, {'end': 1237.116, 'src': 'embed', 'start': 1208.734, 'weight': 2, 'content': [{'end': 1216.3, 'text': "so that's the location where, uh, your data will be either stored or you are going to store as part of your hive table load.", 'start': 1208.734, 'duration': 7.566}, {'end': 1222.164, 'text': 'And the reason why you store your data inside the Hadoop file system,', 'start': 1218.681, 'duration': 3.483}, {'end': 1226.648, 'text': "because that's the place you get basically the reliability and the scalability stuff.", 'start': 1222.164, 'duration': 4.484}, {'end': 1228.489, 'text': 'So, by using this Hadoop file system,', 'start': 1226.668, 'duration': 1.821}, {'end': 1237.116, 'text': 'you will be able to have all these multiple copies of the data that you can replicate across the server and you can consume that.', 'start': 1228.489, 'duration': 8.627}], 'summary': 'Data is stored in hadoop file system for reliability and scalability, allowing multiple copies and replication.', 'duration': 28.382, 'max_score': 1208.734, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1208734.jpg'}, {'end': 1291.154, 'src': 'embed', 'start': 1269.46, 'weight': 1, 'content': [{'end': 1279.066, 'text': 'And on top of this Metastore only you have this Thrift API call running and that basically enables you to do browse query with the help of the JDBC and ODBC connectivity.', 'start': 1269.46, 'duration': 9.606}, {'end': 1286.871, 'text': 'So you are all table definition, column definition, database, everything will be stored in something called the Metastore.', 'start': 1281.247, 'duration': 5.624}, {'end': 1291.154, 'text': 'Now when it comes to the Metastore you can implement Metastore in different way.', 'start': 1287.631, 'duration': 3.523}], 'summary': 'Metastore stores table, column, and database definitions for browse query using jdbc and odbc.', 'duration': 21.694, 'max_score': 1269.46, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1269460.jpg'}, {'end': 1343.827, 'src': 'embed', 'start': 1314.865, 'weight': 9, 'content': [{'end': 1318.606, 'text': "what you will have with the Derby database, guys, is it's limited to one user only.", 'start': 1314.865, 'duration': 3.741}, {'end': 1323.327, 'text': 'so at one point, if you have moved the data, only one shell you can open and then you can interact with that.', 'start': 1318.606, 'duration': 4.721}, {'end': 1329.798, 'text': 'Apart from that, what you have is like all these user-defined function support.', 'start': 1325.875, 'duration': 3.923}, {'end': 1331.639, 'text': 'So, it supports all these MapReduce script.', 'start': 1329.818, 'duration': 1.821}, {'end': 1332.64, 'text': 'You can integrate that.', 'start': 1331.679, 'duration': 0.961}, {'end': 1337.823, 'text': 'All these, you know, the UDF function, string, sum, average, those kind of things you can do.', 'start': 1333.26, 'duration': 4.563}, {'end': 1343.827, 'text': 'If you want to basically read serialization and deserialization data, all those kind of the data set, it can support.', 'start': 1338.323, 'duration': 5.504}], 'summary': 'Derby database limited to one user, supports user-defined functions, mapreduce script, and data serialization.', 'duration': 28.962, 'max_score': 1314.865, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1314865.jpg'}, {'end': 1375.612, 'src': 'embed', 'start': 1350.211, 'weight': 6, 'content': [{'end': 1357.417, 'text': "Then in terms of the storage, what a high budget, it's supposed text file like you can store your backend data in the text file itself.", 'start': 1350.211, 'duration': 7.206}, {'end': 1359.298, 'text': 'So you can read the data set.', 'start': 1357.457, 'duration': 1.841}, {'end': 1363.321, 'text': 'If you want to store your data as a sequence file, you can do that as well.', 'start': 1359.899, 'duration': 3.422}, {'end': 1370.728, 'text': 'If you want to store data in RC file, so RC file is basically, you guys, a columnar way to store the data inside the hive.', 'start': 1363.381, 'duration': 7.347}, {'end': 1375.612, 'text': 'so if you are using just selective columns in the rest of the hive processing,', 'start': 1370.728, 'duration': 4.884}], 'summary': 'High budget allows storing backend data in different file formats like text, sequence, or rc files for efficient processing in hive.', 'duration': 25.401, 'max_score': 1350.211, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1350211.jpg'}, {'end': 1444.326, 'src': 'embed', 'start': 1413.664, 'weight': 5, 'content': [{'end': 1415.787, 'text': 'Then you have the hive cell.', 'start': 1413.664, 'duration': 2.123}, {'end': 1417.408, 'text': 'We talked like where we submit the query.', 'start': 1415.887, 'duration': 1.521}, {'end': 1418.99, 'text': 'Then you have something called the driver.', 'start': 1417.448, 'duration': 1.542}, {'end': 1426.919, 'text': 'So, with the help of the hive driver, it will take your code and it will convert into the you know,', 'start': 1419.07, 'duration': 7.849}, {'end': 1432.078, 'text': 'into something which Hadoop can understand in terms of execution.', 'start': 1428.436, 'duration': 3.642}, {'end': 1436.361, 'text': 'And then you have the Hive compiler, execution engine, optimizer.', 'start': 1433.039, 'duration': 3.322}, {'end': 1441.504, 'text': 'So all these basically whenever you submit your query, all these steps you know.', 'start': 1436.381, 'duration': 5.123}, {'end': 1444.326, 'text': 'go through this, the compilers and execution, and then, finally,', 'start': 1441.504, 'duration': 2.822}], 'summary': 'Hive uses a driver and compiler for query execution in hadoop.', 'duration': 30.662, 'max_score': 1413.664, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1413664.jpg'}, {'end': 1648.446, 'src': 'embed', 'start': 1622.51, 'weight': 7, 'content': [{'end': 1631.772, 'text': 'So you can store your data into a Hadoop file system like from the Hive table if you run certain SQL and you want to store in Hadoop you can do that.', 'start': 1622.51, 'duration': 9.262}, {'end': 1637.777, 'text': 'And it also supports something called the partitioning guide.', 'start': 1634.674, 'duration': 3.103}, {'end': 1648.446, 'text': 'So with the help of the partitioning, what you will see is it basically speed up the process of how you create the data and consume the data.', 'start': 1637.817, 'duration': 10.629}], 'summary': 'Hadoop fs stores data from hive table, supports partitioning for faster data processing.', 'duration': 25.936, 'max_score': 1622.51, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1622510.jpg'}], 'start': 1121.878, 'title': 'Hive and pig data management', 'summary': "Delves into the data management capabilities of hive and pig, covering udf support, data storage in hadoop file system, metastore, thrift api, jdbc, and odbc connectivity. it also explores hive's functions, storage, and components, including data processing functions, storage options, metastore, shell, driver, compiler, execution engine, and optimizer, along with its limitations and capabilities for filtering, joining, and partitioning data.", 'chapters': [{'end': 1332.64, 'start': 1121.878, 'title': 'Hive and pig data management', 'summary': 'Discusses the data management capabilities of hive and pig, including support for udfs, data storage in hadoop file system, metastore, thrift api, and connectivity options like jdbc and odbc.', 'duration': 210.762, 'highlights': ['Hive and Pig support user-defined functions (UDFs), most of which are in Java. UDFs are supported in both Hive and Pig, primarily in Java.', 'Hive and Pig support CDLIS and DCLIS. Both Hive and Pig support CDLIS and DCLIS.', 'Both Hive and Pig allow direct data access and storage in the Hadoop file system. Both Hive and Pig enable direct data access and storage within the Hadoop file system.', 'Hive and Pig provide interactive shells for interacting with datasets. Both Hive and Pig offer interactive shells for dataset interaction.', 'Hive has a web interface for dataset and table visualization, while Pig lacks such an interface. Hive offers a web interface for dataset and table visualization, unlike Pig.', 'Hive supports JDBC and ODBC connectivity, while Pig only writes data to an output file, requiring separate data movement and integration. Hive supports JDBC and ODBC connectivity, whereas Pig only writes data to an output file, necessitating separate data movement and integration.', 'Data is stored in the Hadoop file system for reliability and scalability, with the ability to replicate data across servers. Data is stored in the Hadoop file system for reliability and scalability, allowing data replication across servers.', 'Metastore stores all database, table, and column definitions, enabling browse query through JDBC and ODBC connectivity. Metastore stores database, table, and column definitions, facilitating browse query through JDBC and ODBC connectivity.', 'Metastore can be implemented using different databases, such as Derby or MySQL. Metastore can be implemented with various databases, including Derby or MySQL.', 'Derby database, used for Metastore in Hive, is limited to one user, posing a challenge for concurrent interactions. Derby database, used for Metastore in Hive, is limited to one user, posing a challenge for concurrent interactions.', 'Hive and Pig support user-defined functions (UDFs), with integration of MapReduce scripts. Both Hive and Pig support user-defined functions (UDFs) and integrate MapReduce scripts.']}, {'end': 1956.888, 'start': 1333.26, 'title': 'Hive: functions, storage, and components', 'summary': 'Discusses the capabilities of hive, including data processing functions, storage options such as text, sequence, and rc files, and the different components of hive including the metastore, shell, driver, compiler, execution engine, and optimizer. it highlights the limitation of selective update and delete operations, the lack of low latency performance, and the ability to filter, join, and partition data using hive.', 'duration': 623.628, 'highlights': ['Components of Hive The different components of Hive include the hive shell for table creation, the metastore for storing metadata, the driver for code execution, and the compiler, execution engine, and optimizer for query processing.', 'Storage Options Hive supports storing data in text, sequence, and RC file formats, providing flexibility for backend data storage and selective column storage in the RC file format.', 'Hive Limitations Hive lacks the capability for selective update and delete operations, and it does not provide low latency performance for online transaction processing systems.', 'Data Processing Functions Hive enables data processing functions such as filtering, joining, and partitioning data, allowing for efficient data creation and consumption.']}], 'duration': 835.01, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1121878.jpg', 'highlights': ['Hive and Pig support user-defined functions (UDFs) and integrate MapReduce scripts.', 'Metastore stores database, table, and column definitions, facilitating browse query through JDBC and ODBC connectivity.', 'Data is stored in the Hadoop file system for reliability and scalability, allowing data replication across servers.', 'Hive and Pig support CDLIS and DCLIS.', 'Both Hive and Pig enable direct data access and storage within the Hadoop file system.', 'Components of Hive include the hive shell for table creation, the metastore for storing metadata, the driver for code execution, and the compiler, execution engine, and optimizer for query processing.', 'Hive supports storing data in text, sequence, and RC file formats, providing flexibility for backend data storage and selective column storage in the RC file format.', 'Hive enables data processing functions such as filtering, joining, and partitioning data, allowing for efficient data creation and consumption.', 'Hive offers a web interface for dataset and table visualization, unlike Pig.', 'Derby database, used for Metastore in Hive, is limited to one user, posing a challenge for concurrent interactions.', 'Hive supports JDBC and ODBC connectivity, whereas Pig only writes data to an output file, necessitating separate data movement and integration.']}, {'end': 2672.994, 'segs': [{'end': 2088.12, 'src': 'embed', 'start': 1983.141, 'weight': 0, 'content': [{'end': 1992.283, 'text': 'so In case of Hive, like whenever you load a data inside Hive, so in Hive you will see the loading of the data will be very fast.', 'start': 1983.141, 'duration': 9.142}, {'end': 2003.006, 'text': 'So you know, it makes basically what it does is like whenever you want to write the data inside Hive.', 'start': 1992.863, 'duration': 10.143}, {'end': 2008.048, 'text': 'so whenever you have to perform the load, it will just take the data set and move the data inside Hive name space.', 'start': 2003.006, 'duration': 5.042}, {'end': 2011.149, 'text': 'So that way the initial load is very, very fast.', 'start': 2008.508, 'duration': 2.641}, {'end': 2018.171, 'text': 'While it comes to the processing of the data, it takes a certain time because it converts into the MapReduce and all.', 'start': 2011.229, 'duration': 6.942}, {'end': 2020.971, 'text': "So that's where it takes time to do that.", 'start': 2018.531, 'duration': 2.44}, {'end': 2028.233, 'text': 'And when it comes to doing any update or any kind of transaction.', 'start': 2022.792, 'duration': 5.441}, {'end': 2034.715, 'text': 'so all those stuff it is not supported in Hive because it cannot do the selective update and insert on a dataset.', 'start': 2028.233, 'duration': 6.482}, {'end': 2042.923, 'text': 'So in case of the Hive data type what all we have.', 'start': 2037.821, 'duration': 5.102}, {'end': 2046.524, 'text': 'So what all we have is like you have all this Boolean data type.', 'start': 2043.323, 'duration': 3.201}, {'end': 2048.725, 'text': 'You have int, small int, big int.', 'start': 2046.624, 'duration': 2.101}, {'end': 2052.447, 'text': 'So all these kind of data type is supported with Hive.', 'start': 2048.764, 'duration': 3.683}, {'end': 2060.973, 'text': "Apart from that you don't have here the char array guys where you define like what where char kind of thing normal in the database.", 'start': 2053.907, 'duration': 7.066}, {'end': 2069.96, 'text': 'So what you can do is in case of a hive you can always define that as a string like if you have to define and then it will be able to do that.', 'start': 2062.333, 'duration': 7.627}, {'end': 2074.945, 'text': 'In case of floating data or the decimal data you have you can define float and label.', 'start': 2070.481, 'duration': 4.464}, {'end': 2077.107, 'text': 'So these are the basic data type we have.', 'start': 2075.344, 'duration': 1.763}, {'end': 2088.12, 'text': "Then in Hive, what it does is it also supports the complex data type that's called basically the struct map and the array.", 'start': 2077.147, 'duration': 10.973}], 'summary': 'Hive allows fast data loading, supports various data types, but has limitations in updates and transactions.', 'duration': 104.979, 'max_score': 1983.141, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1983141.jpg'}, {'end': 2169.511, 'src': 'embed', 'start': 2140.467, 'weight': 3, 'content': [{'end': 2144.477, 'text': 'so other thing, what you have with the hive guys is the hive namespace.', 'start': 2140.467, 'duration': 4.01}, {'end': 2154.042, 'text': 'so basically whenever you, Whenever you want to you know, store the data inside Hive.', 'start': 2144.477, 'duration': 9.565}, {'end': 2156.203, 'text': 'So you have the two stuff.', 'start': 2154.482, 'duration': 1.721}, {'end': 2161.686, 'text': 'You can create a database and then inside the database, whatever table you create, it will be stored.', 'start': 2156.243, 'duration': 5.443}, {'end': 2166.289, 'text': 'Other way is like, you know, you leave it everything in the default.', 'start': 2162.987, 'duration': 3.302}, {'end': 2169.511, 'text': "So that's the default database app and where your table will get created.", 'start': 2166.309, 'duration': 3.202}], 'summary': 'Hive allows data storage through databases and tables.', 'duration': 29.044, 'max_score': 2140.467, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE2140467.jpg'}, {'end': 2409.168, 'src': 'embed', 'start': 2375.565, 'weight': 4, 'content': [{'end': 2379.328, 'text': 'so hive right, so you do create database.', 'start': 2375.565, 'duration': 3.763}, {'end': 2385.4, 'text': 'Okay, so let me pick this data name telecom.', 'start': 2382.259, 'duration': 3.141}, {'end': 2387.581, 'text': 'So we are going to create a database called telecom.', 'start': 2385.44, 'duration': 2.141}, {'end': 2392.863, 'text': 'Okay, so how you create the database? Create database and telecom.', 'start': 2388.601, 'duration': 4.262}, {'end': 2395.824, 'text': 'All the hive commands you have to terminate with the column.', 'start': 2393.303, 'duration': 2.521}, {'end': 2397.264, 'text': "So that's why you have to do that.", 'start': 2395.844, 'duration': 1.42}, {'end': 2398.905, 'text': 'So you terminate it with the column.', 'start': 2397.304, 'duration': 1.601}, {'end': 2409.168, 'text': 'Okay, once you return your hive cell again with, you know, gracefully, what it means that this particular command executed successfully.', 'start': 2399.965, 'duration': 9.203}], 'summary': "Creating a database named telecom in hive using 'create database' command.", 'duration': 33.603, 'max_score': 2375.565, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE2375565.jpg'}, {'end': 2532.698, 'src': 'embed', 'start': 2508.228, 'weight': 5, 'content': [{'end': 2514.279, 'text': 'So now, if you do show databases, you will see two databases telecom and telecom underscore backup.', 'start': 2508.228, 'duration': 6.051}, {'end': 2520.19, 'text': 'So you can add the comment as well so that next time whenever you are reading the dataset, you can undo that.', 'start': 2515.001, 'duration': 5.189}, {'end': 2522.953, 'text': 'now up.', 'start': 2522.253, 'duration': 0.7}, {'end': 2524.254, 'text': 'you know the couple multiple things.', 'start': 2522.953, 'duration': 1.301}, {'end': 2524.934, 'text': 'you can do it.', 'start': 2524.254, 'duration': 0.68}, {'end': 2529.857, 'text': "I'm not sure how much admin can affect you, but you, like you can add the DB properties as well.", 'start': 2524.934, 'duration': 4.923}, {'end': 2532.698, 'text': "that you can say well, who is the creator, what date you're getting.", 'start': 2529.857, 'duration': 2.841}], 'summary': "Two databases 'telecom' and 'telecom_backup' exist, with added comments for future reference and ability to specify db properties like creator and date.", 'duration': 24.47, 'max_score': 2508.228, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE2508228.jpg'}, {'end': 2634.504, 'src': 'embed', 'start': 2563.984, 'weight': 6, 'content': [{'end': 2576.553, 'text': 'you can add something called db and then here you can add like, who is the creator?', 'start': 2563.984, 'duration': 12.569}, {'end': 2602.382, 'text': 'creator. you can add like okay, you can give something, date, maybe something like 18-10-14, whatever you want to give, like,', 'start': 2576.553, 'duration': 25.829}, {'end': 2605.924, 'text': 'you can add these kind of stuff as well.', 'start': 2602.382, 'duration': 3.542}, {'end': 2625.441, 'text': 'okay, just a second, guys should be going.', 'start': 2605.924, 'duration': 19.517}, {'end': 2626.461, 'text': "what's the problem here?", 'start': 2625.441, 'duration': 1.02}, {'end': 2634.504, 'text': 'create database telecom with tv properties, creator.', 'start': 2626.461, 'duration': 8.043}], 'summary': 'The transcript discusses adding properties to a database, including the creator and date, and creating a database with specific properties.', 'duration': 70.52, 'max_score': 2563.984, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE2563984.jpg'}], 'start': 1957.068, 'title': 'Hive data loading and database management in hive', 'summary': 'Explains the efficiency of data loading in hive, emphasizing its fast initial load and support for various data types and complex data structures. it also covers the process of creating and managing databases in hive, enabling users to handle data efficiently within hive.', 'chapters': [{'end': 2204.815, 'start': 1957.068, 'title': 'Hive data loading and data types', 'summary': 'Explains the efficiency of data loading in hive, emphasizing its fast initial load and support for various data types and complex data structures, while also highlighting limitations in updates and transactions, and the options for database and table creation.', 'duration': 247.747, 'highlights': ['In Hive, the loading of data is very fast, as it moves the data inside Hive namespace, resulting in efficient initial load.', 'Hive supports various data types such as Boolean, int, small int, big int, string, float, and decimal.', 'Hive also supports complex data types including struct, map, and array, allowing for flexible storage of data.', 'Hive does not support selective update and insert on a dataset, leading to limitations in performing updates and transactions.', 'Users can create a database and tables for data storage in Hive, with the option to customize and organize namespaces for efficient data management.']}, {'end': 2672.994, 'start': 2204.895, 'title': 'Working with hive database', 'summary': 'Covers the process of creating and managing databases in hive, including creating a database, adding comments, and managing database properties, enabling users to handle data efficiently within hive.', 'duration': 468.099, 'highlights': ["The process of creating a database in Hive is explained, including the command 'create database' followed by the desired database name, such as 'telecom', and the command to show the created database.", 'The demonstration of adding comments to databases in Hive is illustrated, showcasing the use of comments to provide additional context and information about the database for future reference.', 'The concept of managing database properties in Hive is discussed, enabling the addition of properties such as creator and creation date to the database for administrative and organizational purposes.']}], 'duration': 715.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE1957068.jpg', 'highlights': ['In Hive, the loading of data is very fast, as it moves the data inside Hive namespace, resulting in efficient initial load.', 'Hive supports various data types such as Boolean, int, small int, big int, string, float, and decimal.', 'Hive also supports complex data types including struct, map, and array, allowing for flexible storage of data.', 'Users can create a database and tables for data storage in Hive, with the option to customize and organize namespaces for efficient data management.', "The process of creating a database in Hive is explained, including the command 'create database' followed by the desired database name, such as 'telecom', and the command to show the created database.", 'The demonstration of adding comments to databases in Hive is illustrated, showcasing the use of comments to provide additional context and information about the database for future reference.', 'The concept of managing database properties in Hive is discussed, enabling the addition of properties such as creator and creation date to the database for administrative and organizational purposes.', 'Hive does not support selective update and insert on a dataset, leading to limitations in performing updates and transactions.']}, {'end': 5032.758, 'segs': [{'end': 2950.798, 'src': 'embed', 'start': 2920.834, 'weight': 0, 'content': [{'end': 2923.436, 'text': 'And now, we are inside this database and we are creating the table.', 'start': 2920.834, 'duration': 2.602}, {'end': 2925.717, 'text': 'So, we do create table recharge.', 'start': 2923.956, 'duration': 1.761}, {'end': 2931.661, 'text': 'Then, we give the column name and the data type, city and the data type, name and the data type.', 'start': 2925.817, 'duration': 5.844}, {'end': 2938.395, 'text': "once you're done with that, you have to specify what will be the schema of the data you read.", 'start': 2933.214, 'duration': 5.181}, {'end': 2946.917, 'text': 'now, when it comes to schema all we know that in hive you can only process the structure data.', 'start': 2938.395, 'duration': 8.522}, {'end': 2950.798, 'text': 'so once you have the structure data, you have to provide what kind of the schema it is going to use.', 'start': 2946.917, 'duration': 3.881}], 'summary': 'Creating a table named recharge with columns city and name in the database. specifying schema for structured data in hive.', 'duration': 29.964, 'max_score': 2920.834, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE2920834.jpg'}, {'end': 3092.785, 'src': 'heatmap', 'start': 3002.424, 'weight': 0.765, 'content': [{'end': 3007.947, 'text': 'Sorry, inside this database called telecom, you have a table got created called HR.', 'start': 3002.424, 'duration': 5.523}, {'end': 3012.189, 'text': 'Are you guys with me till now? It should be simple to understand this.', 'start': 3009.067, 'duration': 3.122}, {'end': 3014.01, 'text': "I don't think there is any complexity.", 'start': 3012.389, 'duration': 1.621}, {'end': 3021.394, 'text': 'Now, once you are able to create a database, once you are able to create table, the next thing is like how to load the data inside this.', 'start': 3015.511, 'duration': 5.883}, {'end': 3022.334, 'text': "So, that's what next.", 'start': 3021.494, 'duration': 0.84}, {'end': 3025.816, 'text': "So, for that, let's first go ahead and create a data set here.", 'start': 3022.894, 'duration': 2.922}, {'end': 3033.524, 'text': "So let's do gedit and go ahead and insert the data in this table.", 'start': 3027.64, 'duration': 5.884}, {'end': 3034.965, 'text': 'So recharge is the table name.', 'start': 3033.564, 'duration': 1.401}, {'end': 3036.385, 'text': 'Okay Gedit.', 'start': 3035.065, 'duration': 1.32}, {'end': 3038.367, 'text': 'There is no similar to recharge again.', 'start': 3036.746, 'duration': 1.621}, {'end': 3040.868, 'text': 'You can have anything recharge dot input.', 'start': 3038.407, 'duration': 2.461}, {'end': 3041.769, 'text': 'I will just take it.', 'start': 3040.988, 'duration': 0.781}, {'end': 3048.904, 'text': 'Okay, now, inside this, I have to add certain you know the detail, the fields which will.', 'start': 3042.069, 'duration': 6.835}, {'end': 3051.544, 'text': 'it will read it and insert that.', 'start': 3048.904, 'duration': 2.64}, {'end': 3053.125, 'text': 'so let me have that type.', 'start': 3051.544, 'duration': 1.581}, {'end': 3062.228, 'text': 'you can have 999 as one number, then it can be something India, and then it can be something name ABC.', 'start': 3053.125, 'duration': 9.103}, {'end': 3069.294, 'text': 'you can have 69, another number.', 'start': 3062.228, 'duration': 7.066}, {'end': 3074.457, 'text': 'then you can have USA, and then you can have name with M and B.', 'start': 3069.294, 'duration': 5.163}, {'end': 3081.443, 'text': 'okay, then you can have nine, zero, eight, seven, something you know.', 'start': 3074.457, 'duration': 6.986}, {'end': 3087.227, 'text': 'okay, save it, close it.', 'start': 3081.443, 'duration': 5.784}, {'end': 3092.785, 'text': 'So now, what we did?', 'start': 3091.604, 'duration': 1.181}], 'summary': "Creating a table 'hr' inside the 'telecom' database and inserting data into it via a data set named 'recharge' with specific details.", 'duration': 90.361, 'max_score': 3002.424, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE3002424.jpg'}, {'end': 3466.123, 'src': 'embed', 'start': 3432.621, 'weight': 2, 'content': [{'end': 3436.527, 'text': "that's altogether separate from the what you have in this particular database.", 'start': 3432.621, 'duration': 3.906}, {'end': 3454.115, 'text': "So now, once we are good with this one right, let's move on to one more concept, right?", 'start': 3450.293, 'duration': 3.822}, {'end': 3460.579, 'text': 'So this is something we call it managed tables, where you create a table, that ownership of the data.', 'start': 3454.516, 'duration': 6.063}, {'end': 3463.621, 'text': 'okay, it goes with the hive and it keeps that.', 'start': 3460.579, 'duration': 3.042}, {'end': 3466.123, 'text': 'Then there is a concept or something called the external table.', 'start': 3463.641, 'duration': 2.482}], 'summary': 'Introduction to managed and external tables in database management.', 'duration': 33.502, 'max_score': 3432.621, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE3432621.jpg'}, {'end': 4121.642, 'src': 'heatmap', 'start': 4016.306, 'weight': 0.779, 'content': [{'end': 4052.908, 'text': 'now if i move this dataset hadoop fs hyphen put no i have to create a dataset first mkdir So we have moved this data set.', 'start': 4016.306, 'duration': 36.602}, {'end': 4055.509, 'text': 'So now this is the location we will give to our hive table.', 'start': 4052.928, 'duration': 2.581}, {'end': 4060.111, 'text': 'So this is the location from where it has to read the data.', 'start': 4057.529, 'duration': 2.582}, {'end': 4068.574, 'text': 'So now what you can think of, guys, is this location will be like coming from your like.', 'start': 4063.272, 'duration': 5.302}, {'end': 4071.275, 'text': 'you have the upstream job that the MapReduce job,', 'start': 4068.574, 'duration': 2.701}, {'end': 4077.878, 'text': 'that basically are writing the data and then rest of the consumption of the data you can do with the help of the hive table.', 'start': 4071.275, 'duration': 6.603}, {'end': 4088.891, 'text': 'So now, if you do select a star from recharge external, you will be able to see the data.', 'start': 4078.518, 'duration': 10.373}, {'end': 4091.032, 'text': "so in this case we didn't run the load statement.", 'start': 4088.891, 'duration': 2.141}, {'end': 4095.355, 'text': 'all we did is like whatever the data is available inside HDFS.', 'start': 4091.032, 'duration': 4.323}, {'end': 4100.178, 'text': 'we took that data and we just linked it here with the keyword called location,', 'start': 4095.355, 'duration': 4.823}, {'end': 4103.58, 'text': 'and the data is available inside this table called recharge underscore external.', 'start': 4100.178, 'duration': 3.402}, {'end': 4110.118, 'text': "Now let's see what happened in the backend when I created this table.", 'start': 4105.658, 'duration': 4.46}, {'end': 4115.359, 'text': 'Did it read the data directly from the location we gave, or it has created a copy of that one?', 'start': 4110.158, 'duration': 5.201}, {'end': 4116.441, 'text': 'So that one.', 'start': 4115.96, 'duration': 0.481}, {'end': 4117.22, 'text': 'how you can do that?', 'start': 4116.441, 'duration': 0.779}, {'end': 4121.642, 'text': 'You can come back here and same command.', 'start': 4117.481, 'duration': 4.161}], 'summary': 'Data moved to hdfs and linked to hive table for analysis.', 'duration': 105.336, 'max_score': 4016.306, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE4016306.jpg'}, {'end': 4618.734, 'src': 'embed', 'start': 4585.394, 'weight': 4, 'content': [{'end': 4591.259, 'text': "So it's automatically just starting that and it's telling you that it's having three number of the records right?", 'start': 4585.394, 'duration': 5.865}, {'end': 4592.576, 'text': "So that's the way.", 'start': 4591.755, 'duration': 0.821}, {'end': 4599.32, 'text': 'Hive internally, start the MapReduce job and do the processing on behalf of you.', 'start': 4592.576, 'duration': 6.744}, {'end': 4606.305, 'text': 'So always you have to see the location where the data is.', 'start': 4603.743, 'duration': 2.562}, {'end': 4607.886, 'text': 'It is from the location.', 'start': 4606.365, 'duration': 1.521}, {'end': 4618.734, 'text': 'You will basically look for the file which is getting delivered there and based on that you will know the schema of the external table.', 'start': 4608.467, 'duration': 10.267}], 'summary': 'Hive automatically processes 3 records using mapreduce, requires data location for schema.', 'duration': 33.34, 'max_score': 4585.394, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE4585394.jpg'}, {'end': 4702.127, 'src': 'embed', 'start': 4670.904, 'weight': 5, 'content': [{'end': 4676.086, 'text': 'But when you scale that particular database with the large volume of the data,', 'start': 4670.904, 'duration': 5.182}, {'end': 4682.128, 'text': 'you will see that although it has some overhead in terms of starting this MapReduce and getting the count,', 'start': 4676.086, 'duration': 6.042}, {'end': 4686.41, 'text': 'but then it will be able to scale much better than the normal DBMS kind of stuff.', 'start': 4682.128, 'duration': 4.282}, {'end': 4697.944, 'text': 'So we saw what is hive, where to use hive.', 'start': 4695.202, 'duration': 2.742}, {'end': 4702.127, 'text': 'So basically for the structured data, why to go with hive when pig is there.', 'start': 4698.824, 'duration': 3.303}], 'summary': 'Hive scales better than traditional dbms for large data volumes, making it suitable for structured data.', 'duration': 31.223, 'max_score': 4670.904, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE4670904.jpg'}], 'start': 2672.994, 'title': 'Creating and managing hive tables', 'summary': "Covers creating and managing database tables in a telecom environment, creating table 'recharge' in the 'telecom' database in hive, loading data, accessing table details, understanding internal, external, and managed tables, and discussing hive's architecture and concepts.", 'chapters': [{'end': 2902.183, 'start': 2672.994, 'title': 'Creating database and table in telecom', 'summary': "Explains the process of creating a database and table in a telecom environment, including commands like 'create database' and 'create table', and the importance of providing a schema for the table.", 'duration': 229.189, 'highlights': ["Creating a database using the command 'create database' and specifying the database name like 'telecom' simplifies the process. The 'create database' command simplifies the process of database creation by providing a straightforward method to specify the database name, such as 'telecom'.", "Describing the database using the 'describe database' command with 'extended' option provides comprehensive details about the database and its creation. Using the 'describe database' command with the 'extended' option offers a way to obtain comprehensive details about the database, including its creation and added details.", 'Creating a table involves specifying the structure and schema, and then loading data into it. The process of creating a table involves specifying its structure and schema and subsequently loading data into it.']}, {'end': 3127.018, 'start': 2903.004, 'title': 'Creating table and loading data in hive', 'summary': "Explains the process of creating a table 'recharge' in the 'telecom' database in hive, including defining the schema and loading data into the table using a data set, specifying the delimiter as comma.", 'duration': 224.014, 'highlights': ["The process of creating a table 'recharge' inside the 'telecom' database in Hive is detailed, including defining the schema and specifying the delimiter as comma. It covers the step-by-step process of creating a table 'recharge' inside the 'telecom' database in Hive, including defining the schema and specifying the delimiter as comma for storing the data.", "The method of creating a data set and inserting data into the 'recharge' table is explained, with an example of data insertion and the delimiter specified as comma. It explains the creation of a data set and inserting data into the 'recharge' table, including an example of data insertion with fields separated by comma as the specified delimiter."]}, {'end': 3659.872, 'start': 3127.359, 'title': 'Hive table data loading and table creation', 'summary': "Covers the process of loading data into a hive table, including the use of 'load data' command, and accessing table details using 'describe' and 'describe extended' commands. it also explains the concept of managed tables and external tables.", 'duration': 532.513, 'highlights': ["You can load data into a table using the 'load data local inpath' command, specifying the data set and the path, and then verify the data using 'select star' command.", "The 'describe' command provides information about the schema of a table, while 'describe extended' command gives details about the table name, database name, owner, and creation time.", 'When creating a database inside Hive, it creates a folder structure with user hive warehouse and the specified database name, and when a table is created inside the database, it creates a table inside that particular database, maintaining the namespace in Hive.', 'The concept of managed tables involves creating a table where the ownership of the data goes with Hive, while the external table creates a reference for the data set without directly loading the data into the table.', 'When creating an external table, a reference is created for the particular data set, and the data gets exposed to the external table, without directly loading the data into the table.']}, {'end': 4227.302, 'start': 3660.372, 'title': 'Creating and managing external tables in hive', 'summary': 'Explains the process of creating and managing external tables in hive, including the steps to load data, create an external table, and the differences between internal, external, and managed tables, emphasizing the concepts of data ownership and location references.', 'duration': 566.93, 'highlights': ['Creating an external table in Hive allows for referencing data in a location without owning the data, providing metadata but not data ownership. The external table in Hive enables referencing data in a location without owning the data, allowing metadata creation without data ownership.', 'Distinguishing between external, internal, and managed tables in Hive, emphasizing the differences in data ownership and location references. The chapter explains the differences between external, internal, and managed tables in Hive, focusing on data ownership and location references.', 'Loading data into Hive can create a copy of the file, and creating an external table does not lead to the creation of a folder in the backend. Loading data into Hive can create a copy of the file, while creating an external table does not result in the creation of a folder in the backend.', 'The external table concept in Hive allows for reading and referencing data from a specified location, enabling processing without owning the data. The external table concept in Hive enables reading and referencing data from a specified location, allowing processing without owning the data.', "Dropping an external table in Hive does not affect the data's presence in the Hadoop file system, emphasizing the lack of data ownership by the external table. Dropping an external table in Hive does not affect the data's presence in the Hadoop file system, highlighting the lack of data ownership by the external table."]}, {'end': 4646.516, 'start': 4227.302, 'title': 'Managing hive tables and understanding ownership', 'summary': 'Explains the process of dropping tables in hive, understanding the difference between manage and external tables, and how hive internally processes data using mapreduce, with a focus on understanding table creation and data loading.', 'duration': 419.214, 'highlights': ["Hive internally processes data using MapReduce and automatically starts processing jobs when executing queries, as demonstrated by a simple 'select count' query which resulted in three records being processed using MapReduce. Hive internally processes data using MapReduce and automatically starts processing jobs when executing queries.", 'Understanding the difference between manage and external tables, where dropping a manage table also deletes the associated data, while dropping an external table does not affect the stored data. Explaining the difference between manage and external tables, where dropping a manage table also deletes the associated data.', "Dropping tables in Hive involves using the 'drop table' command, and using 'if not exist, drop table' when unsure about the action, to prevent errors. Dropping tables in Hive involves using the 'drop table' command and using 'if not exist, drop table' when unsure about the action."]}, {'end': 5032.758, 'start': 4646.516, 'title': 'Understanding hive and its concepts', 'summary': 'Discusses the architecture, limitations, and usage of hive, emphasizing the advantages of scaling with large volumes of data and the concepts of partitioning and bucketing.', 'duration': 386.242, 'highlights': ["Hive's ability to scale better than normal DBMS with large data volumes Hive can scale much better than normal DBMS when dealing with large volumes of data, despite having some overhead in starting MapReduce and getting the count.", 'Limitations of Hive in processing unstructured data and providing selective update and delete Hive has limitations in processing unstructured data and providing selective update and delete operations, which are important points to consider when working with Hive.', 'Explanation of partitioning and bucketing concepts in Hive The chapter explains the important concepts of partitioning and bucketing in Hive, emphasizing their relevance in interviews and data processing scenarios.']}], 'duration': 2359.764, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE2672994.jpg', 'highlights': ['Creating a table involves specifying the structure and schema, and then loading data into it.', "The process of creating a table 'recharge' inside the 'telecom' database in Hive is detailed, including defining the schema and specifying the delimiter as comma.", 'The concept of managed tables involves creating a table where the ownership of the data goes with Hive, while the external table creates a reference for the data set without directly loading the data into the table.', 'Distinguishing between external, internal, and managed tables in Hive, emphasizing the differences in data ownership and location references.', 'Hive internally processes data using MapReduce and automatically starts processing jobs when executing queries.', "Hive's ability to scale better than normal DBMS with large data volumes Hive can scale much better than normal DBMS when dealing with large volumes of data, despite having some overhead in starting MapReduce and getting the count."]}, {'end': 5631.22, 'segs': [{'end': 5085.764, 'src': 'embed', 'start': 5056.414, 'weight': 0, 'content': [{'end': 5059.975, 'text': "Yeah, correct, right? So that's the way Hive is going to read that data.", 'start': 5056.414, 'duration': 3.561}, {'end': 5063.796, 'text': 'Now assume that these files are like huge files, like you know.', 'start': 5060.035, 'duration': 3.761}, {'end': 5066.718, 'text': "suppose, let's say, scan all yeah?", 'start': 5063.796, 'duration': 2.922}, {'end': 5071.86, 'text': 'So, like, suppose, if these files are like 10 GB, or you know, 12 GB files, huge files are there right?', 'start': 5067.118, 'duration': 4.742}, {'end': 5075.041, 'text': 'Going through all the files, you will get the output.', 'start': 5072.58, 'duration': 2.461}, {'end': 5078.182, 'text': "There's no doubt about that, but then it's going to take the time.", 'start': 5075.061, 'duration': 3.121}, {'end': 5085.764, 'text': 'now, instead of that, you know, storing the file in this particular way,', 'start': 5079.562, 'duration': 6.202}], 'summary': 'Hive reads huge 10-12 gb files, but it takes time.', 'duration': 29.35, 'max_score': 5056.414, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5056414.jpg'}, {'end': 5204.794, 'src': 'embed', 'start': 5171.574, 'weight': 1, 'content': [{'end': 5176.377, 'text': 'Now, if we know this one there, that this is the way how the table is going to carry,', 'start': 5171.574, 'duration': 4.803}, {'end': 5187.944, 'text': 'what we can do is we can use this particular way of storing the data.', 'start': 5176.377, 'duration': 11.567}, {'end': 5194.688, 'text': 'So we can create a folder for each month, and inside that folder, I only store the January data, February data, or the March data.', 'start': 5187.984, 'duration': 6.704}, {'end': 5195.928, 'text': 'So this is called partitioning.', 'start': 5194.728, 'duration': 1.2}, {'end': 5204.794, 'text': 'Now, if I go and run the same SQL, what we have run here, if you run here in this particular stuff, what it will do is it see that, okay,', 'start': 5196.569, 'duration': 8.225}], 'summary': 'Using partitioning for data storage by creating a folder for each month.', 'duration': 33.22, 'max_score': 5171.574, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5171574.jpg'}, {'end': 5378.376, 'src': 'embed', 'start': 5350.884, 'weight': 3, 'content': [{'end': 5359.026, 'text': 'But then that may not be optimal if the users are, you know, using all this kind of like, you know, other columns for filtering the data.', 'start': 5350.884, 'duration': 8.142}, {'end': 5365.008, 'text': "So that's the concept we have related to the partitioning.", 'start': 5362.847, 'duration': 2.161}, {'end': 5370.451, 'text': 'Now, since we understood about the partitioning, let me understand one more related concept here.', 'start': 5365.928, 'duration': 4.523}, {'end': 5372.112, 'text': "That's basically called about the bucketing.", 'start': 5370.511, 'duration': 1.601}, {'end': 5378.376, 'text': 'So now the next question what we have is about the next concept we have is about the bucketing.', 'start': 5373.433, 'duration': 4.943}], 'summary': 'Discussion about optimizing data filtering using partitioning and bucketing.', 'duration': 27.492, 'max_score': 5350.884, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5350884.jpg'}, {'end': 5563.492, 'src': 'embed', 'start': 5535.367, 'weight': 2, 'content': [{'end': 5539.429, 'text': 'so instead of storing all the data in different table, you will tell that.', 'start': 5535.367, 'duration': 4.062}, {'end': 5542.85, 'text': "okay, I'm going to create 10 different buckets.", 'start': 5539.429, 'duration': 3.421}, {'end': 5544.751, 'text': 'so you will tell how many buckets.', 'start': 5542.85, 'duration': 1.901}, {'end': 5548.973, 'text': 'so bucketing is just like you know how many of the two different storage you are going to create.', 'start': 5544.751, 'duration': 4.222}, {'end': 5550.153, 'text': 'so you will identify that.', 'start': 5548.973, 'duration': 1.18}, {'end': 5554.935, 'text': "okay, I'm going to cluster by my table by 10 buckets.", 'start': 5550.153, 'duration': 4.782}, {'end': 5559.257, 'text': 'once you identify that, what you will do is you will create for each transaction ID.', 'start': 5554.935, 'duration': 4.322}, {'end': 5563.492, 'text': 'you will create something called hash value.', 'start': 5561.55, 'duration': 1.942}], 'summary': 'Data will be stored in 10 buckets with hash values for each transaction id.', 'duration': 28.125, 'max_score': 5535.367, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5535367.jpg'}], 'start': 5032.758, 'title': 'Hive data processing', 'summary': 'Discusses how hive processes data by scanning large files (e.g. 10-12 gb) to provide output and emphasizes the concept of data partitioning and bucketing for optimizing sql queries and faster query execution.', 'chapters': [{'end': 5075.041, 'start': 5032.758, 'title': 'Hive data processing', 'summary': 'Discusses how hive processes data by scanning through all files, which could be large (e.g. 10-12 gb), to provide the output.', 'duration': 42.283, 'highlights': ['Hive processes data by scanning through all files, which could be large (e.g. 10-12 GB), to provide the output.', 'The intelligence of Hive involves scanning all the data from a table to generate the output.', 'The chapter discusses the process of Hive scanning through all files to provide the output.']}, {'end': 5631.22, 'start': 5075.061, 'title': 'Data partitioning and bucketing', 'summary': 'Discusses the concept of data partitioning and bucketing, emphasizing how arranging data in folders for each month can optimize sql queries by reducing input data size, and how bucketing can be used to cluster data into multiple buckets for faster query execution.', 'duration': 556.159, 'highlights': ['Arranging data in folders for each month can optimize SQL queries by reducing input data size By creating a folder for each month and storing only the data for that specific month, SQL queries can run faster by considering reduced input data size.', 'Bucketing can be used to cluster data into multiple buckets for faster query execution Using bucketing, data can be clustered into multiple buckets based on hash values, which helps in faster query execution by distributing the data into smaller, manageable units.', 'Understanding how users will consume the table is crucial for effective data partitioning and bucketing It is important to understand how users will consume the table and what kind of SQL queries they will run to effectively implement data partitioning and bucketing for optimizing query performance.']}], 'duration': 598.462, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5032758.jpg', 'highlights': ['Hive processes data by scanning through all files, which could be large (e.g. 10-12 GB), to provide the output.', 'Arranging data in folders for each month can optimize SQL queries by reducing input data size.', 'Bucketing can be used to cluster data into multiple buckets for faster query execution.', 'Understanding how users will consume the table is crucial for effective data partitioning and bucketing.']}, {'end': 7160.906, 'segs': [{'end': 5666.257, 'src': 'embed', 'start': 5631.707, 'weight': 0, 'content': [{'end': 5633.789, 'text': 'So like that way you are going to arrange the data.', 'start': 5631.707, 'duration': 2.082}, {'end': 5637.251, 'text': "Now suppose if the fourth value which you tell, it's again emit the value as 2.", 'start': 5633.849, 'duration': 3.402}, {'end': 5639.052, 'text': "So it's going to come into the same table.", 'start': 5637.251, 'duration': 1.801}, {'end': 5642.255, 'text': 'So that way now you arrange the data.', 'start': 5640.333, 'duration': 1.922}, {'end': 5645.617, 'text': "You don't arrange all the input record into separate folder.", 'start': 5642.275, 'duration': 3.342}, {'end': 5653.503, 'text': 'But instead of one folder, whoever have the same hash value, it will go inside the same folder structure.', 'start': 5645.637, 'duration': 7.866}, {'end': 5655.864, 'text': 'So that way again you rearrange the data.', 'start': 5654.003, 'duration': 1.861}, {'end': 5657.886, 'text': 'Now, whenever you will give the transaction ID.', 'start': 5655.924, 'duration': 1.962}, {'end': 5666.257, 'text': 'Again, you are basically going to arrange your data in this particular format where user hives database name, table name,', 'start': 5659.511, 'duration': 6.746}], 'summary': 'Data is rearranged based on hash values to optimize storage and retrieval.', 'duration': 34.55, 'max_score': 5631.707, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5631707.jpg'}, {'end': 5810.698, 'src': 'embed', 'start': 5786.88, 'weight': 1, 'content': [{'end': 5796.371, 'text': 'So partition means dividing the table into different parts based on the value of the column on which you are partitioning.', 'start': 5786.88, 'duration': 9.491}, {'end': 5797.992, 'text': 'Example is like the date column.', 'start': 5796.551, 'duration': 1.441}, {'end': 5799.334, 'text': "That's like one of the widely used.", 'start': 5798.093, 'duration': 1.241}, {'end': 5804.572, 'text': 'So this is going to make it faster to do the query on the slice of the data,', 'start': 5801.089, 'duration': 3.483}, {'end': 5810.698, 'text': 'because you are basically reducing the data set which you are going to process to get the output.', 'start': 5804.572, 'duration': 6.126}], 'summary': 'Partitioning the table by date column can make queries faster by reducing the data set.', 'duration': 23.818, 'max_score': 5786.88, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5786880.jpg'}, {'end': 6024.104, 'src': 'heatmap', 'start': 5915.35, 'weight': 2, 'content': [{'end': 5923.973, 'text': 'you have these buckets that will get created and sometimes it becomes like you know if you have like user id and if you want to carry on the user id.', 'start': 5915.35, 'duration': 8.623}, {'end': 5931.195, 'text': 'so in that case, like you can always create the buckets and your different user id will get grouped together as one bucket,', 'start': 5923.973, 'duration': 7.222}, {'end': 5936.177, 'text': 'as in the next in another bucket, and you can just carry that subset of the data and get the output.', 'start': 5931.195, 'duration': 4.982}, {'end': 5945.534, 'text': "If you have the joints on the table and you're including those columns in those cases also, it helps to basically get your data faster.", 'start': 5936.983, 'duration': 8.551}, {'end': 5955.901, 'text': 'okay. so now, uh, we have to see about, like um, some of the examples here about more about the hands-on sessions.', 'start': 5948.797, 'duration': 7.104}, {'end': 5957.142, 'text': 'i think rest of the slides.', 'start': 5955.901, 'duration': 1.241}, {'end': 5960.143, 'text': 'what you have here is is more towards the hands-on slide.', 'start': 5957.142, 'duration': 3.001}, {'end': 5963.585, 'text': "i don't think we have, okay, all this answer.", 'start': 5960.143, 'duration': 3.442}, {'end': 5968.628, 'text': "so, uh, what i'm going to do, guys, here, is i'm going to show you all this stuff right, joining the two tables.", 'start': 5963.585, 'duration': 5.043}, {'end': 5969.328, 'text': 'i will just quickly.', 'start': 5968.628, 'duration': 0.7}, {'end': 5973.671, 'text': 'okay, so i will show all these hands-on exercise.', 'start': 5969.328, 'duration': 4.343}, {'end': 5977.154, 'text': "so we'll create a database, uh, which is the database retail.", 'start': 5973.671, 'duration': 3.483}, {'end': 5980.776, 'text': 'so we will create one database here and inside this we will see all these.', 'start': 5977.154, 'duration': 3.622}, {'end': 5992.765, 'text': 'you know the transaction data getting stored, partition and bucketing.', 'start': 5980.776, 'duration': 11.989}, {'end': 5994.747, 'text': 'let me switch over to the virtual machine which we have.', 'start': 5992.765, 'duration': 1.982}, {'end': 6004.452, 'text': 'If I want to create this database, myretail, tell me what should be the command I should use.', 'start': 5998.624, 'duration': 5.828}, {'end': 6015.528, 'text': "If I want to create this database, myretail, what should be the command? So now I'm going to use US Commander, create database.", 'start': 6004.813, 'duration': 10.715}, {'end': 6019.904, 'text': 'yeah. so always, guys, you have to terminate it by a column.', 'start': 6017.103, 'duration': 2.801}, {'end': 6022.044, 'text': "right. if you don't do that, i will not accept that.", 'start': 6019.904, 'duration': 2.14}, {'end': 6024.104, 'text': 'like if i do this, create database, my detail.', 'start': 6022.044, 'duration': 2.06}], 'summary': 'Creating buckets and using table joints for faster data retrieval in hands-on database exercise.', 'duration': 25.48, 'max_score': 5915.35, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5915350.jpg'}, {'end': 6603.538, 'src': 'embed', 'start': 6529.716, 'weight': 3, 'content': [{'end': 6532.258, 'text': 'If you want more extended and formatted, you can do anything like that.', 'start': 6529.716, 'duration': 2.542}, {'end': 6538.387, 'text': 'Now, once the table is created, you can do all sort of operations,', 'start': 6534.645, 'duration': 3.742}, {'end': 6545.39, 'text': 'generally what you do in the SQL you can do in this particular table and the thing which you will observe here,', 'start': 6538.387, 'duration': 7.003}, {'end': 6552.973, 'text': 'compared to what we saw or like the same data set, if, we you know, process using the MapReduce, it becomes, you know, challenging.', 'start': 6545.39, 'duration': 7.583}, {'end': 6557.435, 'text': 'now, with the help of the hive or the SQL interface which we use, it will be much easier.', 'start': 6552.973, 'duration': 4.462}, {'end': 6566.799, 'text': 'like, suppose, if you want to count how many number of records you just select, count Start from tangents and decodes.', 'start': 6557.435, 'duration': 9.364}, {'end': 6571.934, 'text': 'It will launch a MapReduce job and it will show you the output.', 'start': 6569.407, 'duration': 2.527}, {'end': 6599.977, 'text': 'It has around 50, 000 records.', 'start': 6597.136, 'duration': 2.841}, {'end': 6602.358, 'text': 'So it has around 50, 000 entries in this table.', 'start': 6600.677, 'duration': 1.681}, {'end': 6603.538, 'text': 'So it shows you that.', 'start': 6602.378, 'duration': 1.16}], 'summary': 'Using sql interface in hive makes processing and querying data easier, as seen with 50,000 records.', 'duration': 73.822, 'max_score': 6529.716, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE6529716.jpg'}, {'end': 6862.932, 'src': 'heatmap', 'start': 6771.823, 'weight': 1, 'content': [{'end': 6779.469, 'text': 'Instead of all the thing, you want to do something like limit, something like seven records, you want to see that.', 'start': 6771.823, 'duration': 7.646}, {'end': 6788.095, 'text': 'The reason why I am showing you the limited stuff because always you will have a huge table.', 'start': 6783.853, 'duration': 4.242}, {'end': 6791.677, 'text': "So always don't forget to use the limit otherwise you have to terminate.", 'start': 6788.636, 'duration': 3.041}, {'end': 6797.02, 'text': 'So you can do limit 1, 2 and those so that you can see the sample record and based on that you can do the further processing.', 'start': 6791.697, 'duration': 5.323}, {'end': 6809.936, 'text': 'So you can see that now, instead of dumping everything, it will just dump you the limited, like 6,', 'start': 6805.144, 'duration': 4.792}, {'end': 6813.128, 'text': '7 records and that you can take it to basically the further processing.', 'start': 6809.936, 'duration': 3.192}, {'end': 6821.066, 'text': 'okay, okay, so now let me tell you, like, how to create a partition table.', 'start': 6815.62, 'duration': 5.446}, {'end': 6826.731, 'text': 'so this was like one of the table which we created and this is the table which we created now.', 'start': 6821.066, 'duration': 5.665}, {'end': 6832.377, 'text': 'suppose, if you want to create a partition table here, so what we can do is like we can, based on the category.', 'start': 6826.731, 'duration': 5.646}, {'end': 6837.562, 'text': 'just now we saw right, like suppose, if you want to create a partition on this particular column,', 'start': 6832.377, 'duration': 5.185}, {'end': 6848.122, 'text': 'so in that case what you will do is so you have this category like suppose, if you pick this column that you are going to partition your data.', 'start': 6839.676, 'duration': 8.446}, {'end': 6849.683, 'text': 'so when will you create a table?', 'start': 6848.122, 'duration': 1.561}, {'end': 6851.024, 'text': 'you tell that okay, this is.', 'start': 6849.683, 'duration': 1.341}, {'end': 6853.425, 'text': 'so. there are two way you can create the partitioning guys.', 'start': 6851.024, 'duration': 2.401}, {'end': 6855.246, 'text': 'one is like dynamic partitioning.', 'start': 6853.425, 'duration': 1.821}, {'end': 6857.968, 'text': 'one is static partitioning.', 'start': 6855.246, 'duration': 2.722}, {'end': 6862.932, 'text': 'in case of dynamic partitioning, when you load the data, the table data will get partitioned.', 'start': 6857.968, 'duration': 4.964}], 'summary': 'Use limit to view sample records and consider partitioning for data organization.', 'duration': 91.109, 'max_score': 6771.823, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE6771823.jpg'}], 'start': 5631.707, 'title': 'Data organization and processing in hadoop and hive', 'summary': 'Covers hash-based data organization, data partitioning, bucketing, database creation, table loading, and mapreduce usage in hadoop and hive, emphasizing improved efficiency, query response, and data handling of large datasets, including an example of processing 50,000 records using a mapreduce job.', 'chapters': [{'end': 5676.526, 'start': 5631.707, 'title': 'Hash-based data organization', 'summary': 'Explains a method of organizing data using hash values to group similar data into the same folder structure, resulting in improved data arrangement and efficiency.', 'duration': 44.819, 'highlights': ['Organizing data using hash values to group similar data into the same folder structure improves data arrangement and efficiency.', 'Using the hash value to arrange data allows for the grouping of data with the same hash value into the same folder structure, leading to efficient data organization.', 'Arranging data based on hash values results in improved efficiency and organization by grouping similar data into the same folder structure.']}, {'end': 5945.534, 'start': 5676.526, 'title': 'Data partitioning and bucketing', 'summary': 'Discusses the concepts of data partitioning and bucketing in hadoop and hive, where partitioning divides the table into different parts based on a column value, increasing query response, and bucketing further divides data into buckets for data sampling or quick retrieval, all while maintaining the three-way replication of data in hadoop.', 'duration': 269.008, 'highlights': ['Data partitioning divides the table into different parts based on the value of the partition key, such as date, leading to faster query response and efficient data processing in Hive. Partitioning increases query response in Hive by reducing the dataset to be processed and stored, with each unique value of the partition key defining a partition of the table, resulting in faster query processing.', 'Bucketing divides data into buckets based on a column value, allowing for data sampling and quick retrieval, and it supports the maintenance of three-way replication of data in Hadoop. Bucketing further divides data into buckets based on a specific column, enabling data sampling and efficient retrieval, while maintaining the three-way replication of data in Hadoop.', 'The three-way replication of data is maintained in Hadoop, where the data divided into a partition column is replicated internally by the name node into three different machines. The three-way replication of data is upheld in Hadoop, as data divided into a partition column is internally replicated by the name node across three different machines, ensuring fault tolerance and data availability.']}, {'end': 6171.507, 'start': 5948.797, 'title': 'Hands-on database creation and table loading', 'summary': "Covers the creation of a database 'myretail' and loading of a transaction dataset into a table, emphasizing the use of sql commands and data structure explanation.", 'duration': 222.71, 'highlights': ["The instructor demonstrates the creation of a database 'myretail' and emphasizes the use of proper SQL command syntax, ensuring the command termination with a semicolon for successful execution.", 'The dataset used for demonstration is the transaction data set, comprising columns such as transaction ID, date, customer number, transaction amount, category, product, city, and state, providing a comprehensive overview of the data structure and content.', "The chapter focuses on the hands-on session of creating a database 'myretail' and loading a transaction dataset into a table, showcasing practical implementation and utilization of SQL commands and data understanding for analysis."]}, {'end': 6625.532, 'start': 6175.552, 'title': 'Data set table creation and loading', 'summary': 'Explains the process of creating a table in hive, loading data into the table, and performing sql operations, highlighting the ease of use and the ability to handle large datasets, with an example of processing 50,000 records using a mapreduce job.', 'duration': 449.98, 'highlights': ['Processing 50,000 records using a MapReduce job The SQL interface in Hive allows for the processing of large datasets, as demonstrated by the ease of counting 50,000 records from a table.', "Creating a table in Hive and loading data The process of creating a table in Hive and loading data into it involves defining the table structure, specifying the data type, and loading data using the 'load data' command.", 'Ability to perform SQL operations on the created table Once the table is created, various SQL operations can be performed on it, similar to traditional SQL, providing ease of use and flexibility for data processing.']}, {'end': 7160.906, 'start': 6625.532, 'title': 'Understanding mapreduce and partitioning tables', 'summary': 'Explains how to use mapreduce jobs to process sql queries and demonstrates the process of creating a partition table with dynamic partitioning and bucketing in hive.', 'duration': 535.374, 'highlights': ['The chapter explains how to use MapReduce jobs to process SQL queries. Demonstrates how a MapReduce job is launched to process SQL queries, showcasing the practical application of MapReduce in data processing.', 'The process of creating a partition table with dynamic partitioning and bucketing in Hive is demonstrated. Detailed explanation of creating a partition table with dynamic partitioning and bucketing in Hive, including setting parameters and loading data.', "Demonstrates the use of dynamic partitioning and the process of setting parameters for dynamic partitioning in Hive. Explains the concept of dynamic partitioning and setting parameters like 'hive.execute.dynamic.partition.mode' and 'hive.enforce.bucketing' for dynamic partitioning in Hive."]}], 'duration': 1529.199, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE5631707.jpg', 'highlights': ['Using hash values to group similar data into the same folder structure improves data arrangement and efficiency.', 'Data partitioning divides the table into different parts based on the value of the partition key, such as date, leading to faster query response and efficient data processing in Hive.', "The instructor demonstrates the creation of a database 'myretail' and emphasizes the use of proper SQL command syntax, ensuring the command termination with a semicolon for successful execution.", 'Processing 50,000 records using a MapReduce job The SQL interface in Hive allows for the processing of large datasets, as demonstrated by the ease of counting 50,000 records from a table.', 'The chapter explains how to use MapReduce jobs to process SQL queries. Demonstrates how a MapReduce job is launched to process SQL queries, showcasing the practical application of MapReduce in data processing.']}, {'end': 8545.648, 'segs': [{'end': 7349.212, 'src': 'heatmap', 'start': 7202.393, 'weight': 2, 'content': [{'end': 7206.676, 'text': 'So in case of the bigger table you can just avoid to have this full table join.', 'start': 7202.393, 'duration': 4.283}, {'end': 7209.779, 'text': 'In our case we are not putting a stick so you can run any kind of SQL here.', 'start': 7206.697, 'duration': 3.082}, {'end': 7217.485, 'text': 'Once you do all this setting what you can do is you can insert the data into this table.', 'start': 7212.601, 'duration': 4.884}, {'end': 7224.839, 'text': 'from the table which you have created, the transaction records, the table which you have created from there.', 'start': 7219.218, 'duration': 5.621}, {'end': 7228.42, 'text': 'you can now insert the data inside this table because, like now,', 'start': 7224.839, 'duration': 3.581}, {'end': 7233.182, 'text': 'you are reading the data from one table and inserting the data in transaction record by category table.', 'start': 7228.42, 'duration': 4.762}, {'end': 7246.145, 'text': 'so, by using the simple sql stuff, you will be able to insert the data inside this table.', 'start': 7233.182, 'duration': 12.963}, {'end': 7259.379, 'text': 'okay, So what you do here now is you now insert the data in this table.', 'start': 7246.145, 'duration': 13.234}, {'end': 7266.121, 'text': 'So you tell, like read the data from the transaction record table and then insert into transaction by category,', 'start': 7259.919, 'duration': 6.202}, {'end': 7268.962, 'text': 'where we have done the partitioning based on the category.', 'start': 7266.121, 'duration': 2.841}, {'end': 7272.623, 'text': 'And we are inserting the data in these columns.', 'start': 7270.282, 'duration': 2.341}, {'end': 7274.583, 'text': 'Hit enter.', 'start': 7274.003, 'duration': 0.58}, {'end': 7278.464, 'text': 'It will start dumping the data inside this table.', 'start': 7275.783, 'duration': 2.681}, {'end': 7290.613, 'text': 'Translational matrix table is already a partition table,', 'start': 7286.786, 'duration': 3.827}, {'end': 7298.987, 'text': 'so it will keep rearranging the data in such a way that your data will be stored inside different buckets as well as the partition.', 'start': 7290.613, 'duration': 8.374}, {'end': 7336.925, 'text': 'cool. so a lot of stuff happened here.', 'start': 7334.962, 'duration': 1.963}, {'end': 7339.269, 'text': "so let's see what we are interested in.", 'start': 7336.925, 'duration': 2.344}, {'end': 7344.889, 'text': "let's see what happened over and right.", 'start': 7343.288, 'duration': 1.601}, {'end': 7349.212, 'text': 'so now you see here, earlier we had this transaction by category table.', 'start': 7344.889, 'duration': 4.323}], 'summary': 'Optimized sql queries allow for efficient data insertion and storage into a partitioned table.', 'duration': 58.599, 'max_score': 7202.393, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE7202393.jpg'}, {'end': 7544.73, 'src': 'embed', 'start': 7510.122, 'weight': 3, 'content': [{'end': 7514.024, 'text': 'So let me show you how to, how to high perform the join.', 'start': 7510.122, 'duration': 3.902}, {'end': 7517.165, 'text': 'So let me show you the join.', 'start': 7515.744, 'duration': 1.421}, {'end': 7519.945, 'text': 'So I have to get the data set for that.', 'start': 7517.365, 'duration': 2.58}, {'end': 7524.707, 'text': 'Let me see here.', 'start': 7520.285, 'duration': 4.422}, {'end': 7535.12, 'text': 'Okay, so suppose if we have a data set something like this.', 'start': 7527.493, 'duration': 7.627}, {'end': 7539.585, 'text': "So I'm just going to demo you how the join works in Hype.", 'start': 7535.641, 'duration': 3.944}, {'end': 7544.73, 'text': "So suppose if you have a data set that's basically the employee data set and that's what you have here.", 'start': 7540.326, 'duration': 4.404}], 'summary': 'Demonstration of high-performance join in hype using employee dataset.', 'duration': 34.608, 'max_score': 7510.122, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE7510122.jpg'}, {'end': 8207.141, 'src': 'embed', 'start': 8182.733, 'weight': 1, 'content': [{'end': 8191.736, 'text': 'you can do all those group by partitioning, clustering, join like select, aggregate function, sum and appraise all those kind of stuff.', 'start': 8182.733, 'duration': 9.003}, {'end': 8196.898, 'text': 'you can run on the hive and it will be able to give you the output.', 'start': 8191.736, 'duration': 5.162}, {'end': 8207.141, 'text': "okay, So now let's go back to the slide and see how we are progressing on the slide, and then we'll come back.", 'start': 8196.898, 'duration': 10.243}], 'summary': 'Hive allows group by, partitioning, clustering, joins, select, and aggregate functions for data processing.', 'duration': 24.408, 'max_score': 8182.733, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE8182733.jpg'}, {'end': 8271.734, 'src': 'embed', 'start': 8240.141, 'weight': 4, 'content': [{'end': 8245.545, 'text': 'and based on that it will start exposing this particular data which is in this location inside this table.', 'start': 8240.141, 'duration': 5.404}, {'end': 8256.884, 'text': 'so the good thing with the external table, guys, that hive will not delete the table or the htfs file when the table are dropped.', 'start': 8250.339, 'duration': 6.545}, {'end': 8261.066, 'text': 'so this will leave the table untouched and only the metadata about stable will be deleted.', 'start': 8256.884, 'duration': 4.182}, {'end': 8262.628, 'text': "so that's the big thing.", 'start': 8261.066, 'duration': 1.562}, {'end': 8271.734, 'text': "with the external table, you save basically the file or you don't give the ownership of the file to hive, so only you can read the data.", 'start': 8262.628, 'duration': 9.106}], 'summary': 'External tables in hive do not delete data files when dropped, preserving ownership and data access.', 'duration': 31.593, 'max_score': 8240.141, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE8240141.jpg'}, {'end': 8469.103, 'src': 'embed', 'start': 8438.771, 'weight': 0, 'content': [{'end': 8442.236, 'text': 'right, you should have a fair understanding about the whole Hadoop environment,', 'start': 8438.771, 'duration': 3.465}, {'end': 8449.305, 'text': 'where You have seen the multiple tools and you have seen the uses of each of these tools, and based on this,', 'start': 8442.236, 'duration': 7.069}, {'end': 8452.769, 'text': 'you should be able to now take a call like how to design the whole.', 'start': 8449.305, 'duration': 3.464}, {'end': 8460.639, 'text': 'you know which particular tool you should select to do this kind of data processing stuff inside Hive.', 'start': 8452.769, 'duration': 7.87}, {'end': 8469.103, 'text': 'Are you guys getting all this confidence, guys, now that you know your knowledge based on, about the Hadoop different tools just now,', 'start': 8461.92, 'duration': 7.183}], 'summary': 'Fair understanding of hadoop tools for data processing inside hive', 'duration': 30.332, 'max_score': 8438.771, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE8438771.jpg'}], 'start': 7161.026, 'title': 'Sql partition modes and hadoop data management', 'summary': "Delves into sql partition modes, highlighting the impact of non-strict mode on sql performance and the process of data insertion, storage, table creation, and join operations in hadoop, showcasing high-performance joins and comprehensive understanding of hive's capabilities and sql-like interface.", 'chapters': [{'end': 7201.032, 'start': 7161.026, 'title': 'Partition modes in sql', 'summary': 'Explains the concept of partition modes in sql, highlighting the impact of non-strict mode on sql performance and the option to restrict sql queries based on partitioned columns, emphasizing the importance of using the correct column for querying to optimize sql performance.', 'duration': 40.006, 'highlights': ['The non-strict partition mode in SQL impacts SQL performance by allowing queries to be run using columns not designated for partitioning, leading to decreased performance.', 'The option to restrict SQL queries based on partitioned columns can be used to optimize SQL performance by ensuring that queries are run using the correct column for carrying the data.']}, {'end': 7544.73, 'start': 7202.393, 'title': 'Data insertion and storage in hadoop', 'summary': 'Explains the process of inserting data into a partitioned table in hadoop, showcasing the creation of folders and buckets for data storage and retrieval, as well as demonstrating high-performance joins.', 'duration': 342.337, 'highlights': ['Creating folders and buckets for data storage When inserting data into a partitioned table, folders and buckets are created for data storage, with each category having a dedicated folder and multiple buckets, facilitating efficient data organization and retrieval.', 'Demonstration of high-performance joins in Hadoop The transcript demonstrates the process of high-performing joins in Hadoop, showcasing an example of joining employee datasets to illustrate how joins work in the Hadoop environment.', 'Explanation of partitioning and bucketing in Hadoop The chapter explains the concepts of partitioning and bucketing in Hadoop, emphasizing how data is organized and stored within different buckets and partitions for efficient data management and retrieval.']}, {'end': 7890.133, 'start': 7546.377, 'title': 'Creating and loading tables in hadoop', 'summary': 'Discusses creating and loading tables in hadoop, including creating tables for employee details and email addresses, loading data into the tables, and granting access permissions to users.', 'duration': 343.756, 'highlights': ['Creating a table called imply with columns for name, salary, and city, and loading data into it The chapter explains the process of creating a table called imply with columns for name, salary, and city, and loading data into it.', 'Creating a table for email addresses and loading data into it The chapter details the creation of a table for email addresses and the process of loading data into it.', 'Granting access permissions to users for accessing the created tables The chapter discusses granting access permissions to users for accessing the created tables, including the use of Hadoop fs-chmode command to change permissions.']}, {'end': 8207.141, 'start': 7892.695, 'title': 'Hive join operations', 'summary': "Illustrates hive equijoin, inner join, and left outer join operations, demonstrating the capability to perform various join types and sql-like functionalities in hive, thus providing a comprehensive understanding of hive's capabilities and its sql-like interface.", 'duration': 314.446, 'highlights': ['Hive supports Equijoin, inner join, and left outer join operations, allowing for various types of join operations. Hive supports Equijoin, inner join, and left outer join operations, providing flexibility in performing different types of join operations.', 'The demonstration includes selecting specific columns from joined tables and explaining the inner join and left outer join functionalities. The demonstration includes selecting specific columns from joined tables and explaining the inner join and left outer join functionalities.', 'Hive enables SQL-like functionalities such as group by, partitioning, clustering, and aggregate functions, demonstrating its comprehensive capabilities. Hive enables SQL-like functionalities such as group by, partitioning, clustering, and aggregate functions, demonstrating its comprehensive capabilities.']}, {'end': 8545.648, 'start': 8207.402, 'title': 'Hive sql and table operations', 'summary': 'Covers the use of external tables, including the benefits of maintaining the table data in hdfs, various sql operations like select, insert, and join, and the ability to use hive for data encryption and analysis, providing a comprehensive understanding of hive and hadoop ecosystem tools.', 'duration': 338.246, 'highlights': ['The concept of external tables in Hive allows for data to be maintained in HDFS, preventing deletion of table data when the table is dropped, leaving only the metadata about the table to be deleted. External tables in Hive prevent deletion of table data when the table is dropped, leaving only the metadata about the table to be deleted, providing data maintenance in HDFS.', 'Various SQL operations such as select count, star, aggregation, grouping, and insert override table are available in Hive for data manipulation and analysis. Hive supports various SQL operations like select count, star, aggregation, grouping, and insert override table for data manipulation and analysis.', 'Hive can be used for data encryption and analysis, enabling the storage and analysis of data in an encrypted format. Hive allows for data encryption and analysis, enabling the storage and analysis of data in an encrypted format.', 'The chapter provides a comprehensive understanding of Hive and Hadoop ecosystem tools, empowering users to make informed decisions on tool selection for data processing tasks. The chapter provides a comprehensive understanding of Hive and Hadoop ecosystem tools, empowering users to make informed decisions on tool selection for data processing tasks.']}], 'duration': 1384.622, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/tKNGB5IZPFE/pics/tKNGB5IZPFE7161026.jpg', 'highlights': ['The chapter provides a comprehensive understanding of Hive and Hadoop ecosystem tools, empowering users to make informed decisions on tool selection for data processing tasks.', 'Hive enables SQL-like functionalities such as group by, partitioning, clustering, and aggregate functions, demonstrating its comprehensive capabilities.', 'Creating folders and buckets for data storage When inserting data into a partitioned table, folders and buckets are created for data storage, with each category having a dedicated folder and multiple buckets, facilitating efficient data organization and retrieval.', 'Demonstration of high-performance joins in Hadoop The transcript demonstrates the process of high-performing joins in Hadoop, showcasing an example of joining employee datasets to illustrate how joins work in the Hadoop environment.', 'The concept of external tables in Hive allows for data to be maintained in HDFS, preventing deletion of table data when the table is dropped, leaving only the metadata about the table to be deleted.']}], 'highlights': ['Hive and Pig support user-defined functions (UDFs) and integrate MapReduce scripts.', 'Hive and Pig support CDLIS and DCLIS.', 'Hive and Pig enable direct data access and storage within the Hadoop file system.', 'Hive supports storing data in text, sequence, and RC file formats, providing flexibility for backend data storage and selective column storage in the RC file format.', 'Hive enables data processing functions such as filtering, joining, and partitioning data, allowing for efficient data creation and consumption.', 'Hive offers a web interface for dataset and table visualization, unlike Pig.', 'Hive supports JDBC and ODBC connectivity, whereas Pig only writes data to an output file, necessitating separate data movement and integration.', 'Hive supports various data types such as Boolean, int, small int, big int, string, float, and decimal.', 'Hive also supports complex data types including struct, map, and array, allowing for flexible storage of data.', 'Hive does not support selective update and insert on a dataset, leading to limitations in performing updates and transactions.', "The process of creating a table 'recharge' inside the 'telecom' database in Hive is detailed, including defining the schema and specifying the delimiter as comma.", 'The concept of managed tables involves creating a table where the ownership of the data goes with Hive, while the external table creates a reference for the data set without directly loading the data into the table.', 'Distinguishing between external, internal, and managed tables in Hive, emphasizing the differences in data ownership and location references.', 'Hive internally processes data using MapReduce and automatically starts processing jobs when executing queries.', "Hive's ability to scale better than normal DBMS with large data volumes Hive can scale much better than normal DBMS when dealing with large volumes of data, despite having some overhead in starting MapReduce and getting the count.", 'Arranging data in folders for each month can optimize SQL queries by reducing input data size.', 'Bucketing can be used to cluster data into multiple buckets for faster query execution.', 'Understanding how users will consume the table is crucial for effective data partitioning and bucketing.', 'Using hash values to group similar data into the same folder structure improves data arrangement and efficiency.', 'Data partitioning divides the table into different parts based on the value of the partition key, such as date, leading to faster query response and efficient data processing in Hive.', "The instructor demonstrates the creation of a database 'myretail' and emphasizes the use of proper SQL command syntax, ensuring the command termination with a semicolon for successful execution.", 'Processing 50,000 records using a MapReduce job The SQL interface in Hive allows for the processing of large datasets, as demonstrated by the ease of counting 50,000 records from a table.', 'The chapter provides a comprehensive understanding of Hive and Hadoop ecosystem tools, empowering users to make informed decisions on tool selection for data processing tasks.', 'Hive enables SQL-like functionalities such as group by, partitioning, clustering, and aggregate functions, demonstrating its comprehensive capabilities.', 'Creating folders and buckets for data storage When inserting data into a partitioned table, folders and buckets are created for data storage, with each category having a dedicated folder and multiple buckets, facilitating efficient data organization and retrieval.', 'Demonstration of high-performance joins in Hadoop The transcript demonstrates the process of high-performing joins in Hadoop, showcasing an example of joining employee datasets to illustrate how joins work in the Hadoop environment.', 'The concept of external tables in Hive allows for data to be maintained in HDFS, preventing deletion of table data when the table is dropped, leaving only the metadata about the table to be deleted.']}