title
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Hadoop | Simplilearn
description
π₯Post Graduate Program In Data Engineering: https://www.simplilearn.com/pgp-data-engineering-certification-training-course?utm_campaign=Hadoop-rr17cbPGWGA&utm_medium=DescriptionFirstFold&utm_source=youtube
π₯Big Data Engineer Masters Program (Discount Code - YTBE15): https://www.simplilearn.com/big-data-engineer-masters-program?utm_campaign=Hadoop-rr17cbPGWGA&utm_medium=DescriptionFirstFold&utm_source=youtube
This Simplilearn video on Hive tutorial speaks about Hive architecture and all about Apache Hive. You will learn what is Hive In Hadoop, data flow in Hive, Hive vs RDBMS, Hive features, etc. Finally, you will see a hands-on demo session on HiveQL commands. So, let's get started with this Hive Tutorial For Beginners!
Below topics are explained in this Hive tutorial:
1. History of Hive 00:00
2. What is Hive? 01:57
3. Architecture of Hive 02:23
4. Data flow in Hive 05:33
5. Hive data modeling 07:07
6. Hive data types 08:45
7. Different modes of Hive 11:47
8. Difference between Hive and RDBMS 13:05
9. Features of Hive 16:28
10. Demo on HiveQL 18:04
To learn more about Hadoop, subscribe to our YouTube channel: https://www.youtube.com/user/Simplilearn?sub_confirmation=1
To access slides, click here: https://www.slideshare.net/Simplilearn/hive-tutorial-hive-architecture-hive-tutorial-for-beginners-hive-in-hadoop-simplilearn/Simplilearn/hive-tutorial-hive-architecture-hive-tutorial-for-beginners-hive-in-hadoop-simplilearn.
Watch more videos on Hadoop training: https://www.youtube.com/watch?v=CKLzDWMsQGM&list=PLEiEAq2VkUUJqp1k-g5W1mo37urJQOdCZ
#HiveTutorial #HadoopHive #Hadoop #HBaseArchitecture #HadoopTutorialForBeginners #LearnHadoop #HadoopTraining #HadoopCertification #SimplilearnHadoop #Simplilearn
π₯ Enroll for FREE Big Data Hadoop Spark Course & Get your Completion Certificate: https://www.simplilearn.com/learn-hadoop-spark-basics-skillup?utm_campaign=Hadoop-rr17cbPGWGA&utm_medium=Description&utm_source=youtube
β‘οΈ About Post Graduate Program In Data Engineering
This Data Engineering course is ideal for professionals, covering critical topics like the Hadoop framework, Data Processing using Spark, Data Pipelines with Kafka, Big Data on AWS, and Azure cloud infrastructures. This program is delivered via live sessions, industry projects, IBM hackathons, and Ask Me Anything sessions.
β
Key Features
Post Graduate Program Certificate and Alumni Association membership
- Exclusive Master Classes and Ask me Anything sessions by IBM
- 8X higher live interaction in live Data Engineering online classes by industry experts
- Capstone from 3 domains and 14+ Projects with Industry datasets from YouTube, Glassdoor, Facebook etc.
- Simplilearn's JobAssist helps you get noticed by top hiring companies
β
Skills Covered
- Real-Time Data Processing
- Data Pipelining
- Big Data Analytics
- Data Visualization
- Provisioning data storage services
- Apache Hadoop
- Ingesting Streaming and Batch Data
- Transforming Data
- Implementing Security Requirements
- Data Protection
- Encryption Techniques
- Data Governance and Compliance Controls
π Learn More At: https://www.simplilearn.com/pgp-data-engineering-certification-training-course?utm_campaign=Hadoop-rr17cbPGWGA&utm_medium=Description&utm_source=youtube
π₯π₯ Interested in Attending Live Classes? Call Us: IN - 18002127688 / US - +18445327688
detail
{'title': 'Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Hadoop | Simplilearn', 'heatmap': [{'end': 190.793, 'start': 134.451, 'weight': 1}, {'end': 354.926, 'start': 245.136, 'weight': 0.724}, {'end': 436.706, 'start': 379.435, 'weight': 0.759}], 'summary': "This hive tutorial covers the history, architecture, and features of hive, including data flow, data modeling, data types, and differences from rdbms, with a focus on facebook's use of hadoop, as well as hiveql for querying and analyzing large data sets. it also delves into data modeling basics, data types, complex data types, hive advantages, hiveql features, setting up hive query and hadoop, managing data in linux, and sql, etl, and hadoop hive processes.", 'chapters': [{'end': 415.528, 'segs': [{'end': 71.023, 'src': 'embed', 'start': 28.588, 'weight': 0, 'content': [{'end': 31.369, 'text': 'And difference between Hive and RDBMS.', 'start': 28.588, 'duration': 2.781}, {'end': 39.097, 'text': "Finally, we're going to look into the features of Hive and do a quick hands-on demo on Hive in the Cloudera Hadoop file system.", 'start': 31.869, 'duration': 7.228}, {'end': 41.68, 'text': "Let's dive in with a brief history of Hive.", 'start': 39.217, 'duration': 2.463}, {'end': 44.583, 'text': 'So the history of Hive begins with Facebook.', 'start': 42.06, 'duration': 2.523}, {'end': 49.188, 'text': 'Facebook began using Hadoop as a solution to handle the growing big data.', 'start': 44.783, 'duration': 4.405}, {'end': 53.45, 'text': "And we're not talking about a data that fits on one or two or even five computers.", 'start': 49.308, 'duration': 4.142}, {'end': 60.655, 'text': "We're talking data that fits on, if you've looked at any of our other Hadoop tutorials, you'll know we're talking about very big data and data pools.", 'start': 53.911, 'duration': 6.744}, {'end': 63.517, 'text': 'And Facebook certainly has a lot of data it tracks.', 'start': 60.795, 'duration': 2.722}, {'end': 68.301, 'text': 'As we know, the Hadoop uses MapReduce for processing data.', 'start': 63.997, 'duration': 4.304}, {'end': 71.023, 'text': 'MapReduce required users to write long codes.', 'start': 68.481, 'duration': 2.542}], 'summary': 'Hive, developed by facebook for big data, uses hadoop and cloudera for demo. it handles large data pools efficiently.', 'duration': 42.435, 'max_score': 28.588, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA28588.jpg'}, {'end': 126.225, 'src': 'embed', 'start': 104.57, 'weight': 2, 'content': [{'end': 113.014, 'text': 'You have your processing, you have your analyzing, and so the solution was required a language similar to SQL, which was well known to all the users.', 'start': 104.57, 'duration': 8.444}, {'end': 117.077, 'text': 'and thus the Hive or HQL language evolved.', 'start': 113.414, 'duration': 3.663}, {'end': 126.225, 'text': 'What is Hive? Hive is a data warehouse system which is used for querying and analyzing large data sets stored in the HDFS or the Hadoop file system.', 'start': 117.437, 'duration': 8.788}], 'summary': 'Hive evolved as a sql-like language for querying and analyzing large datasets in hdfs.', 'duration': 21.655, 'max_score': 104.57, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA104570.jpg'}, {'end': 190.793, 'src': 'heatmap', 'start': 134.451, 'weight': 1, 'content': [{'end': 143.138, 'text': 'the user sends out their Hive queries and then that is converted into a MapReduce tasks and then accesses the Hadoop MapReduce system.', 'start': 134.451, 'duration': 8.687}, {'end': 145.259, 'text': "Let's take a look at the architecture of Hive.", 'start': 143.278, 'duration': 1.981}, {'end': 146.56, 'text': 'Architecture of Hive.', 'start': 145.56, 'duration': 1}, {'end': 150.203, 'text': 'We have the Hive client, so that could be the programmer,', 'start': 146.681, 'duration': 3.522}, {'end': 155.227, 'text': "or maybe it's the manager who knows enough SQL to do a basic query to look up the data they need.", 'start': 150.203, 'duration': 5.024}, {'end': 161.702, 'text': 'The Hive client supports different types of client applications in different languages for performing queries.', 'start': 155.66, 'duration': 6.042}, {'end': 165.284, 'text': 'And so we have our Thrift application and the Hive Thrift client.', 'start': 162.082, 'duration': 3.202}, {'end': 167.484, 'text': 'Thrift is a software framework.', 'start': 165.724, 'duration': 1.76}, {'end': 174.107, 'text': 'Hive server is based on Thrift, so it can serve the request from all programming languages that support Thrift.', 'start': 167.825, 'duration': 6.282}, {'end': 179.149, 'text': 'And then we have our JDBC application and the Hive JDBC driver.', 'start': 174.447, 'duration': 4.702}, {'end': 186.032, 'text': 'JDBC, Java Database Connectivity, JDBC application is connected through the JDBC driver.', 'start': 179.309, 'duration': 6.723}, {'end': 190.793, 'text': 'And then you have the ODBC application, or the Hive ODBC driver.', 'start': 186.212, 'duration': 4.581}], 'summary': 'Hive uses thrift and jdbc drivers for various client applications and supports different programming languages.', 'duration': 56.342, 'max_score': 134.451, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA134451.jpg'}, {'end': 240.914, 'src': 'embed', 'start': 216.624, 'weight': 3, 'content': [{'end': 224.106, 'text': 'so you have your hive server, basically your thrift application or your hive thrift client, or your jdbc or your hive jdbc driver,', 'start': 216.624, 'duration': 7.482}, {'end': 227.128, 'text': 'your odbc application or your hive odbc driver.', 'start': 224.106, 'duration': 3.022}, {'end': 231.269, 'text': 'they all connect into the hive server and you have your hive web interface.', 'start': 227.128, 'duration': 4.141}, {'end': 233.07, 'text': 'you also have your cli.', 'start': 231.269, 'duration': 1.801}, {'end': 238.333, 'text': 'Now the Hive web interface is a GUI is provided to execute Hive queries.', 'start': 233.51, 'duration': 4.823}, {'end': 240.914, 'text': "And we'll actually be using that later on today.", 'start': 238.773, 'duration': 2.141}], 'summary': 'Hive server connects to various clients and provides web interface for executing queries.', 'duration': 24.29, 'max_score': 216.624, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA216624.jpg'}, {'end': 354.926, 'src': 'heatmap', 'start': 245.136, 'weight': 0.724, 'content': [{'end': 247.818, 'text': 'Commands are executed directly in CLI.', 'start': 245.136, 'duration': 2.682}, {'end': 250.799, 'text': 'And then the CLI is a direct terminal window.', 'start': 248.178, 'duration': 2.621}, {'end': 252.9, 'text': "And I'll also show you that too.", 'start': 251.3, 'duration': 1.6}, {'end': 255.382, 'text': 'So you can see how those two different interfaces work.', 'start': 252.92, 'duration': 2.462}, {'end': 258.123, 'text': 'These then push the code into the Hive driver.', 'start': 255.742, 'duration': 2.381}, {'end': 261.345, 'text': 'Hive driver is responsible for all the queries submitted.', 'start': 258.363, 'duration': 2.982}, {'end': 263.066, 'text': 'So everything goes through that driver.', 'start': 261.584, 'duration': 1.482}, {'end': 265.107, 'text': "Let's take a closer look at the HiveDriver.", 'start': 263.266, 'duration': 1.841}, {'end': 268.127, 'text': 'The HiveDriver now performs three steps internally.', 'start': 265.247, 'duration': 2.88}, {'end': 269.688, 'text': 'One is a compiler.', 'start': 268.547, 'duration': 1.141}, {'end': 273.509, 'text': 'HiveDriver passes query to compiler where it is checked and analyzed.', 'start': 270.028, 'duration': 3.481}, {'end': 281.211, 'text': 'Then the optimizer kicks in and the optimized logical plan in the form of a graph of MapReduce and HDFS tasks is obtained.', 'start': 273.829, 'duration': 7.382}, {'end': 285.812, 'text': 'And then finally in the executor, in the final step, the tasks are executed.', 'start': 281.511, 'duration': 4.301}, {'end': 289.095, 'text': 'When we look at the architecture, we also have to note the Metastore.', 'start': 286.152, 'duration': 2.943}, {'end': 292.097, 'text': 'Metastore is a repository for Hive metadata.', 'start': 289.295, 'duration': 2.802}, {'end': 294.039, 'text': 'It stores metadata for Hive tables.', 'start': 292.117, 'duration': 1.922}, {'end': 297.262, 'text': 'And you can think of this as your schema and where is it located.', 'start': 294.359, 'duration': 2.903}, {'end': 299.644, 'text': "And it's stored on the Apache Derby DB.", 'start': 297.502, 'duration': 2.142}, {'end': 304.628, 'text': 'Processing and resource management is all handled by the MapReduce v1.', 'start': 299.944, 'duration': 4.684}, {'end': 308.011, 'text': "You'll see MapReduce v2, the YARN, and the TEZ.", 'start': 304.768, 'duration': 3.243}, {'end': 312.755, 'text': "These are all different ways of managing these resources, depending on what version of Hadoop you're in.", 'start': 308.391, 'duration': 4.364}, {'end': 315.537, 'text': 'Hive uses MapReduce framework to process queries.', 'start': 312.995, 'duration': 2.542}, {'end': 319.099, 'text': 'And then we have our distributed storage, which is the HDFS.', 'start': 315.957, 'duration': 3.142}, {'end': 326.244, 'text': "And if you looked at our Hadoop tutorials, you'll know that these are on commodity machines and are linearly scalable.", 'start': 319.539, 'duration': 6.705}, {'end': 327.865, 'text': "That means they're very affordable.", 'start': 326.464, 'duration': 1.401}, {'end': 333.449, 'text': "A lot of time when you're talking about big data, you're talking about a tenth of the price of storing it on enterprise computers.", 'start': 328.085, 'duration': 5.364}, {'end': 335.891, 'text': 'And then we look at the data flow in Hive.', 'start': 333.909, 'duration': 1.982}, {'end': 339.874, 'text': 'So in our data flow in Hive, we have our Hive and the Hadoop system.', 'start': 336.571, 'duration': 3.303}, {'end': 346.859, 'text': 'And underneath the user interface, or the UI, we have our driver, our compiler, our execution engine, and our metastore.', 'start': 340.134, 'duration': 6.725}, {'end': 350.402, 'text': 'That all goes into the MapReduce and the Hadoop file system.', 'start': 347.119, 'duration': 3.283}, {'end': 354.926, 'text': 'So when we execute a query, you can see it coming in here, it goes into the driver, step one.', 'start': 350.662, 'duration': 4.264}], 'summary': "Hive's architecture includes cli, hive driver, metastore, mapreduce v1, hdfs, and data flow components, with query processing and resource management handled by mapreduce framework and hdfs on commodity machines.", 'duration': 109.79, 'max_score': 245.136, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA245136.jpg'}, {'end': 285.812, 'src': 'embed', 'start': 258.363, 'weight': 4, 'content': [{'end': 261.345, 'text': 'Hive driver is responsible for all the queries submitted.', 'start': 258.363, 'duration': 2.982}, {'end': 263.066, 'text': 'So everything goes through that driver.', 'start': 261.584, 'duration': 1.482}, {'end': 265.107, 'text': "Let's take a closer look at the HiveDriver.", 'start': 263.266, 'duration': 1.841}, {'end': 268.127, 'text': 'The HiveDriver now performs three steps internally.', 'start': 265.247, 'duration': 2.88}, {'end': 269.688, 'text': 'One is a compiler.', 'start': 268.547, 'duration': 1.141}, {'end': 273.509, 'text': 'HiveDriver passes query to compiler where it is checked and analyzed.', 'start': 270.028, 'duration': 3.481}, {'end': 281.211, 'text': 'Then the optimizer kicks in and the optimized logical plan in the form of a graph of MapReduce and HDFS tasks is obtained.', 'start': 273.829, 'duration': 7.382}, {'end': 285.812, 'text': 'And then finally in the executor, in the final step, the tasks are executed.', 'start': 281.511, 'duration': 4.301}], 'summary': 'Hivedriver performs 3 internal steps: compiler, optimizer, and executor.', 'duration': 27.449, 'max_score': 258.363, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA258363.jpg'}], 'start': 3.237, 'title': 'Hive and hiveql', 'summary': "Covers the history, architecture, and features of hive, including data flow, data modeling, data types, and differences from rdbms, with a focus on facebook's use of hadoop for handling big data, and introduces hive and hiveql as a data warehouse system for querying and analyzing large data sets using hiveql similar to sql.", 'chapters': [{'end': 71.023, 'start': 3.237, 'title': 'Hive tutorial overview', 'summary': "Covers the history, architecture, data flow, data modeling, data types, different modes, differences from rdbms, and features of hive, with a focus on the history of hive starting with facebook's use of hadoop for handling big data.", 'duration': 67.786, 'highlights': ["The history of Hive begins with Facebook's use of Hadoop to handle growing big data, which required MapReduce for processing data.", "The tutorial covers the history, architecture, data flow, data modeling, data types, different modes, differences from RDBMS, and features of Hive, with a focus on the history of Hive starting with Facebook's use of Hadoop for handling big data."]}, {'end': 415.528, 'start': 71.504, 'title': 'Introduction to hive and hiveql', 'summary': 'Introduces hive and hiveql as a data warehouse system for querying and analyzing large data sets, using hiveql, similar to sql, to process queries, with an architecture encompassing the hive client, server, driver, metastore, mapreduce framework, and hdfs.', 'duration': 344.024, 'highlights': ['Hive is a data warehouse system used for querying and analyzing large data sets stored in the HDFS or the Hadoop file system, employing HiveQL, similar to SQL, to process queries. Hive is a data warehouse system used for querying and analyzing large data sets stored in the HDFS or the Hadoop file system. It employs HiveQL, similar to SQL, to process queries.', 'The Hive architecture includes the Hive client, server, driver, Metastore, MapReduce framework, and HDFS, with the Hive web interface providing a GUI for executing queries and the CLI for direct terminal window commands. The Hive architecture encompasses the Hive client, server, driver, Metastore, MapReduce framework, and HDFS. The Hive web interface provides a GUI for executing queries, and the CLI offers direct terminal window commands.', 'The Hive driver is responsible for processing queries, internally performing steps involving compiler, optimizer, and executor, while the Metastore serves as a repository for Hive metadata, storing metadata for Hive tables. The Hive driver is responsible for processing queries, internally performing steps involving compiler, optimizer, and executor. The Metastore serves as a repository for Hive metadata, storing metadata for Hive tables.']}], 'duration': 412.291, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA3237.jpg', 'highlights': ["The history of Hive begins with Facebook's use of Hadoop to handle growing big data, which required MapReduce for processing data.", "The tutorial covers the history, architecture, data flow, data modeling, data types, different modes, differences from RDBMS, and features of Hive, with a focus on the history of Hive starting with Facebook's use of Hadoop for handling big data.", 'Hive is a data warehouse system used for querying and analyzing large data sets stored in the HDFS or the Hadoop file system, employing HiveQL, similar to SQL, to process queries.', 'The Hive architecture includes the Hive client, server, driver, Metastore, MapReduce framework, and HDFS. The Hive web interface provides a GUI for executing queries, and the CLI offers direct terminal window commands.', 'The Hive driver is responsible for processing queries, internally performing steps involving compiler, optimizer, and executor. The Metastore serves as a repository for Hive metadata, storing metadata for Hive tables.']}, {'end': 633.979, 'segs': [{'end': 453.114, 'src': 'embed', 'start': 415.848, 'weight': 0, 'content': [{'end': 418.309, 'text': "So again, we're talking about the schema of your database.", 'start': 415.848, 'duration': 2.461}, {'end': 423.825, 'text': 'And once we have that, we have a bi-directional send results communication back into the driver.', 'start': 418.717, 'duration': 5.108}, {'end': 426.87, 'text': 'And then we have the fetch results, which goes back to the client.', 'start': 424.065, 'duration': 2.805}, {'end': 430.664, 'text': "So let's take a little bit look at the Hive data modeling.", 'start': 427.182, 'duration': 3.482}, {'end': 432.024, 'text': 'Hive data modeling.', 'start': 430.924, 'duration': 1.1}, {'end': 433.685, 'text': 'So you have your Hive data modeling.', 'start': 432.064, 'duration': 1.621}, {'end': 436.706, 'text': 'You have your tables, you have your partitions, and you have buckets.', 'start': 433.705, 'duration': 3.001}, {'end': 441.589, 'text': 'The tables in Hive are created the same way it is done in RDBMS.', 'start': 437.127, 'duration': 4.462}, {'end': 445.691, 'text': "So when you're looking at your traditional SQL Server or MySQL Server,", 'start': 441.869, 'duration': 3.822}, {'end': 453.114, 'text': 'where you might have enterprise equipment and a lot of people pulling and moving stuff off of there, the tables are going to look very similar.', 'start': 445.691, 'duration': 7.423}], 'summary': 'Discussion on hive data modeling, including tables, partitions, and buckets, with comparison to rdbms.', 'duration': 37.266, 'max_score': 415.848, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA415848.jpg'}, {'end': 528.876, 'src': 'embed', 'start': 493.21, 'weight': 3, 'content': [{'end': 497.155, 'text': "because that might be one of the key things that you're always looking up as far as employees are concerned.", 'start': 493.21, 'duration': 3.945}, {'end': 499.136, 'text': 'And finally, we have buckets.', 'start': 497.495, 'duration': 1.641}, {'end': 503.778, 'text': 'Data present in partitions can be further divided into buckets for efficient querying.', 'start': 499.236, 'duration': 4.542}, {'end': 505.338, 'text': "Again, there's that efficiency.", 'start': 504.078, 'duration': 1.26}, {'end': 506.879, 'text': 'At this level.', 'start': 505.879, 'duration': 1}, {'end': 516.623, 'text': "a lot of times you're working with the programmer and the admin of your Hadoop file system to maximize the efficiency of that file system.", 'start': 506.879, 'duration': 9.744}, {'end': 520.664, 'text': "So it's usually a two-person job, and we're talking about high data modeling.", 'start': 516.823, 'duration': 3.841}, {'end': 525.431, 'text': "You want to make sure that they work together and you're maximizing your resources.", 'start': 520.845, 'duration': 4.586}, {'end': 526.673, 'text': 'Hive data types.', 'start': 525.792, 'duration': 0.881}, {'end': 528.876, 'text': "So we're talking about hive data types.", 'start': 526.993, 'duration': 1.883}], 'summary': 'Efficiently manage hive data types to maximize hadoop file system efficiency.', 'duration': 35.666, 'max_score': 493.21, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA493210.jpg'}, {'end': 593.979, 'src': 'embed', 'start': 569.056, 'weight': 5, 'content': [{'end': 574.682, 'text': 'So you can store arrays, you can store maps, you can store structures, and even units in there.', 'start': 569.056, 'duration': 5.626}, {'end': 580.608, 'text': 'As we dig into Hive data types and we have the primitive data types and the complex data types.', 'start': 574.862, 'duration': 5.746}, {'end': 583.951, 'text': "so we look at primitive data types and we're looking at numeric data types.", 'start': 580.608, 'duration': 3.343}, {'end': 587.514, 'text': 'Data types like an integer float, a decimal.', 'start': 584.211, 'duration': 3.303}, {'end': 591.877, 'text': 'those are all stored as numbers in the hive data system a string data type.', 'start': 587.514, 'duration': 4.363}, {'end': 593.979, 'text': 'data types like characters and strings.', 'start': 591.877, 'duration': 2.102}], 'summary': 'Hive data types include arrays, maps, structures, and numeric data types like integers, floats, and decimals, as well as string data types.', 'duration': 24.923, 'max_score': 569.056, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA569056.jpg'}], 'start': 415.848, 'title': 'Hive data modeling and types', 'summary': 'Covers hive data modeling basics, including table structure, partitions, and buckets, facilitating the import and storage of data. it also delves into data partitioning, hive data types, and their significance in efficient querying and resource optimization, highlighting the collaboration between programmers and hadoop file system admins. furthermore, it provides an overview of primitive and complex data types in hive, along with their storage and examples.', 'chapters': [{'end': 473.99, 'start': 415.848, 'title': 'Hive data modeling basics', 'summary': 'Discusses the basics of hive data modeling, including the structure of tables, partitions, and buckets, and how it simplifies importing and storing data from traditional sql databases into the hadoop hive system.', 'duration': 58.142, 'highlights': ['The tables in Hive are created similarly to RDBMS, making it easy to import and store data from traditional SQL databases into the Hive system.', 'Hive data modeling includes tables, partitions, and buckets, providing a structured approach to organizing and storing data within the Hive system.', 'The schema of the database facilitates bi-directional communication between the driver and fetch results, enhancing data retrieval and processing efficiency.']}, {'end': 528.876, 'start': 474.09, 'title': 'Data partitioning and hive data types', 'summary': 'Explains the importance of partitioning tables for efficient querying and maximizing resources, and also discusses the concept of buckets within partitions for further efficiency, emphasizing the collaboration between programmers and hadoop file system admins. it also touches upon the significance of hive data types.', 'duration': 54.786, 'highlights': ['The importance of partitioning tables for efficient querying and maximizing resources, emphasizing the collaboration between programmers and Hadoop file system admins.', 'The concept of buckets within partitions for further efficiency, stressing the collaboration between programmers and Hadoop file system admins.', 'The significance of Hive data types.']}, {'end': 633.979, 'start': 529.096, 'title': 'Hive data types overview', 'summary': 'Explains the primitive and complex data types in hive, including numerical, string, date time, and miscellaneous data types, and highlights the storage and examples of each type.', 'duration': 104.883, 'highlights': ['Hive data types include primitive data types such as numerical, string, date time, and miscellaneous data types, as well as complex data types like arrays, maps, structures, and units.', 'Numeric data types in Hive encompass floats, integers, and short integers, all stored as numbers in the hive data system.', 'String data types in Hive store characters and strings, such as names, cities, and states, and are stored as string characters.', 'Date time data types in Hive, including timestamp and day interval, are commonly used for tracking sales and measuring time intervals in tasks.', 'Miscellaneous data types in Hive encompass boolean and binary data.']}], 'duration': 218.131, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA415848.jpg', 'highlights': ['The tables in Hive are created similarly to RDBMS, making it easy to import and store data from traditional SQL databases into the Hive system.', 'The schema of the database facilitates bi-directional communication between the driver and fetch results, enhancing data retrieval and processing efficiency.', 'Hive data modeling includes tables, partitions, and buckets, providing a structured approach to organizing and storing data within the Hive system.', 'The importance of partitioning tables for efficient querying and maximizing resources, emphasizing the collaboration between programmers and Hadoop file system admins.', 'The concept of buckets within partitions for further efficiency, stressing the collaboration between programmers and Hadoop file system admins.', 'Hive data types include primitive data types such as numerical, string, date time, and miscellaneous data types, as well as complex data types like arrays, maps, structures, and units.']}, {'end': 975.428, 'segs': [{'end': 713.839, 'src': 'embed', 'start': 655.438, 'weight': 0, 'content': [{'end': 657.6, 'text': 'This is a collection of key-value pairs.', 'start': 655.438, 'duration': 2.162}, {'end': 661.883, 'text': 'So understanding maps is so central to Hadoop.', 'start': 658.4, 'duration': 3.483}, {'end': 665.086, 'text': 'So when we store maps, you have a key, which is a set.', 'start': 662.343, 'duration': 2.743}, {'end': 667.648, 'text': 'You can only have one key per mapped value.', 'start': 665.106, 'duration': 2.542}, {'end': 674.493, 'text': 'And so in Hadoop, of course, you collect the same keys, and you can add them all up or do something with all the contents of the same key.', 'start': 667.928, 'duration': 6.565}, {'end': 679.857, 'text': 'But this is our map as a primitive type data type in our collection of key-value pairs.', 'start': 674.593, 'duration': 5.264}, {'end': 682.98, 'text': 'And then collection of complex data with comment.', 'start': 680.238, 'duration': 2.742}, {'end': 688.643, 'text': 'So we can have a structure where you have a column name, data type, comment, column comment.', 'start': 683.32, 'duration': 5.323}, {'end': 693.946, 'text': 'So you can get very complicated structures in here with your collection of data and your commented setup.', 'start': 689.003, 'duration': 4.943}, {'end': 698.049, 'text': 'And then we have units, and this is a collection of heterogeneous data types.', 'start': 694.246, 'duration': 3.803}, {'end': 702.931, 'text': 'So the syntax for this is union type, data type, data type, and so on.', 'start': 698.729, 'duration': 4.202}, {'end': 704.172, 'text': "So it's all going to be the same.", 'start': 702.971, 'duration': 1.201}, {'end': 707.294, 'text': 'A little bit different than the arrays where you can actually mix and match.', 'start': 704.192, 'duration': 3.102}, {'end': 709.055, 'text': 'Different modes of Hive.', 'start': 707.594, 'duration': 1.461}, {'end': 713.839, 'text': 'Hive operates in two modes depending on the number and size of data nodes.', 'start': 709.456, 'duration': 4.383}], 'summary': 'Understanding maps is central to hadoop for collecting and processing key-value pairs, as well as dealing with complex data structures and heterogeneous data types.', 'duration': 58.401, 'max_score': 655.438, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA655438.jpg'}, {'end': 816.704, 'src': 'embed', 'start': 771.588, 'weight': 3, 'content': [{'end': 778.473, 'text': "And this you can think of instead of it being one, two, three, or even five computers, we're usually talking with the Hadoop file system.", 'start': 771.588, 'duration': 6.885}, {'end': 785.059, 'text': '10 computers, 15, 100, where this data is spread across all those different Hadoop nodes.', 'start': 779.293, 'duration': 5.766}, {'end': 788.582, 'text': 'Difference between Hive and RDBMS.', 'start': 785.379, 'duration': 3.203}, {'end': 793.666, 'text': 'Remember, RDBMS stands for the Relational Database Management System.', 'start': 789.042, 'duration': 4.624}, {'end': 796.989, 'text': "Let's take a look at the difference between Hive and the RDBMS.", 'start': 793.746, 'duration': 3.243}, {'end': 800.793, 'text': 'With Hive, Hive enforces schema on read.', 'start': 797.23, 'duration': 3.563}, {'end': 807.497, 'text': "it's very important that whatever is coming in, that's when hives looking at it and making sure that it fits the model.", 'start': 801.193, 'duration': 6.304}, {'end': 812.921, 'text': 'the RDBMS enforces a schema when it actually writes the data into the database.', 'start': 807.497, 'duration': 5.424}, {'end': 816.704, 'text': "so it's read the data and then, once it starts to write it, that's where it's going to give you the error.", 'start': 812.921, 'duration': 3.783}], 'summary': 'Hadoop system can handle data across 10, 15, 100 nodes, unlike rdbms. hive enforces schema on read, while rdbms enforces it on write.', 'duration': 45.116, 'max_score': 771.588, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA771588.jpg'}, {'end': 886.594, 'src': 'embed', 'start': 856.375, 'weight': 4, 'content': [{'end': 858.556, 'text': 'With Hive, you can take it as big as you want.', 'start': 856.375, 'duration': 2.181}, {'end': 862.918, 'text': 'Hive is based on the notion of write once and read many times.', 'start': 858.936, 'duration': 3.982}, {'end': 864.579, 'text': 'This is so important.', 'start': 863.359, 'duration': 1.22}, {'end': 870.422, 'text': 'They call it WIRM, which is write, W, once, O, read, R, many times, M.', 'start': 865.039, 'duration': 5.383}, {'end': 873.984, 'text': "They refer to it as WIRM, and that's true of a lot of your Hadoop setup.", 'start': 870.422, 'duration': 3.562}, {'end': 879.607, 'text': "It's altered a little bit, but in general, we're looking at archiving data that you want to do data analysis on.", 'start': 874.244, 'duration': 5.363}, {'end': 886.594, 'text': "We're looking at pulling all that stuff off your RDBMS from years and years and years of business or whatever your company does,", 'start': 879.847, 'duration': 6.747}], 'summary': 'Hive allows scalable data archiving for analysis, based on wirm principle.', 'duration': 30.219, 'max_score': 856.375, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA856375.jpg'}, {'end': 988.795, 'src': 'embed', 'start': 958.976, 'weight': 6, 'content': [{'end': 961.838, 'text': 'The RDBMS is not scalable at a low cost.', 'start': 958.976, 'duration': 2.862}, {'end': 966.742, 'text': "When you first start on the lower end, you're talking about $10,000 per terabyte of data,", 'start': 962.058, 'duration': 4.684}, {'end': 971.626, 'text': 'including all the backup on the models and all the added necessities to support it.', 'start': 966.742, 'duration': 4.884}, {'end': 975.428, 'text': 'As you scale it up, you have to scale those computers and hardware up.', 'start': 971.886, 'duration': 3.542}, {'end': 984.933, 'text': 'So you might start off with a basic server and then you upgrade to a Sun computer to run it and you spend tens of thousands of dollars for that hardware upgrade.', 'start': 976.069, 'duration': 8.864}, {'end': 988.795, 'text': 'With Hive, you just put another computer into your Hadoop file system.', 'start': 985.193, 'duration': 3.602}], 'summary': 'Rdbms not scalable, costs $10,000/tb, hive scalable with lower cost hardware.', 'duration': 29.819, 'max_score': 958.976, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA958976.jpg'}], 'start': 634.199, 'title': 'Understanding data types and comparing hive vs rdbms data modes', 'summary': 'Delves into complex data types in hadoop including arrays, maps, structures, and unions, emphasizing the significance of maps for key-value pair storage. it also compares hive and rdbms data handling, focusing on local mode, map reduce mode, data sizes, scalability, and schema enforcement on read vs. write.', 'chapters': [{'end': 713.839, 'start': 634.199, 'title': 'Understanding data types in hadoop', 'summary': 'Discusses complex data types such as arrays, maps, structures, and unions in hadoop, emphasizing the importance of maps in hadoop for key-value pair storage and operations.', 'duration': 79.64, 'highlights': ['Maps are central to Hadoop for key-value pair storage and operations, allowing only one key per mapped value. Understanding the significance of maps in Hadoop for key-value pair storage and operations, with a restriction of one key per mapped value.', 'Discussion of arrays as a collection of same entities and units as a collection of heterogeneous data types. Explanation of arrays as a collection of same entities and units as a collection of heterogeneous data types with corresponding syntax.', 'Explanation of structures as a collection of complex data with a column name, data type, and comment. Detailed explanation of structures as a collection of complex data with a column name, data type, and comment, enabling the creation of intricate data structures.']}, {'end': 975.428, 'start': 714.16, 'title': 'Hive vs rdbms data modes', 'summary': 'Discusses the local mode and map reduce mode in hadoop, highlighting the differences between hive and rdbms data handling, with emphasis on data sizes and scalability, and the enforcement of schema on read vs. write.', 'duration': 261.268, 'highlights': ['Hive enforces schema on read, unlike RDBMS which enforces schema on write, and supports data sizes in petabytes compared to terabytes in RDBMS. Hive enforces schema on read, supports petabyte data sizes, while RDBMS enforces schema on write and supports terabyte data sizes.', 'Hive is based on write once and read many times (WIRM) concept, allowing archiving of historical data for analysis, while RDBMS is based on read and write many times, for continually updating and changing data. Hive is based on write once and read many times (WIRM), allowing archiving of historical data, while RDBMS is based on read and write many times for continually updating data.', 'Hive is a data warehouse supporting SQL, not a traditional database, while RDBMS is a type of database management system based on the relational model of data. Hive is a data warehouse supporting SQL, not a traditional database, while RDBMS is based on the relational model of data.', 'Hive is easily scalable at a low cost, approximately $1,000 per terabyte, while RDBMS is not scalable at a low cost, starting at $10,000 per terabyte and requiring hardware scaling. Hive is easily scalable at a low cost, approximately $1,000 per terabyte, while RDBMS is not scalable at a low cost, starting at $10,000 per terabyte and requiring hardware scaling.']}], 'duration': 341.229, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA634199.jpg', 'highlights': ['Understanding the significance of maps in Hadoop for key-value pair storage and operations, with a restriction of one key per mapped value.', 'Explanation of arrays as a collection of same entities and units as a collection of heterogeneous data types with corresponding syntax.', 'Detailed explanation of structures as a collection of complex data with a column name, data type, and comment, enabling the creation of intricate data structures.', 'Hive enforces schema on read, supports petabyte data sizes, while RDBMS enforces schema on write and supports terabyte data sizes.', 'Hive is based on write once and read many times (WIRM), allowing archiving of historical data, while RDBMS is based on read and write many times for continually updating data.', 'Hive is a data warehouse supporting SQL, not a traditional database, while RDBMS is based on the relational model of data.', 'Hive is easily scalable at a low cost, approximately $1,000 per terabyte, while RDBMS is not scalable at a low cost, starting at $10,000 per terabyte and requiring hardware scaling.']}, {'end': 1175.275, 'segs': [{'end': 1073.753, 'src': 'embed', 'start': 1015.234, 'weight': 0, 'content': [{'end': 1020.438, 'text': "They actually now can easily do that if they're not skilled in programming and script writing.", 'start': 1015.234, 'duration': 5.204}, {'end': 1024.922, 'text': 'Tables are used, which are similar to the RDBMS, hence easier to understand.', 'start': 1020.538, 'duration': 4.384}, {'end': 1030.986, 'text': "And one of the things I like about this is when I'm bringing tables in from a MySQL server or SQL server,", 'start': 1025.281, 'duration': 5.705}, {'end': 1032.967, 'text': "there's almost a direct reflection between the two.", 'start': 1030.986, 'duration': 1.981}, {'end': 1038.231, 'text': "So when you're looking at one which is a data which is continually changing, and then you're going into the archive database,", 'start': 1033.107, 'duration': 5.124}, {'end': 1041.733, 'text': "it's not this huge jump where you have to learn a whole new language.", 'start': 1038.231, 'duration': 3.502}, {'end': 1050.619, 'text': 'mirror that same schema into the HDFS, into the hive, making it very easy to go between the two, and then, using hive QL,', 'start': 1042.834, 'duration': 7.785}, {'end': 1054.201, 'text': 'multiple users can simultaneously query data.', 'start': 1050.619, 'duration': 3.582}, {'end': 1057.202, 'text': 'so again you have multiple clients in there and they send in their query.', 'start': 1054.201, 'duration': 3.001}, {'end': 1063.566, 'text': "that's also true with the RDBMS, which kind of queues them up because it's running so fast, you don't notice the lag time.", 'start': 1057.202, 'duration': 6.364}, {'end': 1065.507, 'text': 'Well, you get that also with the HQL.', 'start': 1063.586, 'duration': 1.921}, {'end': 1067.188, 'text': 'As you add more computers in.', 'start': 1065.808, 'duration': 1.38}, {'end': 1073.753, 'text': 'the query can go very quickly, depending on how many computers and how much resources each machine has to pull the information.', 'start': 1067.188, 'duration': 6.565}], 'summary': 'Tables in hdfs mirror rdbms, making it easier to switch between the two with almost direct reflection and allowing for simultaneous queries by multiple users.', 'duration': 58.519, 'max_score': 1015.234, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1015234.jpg'}, {'end': 1139.27, 'src': 'embed', 'start': 1102.787, 'weight': 6, 'content': [{'end': 1105.648, 'text': 'This is the main software or the main site for the build.', 'start': 1102.787, 'duration': 2.861}, {'end': 1110.331, 'text': "And if you go in here, you'll see that they're slowly migrating Hive into Beehive.", 'start': 1105.908, 'duration': 4.423}, {'end': 1114.994, 'text': "And so if you see Beehive versus Hive, note that Beehive is a new release that's coming out.", 'start': 1110.511, 'duration': 4.483}, {'end': 1115.915, 'text': "That's all it is.", 'start': 1115.354, 'duration': 0.561}, {'end': 1118.776, 'text': 'It reflects a lot of the same functionality of Hive.', 'start': 1116.055, 'duration': 2.721}, {'end': 1119.697, 'text': "It's the same thing.", 'start': 1118.856, 'duration': 0.841}, {'end': 1123.78, 'text': 'And then we like to pull up some kind of documentation on commands.', 'start': 1119.897, 'duration': 3.883}, {'end': 1128.563, 'text': "And for this, I'm actually going to go to Hortonworks Hive cheat sheet.", 'start': 1124.02, 'duration': 4.543}, {'end': 1139.27, 'text': "And that's because Hortonworks and Cloudera are two of the most common used builds for Hadoop and for which include Hive and all the different tools in there.", 'start': 1128.883, 'duration': 10.387}], 'summary': 'Transition from hive to beehive, a new release with similar functionality. referring to hortonworks hive cheat sheet for commands.', 'duration': 36.483, 'max_score': 1102.787, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1102787.jpg'}, {'end': 1175.275, 'src': 'embed', 'start': 1145.015, 'weight': 4, 'content': [{'end': 1148.257, 'text': "But we'll go ahead and just look at the Horton one because it's the one that comes up really good.", 'start': 1145.015, 'duration': 3.242}, {'end': 1154.622, 'text': 'And you can see when we look at the query language, it compares MySQL Server to HiveQL or HQL.', 'start': 1148.277, 'duration': 6.345}, {'end': 1156.183, 'text': 'And you can see the basic select.', 'start': 1154.702, 'duration': 1.481}, {'end': 1160.466, 'text': 'We select from columns, from table where conditions exist.', 'start': 1156.283, 'duration': 4.183}, {'end': 1162.588, 'text': 'You know, most basic command on there.', 'start': 1160.626, 'duration': 1.962}, {'end': 1166.77, 'text': 'And they have different things you can do with it, just like you do with your SQL.', 'start': 1162.788, 'duration': 3.982}, {'end': 1169.672, 'text': "And if you scroll down, you'll see data types.", 'start': 1166.911, 'duration': 2.761}, {'end': 1175.275, 'text': "So here's your integer, your flow, your binary, double string, timestamp, and all the different data types you can use.", 'start': 1169.872, 'duration': 5.403}], 'summary': 'Comparison of mysql server to hiveql, covering basic select commands and available data types.', 'duration': 30.26, 'max_score': 1145.015, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1145015.jpg'}], 'start': 976.069, 'title': 'Hive advantages and hiveql features', 'summary': 'Discusses the benefits of using hive, including easy integration with existing databases, use of sql-like language hiveql, and efficient handling of multiple users and large data queries, as well as hive data types, compatibility with hadoop, migration into beehive, and availability of documentation on hive.apache.org. it also covers the use of hortonworks hive cheat sheet, which compares mysql server to hiveql and provides information on basic sql commands and data types available in hiveql.', 'chapters': [{'end': 1073.753, 'start': 976.069, 'title': 'Benefits of hive and hiveql', 'summary': 'Discusses the advantages of using hive, such as easy integration with existing databases, use of sql-like language hiveql, and the ability to handle multiple users and large data queries efficiently.', 'duration': 97.684, 'highlights': ['HiveQL allows easier querying for non-programmers, enabling shareholders to perform basic SQL queries without programmer intervention.', 'Tables in Hive are similar to RDBMS, making it easier for users to understand and work with the data.', 'Hive provides easy integration with existing databases, allowing a direct reflection of schema between MySQL/SQL server and HDFS, reducing the need to learn a new language.', 'HiveQL allows multiple users to simultaneously query data, with the capability to handle large data queries efficiently by adding more resources.', 'The addition of more computers in Hive can significantly enhance the speed of queries, depending on the resources allocated to each machine.']}, {'end': 1123.78, 'start': 1073.993, 'title': 'Hive data types and hiveql demo', 'summary': 'Discusses hive data types, its compatibility with hadoop, the migration of hive into beehive, and the availability of documentation on hive.apache.org.', 'duration': 49.787, 'highlights': ['Hive is designed to be on the Hadoop system, supporting a variety of data types.', 'Beehive is a new release coming out, reflecting a lot of the same functionality of Hive.', "The main website for Hive is hive.apache.org and it's slowly migrating Hive into Beehive.", 'Documentation on commands for Hive can be found on hive.apache.org.']}, {'end': 1175.275, 'start': 1124.02, 'title': 'Hortonworks hive cheat sheet', 'summary': 'Discusses the use of hortonworks hive cheat sheet, which compares mysql server to hiveql, and provides information on basic sql commands and data types available in hiveql.', 'duration': 51.255, 'highlights': ['The Hortonworks Hive cheat sheet compares MySQL Server to HiveQL and provides information on basic SQL commands and data types available in HiveQL.', 'The cheat sheet includes basic select commands and different things that can be done with HiveQL, similar to SQL.', 'It also lists various data types such as integer, flow, binary, double string, timestamp, and others available in HiveQL.']}], 'duration': 199.206, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA976069.jpg', 'highlights': ['HiveQL allows multiple users to simultaneously query data, with the capability to handle large data queries efficiently by adding more resources.', 'Tables in Hive are similar to RDBMS, making it easier for users to understand and work with the data.', 'Hive provides easy integration with existing databases, allowing a direct reflection of schema between MySQL/SQL server and HDFS, reducing the need to learn a new language.', 'The addition of more computers in Hive can significantly enhance the speed of queries, depending on the resources allocated to each machine.', 'The Hortonworks Hive cheat sheet compares MySQL Server to HiveQL and provides information on basic SQL commands and data types available in HiveQL.', 'The cheat sheet includes basic select commands and different things that can be done with HiveQL, similar to SQL.', 'Beehive is a new release coming out, reflecting a lot of the same functionality of Hive.', 'Documentation on commands for Hive can be found on hive.apache.org.']}, {'end': 1702.809, 'segs': [{'end': 1255.943, 'src': 'embed', 'start': 1175.496, 'weight': 0, 'content': [{'end': 1185.862, 'text': 'Some different semantics, different keys, features, functions for running a Hive query, command line setup and of course the Hive shell setup in here.', 'start': 1175.496, 'duration': 10.366}, {'end': 1189.644, 'text': 'So you can see right here, if we loop through it, it has a lot of your basic stuff in it.', 'start': 1186.402, 'duration': 3.242}, {'end': 1193.566, 'text': "We're basically looking at SQL across a Horton database.", 'start': 1189.964, 'duration': 3.602}, {'end': 1200.929, 'text': "We're going to go ahead and run our Hadoop cluster hive demo and I'm going to go ahead and use the Cloudera quick start.", 'start': 1193.786, 'duration': 7.143}, {'end': 1202.89, 'text': 'This is in the virtual box.', 'start': 1201.21, 'duration': 1.68}, {'end': 1212.315, 'text': 'So again we have an Oracle virtual box which is open source and then we have our Cloudera quick start which is the Hadoop setup on a single node.', 'start': 1203.09, 'duration': 9.225}, {'end': 1217.079, 'text': 'Now, obviously Hadoop and Hive are designed to run across a cluster of computers.', 'start': 1212.535, 'duration': 4.544}, {'end': 1221.483, 'text': "So when we talk about a single node, it's for education, testing, that kind of thing.", 'start': 1217.099, 'duration': 4.384}, {'end': 1229.91, 'text': 'And if you have a chance, you can always go back and look at our demo we had on setting up a Hadoop system in a single cluster.', 'start': 1221.583, 'duration': 8.327}, {'end': 1236.416, 'text': "Just set a note down below in the YouTube video, and our team will get in contact with you and send you that link if you don't already have it.", 'start': 1230.05, 'duration': 6.366}, {'end': 1239.759, 'text': 'Or you can contact us at the www.simplylearn.com.', 'start': 1236.596, 'duration': 3.163}, {'end': 1240.759, 'text': 'Now in here.', 'start': 1240.259, 'duration': 0.5}, {'end': 1247.641, 'text': "it's always important to note that you do need on your computer if you're running on Windows because I'm on a Windows machine,", 'start': 1240.759, 'duration': 6.882}, {'end': 1250.661, 'text': "you're going to need probably about 12 gigabytes to actually run this.", 'start': 1247.641, 'duration': 3.02}, {'end': 1255.943, 'text': 'It used to be you could buy with a lot less, but as things have evolved, they take up more and more resources.', 'start': 1251.021, 'duration': 4.922}], 'summary': 'Demonstrates setting up hive in cloudera quickstart for hadoop on a single node virtual box, requiring 12 gigabytes on a windows machine.', 'duration': 80.447, 'max_score': 1175.496, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1175496.jpg'}, {'end': 1337.472, 'src': 'embed', 'start': 1311.004, 'weight': 4, 'content': [{'end': 1315.285, 'text': 'And a quick note, because I mentioned like the restrictions on getting set up on your own computer.', 'start': 1311.004, 'duration': 4.281}, {'end': 1319.406, 'text': "If you have a home edition computer and you're worried about setting it up on there,", 'start': 1315.425, 'duration': 3.981}, {'end': 1325.528, 'text': 'you can also go in there and spin up a one month free service on Amazon Web Service to play with this.', 'start': 1319.406, 'duration': 6.122}, {'end': 1327.069, 'text': "So there's other options.", 'start': 1325.548, 'duration': 1.521}, {'end': 1330.209, 'text': "You're not stuck with just doing it on the quick start menu.", 'start': 1327.149, 'duration': 3.06}, {'end': 1331.85, 'text': 'You can spin this up in many other ways.', 'start': 1330.37, 'duration': 1.48}, {'end': 1337.472, 'text': "Now, the first thing we want to note is that we've come in here into Cloudera, and I'm going to access this in two ways.", 'start': 1332.15, 'duration': 5.322}], 'summary': 'Options to set up on home edition computer or use one month free service on amazon web service. can access cloudera in multiple ways.', 'duration': 26.468, 'max_score': 1311.004, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1311004.jpg'}, {'end': 1386.361, 'src': 'embed', 'start': 1346.954, 'weight': 1, 'content': [{'end': 1356.117, 'text': "If I go in and use Hue as an editor into Hive or into the Hadoop setup, usually I'm doing it from an admin side.", 'start': 1346.954, 'duration': 9.163}, {'end': 1363.303, 'text': 'because it has a lot more information, a lot of visuals, less to do with, you know actually diving in there and just executing code,', 'start': 1356.337, 'duration': 6.966}, {'end': 1367.706, 'text': "and you can also write this code into files and scripts and there's other things you can, other ways.", 'start': 1363.303, 'duration': 4.403}, {'end': 1369.968, 'text': 'you can upload it into hive,', 'start': 1367.706, 'duration': 2.262}, {'end': 1377.594, 'text': "but today we're going to look at the command lines and we'll upload it into hue and then we'll go into and actually do our work in a terminal window under the hive shell.", 'start': 1369.968, 'duration': 7.626}, {'end': 1385.24, 'text': "now in the hue browser window, if you go under query and click on the pull down menu and then you go under editor and you'll see hive, there we go.", 'start': 1377.594, 'duration': 7.646}, {'end': 1386.361, 'text': "there's our hive setup.", 'start': 1385.24, 'duration': 1.121}], 'summary': 'Using hue editor to access hive for admin tasks and executing code, with visual aids and command line usage.', 'duration': 39.407, 'max_score': 1346.954, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1346954.jpg'}, {'end': 1686.314, 'src': 'embed', 'start': 1662.481, 'weight': 3, 'content': [{'end': 1671.863, 'text': "And for those of you who have studied data transformation, that's the etl, where you extract, transfer, form, and then load the data.", 'start': 1662.481, 'duration': 9.382}, {'end': 1675.224, 'text': 'So you really want to extract and transform before putting it into the hive.', 'start': 1672.083, 'duration': 3.141}, {'end': 1677.704, 'text': 'Then you load it into the hive with the transformed data.', 'start': 1675.384, 'duration': 2.32}, {'end': 1679.726, 'text': 'And of course, we also want to note the schema.', 'start': 1677.924, 'duration': 1.802}, {'end': 1682.67, 'text': 'We have an integer, string, string, integer, integer.', 'start': 1679.826, 'duration': 2.844}, {'end': 1686.314, 'text': 'So we kept it pretty simple in here as far as the way the data is set up.', 'start': 1683.13, 'duration': 3.184}], 'summary': 'Etl involves extracting, transforming, and loading data into hive. data has integer and string fields.', 'duration': 23.833, 'max_score': 1662.481, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1662481.jpg'}], 'start': 1175.496, 'title': 'Setting up hive query and hadoop', 'summary': 'Covers setting up hive query, command line and shell setup, using cloudera quick start for hadoop on a single node, and computer resource requirement of 12 gigabytes for windows. it also explains the process of setting up cloudera on a virtual machine, including using hue and the hive shell, and highlights the option to use amazon web services for setup.', 'chapters': [{'end': 1255.943, 'start': 1175.496, 'title': 'Hive query and hadoop setup', 'summary': 'Covers setting up hive query, command line and shell setup, using cloudera quick start for hadoop on a single node, and computer resource requirement of 12 gigabytes for windows.', 'duration': 80.447, 'highlights': ['Setting up Hive query, command line and shell setup The chapter discusses different semantics, features, and functions for running a Hive query, command line setup, and Hive shell setup.', 'Using Cloudera quick start for Hadoop on a single node The speaker mentions using Cloudera quick start for setting up Hadoop on a single node, highlighting its use for education and testing purposes.', "Computer resource requirement of 12 gigabytes for Windows It is mentioned that a computer running on Windows, particularly the speaker's Windows machine, would require approximately 12 gigabytes of resources to run the setup, indicating the increasing resource demand as technology evolves."]}, {'end': 1702.809, 'start': 1256.223, 'title': 'Setting up cloudera on virtual machine', 'summary': 'Explains the process of setting up cloudera on a virtual machine, including using hue and the hive shell, and highlights the option to use amazon web services for setup.', 'duration': 446.586, 'highlights': ['The process of setting up Cloudera on a virtual machine using Linux CentOS and Cloudera Quick Start is explained. The virtual machine with Linux CentOS runs Cloudera Quick Start, which uses Thunderbird browser and opens multiple tabs by default.', 'Using Hue as an editor to access Hive and Hadoop setups is discussed, emphasizing its admin-side features and visual capabilities. Hue is used as an editor to access Hive and Hadoop setups, providing admin-side features, visuals, and the ability to write code into files and scripts.', "The process of accessing Hive through the Hue browser window and the command line interface is detailed. Accessing Hive through the Hue browser window and the command line interface is explained, including the execution of simple HQL commands like 'show databases' and 'create database.'", 'The steps for loading data into the Hive table from a CSV file and understanding the data format are explained. The process of loading data into the Hive table from a CSV file is detailed, including understanding the data format, handling comma-separated values, and identifying the schema.', 'The option to use Amazon Web Services for setting up Cloudera is highlighted as an alternative for users with home edition computers. Users with home edition computers can opt to set up Cloudera on Amazon Web Services as an alternative, offering flexibility in the setup process.']}], 'duration': 527.313, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1175496.jpg', 'highlights': ['Setting up Cloudera on a virtual machine using Linux CentOS and Cloudera Quick Start is explained.', 'Using Hue as an editor to access Hive and Hadoop setups is discussed, emphasizing its admin-side features and visual capabilities.', 'The process of accessing Hive through the Hue browser window and the command line interface is detailed.', 'The steps for loading data into the Hive table from a CSV file and understanding the data format are explained.', 'The option to use Amazon Web Services for setting up Cloudera is highlighted as an alternative for users with home edition computers.', 'Computer resource requirement of 12 gigabytes for Windows is mentioned, indicating the increasing resource demand as technology evolves.', 'Using Cloudera quick start for Hadoop on a single node is mentioned, highlighting its use for education and testing purposes.', 'Covers setting up Hive query, command line and shell setup, discussing different semantics, features, and functions for running a Hive query, command line setup, and Hive shell setup.']}, {'end': 2015.313, 'segs': [{'end': 1753.806, 'src': 'embed', 'start': 1702.969, 'weight': 1, 'content': [{'end': 1706.791, 'text': 'So we can do a simple git edit employee.csv.', 'start': 1702.969, 'duration': 3.822}, {'end': 1708.311, 'text': "And you'll see it comes up here.", 'start': 1707.011, 'duration': 1.3}, {'end': 1712.272, 'text': "It's just a text document so I can easily remove these added spaces.", 'start': 1708.651, 'duration': 3.621}, {'end': 1713.113, 'text': 'There we go.', 'start': 1712.592, 'duration': 0.521}, {'end': 1714.793, 'text': 'And then we go ahead and just save it.', 'start': 1713.473, 'duration': 1.32}, {'end': 1716.794, 'text': 'And so now it has the new setup in there.', 'start': 1715.053, 'duration': 1.741}, {'end': 1717.494, 'text': "We've edited it.", 'start': 1716.834, 'duration': 0.66}, {'end': 1721.575, 'text': 'The gedit is usually one of the default that loads into Linux.', 'start': 1717.814, 'duration': 3.761}, {'end': 1723.456, 'text': 'So any text editor will do.', 'start': 1722.175, 'duration': 1.281}, {'end': 1724.756, 'text': 'Back to the hive shell.', 'start': 1723.656, 'duration': 1.1}, {'end': 1728.117, 'text': "So let's go ahead and create a table employee.", 'start': 1724.896, 'duration': 3.221}, {'end': 1732.439, 'text': 'And what I want you to note here is I did not put the semicolon on the end here.', 'start': 1728.197, 'duration': 4.242}, {'end': 1735.019, 'text': 'Semicolon tells it to execute that line.', 'start': 1732.659, 'duration': 2.36}, {'end': 1739.241, 'text': "So this is kind of nice if you're, you can actually just paste it in if you have it written on another sheet.", 'start': 1735.179, 'duration': 4.062}, {'end': 1746.443, 'text': 'And you can see right here where I have create table employee and it goes into the next line on there so I can do all of my commands at once.', 'start': 1739.421, 'duration': 7.022}, {'end': 1750.865, 'text': "Now, just so I don't have any typo errors, I went ahead and just pasted the next three lines in.", 'start': 1746.883, 'duration': 3.982}, {'end': 1753.806, 'text': 'And the next one is our schema.', 'start': 1751.085, 'duration': 2.721}], 'summary': 'Using git to edit employee.csv and creating a table employee in hive shell.', 'duration': 50.837, 'max_score': 1702.969, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1702969.jpg'}, {'end': 1822.159, 'src': 'embed', 'start': 1795.895, 'weight': 3, 'content': [{'end': 1799.68, 'text': "It's kind of goofy when you're uploading a single file that you have to put all this in here.", 'start': 1795.895, 'duration': 3.785}, {'end': 1805.808, 'text': 'But keep in mind, Hive and Hadoop is designed for writing many files into the database.', 'start': 1799.8, 'duration': 6.008}, {'end': 1808.292, 'text': "You write them all in there, and then they're saved.", 'start': 1805.988, 'duration': 2.304}, {'end': 1808.932, 'text': "It's an archive.", 'start': 1808.332, 'duration': 0.6}, {'end': 1809.914, 'text': "It's a data warehouse.", 'start': 1808.972, 'duration': 0.942}, {'end': 1811.995, 'text': "And then you're able to do all your queries on them.", 'start': 1810.154, 'duration': 1.841}, {'end': 1815.596, 'text': "So a lot of times we're not looking at just the one file coming up.", 'start': 1812.175, 'duration': 3.421}, {'end': 1817.477, 'text': "We're loading hundreds of files.", 'start': 1815.656, 'duration': 1.821}, {'end': 1820.378, 'text': 'You have your reports coming off of your main database.', 'start': 1817.737, 'duration': 2.641}, {'end': 1822.159, 'text': 'All those reports are being loaded.', 'start': 1820.618, 'duration': 1.541}], 'summary': 'Hive and hadoop are designed for writing many files into the database, enabling queries on an archive of hundreds of files for reporting.', 'duration': 26.264, 'max_score': 1795.895, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1795895.jpg'}, {'end': 2001.26, 'src': 'embed', 'start': 1972.893, 'weight': 0, 'content': [{'end': 1975.615, 'text': "We'll just look at a couple of these different select options you can do.", 'start': 1972.893, 'duration': 2.722}, {'end': 1978.478, 'text': "We're going to count everything from employee.", 'start': 1975.716, 'duration': 2.762}, {'end': 1987.446, 'text': "Now this is kind of interesting because the first one just pops up with a basic select because it doesn't need to go through the full map reduce phase.", 'start': 1978.678, 'duration': 8.768}, {'end': 1993.612, 'text': 'But when you start doing a count, it does go through the full map reduce setup in the Hive and Hadoop.', 'start': 1987.626, 'duration': 5.986}, {'end': 2001.26, 'text': "and because I'm doing this demo on a single node Cloudera virtual box on top of a Windows 10,", 'start': 1993.872, 'duration': 7.388}], 'summary': 'Demonstrating select options and count in hive and hadoop, discussing the map reduce setup.', 'duration': 28.367, 'max_score': 1972.893, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1972893.jpg'}], 'start': 1702.969, 'title': 'Editing and managing data in linux and hive', 'summary': 'Covers editing text files using git, creating tables in hive, defining table schemas, uploading files into hive, specifying formats and paths, managing multiple file operations, executing sql and hql commands, and emphasizing the efficiency of count operations in hive and hadoop.', 'chapters': [{'end': 1773.253, 'start': 1702.969, 'title': 'Editing and creating tables in linux', 'summary': 'Explains how to edit a text file using git in linux, create a table in hive shell without using a semicolon, and define a schema for the table with specific data types.', 'duration': 70.284, 'highlights': ['Using git to edit a text document in Linux, removing added spaces and saving it.', 'Creating a table in Hive shell without using a semicolon, allowing multiple commands to be executed at once.', 'Defining a schema for the table with specific data types such as integer and string.']}, {'end': 2015.313, 'start': 1773.473, 'title': 'Managing data in hive', 'summary': "Discusses the process of uploading a single file into hive, highlighting the necessity of specifying the format, path, and destination for loading data, emphasizing hive's design for managing multiple files and performing queries, and demonstrating basic sql and hql commands, with a focus on the efficiency of count operations in hive and hadoop.", 'duration': 241.84, 'highlights': ['The necessity of specifying the format, path, and destination for loading data into Hive, emphasizing its design for managing multiple files and performing queries.', 'Demonstrating basic SQL and HQL commands, including the use of select and count operations in Hive, with an emphasis on the efficiency of count operations in Hive and Hadoop.', 'Highlighting the importance of providing the full path for data loads to avoid potential issues and ensure accuracy in the loading process.']}], 'duration': 312.344, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA1702969.jpg', 'highlights': ['Emphasizing the efficiency of count operations in Hive and Hadoop.', 'Defining a schema for the table with specific data types such as integer and string.', 'Demonstrating basic SQL and HQL commands, including the use of select and count operations in Hive.', 'The necessity of specifying the format, path, and destination for loading data into Hive, emphasizing its design for managing multiple files and performing queries.', 'Creating a table in Hive shell without using a semicolon, allowing multiple commands to be executed at once.', 'Using git to edit a text document in Linux, removing added spaces and saving it.']}, {'end': 2718.23, 'segs': [{'end': 2081.469, 'src': 'embed', 'start': 2054.445, 'weight': 0, 'content': [{'end': 2057.967, 'text': 'because we have office set as a default on there.', 'start': 2054.445, 'duration': 3.522}, {'end': 2068.293, 'text': 'so from office, employee, and then the command where creates a subset, and in this case we want to know where the salary is greater than 25,000..', 'start': 2057.967, 'duration': 10.326}, {'end': 2069.554, 'text': 'There we go.', 'start': 2068.293, 'duration': 1.261}, {'end': 2071.657, 'text': 'And of course, we end with our semicolon.', 'start': 2069.815, 'duration': 1.842}, {'end': 2076.342, 'text': "And if we run this query, you can see it pops up, and there's our salaries of people, top earners.", 'start': 2071.757, 'duration': 4.585}, {'end': 2079.167, 'text': 'We have Rose in IT and Mike in HR.', 'start': 2076.704, 'duration': 2.463}, {'end': 2080.248, 'text': 'Kudos to them.', 'start': 2079.647, 'duration': 0.601}, {'end': 2081.469, 'text': "Of course, they're fictitional.", 'start': 2080.268, 'duration': 1.201}], 'summary': 'Query identifies top earners with salaries above $25,000: rose in it and mike in hr.', 'duration': 27.024, 'max_score': 2054.445, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA2054445.jpg'}, {'end': 2150.256, 'src': 'embed', 'start': 2120.931, 'weight': 5, 'content': [{'end': 2122.232, 'text': 'He wants employees plus.', 'start': 2120.931, 'duration': 1.301}, {'end': 2124.093, 'text': "plural. it's a big deal to him.", 'start': 2122.452, 'duration': 1.641}, {'end': 2126.616, 'text': "so let's go ahead and change that name for the table.", 'start': 2124.093, 'duration': 2.523}, {'end': 2129.398, 'text': "it's that easy, because it's just changing the metadata on there.", 'start': 2126.616, 'duration': 2.782}, {'end': 2137.405, 'text': "and now, if we do show tables, you'll see we now have employees, not employee, and then at this point maybe we're doing some house cleaning,", 'start': 2129.398, 'duration': 8.007}, {'end': 2138.967, 'text': 'because this is all practice.', 'start': 2137.405, 'duration': 1.562}, {'end': 2144.832, 'text': "so we're going to go ahead and drop table and we'll drop table employees, because we changed the name in there.", 'start': 2138.967, 'duration': 5.865}, {'end': 2146.834, 'text': 'so if we did employee, just give us an error.', 'start': 2144.832, 'duration': 2.002}, {'end': 2150.256, 'text': "And now, if we do show tables, you'll see all the tables are gone.", 'start': 2147.094, 'duration': 3.162}], 'summary': "Changed table name from 'employee' to 'employees' and dropped all tables.", 'duration': 29.325, 'max_score': 2120.931, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA2120931.jpg'}, {'end': 2288.442, 'src': 'embed', 'start': 2227.393, 'weight': 2, 'content': [{'end': 2238.979, 'text': "This particular format, where it's year and it has all four digits, dash month, two digits, dash day, is the standard import for the hive.", 'start': 2227.393, 'duration': 11.586}, {'end': 2242.1, 'text': "So you'll have to look up and see what the different formats are,", 'start': 2239.319, 'duration': 2.781}, {'end': 2246.342, 'text': "if you're going to do a different format in there coming in or you're not able to pre-process the data.", 'start': 2242.1, 'duration': 4.242}, {'end': 2248.883, 'text': 'But this would be a pre-processing of the data thing coming in.', 'start': 2246.422, 'duration': 2.461}, {'end': 2259.669, 'text': "If you remember correctly from our etl, which is E just in case you weren't able to hear me last time ETL, which stands for extract, transform,", 'start': 2248.963, 'duration': 10.706}, {'end': 2260.349, 'text': 'then load.', 'start': 2259.669, 'duration': 0.68}, {'end': 2263.551, 'text': "So you want to make sure you're transforming this data before it gets into here.", 'start': 2260.569, 'duration': 2.982}, {'end': 2267.213, 'text': "And so we're going to go ahead and bring both this data in here.", 'start': 2263.791, 'duration': 3.422}, {'end': 2270.115, 'text': "And really we're doing this so we can show you the basic join.", 'start': 2267.393, 'duration': 2.722}, {'end': 2274.697, 'text': 'There is, if you remember from our setup, merge, join, all kinds of different things you can do.', 'start': 2270.315, 'duration': 4.382}, {'end': 2277.659, 'text': 'But joining different data sets is so common.', 'start': 2274.837, 'duration': 2.822}, {'end': 2279.54, 'text': "So it's really important to know how to do this.", 'start': 2277.839, 'duration': 1.701}, {'end': 2281.32, 'text': 'We need to go ahead and bring in these two data sets.', 'start': 2279.58, 'duration': 1.74}, {'end': 2284.081, 'text': 'And you can see where I just created a table customer.', 'start': 2281.54, 'duration': 2.541}, {'end': 2288.442, 'text': "Here's our schema, the integer, name, age, address, salary.", 'start': 2284.341, 'duration': 4.101}], 'summary': 'Standard date format is important for data import in hive. etl process involves extract, transform, and load. joining different datasets is common and important.', 'duration': 61.049, 'max_score': 2227.393, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA2227393.jpg'}, {'end': 2478.999, 'src': 'embed', 'start': 2434.16, 'weight': 4, 'content': [{'end': 2435.321, 'text': 'We have four orders.', 'start': 2434.16, 'duration': 1.161}, {'end': 2441.584, 'text': 'And as it processes, we should get a return of four different names joined together.', 'start': 2435.801, 'duration': 5.783}, {'end': 2444.325, 'text': "And they're joined based on, of course, the orders on there.", 'start': 2441.704, 'duration': 2.621}, {'end': 2450.015, 'text': "And once we're done, we now have the order number, the person who made the order,", 'start': 2444.89, 'duration': 5.125}, {'end': 2453.558, 'text': 'their age and the amount of the order which came from the order table.', 'start': 2450.015, 'duration': 3.543}, {'end': 2455.519, 'text': 'So you have your different information.', 'start': 2453.578, 'duration': 1.941}, {'end': 2457.021, 'text': 'And you can see how the join works here.', 'start': 2455.539, 'duration': 1.482}, {'end': 2461.344, 'text': 'Very common use of tables and HQL and SQL.', 'start': 2457.241, 'duration': 4.103}, {'end': 2464.287, 'text': "And let's do one more thing with our database.", 'start': 2461.464, 'duration': 2.823}, {'end': 2467.309, 'text': "And then I'll show you a couple other Hive commands.", 'start': 2464.627, 'duration': 2.682}, {'end': 2469.191, 'text': "And let's go ahead and do a drop.", 'start': 2467.55, 'duration': 1.641}, {'end': 2471.913, 'text': "And we're going to drop database office.", 'start': 2469.511, 'duration': 2.402}, {'end': 2477.437, 'text': "And if you're looking at this and you remember from earlier, this will give me an error.", 'start': 2472.694, 'duration': 4.743}, {'end': 2478.999, 'text': "And let's just see what that looks like.", 'start': 2477.778, 'duration': 1.221}], 'summary': 'Processing four orders to join names, ages, and amounts, demonstrating table joins and hive commands.', 'duration': 44.839, 'max_score': 2434.16, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA2434160.jpg'}, {'end': 2607.511, 'src': 'embed', 'start': 2563.434, 'weight': 1, 'content': [{'end': 2573.844, 'text': 'So we took a look at the history of Hive and how it evolved from the Hadoop file system to an HQL similar to SQL layer on top of Hadoop,', 'start': 2563.434, 'duration': 10.41}, {'end': 2579.37, 'text': 'with the full meta store and all that information connected to the Hadoop file system.', 'start': 2573.844, 'duration': 5.526}, {'end': 2586.837, 'text': 'That way you can easily scale it up and still have underneath the Hadoop setup while still having the SQL query language available.', 'start': 2579.49, 'duration': 7.347}, {'end': 2593.942, 'text': 'We looked at what is Hive, and we looked at the Hive queries going in through the MapReduce into the Hadoop MapReduce system.', 'start': 2586.977, 'duration': 6.965}, {'end': 2599.326, 'text': 'We dug a little deeper to look at the architecture of Hive and all the different pieces and how they fit together,', 'start': 2594.102, 'duration': 5.224}, {'end': 2607.511, 'text': 'including the fact that it has the Hive client and you have your Thrift applications and your JDBC applications and your ODBC applications.', 'start': 2599.326, 'duration': 8.185}], 'summary': 'Hive evolved from hadoop file system to a sql layer, enabling easy scaling with mapreduce integration and various client applications.', 'duration': 44.077, 'max_score': 2563.434, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA2563434.jpg'}, {'end': 2671.21, 'src': 'embed', 'start': 2644.433, 'weight': 7, 'content': [{'end': 2650.997, 'text': 'In with all the major scripting languages usually have their own plugin for sending that information over to our hadoop file system.', 'start': 2644.433, 'duration': 6.564}, {'end': 2653.318, 'text': 'we dug in deeper into the data flow and hive.', 'start': 2650.997, 'duration': 2.321}, {'end': 2659.342, 'text': 'So your user interface and the different steps it takes to go through and get in and out of the map reduce system.', 'start': 2653.939, 'duration': 5.403}, {'end': 2664.466, 'text': 'We also took a glance at hive data types and certainly these are always evolving.', 'start': 2659.542, 'duration': 4.924}, {'end': 2671.21, 'text': "so it's good to look at the Apache website and especially with the new stuff come up underneath of Beehive, which is also hive.", 'start': 2664.466, 'duration': 6.744}], 'summary': 'Exploring data flow, hive, and scripting languages for hadoop file system.', 'duration': 26.777, 'max_score': 2644.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA2644433.jpg'}], 'start': 2015.313, 'title': 'Sql, etl, and hadoop hive', 'summary': 'Covers sql sub-queries, table management, etl process, data joining, and hadoop hive overview. it includes examples such as filtering employees with salaries over $25,000, joining data sets resulting in 4 orders from 7 customers, and basics of hadoop hive. it also emphasizes the importance of transforming data before loading and provides a detailed step-by-step guide.', 'chapters': [{'end': 2248.883, 'start': 2015.313, 'title': 'Sql sub-queries and table management', 'summary': 'Covers the use of sub-queries in sql, showcasing a query to filter employees with salaries over $25,000, and demonstrates table management including renaming and dropping tables in a data warehouse environment.', 'duration': 233.57, 'highlights': ['Demonstrating a sub-query to filter employees with salaries over $25,000, yielding top earners Rose in IT and Mike in HR. The query showcases the use of sub-queries to filter employees with salaries greater than $25,000, resulting in the identification of top earners, Rose in IT and Mike in HR.', "Explanation of renaming a table from 'employee' to 'employees', emphasizing the ease of changing metadata for table names. The process of altering table name from 'employee' to 'employees' is explained, highlighting the simplicity of changing metadata for table names.", 'Discussion on the standard date format for importing data in Hive, emphasizing the year-month-day format as the standard. The transcript provides insights into the standard date format for importing data in Hive, specifically emphasizing the year-month-day format as the standard.']}, {'end': 2455.519, 'start': 2248.963, 'title': 'Etl process and data joining', 'summary': 'Introduces the etl process and demonstrates the joining of two data sets, resulting in 4 orders from 7 customers, emphasizing the importance of transforming data before loading and providing a detailed step-by-step guide.', 'duration': 206.556, 'highlights': ['The importance of transforming data before loading is emphasized in the ETL process. None', "Demonstrates the joining of two data sets, resulting in 4 orders from 7 customers. {'customers': 7, 'orders': 4}", 'Provides a detailed step-by-step guide on loading and joining data sets. None']}, {'end': 2718.23, 'start': 2455.539, 'title': 'Hadoop hive overview', 'summary': 'Covers the basics of hadoop hive, from database operations like dropping a database, to the architecture, hive queries, commands, and the hive web interface, highlighting the key takeaways and the role of hive in the hadoop ecosystem.', 'duration': 262.691, 'highlights': ['Covered the basics of Hadoop Hive, including dropping a database and common Hive commands like select round 2.3 as round value, floor value, and ceiling.', 'Explored the history and evolution of Hive from the Hadoop file system to an HQL similar to SQL layer on top of Hadoop, emphasizing the scalability and availability of SQL query language.', 'Discussed the architecture of Hive, its components, and the Hive web interface, along with the role of Thrift, JDBC, and ODBC applications in the Hive ecosystem.', 'Examined the data flow in Hive, including the user interface and the steps involved in MapReduce, as well as the evolving data types and features of Hive, advising to stay updated with the Apache website for new updates and features.']}], 'duration': 702.917, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rr17cbPGWGA/pics/rr17cbPGWGA2015313.jpg', 'highlights': ['Demonstrating a sub-query to filter employees with salaries over $25,000, yielding top earners Rose in IT and Mike in HR.', 'Exploration of the history and evolution of Hive from the Hadoop file system to an HQL similar to SQL layer on top of Hadoop, emphasizing the scalability and availability of SQL query language.', 'Discussion on the standard date format for importing data in Hive, emphasizing the year-month-day format as the standard.', 'Provides a detailed step-by-step guide on loading and joining data sets.', "Demonstrates the joining of two data sets, resulting in 4 orders from 7 customers. {'customers': 7, 'orders': 4}", "Explained the process of altering table name from 'employee' to 'employees', highlighting the simplicity of changing metadata for table names.", 'Covered the basics of Hadoop Hive, including dropping a database and common Hive commands like select round 2.3 as round value, floor value, and ceiling.', 'Examined the data flow in Hive, including the user interface and the steps involved in MapReduce, as well as the evolving data types and features of Hive, advising to stay updated with the Apache website for new updates and features.', 'Explored the architecture of Hive, its components, and the Hive web interface, along with the role of Thrift, JDBC, and ODBC applications in the Hive ecosystem.', 'The importance of transforming data before loading is emphasized in the ETL process.']}], 'highlights': ["The history of Hive begins with Facebook's use of Hadoop to handle growing big data, which required MapReduce for processing data.", "The tutorial covers the history, architecture, data flow, data modeling, data types, different modes, differences from RDBMS, and features of Hive, with a focus on the history of Hive starting with Facebook's use of Hadoop for handling big data.", 'Hive is a data warehouse system used for querying and analyzing large data sets stored in the HDFS or the Hadoop file system, employing HiveQL, similar to SQL, to process queries.', 'The Hive architecture includes the Hive client, server, driver, Metastore, MapReduce framework, and HDFS. The Hive web interface provides a GUI for executing queries, and the CLI offers direct terminal window commands.', 'The Hive driver is responsible for processing queries, internally performing steps involving compiler, optimizer, and executor. The Metastore serves as a repository for Hive metadata, storing metadata for Hive tables.', 'The tables in Hive are created similarly to RDBMS, making it easy to import and store data from traditional SQL databases into the Hive system.', 'Hive data modeling includes tables, partitions, and buckets, providing a structured approach to organizing and storing data within the Hive system.', 'Hive data types include primitive data types such as numerical, string, date time, and miscellaneous data types, as well as complex data types like arrays, maps, structures, and units.', 'Hive enforces schema on read, supports petabyte data sizes, while RDBMS enforces schema on write and supports terabyte data sizes.', 'Hive is based on write once and read many times (WIRM), allowing archiving of historical data, while RDBMS is based on read and write many times for continually updating data.', 'Hive is a data warehouse supporting SQL, not a traditional database, while RDBMS is based on the relational model of data.', 'HiveQL allows multiple users to simultaneously query data, with the capability to handle large data queries efficiently by adding more resources.', 'Tables in Hive are similar to RDBMS, making it easier for users to understand and work with the data.', 'Hive provides easy integration with existing databases, allowing a direct reflection of schema between MySQL/SQL server and HDFS, reducing the need to learn a new language.', 'The addition of more computers in Hive can significantly enhance the speed of queries, depending on the resources allocated to each machine.', 'Setting up Cloudera on a virtual machine using Linux CentOS and Cloudera Quick Start is explained.', 'Using Hue as an editor to access Hive and Hadoop setups is discussed, emphasizing its admin-side features and visual capabilities.', 'The process of accessing Hive through the Hue browser window and the command line interface is detailed.', 'The steps for loading data into the Hive table from a CSV file and understanding the data format are explained.', 'Emphasizing the efficiency of count operations in Hive and Hadoop.', 'Demonstrating a sub-query to filter employees with salaries over $25,000, yielding top earners Rose in IT and Mike in HR.', 'Exploration of the history and evolution of Hive from the Hadoop file system to an HQL similar to SQL layer on top of Hadoop, emphasizing the scalability and availability of SQL query language.', 'Discussion on the standard date format for importing data in Hive, emphasizing the year-month-day format as the standard.', 'Provides a detailed step-by-step guide on loading and joining data sets.', "Demonstrates the joining of two data sets, resulting in 4 orders from 7 customers. {'customers': 7, 'orders': 4}", "Explained the process of altering table name from 'employee' to 'employees', highlighting the simplicity of changing metadata for table names.", 'Covered the basics of Hadoop Hive, including dropping a database and common Hive commands like select round 2.3 as round value, floor value, and ceiling.', 'Examined the data flow in Hive, including the user interface and the steps involved in MapReduce, as well as the evolving data types and features of Hive, advising to stay updated with the Apache website for new updates and features.', 'Explored the architecture of Hive, its components, and the Hive web interface, along with the role of Thrift, JDBC, and ODBC applications in the Hive ecosystem.', 'The importance of transforming data before loading is emphasized in the ETL process.']}