title
Hadoop MapReduce Example | MapReduce Programming | Hadoop Tutorial For Beginners | Edureka
description
🔥 Edureka Hadoop Training: https://www.edureka.co/big-data-hadoop-training-certification
This Hadoop tutorial on MapReduce Example ( Mapreduce Tutorial Blog Series: https://goo.gl/cZmvLS ) will help you understand how to write a MapReduce program in Java. You will also get to see multiple mapreduce examples on Analytics and Testing.
Check our complete Hadoop playlist here: https://goo.gl/ExJdZs
Below are the topics covered in this tutorial:
1) MapReduce Way
2) Classes and Packages in MapReduce
3) Explanation of a Complete MapReduce Program
4) MapReduce Examples on Analytics
5) MapReduce Example on Testing - MRUnit
Subscribe to our channel to get video updates. Hit the subscribe button above.
--------------------Edureka Big Data Training and Certifications------------------------
🔵 Edureka Hadoop Training: http://bit.ly/2YBlw29
🔵 Edureka Spark Training: http://bit.ly/2PeHvc9
🔵 Edureka Kafka Training: http://bit.ly/34e7Riy
🔵 Edureka Cassandra Training: http://bit.ly/2E9AK54
🔵 Edureka Talend Training: http://bit.ly/2YzYIjg
🔵 Edureka Hadoop Administration Training: http://bit.ly/2YE8Nf9
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
#edureka #edurekaMapreduce #MapReduceExample #MapReduceAnalytics #MapReduceTesting
How it Works?
1. This is a 5 Week Instructor led Online Course, 40 hours of assignment and 30 hours of project work
2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course.
3. At the end of the training you will have to undergo a 2-hour LIVE Practical Exam based on which we will provide you a Grade and a Verifiable Certificate!
- - - - - - - - - - - - - -
About the Course
Edureka’s Big Data and Hadoop online training is designed to help you become a top Hadoop developer. During this course, our expert Hadoop instructors will help you:
1. Master the concepts of HDFS and MapReduce framework
2. Understand Hadoop 2.x Architecture
3. Setup Hadoop Cluster and write Complex MapReduce programs
4. Learn data loading techniques using Sqoop and Flume
5. Perform data analytics using Pig, Hive and YARN
6. Implement HBase and MapReduce integration
7. Implement Advanced Usage and Indexing
8. Schedule jobs using Oozie
9. Implement best practices for Hadoop development
10. Work on a real life Project on Big Data Analytics
11. Understand Spark and its Ecosystem
12. Learn how to work in RDD in Spark
- - - - - - - - - - - - - -
Who should go for this course?
If you belong to any of the following groups, knowledge of Big Data and Hadoop is crucial for you if you want to progress in your career:
1. Analytics professionals
2. BI /ETL/DW professionals
3. Project managers
4. Testing professionals
5. Mainframe professionals
6. Software developers and architects
7. Recent graduates passionate about building successful career in Big Data
- - - - - - - - - - - - - -
Why Learn Hadoop?
Big Data! A Worldwide Problem?
According to Wikipedia, "Big data is collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications." In simpler terms, Big Data is a term given to large volumes of data that organizations store and process. However, it is becoming very difficult for companies to store, retrieve and process the ever-increasing data. If any company gets hold on managing its data well, nothing can stop it from becoming the next BIG success!
The problem lies in the use of traditional systems to store enormous data. Though these systems were a success a few years ago, with increasing amount and complexity of data, these are soon becoming obsolete. The good news is - Hadoop, which is not less than a panacea for all those companies working with BIG DATA in a variety of applications and has become an integral part for storing, handling, evaluating and retrieving hundreds of terabytes, and even petabytes of data.
For more information, Please write back to us at sales@edureka.in or call us at IND: 9606058406 / US: 18338555775 (toll-free).
Customer Review:
Michael Harkins, System Architect, Hortonworks says: “The courses are top rate. The best part is live instruction, with playback. But my favorite feature is viewing a previous class. Also, they are always there to answer questions, and prompt when you open an issue if you are having any trouble. Added bonus ~ you get lifetime access to the course you took!!! Edureka lets you go back later, when your boss says "I want this ASAP!" ~ This is the killer education app... I've take two courses, and I'm taking two more.”
detail
{'title': 'Hadoop MapReduce Example | MapReduce Programming | Hadoop Tutorial For Beginners | Edureka', 'heatmap': [{'end': 1457.76, 'start': 1414.22, 'weight': 1}], 'summary': 'Provides a comprehensive tutorial on hadoop mapreduce, covering practical examples such as word count, input/output formats, java programming, string tokenization, reduce functions, weather data analysis, and data processing for song popularity, demonstrating various mapreduce applications and usage scenarios.', 'chapters': [{'end': 154.962, 'segs': [{'end': 71.539, 'src': 'embed', 'start': 44.755, 'weight': 2, 'content': [{'end': 48.937, 'text': 'Omar says yes, Matt says yes, Himanshu says yes, Jessica says yes.', 'start': 44.755, 'duration': 4.182}, {'end': 51.638, 'text': 'Himanshu says how many practicals are there?', 'start': 49.717, 'duration': 1.921}, {'end': 55.706, 'text': "Himanshu, don't you worry, the entire session is based on practicals, okay?", 'start': 51.923, 'duration': 3.783}, {'end': 58.749, 'text': "So you'll have at least two, three practicals for today's session.", 'start': 55.926, 'duration': 2.823}, {'end': 62.352, 'text': 'Does that sound interesting, guys? Himanshu says great.', 'start': 59.489, 'duration': 2.863}, {'end': 63.973, 'text': "So let's move on.", 'start': 63.212, 'duration': 0.761}, {'end': 68.276, 'text': 'And we start off with MapReduce example on word count.', 'start': 65.254, 'duration': 3.022}, {'end': 71.539, 'text': 'We already saw this example in our previous session.', 'start': 68.917, 'duration': 2.622}], 'summary': "At least 2-3 practicals planned for today's session on mapreduce example on word count.", 'duration': 26.784, 'max_score': 44.755, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA44755.jpg'}, {'end': 134.707, 'src': 'embed', 'start': 91.472, 'weight': 0, 'content': [{'end': 94.854, 'text': 'Finally the complete output was sent back to the client, okay.', 'start': 91.472, 'duration': 3.382}, {'end': 98.416, 'text': 'Now this was the example that we looked at.', 'start': 96.655, 'duration': 1.761}, {'end': 103.92, 'text': 'The entire input was Deer Bear River, Carr Carr River, Deer Carr Bear.', 'start': 99.096, 'duration': 4.824}, {'end': 111.314, 'text': 'We had three input splits, Deer Bear River, that is one, car car river that is second, the third one was deer car beer.', 'start': 104.5, 'duration': 6.814}, {'end': 117.958, 'text': 'There was a mapping phase in which the count of one was assigned against every word that was there in the input split.', 'start': 111.954, 'duration': 6.004}, {'end': 121.5, 'text': 'So deer comma one, beer comma one, river comma one.', 'start': 118.538, 'duration': 2.962}, {'end': 125.202, 'text': 'Similarly for input split two and for input split three.', 'start': 121.62, 'duration': 3.582}, {'end': 129.464, 'text': 'In the shuffling phase, all the keys had a list of values.', 'start': 126.062, 'duration': 3.402}, {'end': 134.707, 'text': 'that was assigned by going through all the results that was given out by the mapping function.', 'start': 129.464, 'duration': 5.243}], 'summary': 'Output sent: deer(3), bear(2), river(2), carr(2), beer(1)', 'duration': 43.235, 'max_score': 91.472, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA91472.jpg'}], 'start': 0.069, 'title': 'Mapreduce: word count example', 'summary': 'Focuses on mapreduce examples, specifically the word count example, explaining the process of input splits, mapping, shuffling, and reducing, with at least two to three practicals included in the session.', 'chapters': [{'end': 154.962, 'start': 0.069, 'title': 'Mapreduce: word count example', 'summary': 'Focuses on mapreduce examples, specifically the word count example, explaining the process of input splits, mapping, shuffling, and reducing, with at least two to three practicals included in the session.', 'duration': 154.893, 'highlights': ["The entire session is based on practicals, with at least two to three practicals for today's session. The session includes at least two to three practicals for today's session.", 'The process of input splits, mapping, shuffling, and reducing was explained using the word count example. The chapter explains the process of input splits, mapping, shuffling, and reducing using the word count example.', 'The example included three input splits: Deer Bear River, Carr Carr River, Deer Carr Bear, with the count of one assigned to every word in the mapping phase. The example provided three input splits and assigned a count of one to every word in the mapping phase.']}], 'duration': 154.893, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA69.jpg', 'highlights': ['The process of input splits, mapping, shuffling, and reducing was explained using the word count example.', 'The example included three input splits: Deer Bear River, Carr Carr River, Deer Carr Bear, with the count of one assigned to every word in the mapping phase.', "The entire session is based on practicals, with at least two to three practicals for today's session."]}, {'end': 418.72, 'segs': [{'end': 180.885, 'src': 'embed', 'start': 155.563, 'weight': 0, 'content': [{'end': 164.255, 'text': 'In the reducing phase for every key, the list of values were summed up and the result was beer comma two, car comma three, do comma two,', 'start': 155.563, 'duration': 8.692}, {'end': 165.015, 'text': 'river comma two.', 'start': 164.255, 'duration': 0.76}, {'end': 169.038, 'text': 'And finally, this was the final output on the right, which was sent back to the client.', 'start': 165.375, 'duration': 3.663}, {'end': 172.04, 'text': 'So this was the word count process using MapReduce.', 'start': 169.418, 'duration': 2.622}, {'end': 176.542, 'text': 'Guys, I hope you can clearly recall what we learned in the last session before we move on.', 'start': 172.46, 'duration': 4.082}, {'end': 180.885, 'text': 'Can you give me a quick confirmation? Omar says yes.', 'start': 176.863, 'duration': 4.022}], 'summary': 'In the reducing phase, the list of values were summed up resulting in beer: 2, car: 3, do: 2, river: 2. this was the final output in the word count process using mapreduce.', 'duration': 25.322, 'max_score': 155.563, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA155563.jpg'}, {'end': 270.791, 'src': 'embed', 'start': 241.035, 'weight': 1, 'content': [{'end': 245.196, 'text': 'We have composable input format and then finally we have DB input format.', 'start': 241.035, 'duration': 4.161}, {'end': 253.858, 'text': 'Okay file input format is the most commonly used input format that is there which has a list of input formats under it.', 'start': 245.676, 'duration': 8.182}, {'end': 259.279, 'text': 'That is like combined file input format text input format key value text input format.', 'start': 254.339, 'duration': 4.94}, {'end': 262.287, 'text': 'N line input format, sequence file input format.', 'start': 259.685, 'duration': 2.602}, {'end': 266.229, 'text': 'Sequence file input format is something which takes sequence files as an input.', 'start': 262.367, 'duration': 3.862}, {'end': 270.791, 'text': 'N line input format takes N number of lines as an input altogether.', 'start': 266.729, 'duration': 4.062}], 'summary': 'File input format is the most commonly used, with various sub-formats and capabilities.', 'duration': 29.756, 'max_score': 241.035, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA241035.jpg'}, {'end': 349.869, 'src': 'embed', 'start': 325.448, 'weight': 2, 'content': [{'end': 331.991, 'text': 'that is K and V, whereas DB input format has only one attribute, that is T.', 'start': 325.448, 'duration': 6.543}, {'end': 340.215, 'text': 'That is a very good question, Matt, and let me tell you at this moment that is composable input format takes two attributes,', 'start': 331.991, 'duration': 8.224}, {'end': 342.757, 'text': 'which is nothing but the key and value.', 'start': 340.215, 'duration': 2.542}, {'end': 342.977, 'text': 'okay,', 'start': 342.757, 'duration': 0.22}, {'end': 349.869, 'text': 'And this could be either of the data types that are supported by Hadoop, like intWriteable, longWriteable and so on.', 'start': 343.385, 'duration': 6.484}], 'summary': 'K and v in composable input format, supported by hadoop.', 'duration': 24.421, 'max_score': 325.448, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA325448.jpg'}, {'end': 418.72, 'src': 'embed', 'start': 371.36, 'weight': 3, 'content': [{'end': 377.702, 'text': 'Now this is the superclass that is output format which is present in org Apache Hadoop dot MapReduce package.', 'start': 371.36, 'duration': 6.342}, {'end': 381.023, 'text': 'Now there are basically four types of output formats.', 'start': 378.382, 'duration': 2.641}, {'end': 383.684, 'text': 'File output format the most commonly used.', 'start': 381.483, 'duration': 2.201}, {'end': 387.326, 'text': 'Null output format in which there is no output that has to be given.', 'start': 384.124, 'duration': 3.202}, {'end': 391.607, 'text': 'DB output format which gives out DB writable kind of an object.', 'start': 387.826, 'duration': 3.781}, {'end': 393.408, 'text': 'Filter output format.', 'start': 392.207, 'duration': 1.201}, {'end': 394.948, 'text': 'Okay that is the fourth one.', 'start': 393.728, 'duration': 1.22}, {'end': 403.355, 'text': 'File output format can be divided into two types that is text output format which basically dumps out the output as text.', 'start': 395.928, 'duration': 7.427}, {'end': 412.058, 'text': "Second is the sequence file output format, and I'm sure you can clearly tell me what will be the output from sequence file output format.", 'start': 404.075, 'duration': 7.983}, {'end': 415.539, 'text': 'Can you please tell me? Omar says binary files.', 'start': 412.098, 'duration': 3.441}, {'end': 418.72, 'text': 'Absolutely, binary files are stored as sequence files.', 'start': 415.639, 'duration': 3.081}], 'summary': 'Apache hadoop mapreduce package has 4 output formats, with file output format being the most commonly used, and it can be divided into text and sequence file output formats.', 'duration': 47.36, 'max_score': 371.36, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA371360.jpg'}], 'start': 155.563, 'title': 'Mapreduce input and output formats', 'summary': 'Covers word count process using mapreduce, reducing phase result, and overview of input formats in mapreduce. it emphasizes common usage of file input format and explains differences between composable input format and db input format, highlighting supported attributes. the chapter also discusses various types of output formats including file output format, null output format, db output format, and filter output format, with a specific focus on text output format and sequence file output format.', 'chapters': [{'end': 303.452, 'start': 155.563, 'title': 'Mapreduce input formats', 'summary': 'Covers the word count process using mapreduce, including the result of the reducing phase and an overview of input formats in mapreduce, consisting of file input format, composable input format, and db input format, with emphasis on the common usage of file input format.', 'duration': 147.889, 'highlights': ["The result of the reducing phase was beer: 2, car: 3, do: 2, river: 2, sent back to the client as the final output. The result of the reducing phase yielded specific word counts, with 'car' having the highest count of 3, which was then sent back to the client as the final output.", 'An overview of input formats in MapReduce, including file input format, composable input format, and DB input format, with emphasis on the common usage of file input format. The chapter provides an overview of input formats in MapReduce, highlighting file input format as the most commonly used format, with various sub-formats such as text input format and sequence file input format.', "Umar asks about the possibility of having custom input format, to which the instructor confirms the feasibility but states it won't be covered in the current session. Umar inquires about the option of custom input format, and the instructor acknowledges its possibility but mentions that it won't be discussed in the current session."]}, {'end': 418.72, 'start': 303.952, 'title': 'Input and output formats in mapreduce', 'summary': 'Explains the differences between composable input format and db input format, highlighting the attributes they support, and the various types of output formats including file output format, null output format, db output format, and filter output format, with a specific focus on text output format and sequence file output format.', 'duration': 114.768, 'highlights': ['The composable input format has two attributes, key and value, which can be of data types supported by Hadoop, whereas DB input format has only one attribute, of type dbWritable, for reading data from a database.', 'There are four types of output formats in MapReduce: File output format, Null output format, DB output format, and Filter output format, with File output format being the most commonly used.', 'Under File output format, there are two types: text output format for dumping output as text, and sequence file output format for storing binary files as sequence files.']}], 'duration': 263.157, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA155563.jpg', 'highlights': ["The result of the reducing phase yielded specific word counts, with 'car' having the highest count of 3, which was then sent back to the client as the final output.", 'The chapter provides an overview of input formats in MapReduce, highlighting file input format as the most commonly used format, with various sub-formats such as text input format and sequence file input format.', 'The composable input format has two attributes, key and value, which can be of data types supported by Hadoop, whereas DB input format has only one attribute, of type dbWritable, for reading data from a database.', 'There are four types of output formats in MapReduce: File output format, Null output format, DB output format, and Filter output format, with File output format being the most commonly used.', 'Under File output format, there are two types: text output format for dumping output as text, and sequence file output format for storing binary files as sequence files.']}, {'end': 1075.401, 'segs': [{'end': 471.698, 'src': 'embed', 'start': 440.452, 'weight': 0, 'content': [{'end': 446.094, 'text': "However, in this session, I'll tell you how to write a MapReduce program of your own using Java,", 'start': 440.452, 'duration': 5.642}, {'end': 450.175, 'text': "as well as I'll tell you what are the packages and classes that are required to do that.", 'start': 446.094, 'duration': 4.081}, {'end': 456.737, 'text': "So the very first thing is to explore the classes that are used for writing a MapReduce program, right? Let's move on.", 'start': 450.435, 'duration': 6.302}, {'end': 463.134, 'text': 'Now very first thing is the list of the packages that needs to be imported which is there in the Hadoop jar file.', 'start': 457.591, 'duration': 5.543}, {'end': 471.698, 'text': 'Now these are the packages or libraries that needs to be imported from the Hadoop package so that we get the required classes for writing the MapReduce program.', 'start': 463.374, 'duration': 8.324}], 'summary': 'Learn to write a mapreduce program in java and import required packages from hadoop.', 'duration': 31.246, 'max_score': 440.452, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA440452.jpg'}, {'end': 722.302, 'src': 'embed', 'start': 692.856, 'weight': 3, 'content': [{'end': 694.437, 'text': 'In this case, it is word count.', 'start': 692.856, 'duration': 1.581}, {'end': 696, 'text': "So let's move on.", 'start': 695.179, 'duration': 0.821}, {'end': 698.662, 'text': "So it's time to see some MapReduce example.", 'start': 696.54, 'duration': 2.122}, {'end': 705.908, 'text': "To start with, let me take you through the word count example itself, okay? So it's time for some practical, guys.", 'start': 698.702, 'duration': 7.206}, {'end': 708.15, 'text': "So it's time for some practicals.", 'start': 706.749, 'duration': 1.401}, {'end': 710.412, 'text': "I'll take you to the Dureka VM now.", 'start': 708.39, 'duration': 2.022}, {'end': 712.153, 'text': 'So this is Eclipse, guys.', 'start': 710.912, 'duration': 1.241}, {'end': 715.156, 'text': 'This is the ID for writing a Java MapReduce program.', 'start': 712.293, 'duration': 2.863}, {'end': 718.459, 'text': 'So the very first thing is you need to create a Java project.', 'start': 715.736, 'duration': 2.723}, {'end': 722.302, 'text': 'So you can go to New, click on Project.', 'start': 718.619, 'duration': 3.683}], 'summary': 'The transcript covers a discussion on word count and using eclipse for java mapreduce program.', 'duration': 29.446, 'max_score': 692.856, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA692856.jpg'}, {'end': 767.641, 'src': 'embed', 'start': 740.13, 'weight': 1, 'content': [{'end': 747.664, 'text': 'OK So the very first thing that we saw in the presentation was We were importing certain packages from two jar files.', 'start': 740.13, 'duration': 7.534}, {'end': 754.889, 'text': 'Okay, so as you can see, I have added two jar files that is Hadoop MapReduce client core and Hadoop common.', 'start': 748.064, 'duration': 6.825}, {'end': 760.053, 'text': 'Okay, these jar files can easily be found in the Hadoop package itself.', 'start': 755.47, 'duration': 4.583}, {'end': 763.836, 'text': 'Okay, and the path for these files are right here.', 'start': 760.553, 'duration': 3.283}, {'end': 767.641, 'text': 'that is user lib Hadoop 2.2.', 'start': 764.52, 'duration': 3.121}], 'summary': 'Presentation covered importing packages from hadoop jar files: mapreduce and common, found in hadoop 2.2 package.', 'duration': 27.511, 'max_score': 740.13, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA740130.jpg'}, {'end': 881.399, 'src': 'embed', 'start': 842.223, 'weight': 2, 'content': [{'end': 845.69, 'text': 'OK So this is how you need to add both the jar files in your class path.', 'start': 842.223, 'duration': 3.467}, {'end': 851.493, 'text': 'Once you are done with that, you can come here and you can import all these packages right here.', 'start': 846.37, 'duration': 5.123}, {'end': 855.515, 'text': 'These are the same packages that we saw in the presentation.', 'start': 852.613, 'duration': 2.902}, {'end': 861.057, 'text': 'That is file output format, input format, text input format, text output format.', 'start': 855.995, 'duration': 5.062}, {'end': 865.8, 'text': 'We have got mapper, we have got reducer class, which is nothing but the super classes.', 'start': 861.858, 'duration': 3.942}, {'end': 871.863, 'text': "Okay, so let's see the major classes that are present in this word count.java file.", 'start': 866.92, 'duration': 4.943}, {'end': 879.839, 'text': 'If you divide the entire program, we have got one main class that is called word count, which has got one class, that is map,', 'start': 872.935, 'duration': 6.904}, {'end': 881.399, 'text': 'and the second class that is reduce.', 'start': 879.839, 'duration': 1.56}], 'summary': 'Instructions on adding jar files, importing packages, and examining classes in word count.java.', 'duration': 39.176, 'max_score': 842.223, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA842223.jpg'}], 'start': 419.681, 'title': 'Mapreduce in java and hadoop jar files', 'summary': 'Discusses writing a mapreduce program in java, covering required packages, classes, and setup in eclipse. it also explains importing hadoop jar files, configuring build path, and the word count java program logic.', 'chapters': [{'end': 738.809, 'start': 419.681, 'title': 'Understanding mapreduce in java', 'summary': 'Discusses the packages and classes required for writing a mapreduce program in java, including the necessary imports and the declaration of mapper and reducer classes with their input and output data types. it also covers the setup of a java project in eclipse for writing a mapreduce program.', 'duration': 319.128, 'highlights': ['The chapter discusses the necessary packages and classes for writing a MapReduce program in Java, including the required imports from Hadoop packages.', 'It explains the declaration of the mapper and reducer classes, specifying their input and output data types for a word count example in a MapReduce program.', 'The setup of a Java project in Eclipse for writing a MapReduce program is demonstrated, including creating a new project and the word count Java file.']}, {'end': 1075.401, 'start': 740.13, 'title': 'Importing and configuring hadoop jar files', 'summary': 'Covers the process of importing hadoop mapreduce client core and hadoop common jar files, configuring the build path, importing required packages, and explaining the logic and classes within a word count java program.', 'duration': 335.271, 'highlights': ['The process of importing Hadoop MapReduce client core and Hadoop common jar files and configuring the build path is detailed, with the specific path for locating these files within the Hadoop package.', 'The step-by-step process of adding the jar files to the build path, including locating the files, navigating through the folder structure, and adding them to the classpath, is explained thoroughly.', 'The explanation of the classes within the Word Count Java program, including the main class, map class, and reduce class, along with the attributes and input/output types for each class, is provided.', "The map function logic, input attributes, and the role of the 'context' class in writing the output of the mapper are explained with a detailed example and clarification of the input key type as long writable.", 'The connection between the output key value types of the mapper and the input key value types of the reducer is clarified, emphasizing that the types of key value pairs of the mapper output are exactly the same as the input key value types of the reducer.']}], 'duration': 655.72, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA419681.jpg', 'highlights': ['The chapter discusses the necessary packages and classes for writing a MapReduce program in Java, including the required imports from Hadoop packages.', 'The process of importing Hadoop MapReduce client core and Hadoop common jar files and configuring the build path is detailed, with the specific path for locating these files within the Hadoop package.', 'The step-by-step process of adding the jar files to the build path, including locating the files, navigating through the folder structure, and adding them to the classpath, is explained thoroughly.', 'The setup of a Java project in Eclipse for writing a MapReduce program is demonstrated, including creating a new project and the word count Java file.', 'The explanation of the classes within the Word Count Java program, including the main class, map class, and reduce class, along with the attributes and input/output types for each class, is provided.']}, {'end': 1387.154, 'segs': [{'end': 1146.217, 'src': 'embed', 'start': 1076.122, 'weight': 1, 'content': [{'end': 1078.503, 'text': 'The entire line would go into the value.', 'start': 1076.122, 'duration': 2.381}, {'end': 1083.006, 'text': 'Okay, which is, this is Edureka class three.', 'start': 1079.103, 'duration': 3.903}, {'end': 1087.468, 'text': 'Okay, so which is nothing but of text type.', 'start': 1084.686, 'duration': 2.782}, {'end': 1095.036, 'text': 'So that is why what we are doing is we are converting this to string and storing within a variable that is called line.', 'start': 1089.155, 'duration': 5.881}, {'end': 1100.358, 'text': 'Okay Guys are we clear till now? Good.', 'start': 1095.056, 'duration': 5.302}, {'end': 1104.399, 'text': 'Post that will be using a string tokenizer.', 'start': 1101.538, 'duration': 2.861}, {'end': 1111.5, 'text': 'Now this line variable is now this line variable will be passed to an object of string tokenizer class.', 'start': 1104.859, 'duration': 6.641}, {'end': 1118.322, 'text': 'Now string tokenizer is used to extract the words on the basis of spaces that are present between them.', 'start': 1111.88, 'duration': 6.442}, {'end': 1126.302, 'text': "Okay, now what I'm doing here is I'm creating this object tokenizer and I'm passing the variable line which contains the entire line.", 'start': 1118.897, 'duration': 7.405}, {'end': 1129.805, 'text': 'Okay, the value now once it is passed.', 'start': 1126.643, 'duration': 3.162}, {'end': 1134.548, 'text': 'We need to run a loop and iterator which is a while loop here.', 'start': 1131.166, 'duration': 3.382}, {'end': 1140.273, 'text': 'Now this particular statement is checking whether the tokenizer has got any more words or not.', 'start': 1135.189, 'duration': 5.084}, {'end': 1146.217, 'text': 'till the time the tokenizer has got tokens or words within it, the while loop will keep on running.', 'start': 1140.273, 'duration': 5.944}], 'summary': 'Edureka class three: converting text to string, using string tokenizer to extract words based on spaces.', 'duration': 70.095, 'max_score': 1076.122, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA1076122.jpg'}, {'end': 1222.459, 'src': 'embed', 'start': 1193.094, 'weight': 3, 'content': [{'end': 1200.935, 'text': 'If you recall the explanation that I gave you previously, we need to assign one against every word to find the word count right?', 'start': 1193.094, 'duration': 7.841}, {'end': 1205.996, 'text': 'So what is going to happen is the very first value will come.', 'start': 1202.315, 'duration': 3.681}, {'end': 1208.116, 'text': 'will be this okay?', 'start': 1205.996, 'duration': 2.12}, {'end': 1212.737, 'text': 'and what will be written would be this comma one.', 'start': 1208.116, 'duration': 4.621}, {'end': 1222.459, 'text': 'Similarly the second value would be is and what will be written from the mapper as output would be is comma one.', 'start': 1213.957, 'duration': 8.502}], 'summary': 'Assign one against every word to find the word count.', 'duration': 29.365, 'max_score': 1193.094, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA1193094.jpg'}, {'end': 1351.808, 'src': 'embed', 'start': 1256.446, 'weight': 0, 'content': [{'end': 1260.329, 'text': 'If you can give me a quick confirmation and if you have any questions, you can ask me right now.', 'start': 1256.446, 'duration': 3.883}, {'end': 1262.786, 'text': 'Omar says yes.', 'start': 1261.946, 'duration': 0.84}, {'end': 1265.187, 'text': 'Matt says yes.', 'start': 1264.387, 'duration': 0.8}, {'end': 1269.208, 'text': 'What about the rest of you? Himanshu says yes, great.', 'start': 1265.287, 'duration': 3.921}, {'end': 1271.928, 'text': 'Just remember the output here.', 'start': 1270.608, 'duration': 1.32}, {'end': 1275.609, 'text': 'This one is one, Edureka one, class one and three one.', 'start': 1271.948, 'duration': 3.661}, {'end': 1278.99, 'text': "Now we'll move into the reducer implementation.", 'start': 1276.149, 'duration': 2.841}, {'end': 1288.232, 'text': 'Just like the mapper, we have a reducer class which extends the main class or the superclass that is called reducer and these are the attributes.', 'start': 1279.95, 'duration': 8.282}, {'end': 1298.32, 'text': 'So we have input key as text and input value as int writable, which is nothing but the output that was given by the mapper right?', 'start': 1288.692, 'duration': 9.628}, {'end': 1303.722, 'text': 'The mapper output was of type text key and int writable value.', 'start': 1298.56, 'duration': 5.162}, {'end': 1309.866, 'text': 'Now the key value output from the reducer would be of type text and int writable respectively.', 'start': 1304.303, 'duration': 5.563}, {'end': 1317.89, 'text': 'right?. Now, as soon as you extend the reducer class, next thing is, you need to implement the reduce function, okay?', 'start': 1309.866, 'duration': 8.024}, {'end': 1322.816, 'text': 'Now, the reduce function will take three attributes as input.', 'start': 1318.655, 'duration': 4.161}, {'end': 1328.657, 'text': 'key text second would be values, which is of type iterable, which has values of type int writable.', 'start': 1322.816, 'duration': 5.841}, {'end': 1336.319, 'text': 'Okay And then finally the context context is you as you know is used to write the final output from the reducer or the mapper.', 'start': 1328.957, 'duration': 7.362}, {'end': 1340.339, 'text': 'Okay Now we need to understand what is iterable here.', 'start': 1337.479, 'duration': 2.86}, {'end': 1343.18, 'text': 'Now iterable is nothing but the list of values.', 'start': 1340.92, 'duration': 2.26}, {'end': 1347.761, 'text': 'So if you recall in the reducer phase the input is key and a list of values.', 'start': 1343.32, 'duration': 4.441}, {'end': 1351.808, 'text': 'So this is nothing but the list of values that are against a particular key.', 'start': 1348.046, 'duration': 3.762}], 'summary': 'The transcript covers discussion on mapper and reducer implementation, with participants confirming understanding.', 'duration': 95.362, 'max_score': 1256.446, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA1256446.jpg'}], 'start': 1076.122, 'title': 'String tokenization and mapreduce', 'summary': 'Covers string tokenization, word counting, and mapreduce implementation in hadoop. it includes the process of converting text to string, using a tokenizer, word count functionality, and an explanation of the reducer class in hadoop mapreduce.', 'chapters': [{'end': 1192.714, 'start': 1076.122, 'title': 'String tokenization and word counting', 'summary': 'Covers the process of converting text to string, using a string tokenizer to extract words based on spaces, and implementing a word count functionality, including the usage of a while loop for iterating through the tokens and assigning a value of one to each word, as demonstrated through an example.', 'duration': 116.592, 'highlights': ['The process involves converting the text to string and storing it in a variable called line.', 'Using a string tokenizer to extract words based on spaces, creating a tokenizer object, and passing the line variable to it.', 'Utilizing a while loop to iterate through the tokens and assigning a value of one to each word, demonstrated through an example.']}, {'end': 1278.99, 'start': 1193.094, 'title': 'Word count mapper explanation', 'summary': "Explains the process of assigning a value to each word for word count, utilizing examples and seeking confirmation from the audience, with the final output being 'one' for each word occurrence.", 'duration': 85.896, 'highlights': ["The process of assigning one value against every word to find the word count is explained, with examples like 'edureka' and 'class' resulting in 'edureka one' and 'class one' respectively.", "The chapter seeks confirmation from the audience, receiving responses from individuals like 'Omar' and 'Matt', and emphasizing the output as 'one' for each word occurrence.", 'The chapter concludes with a transition to discussing the reducer implementation for the word count.']}, {'end': 1387.154, 'start': 1279.95, 'title': 'Understanding mapreduce in hadoop', 'summary': 'Explains the implementation of the reducer class in hadoop mapreduce, including the input and output data types, the reduce function, and the usage of iterable to handle key-value pairs in the reducer phase.', 'duration': 107.204, 'highlights': ['The reducer class extends the main class or the superclass, with input key as text and input value as int writable, with the output being of type text key and int writable value.', 'The reduce function takes three attributes as input: key text, values of type iterable with values of type int writable, and context, which is used to write the final output from the reducer or the mapper.', 'Iterable is used to handle the list of values against a particular key in the reducer phase, and the data types in Hadoop are specifically designed for a distributed file system and as an input format to a MapReduce program.']}], 'duration': 311.032, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA1076122.jpg', 'highlights': ['The reducer class extends the main class or the superclass, with input key as text and input value as int writable, with the output being of type text key and int writable value.', 'The process involves converting the text to string and storing it in a variable called line.', 'Using a string tokenizer to extract words based on spaces, creating a tokenizer object, and passing the line variable to it.', "The process of assigning one value against every word to find the word count is explained, with examples like 'edureka' and 'class' resulting in 'edureka one' and 'class one' respectively.", "The chapter seeks confirmation from the audience, receiving responses from individuals like 'Omar' and 'Matt', and emphasizing the output as 'one' for each word occurrence.", 'Utilizing a while loop to iterate through the tokens and assigning a value of one to each word, demonstrated through an example.', 'The reduce function takes three attributes as input: key text, values of type iterable with values of type int writable, and context, which is used to write the final output from the reducer or the mapper.', 'Iterable is used to handle the list of values against a particular key in the reducer phase, and the data types in Hadoop are specifically designed for a distributed file system and as an input format to a MapReduce program.', 'The chapter concludes with a transition to discussing the reducer implementation for the word count.']}, {'end': 2027.562, 'segs': [{'end': 1477.267, 'src': 'heatmap', 'start': 1414.22, 'weight': 2, 'content': [{'end': 1416.982, 'text': 'Okay, so we have values of int writable type.', 'start': 1414.22, 'duration': 2.762}, {'end': 1420.024, 'text': 'So we are defining a variable int writable X.', 'start': 1417.142, 'duration': 2.882}, {'end': 1422.206, 'text': 'We are passing on the I triple that is values.', 'start': 1420.024, 'duration': 2.182}, {'end': 1426.068, 'text': "Okay, so we'll read every value that is present within this I triple.", 'start': 1422.226, 'duration': 3.842}, {'end': 1430.952, 'text': "Okay, and we'll keep on adding this using this particular Java statement.", 'start': 1426.469, 'duration': 4.483}, {'end': 1435.575, 'text': 'Okay, so the very first value is read from values I triple and get stored in X.', 'start': 1431.232, 'duration': 4.343}, {'end': 1437.977, 'text': 'It comes here and it gets added to sum.', 'start': 1435.575, 'duration': 2.402}, {'end': 1440.934, 'text': 'Then again, the for loop is repeated.', 'start': 1438.953, 'duration': 1.981}, {'end': 1446.196, 'text': 'The second value comes into x, it comes here and it gets added to the previous value of sum.', 'start': 1440.954, 'duration': 5.242}, {'end': 1451.798, 'text': 'And this loop will keep on executing till it reaches the end of the values iterable.', 'start': 1446.756, 'duration': 5.042}, {'end': 1457.76, 'text': 'So once all the values are read from the values iterable, it exits the for loop.', 'start': 1452.118, 'duration': 5.642}, {'end': 1460.561, 'text': 'Finally, what we are doing is we are writing down the key,', 'start': 1458.28, 'duration': 2.281}, {'end': 1469.862, 'text': 'which is nothing but the word and the sum of the values that was received from the for function as the output from the reducer function right?', 'start': 1461.516, 'duration': 8.346}, {'end': 1475.526, 'text': "Now let's try and take up the same example that we were dealing while understanding the MapReduce function.", 'start': 1470.462, 'duration': 5.064}, {'end': 1477.267, 'text': 'So this was the output.', 'start': 1476.086, 'duration': 1.181}], 'summary': 'Using java, the program reads and adds values iteratively to calculate the sum, then outputs the key and sum.', 'duration': 63.047, 'max_score': 1414.22, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA1414220.jpg'}, {'end': 1650.938, 'src': 'embed', 'start': 1624.91, 'weight': 0, 'content': [{'end': 1632.437, 'text': 'So using this object, we can define the entire configuration of our word count example or any MapReduce example, okay.', 'start': 1624.91, 'duration': 7.527}, {'end': 1634.868, 'text': 'Now next you define the job.', 'start': 1633.527, 'duration': 1.341}, {'end': 1639.791, 'text': 'Now right here we are defining the job that needs to get executed on the Hadoop cluster.', 'start': 1635.308, 'duration': 4.483}, {'end': 1646.435, 'text': 'To this job we need to pass on the configuration of our MapReduce program along with the name of the MapReduce program.', 'start': 1640.451, 'duration': 5.984}, {'end': 1650.938, 'text': 'This could be anything but under double quotes which is nothing but the string.', 'start': 1646.555, 'duration': 4.383}], 'summary': 'Using an object, we can define the configuration of mapreduce example and job execution on hadoop cluster.', 'duration': 26.028, 'max_score': 1624.91, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA1624910.jpg'}, {'end': 1760.628, 'src': 'embed', 'start': 1737.213, 'weight': 1, 'content': [{'end': 1745.158, 'text': "So what I'm telling the MapReduce program is I'm going to pass the arguments from the command line and the very first argument is the input path.", 'start': 1737.213, 'duration': 7.945}, {'end': 1749.02, 'text': 'Similarly goes with the output path right here, argument one.', 'start': 1745.678, 'duration': 3.342}, {'end': 1752.623, 'text': 'So let me tell you what is this argument zero and argument one.', 'start': 1749.821, 'duration': 2.802}, {'end': 1759.227, 'text': "So while executing the word count example, what I need to do is I'll be executing this command.", 'start': 1752.943, 'duration': 6.284}, {'end': 1760.628, 'text': 'So that is Hadoop jar.', 'start': 1759.347, 'duration': 1.281}], 'summary': 'Explaining mapreduce program arguments, input and output paths, and hadoop jar command for word count example.', 'duration': 23.415, 'max_score': 1737.213, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA1737213.jpg'}], 'start': 1387.915, 'title': 'Understanding and executing mapreduce functions', 'summary': 'Delves into understanding the reduce function in mapreduce, covering logic, aggregation, and frequency calculation, and also details mapreduce program configuration and execution, resulting in successful word count program completion on a hadoop cluster.', 'chapters': [{'end': 1600.116, 'start': 1387.915, 'title': 'Understanding reduce function in mapreduce', 'summary': 'Explains the logic and aggregation done in the reducer function, including the process of calculating frequency of words, iterating through values, and the output generated, using a specific example to illustrate the process.', 'duration': 212.201, 'highlights': ['The reducer function calculates the frequency of words by iterating through the values and adding them to the sum variable, producing the word and sum of values as the output. The reducer function calculates the frequency of words by iterating through the values and adding them to the sum variable, producing the word and sum of values as the output.', 'Illustration of the process with a specific example, showing the input and output for different words, demonstrating the logic and output of the reducer function. Illustration of the process with a specific example, showing the input and output for different words, demonstrating the logic and output of the reducer function.', "Explanation of the process with the example of 'is' and the resulting sum value of 2, showcasing the iterative addition of values in the reducer function. Explanation of the process with the example of 'is' and the resulting sum value of 2, showcasing the iterative addition of values in the reducer function."]}, {'end': 2027.562, 'start': 1600.116, 'title': 'Mapreduce configuration and execution', 'summary': 'Explains the process of configuring and executing a mapreduce program, including setting up the main function, defining job configurations, specifying input and output paths, and running the program, resulting in the successful completion of the word count program on the hadoop cluster.', 'duration': 427.446, 'highlights': ['The process involves defining a configuration object to specify the entire configuration of the MapReduce program, setting up the job to be executed on the Hadoop cluster, and specifying the main class, mapper class, and reducer class for the job.', "Setting the input and output key classes, input format class, and output format class, such as 'text' and 'inwritable' types, to define the data format for the MapReduce program.", 'Specifying the input and output paths for the MapReduce program, using command-line arguments, and executing the word count program on the Hadoop cluster, resulting in the successful completion of the job and displaying the word count output.']}], 'duration': 639.647, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA1387915.jpg', 'highlights': ['The process involves defining a configuration object to specify the entire configuration of the MapReduce program, setting up the job to be executed on the Hadoop cluster, and specifying the main class, mapper class, and reducer class for the job.', 'Specifying the input and output paths for the MapReduce program, using command-line arguments, and executing the word count program on the Hadoop cluster, resulting in the successful completion of the job and displaying the word count output.', 'The reducer function calculates the frequency of words by iterating through the values and adding them to the sum variable, producing the word and sum of values as the output.']}, {'end': 2602.933, 'segs': [{'end': 2100.839, 'src': 'embed', 'start': 2074.315, 'weight': 0, 'content': [{'end': 2077.838, 'text': 'The very first example is MapReduce temperature example, okay.', 'start': 2074.315, 'duration': 3.523}, {'end': 2080.139, 'text': "Now let's understand what it is.", 'start': 2078.819, 'duration': 1.32}, {'end': 2089.789, 'text': 'The problem statement of this example is we need to analyze the weather data of Austin to determine the hot and cold days that were there, okay.', 'start': 2081.181, 'duration': 8.608}, {'end': 2096.655, 'text': 'Now what we are doing is we are using the data set that is present at NCEI okay,', 'start': 2090.989, 'duration': 5.666}, {'end': 2100.839, 'text': 'so which is nothing but National Centers for Environmental Information, okay.', 'start': 2096.655, 'duration': 4.184}], 'summary': 'Analyzing austin weather data using mapreduce to find hot and cold days from ncei dataset.', 'duration': 26.524, 'max_score': 2074.315, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2074315.jpg'}, {'end': 2422.631, 'src': 'embed', 'start': 2399.189, 'weight': 1, 'content': [{'end': 2408.095, 'text': 'so what we are doing is if the max temperature is greater than 35, we are classifying it as a hot day, and if the minimum temperature is less than 10,', 'start': 2399.189, 'duration': 8.906}, {'end': 2410.236, 'text': 'we are classifying that day as a cold day.', 'start': 2408.095, 'duration': 2.141}, {'end': 2415.28, 'text': "Okay, so let's go back to the terminal and execute this.", 'start': 2410.997, 'duration': 4.283}, {'end': 2419.45, 'text': 'So before we go and execute, let me just quickly show you the driver class.', 'start': 2416.189, 'duration': 3.261}, {'end': 2422.631, 'text': "I am sure that you're pretty much clear with the driver class already.", 'start': 2419.73, 'duration': 2.901}], 'summary': 'Classify hot day if max temp > 35, cold day if min temp < 10', 'duration': 23.442, 'max_score': 2399.189, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2399189.jpg'}, {'end': 2574.917, 'src': 'embed', 'start': 2543.227, 'weight': 3, 'content': [{'end': 2546.789, 'text': 'Even the reducer has finished now and the job has also executed.', 'start': 2543.227, 'duration': 3.562}, {'end': 2550.592, 'text': "Let's quickly go and check out the output.", 'start': 2548.811, 'duration': 1.781}, {'end': 2559.637, 'text': "Let's browse the file system which is nothing but the HDFS file system and look for my weather right here.", 'start': 2553.113, 'duration': 6.524}, {'end': 2561.999, 'text': "I'll click on that.", 'start': 2559.657, 'duration': 2.342}, {'end': 2564.34, 'text': "Let's see the output.", 'start': 2563.379, 'duration': 0.961}, {'end': 2571.634, 'text': 'So as you can see, each day is classified as cold day or hot day.', 'start': 2567.25, 'duration': 4.384}, {'end': 2574.917, 'text': 'And against that, you have the temperature value written.', 'start': 2571.794, 'duration': 3.123}], 'summary': 'Reducer finished, job executed. hdfs shows weather data categorized as cold or hot with temperature values.', 'duration': 31.69, 'max_score': 2543.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2543227.jpg'}], 'start': 2028.402, 'title': 'Mapreduce applications in weather data analysis', 'summary': 'Delves into using mapreduce to analyze weather data from austin, identifying hot and cold days, and processing daily temperature data to classify the days, covering mapper logic and execution in hadoop.', 'chapters': [{'end': 2274.981, 'start': 2028.402, 'title': 'Mapreduce: analyzing weather data', 'summary': 'Explores the application of mapreduce in analyzing weather data from austin using the national centers for environmental information dataset to determine hot and cold days, and uses the daily maximum and minimum temperatures along with the date to classify the days.', 'duration': 246.579, 'highlights': ['MapReduce used to analyze weather data to determine hot and cold days in Austin The example demonstrates the application of MapReduce to analyze weather data from Austin to determine hot and cold days, leveraging the National Centers for Environmental Information dataset.', 'Utilization of daily maximum and minimum temperatures along with the date to classify hot and cold days The analysis involves the use of daily maximum and minimum temperatures along with the date to classify whether a particular day in Austin was hot or cold.', 'Significance of reading the readme file for understanding the dataset Emphasizes the importance of reading the readme file to understand the dataset, particularly focusing on the date, daily maximum temperature, and daily minimum temperature.']}, {'end': 2602.933, 'start': 2275.362, 'title': 'Mapper logic for weather data', 'summary': 'Covers the mapper logic for processing weather data, including identifying inconsistent data, extracting date and temperature values, classifying hot and cold days, executing the program, and checking the output in hadoop.', 'duration': 327.571, 'highlights': ['The mapper logic involves identifying inconsistent data, extracting date and temperature values, and classifying hot and cold days, with temperatures greater than 35 being classified as hot and temperatures less than 10 as cold.', 'The process includes converting temperature values to float, executing the program using Hadoop, and checking the output to classify each day as hot or cold, with most values being cold due to the dataset belonging to a cold place.', 'The process also includes moving the dataset to the Hadoop distributed file system, exporting the jar file, and setting up the configuration and job properties for the program.']}], 'duration': 574.531, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2028402.jpg', 'highlights': ['MapReduce used to analyze weather data to determine hot and cold days in Austin', 'Utilization of daily maximum and minimum temperatures along with the date to classify hot and cold days', 'The mapper logic involves identifying inconsistent data, extracting date and temperature values, and classifying hot and cold days, with temperatures greater than 35 being classified as hot and temperatures less than 10 as cold', 'The process includes converting temperature values to float, executing the program using Hadoop, and checking the output to classify each day as hot or cold, with most values being cold due to the dataset belonging to a cold place']}, {'end': 2956.759, 'segs': [{'end': 2705.666, 'src': 'embed', 'start': 2677.748, 'weight': 1, 'content': [{'end': 2681.19, 'text': 'So there could be n number of analysis that could be done over this data.', 'start': 2677.748, 'duration': 3.442}, {'end': 2689.074, 'text': 'However, in this particular session, we will try to find out the unique users for every track ID.', 'start': 2682.15, 'duration': 6.924}, {'end': 2693.836, 'text': 'So essentially what we are trying to find out is the popularity of a particular song.', 'start': 2689.714, 'duration': 4.122}, {'end': 2701.343, 'text': 'So if you have multiple unique users for a particular song and the number is high, that means the song has picked up.', 'start': 2694.397, 'duration': 6.946}, {'end': 2702.344, 'text': 'very well, right?', 'start': 2701.343, 'duration': 1.001}, {'end': 2705.666, 'text': 'Guys, did you understand the problem statement?', 'start': 2703.184, 'duration': 2.482}], 'summary': 'Analyzing unique users for each track id to determine song popularity.', 'duration': 27.918, 'max_score': 2677.748, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2677748.jpg'}, {'end': 2746.865, 'src': 'embed', 'start': 2719.498, 'weight': 0, 'content': [{'end': 2728.278, 'text': 'right?. So this is the MapReduce program that has been written to find out the unique users for every song that is there at last.fm, okay?', 'start': 2719.498, 'duration': 8.78}, {'end': 2731.659, 'text': 'So, right here, this is the data set.', 'start': 2729.859, 'duration': 1.8}, {'end': 2732.88, 'text': "I'll quickly open it.", 'start': 2731.919, 'duration': 0.961}, {'end': 2736.961, 'text': 'So this is our last.fm store.', 'start': 2734.26, 'duration': 2.701}, {'end': 2737.922, 'text': 'set data right?', 'start': 2736.961, 'duration': 0.961}, {'end': 2746.865, 'text': 'We have the user ID, we have the track ID, whether this track was shared or not, whether this track was heard at radio or not and, finally,', 'start': 2738.102, 'duration': 8.763}], 'summary': 'Mapreduce program finds unique users for songs in last.fm dataset.', 'duration': 27.367, 'max_score': 2719.498, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2719498.jpg'}, {'end': 2797.492, 'src': 'embed', 'start': 2770.328, 'weight': 2, 'content': [{'end': 2776.67, 'text': "It contains nothing but a list of constants using which we'll try to traverse through every record.", 'start': 2770.328, 'duration': 6.342}, {'end': 2783.43, 'text': 'But before moving on, Let me quickly show you the significance of these constants that we are declaring for that.', 'start': 2777.41, 'duration': 6.02}, {'end': 2784.73, 'text': "I'll go to the data set.", 'start': 2783.51, 'duration': 1.22}, {'end': 2788.611, 'text': "And I'll copy one record.", 'start': 2787.311, 'duration': 1.3}, {'end': 2797.492, 'text': 'Okay So if you see the pipe is the delimiter right here, right every value is separated by the pipe.', 'start': 2788.651, 'duration': 8.841}], 'summary': 'Constants are used to traverse records with pipe delimiter in data set.', 'duration': 27.164, 'max_score': 2770.328, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2770328.jpg'}, {'end': 2885.752, 'src': 'embed', 'start': 2848.183, 'weight': 4, 'content': [{'end': 2848.803, 'text': "We'll move on.", 'start': 2848.183, 'duration': 0.62}, {'end': 2855.305, 'text': "So this is our mapper class, as you can see right here, in which I'm defining a variable track ID, user ID,", 'start': 2849.323, 'duration': 5.982}, {'end': 2859.966, 'text': 'because this is the final output that I would like to give from my mapper function, correct?', 'start': 2855.305, 'duration': 4.661}, {'end': 2867.328, 'text': 'So what we are doing is we are splitting the string or the input value on the basis of pipe right?, Because it was the delimiter,', 'start': 2860.486, 'duration': 6.842}, {'end': 2874.35, 'text': 'and then again we are picking up the track ID and the user ID and finally dumping it out from the mapper function.', 'start': 2867.328, 'duration': 7.022}, {'end': 2875.731, 'text': 'So very simple, right?', 'start': 2874.69, 'duration': 1.041}, {'end': 2877.171, 'text': 'Guys, are you clear with this?', 'start': 2876.091, 'duration': 1.08}, {'end': 2885.752, 'text': "What we are doing is we're dividing the input on the basis of pipe and we are picking up the track ID and the user ID.", 'start': 2878.447, 'duration': 7.305}], 'summary': 'Mapper class defines track id and user id as final output by splitting input on pipe.', 'duration': 37.569, 'max_score': 2848.183, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2848183.jpg'}, {'end': 2956.759, 'src': 'embed', 'start': 2934.227, 'weight': 3, 'content': [{'end': 2943.932, 'text': 'So what we are doing is We are traversing through every user ID that is there for a particular track ID and then we are adding it to the hash set,', 'start': 2934.227, 'duration': 9.705}, {'end': 2945.433, 'text': 'that is, user ID set.', 'start': 2943.932, 'duration': 1.501}, {'end': 2950.656, 'text': 'Even though this user ID iterable may have duplicate user IDs.', 'start': 2946.033, 'duration': 4.623}, {'end': 2956.759, 'text': 'however, since hash set cannot contain duplicates, it is going to store only one copy of every user.', 'start': 2950.656, 'duration': 6.103}], 'summary': 'Traversing user ids for a track, adding to hash set to store unique users.', 'duration': 22.532, 'max_score': 2934.227, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2934227.jpg'}], 'start': 2603.993, 'title': 'Analyzing last.fm data for song popularity', 'summary': 'Discusses the data stored at last.fm, focusing on finding unique users for each track id to determine song popularity using mapreduce program. it also covers the declaration of constants for indexing and delimiter, the process of splitting input data, and the use of hash set to find unique users for a specific track id in the context of mapreduce data processing.', 'chapters': [{'end': 2770.168, 'start': 2603.993, 'title': 'Analyzing last.fm data for song popularity', 'summary': 'Discusses the data stored at last.fm, specifically focusing on finding unique users for each track id to determine song popularity using mapreduce program.', 'duration': 166.175, 'highlights': ['Last.fm stores data for every song, including user ID, track ID, sharing status, radio listening status, and skipping status. Last.fm stores data for every song, including user ID, track ID, sharing status, radio listening status, and skipping status.', 'The goal is to find unique users for each track ID to determine song popularity. The goal is to find unique users for each track ID to determine song popularity.', 'MapReduce program is used to find unique users for every song at last.fm. MapReduce program is used to find unique users for every song at last.fm.']}, {'end': 2956.759, 'start': 2770.328, 'title': 'Mapreduce constants and data processing', 'summary': 'Discusses the declaration of constants for indexing and delimiter, the process of splitting input data, and the use of hash set to find unique users for a specific track id in the context of mapreduce data processing.', 'duration': 186.431, 'highlights': ['The chapter discusses the declaration of constants for indexing and delimiter The transcript explains the significance of constants for indexing and delimiter to identify values in the dataset.', 'the use of hash set to find unique users for a specific track ID It details the use of a hash set to store unique user IDs for a particular track ID, ensuring no duplicate values are retained.', 'the process of splitting input data The transcript describes the process of splitting input data based on a delimiter and extracting specific values such as track ID and user ID.']}], 'duration': 352.766, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2603993.jpg', 'highlights': ['MapReduce program is used to find unique users for every song at last.fm.', 'The goal is to find unique users for each track ID to determine song popularity.', 'The chapter discusses the declaration of constants for indexing and delimiter.', 'The use of hash set to find unique users for a specific track ID.', 'The process of splitting input data based on a delimiter and extracting specific values.']}, {'end': 3628.094, 'segs': [{'end': 3078.173, 'src': 'embed', 'start': 3019.268, 'weight': 0, 'content': [{'end': 3025.949, 'text': 'You can always download this code and execute it, okay? Now let us go and check out the jar file.', 'start': 3019.268, 'duration': 6.681}, {'end': 3029.07, 'text': 'So this is the jar file that is created.', 'start': 3027.21, 'duration': 1.86}, {'end': 3030.81, 'text': 'That is lastfm.jar.', 'start': 3029.11, 'duration': 1.7}, {'end': 3040.012, 'text': "The very first task is to move the data set to the Hadoop cluster, right? So I'll write Hadoop DFS hyphen port.", 'start': 3031.81, 'duration': 8.202}, {'end': 3049.406, 'text': "I'll give the, path to the input file that is desktop data set and then lastfm underscore sample.", 'start': 3041.172, 'duration': 8.234}, {'end': 3055.369, 'text': 'Finally, I want to move it to the root directory of Hadoop and enter.', 'start': 3050.246, 'duration': 5.123}, {'end': 3058.29, 'text': 'The file has copied now.', 'start': 3057.13, 'duration': 1.16}, {'end': 3062.312, 'text': "Now it's time we execute the lastfm MapReduce program.", 'start': 3059.371, 'duration': 2.941}, {'end': 3073.252, 'text': "So it's Hadoop jar desktop lastfm.jar I need to mention the name of the input file, which is lastfm underscore sample.", 'start': 3062.972, 'duration': 10.28}, {'end': 3078.173, 'text': 'Next, I need to mention the output directory where it will store the final result.', 'start': 3074.012, 'duration': 4.161}], 'summary': 'Moving data set to hadoop cluster and executing mapreduce program.', 'duration': 58.905, 'max_score': 3019.268, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA3019268.jpg'}, {'end': 3166.449, 'src': 'embed', 'start': 3120.158, 'weight': 1, 'content': [{'end': 3124.301, 'text': 'Track 202 was played by 16 unique users and so on.', 'start': 3120.158, 'duration': 4.143}, {'end': 3128.772, 'text': 'There are many other analysis that you can do over this data set.', 'start': 3125.509, 'duration': 3.263}, {'end': 3134.418, 'text': "okay?. That is something I'll leave up to you to imagine and finally write a MapReduce program on that.", 'start': 3128.772, 'duration': 5.646}, {'end': 3139.903, 'text': 'In case you need help, guys, you can always reach out to me or the support team, okay?', 'start': 3134.618, 'duration': 5.285}, {'end': 3142.406, 'text': 'So, guys, are you clear with this??', 'start': 3141.145, 'duration': 1.261}, {'end': 3143.247, 'text': 'Can we move on??', 'start': 3142.586, 'duration': 0.661}, {'end': 3145.588, 'text': 'Great thanks.', 'start': 3144.847, 'duration': 0.741}, {'end': 3153.336, 'text': "So now we look into the MapReduce example for testing, okay? In this we'll be using the MR unit.", 'start': 3146.589, 'duration': 6.747}, {'end': 3160.863, 'text': 'MR unit is nothing but a way or a framework using which you can get the advantages of JUnit on a MapReduce program.', 'start': 3153.756, 'duration': 7.107}, {'end': 3166.449, 'text': 'So what happens is you are basically executing unit test over a MapReduce program.', 'start': 3161.404, 'duration': 5.045}], 'summary': 'Track 202 was played by 16 unique users. further analysis and mapreduce program are discussed.', 'duration': 46.291, 'max_score': 3120.158, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA3120158.jpg'}], 'start': 2957.359, 'title': 'Mapreduce for data analysis', 'summary': 'Covers the process of finding unique listeners for each track using mapreduce, including steps for exporting the jar file, moving data to hadoop cluster, executing the program, and analyzing the final output. it also explains the use of mr unit for unit testing mapreduce programs and demonstrates the importance of unit testing in mapreduce programming.', 'chapters': [{'end': 3145.588, 'start': 2957.359, 'title': 'Mapreduce for finding unique listeners', 'summary': 'Covers the process of finding unique listeners for each track using mapreduce, including steps for exporting the jar file, moving data to hadoop cluster, executing the program, and analyzing the final output, which shows the number of unique users for each track.', 'duration': 188.229, 'highlights': ['The final output shows that Track 202 was played by 16 unique users, and Track 201 was played by 14 unique users, demonstrating the number of unique listeners for each track.', 'The process involves exporting the jar file, moving the data set to the Hadoop cluster, executing the MapReduce program, and analyzing the final output, which displays the number of unique users for each track.', "To execute the lastfm MapReduce program, the input file 'lastfm_sample' is mentioned, and the output directory 'my lastfm' is specified, where the final result is stored.", 'The chapter also emphasizes that additional analysis can be performed on the dataset and encourages writing MapReduce programs for such analyses, offering support if needed.']}, {'end': 3628.094, 'start': 3146.589, 'title': 'Mapreduce unit testing with mr unit', 'summary': 'Explains the use of mr unit for unit testing mapreduce programs, utilizing drivers for testing map and reduce functions, specifying inputs and expected outputs, and demonstrates a test case with a sample mapreduce program, ultimately emphasizing the importance of unit testing in mapreduce programming.', 'duration': 481.505, 'highlights': ['The chapter explains the use of MR Unit for unit testing MapReduce programs Describes the purpose of MR Unit for unit testing MapReduce programs.', 'Utilizing drivers for testing map and reduce functions Discusses the use of drivers for testing map and reduce functions in MapReduce programs.', 'Specifying inputs and expected outputs Emphasizes the importance of specifying inputs and expected outputs for unit testing MapReduce programs.', 'Demonstrates a test case with a sample MapReduce program Illustrates a test case with a sample MapReduce program using MR Unit for unit testing.', 'Emphasizing the importance of unit testing in MapReduce programming Highlights the significance of unit testing in MapReduce programming for ensuring correct logic and functionality.']}], 'duration': 670.735, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/x-PCNX4prLA/pics/x-PCNX4prLA2957359.jpg', 'highlights': ['The process involves exporting the jar file, moving the data set to the Hadoop cluster, executing the MapReduce program, and analyzing the final output, which displays the number of unique users for each track.', 'The final output shows that Track 202 was played by 16 unique users, and Track 201 was played by 14 unique users, demonstrating the number of unique listeners for each track.', 'The chapter explains the use of MR Unit for unit testing MapReduce programs Describes the purpose of MR Unit for unit testing MapReduce programs.', "To execute the lastfm MapReduce program, the input file 'lastfm_sample' is mentioned, and the output directory 'my lastfm' is specified, where the final result is stored."]}], 'highlights': ['The process of input splits, mapping, shuffling, and reducing was explained using the word count example.', 'The example included three input splits: Deer Bear River, Carr Carr River, Deer Carr Bear, with the count of one assigned to every word in the mapping phase.', "The entire session is based on practicals, with at least two to three practicals for today's session.", "The result of the reducing phase yielded specific word counts, with 'car' having the highest count of 3, which was then sent back to the client as the final output.", 'The chapter provides an overview of input formats in MapReduce, highlighting file input format as the most commonly used format, with various sub-formats such as text input format and sequence file input format.', 'The composable input format has two attributes, key and value, which can be of data types supported by Hadoop, whereas DB input format has only one attribute, of type dbWritable, for reading data from a database.', 'There are four types of output formats in MapReduce: File output format, Null output format, DB output format, and Filter output format, with File output format being the most commonly used.', 'Under File output format, there are two types: text output format for dumping output as text, and sequence file output format for storing binary files as sequence files.', 'The chapter discusses the necessary packages and classes for writing a MapReduce program in Java, including the required imports from Hadoop packages.', 'The process of importing Hadoop MapReduce client core and Hadoop common jar files and configuring the build path is detailed, with the specific path for locating these files within the Hadoop package.', 'The step-by-step process of adding the jar files to the build path, including locating the files, navigating through the folder structure, and adding them to the classpath, is explained thoroughly.', 'The setup of a Java project in Eclipse for writing a MapReduce program is demonstrated, including creating a new project and the word count Java file.', 'The explanation of the classes within the Word Count Java program, including the main class, map class, and reduce class, along with the attributes and input/output types for each class, is provided.', 'The reducer class extends the main class or the superclass, with input key as text and input value as int writable, with the output being of type text key and int writable value.', 'The process involves converting the text to string and storing it in a variable called line.', 'Using a string tokenizer to extract words based on spaces, creating a tokenizer object, and passing the line variable to it.', "The process of assigning one value against every word to find the word count is explained, with examples like 'edureka' and 'class' resulting in 'edureka one' and 'class one' respectively.", "The chapter seeks confirmation from the audience, receiving responses from individuals like 'Omar' and 'Matt', and emphasizing the output as 'one' for each word occurrence.", 'Utilizing a while loop to iterate through the tokens and assigning a value of one to each word, demonstrated through an example.', 'The reduce function takes three attributes as input: key text, values of type iterable with values of type int writable, and context, which is used to write the final output from the reducer or the mapper.', 'Iterable is used to handle the list of values against a particular key in the reducer phase, and the data types in Hadoop are specifically designed for a distributed file system and as an input format to a MapReduce program.', 'The chapter concludes with a transition to discussing the reducer implementation for the word count.', 'The process involves defining a configuration object to specify the entire configuration of the MapReduce program, setting up the job to be executed on the Hadoop cluster, and specifying the main class, mapper class, and reducer class for the job.', 'Specifying the input and output paths for the MapReduce program, using command-line arguments, and executing the word count program on the Hadoop cluster, resulting in the successful completion of the job and displaying the word count output.', 'The reducer function calculates the frequency of words by iterating through the values and adding them to the sum variable, producing the word and sum of values as the output.', 'MapReduce used to analyze weather data to determine hot and cold days in Austin', 'Utilization of daily maximum and minimum temperatures along with the date to classify hot and cold days', 'The mapper logic involves identifying inconsistent data, extracting date and temperature values, and classifying hot and cold days, with temperatures greater than 35 being classified as hot and temperatures less than 10 as cold', 'The process includes converting temperature values to float, executing the program using Hadoop, and checking the output to classify each day as hot or cold, with most values being cold due to the dataset belonging to a cold place', 'MapReduce program is used to find unique users for every song at last.fm.', 'The goal is to find unique users for each track ID to determine song popularity.', 'The chapter discusses the declaration of constants for indexing and delimiter.', 'The use of hash set to find unique users for a specific track ID.', 'The process of splitting input data based on a delimiter and extracting specific values.', 'The process involves exporting the jar file, moving the data set to the Hadoop cluster, executing the MapReduce program, and analyzing the final output, which displays the number of unique users for each track.', 'The final output shows that Track 202 was played by 16 unique users, and Track 201 was played by 14 unique users, demonstrating the number of unique listeners for each track.', 'The chapter explains the use of MR Unit for unit testing MapReduce programs Describes the purpose of MR Unit for unit testing MapReduce programs.', "To execute the lastfm MapReduce program, the input file 'lastfm_sample' is mentioned, and the output directory 'my lastfm' is specified, where the final result is stored."]}