title
All about [What is Big Data] ?
description
Big Data?
Big data is a field that treats ways to analyze, systematically extract information from, or otherwise, deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.
And in this video, I have explained " what is big data and its use cases "
Video Playlist
-----------------------
Hadoop in Tamil - https://bit.ly/32k6mBD
Hadoop in English - https://bit.ly/32jle3t
Spark in Tamil - https://bit.ly/2ZzWAJN
Spark in English - https://bit.ly/3mmc0eu
Hive in Tamil - https://bit.ly/2UQVUgv
Hive in English - https://bit.ly/372nCwj
Data Engineering in Tamil - https://bit.ly/2x0r0tG
Data Engineering in English - https://bit.ly/2UIU2ac
Batch vs Stream processing Tamil - https://youtu.be/2txiL17Jer8
Batch vs Stream processing English - https://youtu.be/41VHGrTnFrU
NOSQL in English - https://bit.ly/2XtU07B
NOSQL in Tamil - https://bit.ly/2XVLLjP
Scala in Tamil: https://goo.gl/VfAp6d
Scala in English: https://goo.gl/7l2USl
Email : atozknowledge.com@gmail.com
LinkedIn : https://www.linkedin.com/in/sbgowtham/
Instagram : https://www.instagram.com/bigdata.in/
YouTube channel link
www.youtube.com/atozknowledgevideos
Website
http://atozknowledge.com/
Technology in Tamil & English
#bigdata #dataengineering
detail
{'title': 'All about [What is Big Data] ?', 'heatmap': [{'end': 996.692, 'start': 971.64, 'weight': 1}, {'end': 1117.528, 'start': 1084.077, 'weight': 0.794}], 'summary': 'Delves into big data fundamentals, myths, interview scenarios, technology, open source, hadoop impact, and distribution, emphasizing the significance of data processing, addressing common myths, and demonstrating big data volume understanding through interview scenarios with leading companies. it also covers the emergence of over 10,000 big data solutions and the commercial hadoop services provided by companies like cloudera and hortonworks.', 'chapters': [{'end': 218.304, 'segs': [{'end': 137.905, 'src': 'embed', 'start': 90.433, 'weight': 0, 'content': [{'end': 93.936, 'text': 'Some people are saying Hadoop and some people they are saying they are big data developer.', 'start': 90.433, 'duration': 3.503}, {'end': 98.459, 'text': 'So we are hearing these two terms very frequently in the big data market.', 'start': 94.296, 'duration': 4.163}, {'end': 102.422, 'text': 'So what exactly the difference between these two? So I will tell you that.', 'start': 98.479, 'duration': 3.943}, {'end': 107.466, 'text': 'I will start the session with this difference between first of all.', 'start': 103.683, 'duration': 3.783}, {'end': 111.088, 'text': 'So here Hadoop is a solution.', 'start': 108.626, 'duration': 2.462}, {'end': 116.072, 'text': 'Solution of what? So since you are new, you may get this question.', 'start': 112.069, 'duration': 4.003}, {'end': 120.459, 'text': 'So Hadoop is one solution for data-related problems.', 'start': 116.738, 'duration': 3.721}, {'end': 123.88, 'text': 'For now, you can keep this only in your mind is fine, is wide enough.', 'start': 120.539, 'duration': 3.341}, {'end': 125.341, 'text': 'And big data.', 'start': 124.521, 'duration': 0.82}, {'end': 130.362, 'text': 'as you know, even you are very new to this, but you may aware of it, right?', 'start': 125.341, 'duration': 5.021}, {'end': 133.283, 'text': 'So big data is a problem, okay?', 'start': 130.382, 'duration': 2.901}, {'end': 136.224, 'text': "So we don't want to jump into the solution.", 'start': 133.863, 'duration': 2.361}, {'end': 137.905, 'text': 'We will first start with problem.', 'start': 136.344, 'duration': 1.561}], 'summary': 'Hadoop is a solution for data problems, while big data is the problem itself.', 'duration': 47.472, 'max_score': 90.433, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU90433.jpg'}, {'end': 191.902, 'src': 'embed', 'start': 163.802, 'weight': 3, 'content': [{'end': 167.203, 'text': 'so i have a data which is a collection of information.', 'start': 163.802, 'duration': 3.401}, {'end': 169.964, 'text': 'so what you will be doing with this data?', 'start': 167.203, 'duration': 2.761}, {'end': 173.365, 'text': 'as a human, you can say what, what you will be doing with the data.', 'start': 169.964, 'duration': 3.401}, {'end': 177.995, 'text': "okay, so with the data, I'm just giving my information to you.", 'start': 173.365, 'duration': 4.63}, {'end': 179.156, 'text': "I'm saying that I'm Gautam.", 'start': 178.055, 'duration': 1.101}, {'end': 181.097, 'text': 'I work for so-and-so company.', 'start': 179.736, 'duration': 1.361}, {'end': 182.277, 'text': 'So this is my information.', 'start': 181.137, 'duration': 1.14}, {'end': 184.659, 'text': "I'm just giving it to you and what immediately you will do?", 'start': 182.317, 'duration': 2.342}, {'end': 188.981, 'text': "You will store my information and that's what the mission will do, and that's what the applications will do.", 'start': 184.699, 'duration': 4.282}, {'end': 191.902, 'text': 'And after storing the data, what you will do?', 'start': 189.721, 'duration': 2.181}], 'summary': 'Data collection and storage for personal information', 'duration': 28.1, 'max_score': 163.802, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU163802.jpg'}], 'start': 1.601, 'title': 'Big data fundamentals', 'summary': 'Provides an overview of big data and its difference from hadoop, emphasizing its role as a collection of information. it also discusses data processing, highlighting its significance in recognizing and interacting with individuals.', 'chapters': [{'end': 163.802, 'start': 1.601, 'title': 'Understanding big data basics', 'summary': 'Provides an introductory overview of big data, including the difference between hadoop and big data, as well as the problem-solution paradigm, emphasizing the fundamental concept of big data as a collection of information.', 'duration': 162.201, 'highlights': ['The difference between Hadoop and big data is explained, with Hadoop being a solution for data-related problems, while big data itself is the problem. The session starts with explaining the difference between Hadoop and big data, emphasizing Hadoop as a solution for data-related problems and big data as the problem.', 'The fundamental concept of big data as a collection of information is emphasized, setting the foundation for understanding the problem-solution paradigm. The chapter emphasizes the fundamental concept of big data as a collection of information, laying the groundwork for understanding the problem-solution paradigm.', 'The session introduces big data as a problem and emphasizes the need to first understand the problem before seeking a solution. The session introduces big data as a problem and emphasizes the need to understand the problem before moving on to the solution.']}, {'end': 218.304, 'start': 163.802, 'title': 'Data processing and use', 'summary': 'Discusses the process of storing and using data, illustrating how it is utilized to recognize and interact with individuals, emphasizing the importance of data processing in human interaction.', 'duration': 54.502, 'highlights': ['The process of storing and using data is illustrated through the example of recognizing and interacting with individuals, emphasizing the importance of data processing in human interaction.', 'Human interaction is enabled through the storage and processing of data, as exemplified by recognizing individuals and engaging with them based on stored information.', 'Data is utilized to recognize and interact with individuals, highlighting the significance of data processing in facilitating human interaction.']}], 'duration': 216.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1601.jpg', 'highlights': ['The difference between Hadoop and big data is explained, with Hadoop being a solution for data-related problems, while big data itself is the problem.', 'The fundamental concept of big data as a collection of information is emphasized, setting the foundation for understanding the problem-solution paradigm.', 'The session introduces big data as a problem and emphasizes the need to first understand the problem before seeking a solution.', 'The process of storing and using data is illustrated through the example of recognizing and interacting with individuals, emphasizing the importance of data processing in human interaction.', 'Human interaction is enabled through the storage and processing of data, as exemplified by recognizing individuals and engaging with them based on stored information.', 'Data is utilized to recognize and interact with individuals, highlighting the significance of data processing in facilitating human interaction.']}, {'end': 472.757, 'segs': [{'end': 295.22, 'src': 'embed', 'start': 268.466, 'weight': 1, 'content': [{'end': 271.769, 'text': "So that also I'm going to clarify.", 'start': 268.466, 'duration': 3.303}, {'end': 277.694, 'text': 'Okay, So I will tell you what are all the wrong understanding that people used to have when they come into big data,', 'start': 272.049, 'duration': 5.645}, {'end': 284.637, 'text': 'and even people who has some experience in big data still they used to have these kind of wrong understanding and myths about big data.', 'start': 277.694, 'duration': 6.943}, {'end': 286.317, 'text': 'So I just will clear you all those stuff.', 'start': 284.657, 'duration': 1.66}, {'end': 291.059, 'text': "So one such thing a myth is like I'm going to tell you now.", 'start': 286.877, 'duration': 4.182}, {'end': 295.22, 'text': 'So here when I say infinity the problems can be anything with the data.', 'start': 291.939, 'duration': 3.281}], 'summary': 'Clarifying common myths and misunderstandings about big data to address misconceptions and improve understanding.', 'duration': 26.754, 'max_score': 268.466, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU268466.jpg'}, {'end': 373.545, 'src': 'embed', 'start': 344.408, 'weight': 2, 'content': [{'end': 347.009, 'text': 'So volume is not the only problem we have.', 'start': 344.408, 'duration': 2.601}, {'end': 352.991, 'text': 'And if you take any leading use cases in any big companies to small companies where they use big data, go and ask them.', 'start': 347.489, 'duration': 5.502}, {'end': 354.594, 'text': 'tell your use case.', 'start': 353.513, 'duration': 1.081}, {'end': 356.195, 'text': 'So they will be telling the use case.', 'start': 354.614, 'duration': 1.581}, {'end': 362.458, 'text': 'Only out of 10 companies, two companies can tell you or two use cases can explain you that the problem is volume.', 'start': 356.615, 'duration': 5.843}, {'end': 366.321, 'text': 'But the remaining eight companies or eight use cases will be different problems.', 'start': 362.598, 'duration': 3.723}, {'end': 369.603, 'text': 'You can state the statement like this.', 'start': 367.321, 'duration': 2.282}, {'end': 373.545, 'text': 'Big data can solve any problems.', 'start': 370.743, 'duration': 2.802}], 'summary': 'Only 20% of big data use cases are related to volume, while 80% address other problems.', 'duration': 29.137, 'max_score': 344.408, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU344408.jpg'}, {'end': 419.625, 'src': 'embed', 'start': 386.578, 'weight': 0, 'content': [{'end': 388.68, 'text': 'So volume is one out of n problems.', 'start': 386.578, 'duration': 2.102}, {'end': 392.643, 'text': 'So what problems we do have with the big data?', 'start': 389.2, 'duration': 3.443}, {'end': 402.751, 'text': 'So we have a problem with value, the quality of data, and we have the problem with visualization and we do have problem with velocity.', 'start': 392.663, 'duration': 10.088}, {'end': 406.053, 'text': 'The speed in transaction and speed in processing is called velocity.', 'start': 402.811, 'duration': 3.242}, {'end': 411.618, 'text': 'And then we do have problem with variety, structure, semi-structure, and unstructured data processing.', 'start': 406.554, 'duration': 5.064}, {'end': 414.262, 'text': 'And then we do have volatile viability.', 'start': 412.02, 'duration': 2.242}, {'end': 419.625, 'text': "So all problem statements has been, means they used to, it's some trending terms.", 'start': 414.502, 'duration': 5.123}], 'summary': 'Challenges with big data include volume, quality, velocity, variety, and volatile viability.', 'duration': 33.047, 'max_score': 386.578, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU386578.jpg'}], 'start': 219.057, 'title': 'Big data myths and solutions', 'summary': 'Defines big data and addresses common myths, aiming to clarify misconceptions. it discusses how big data can address a wide range of issues beyond volume, such as value, quality, visualization, velocity, and variety.', 'chapters': [{'end': 291.059, 'start': 219.057, 'title': 'Definition of big data and common myths', 'summary': 'Defines big data as the problem with storage and processing, and addresses common myths and misunderstandings, aiming to clarify misconceptions about big data.', 'duration': 72.002, 'highlights': ['The definition of big data is the problem with storage and processing, which arises due to problematic data, leading to various misconceptions and myths.', 'The chapter aims to clarify the wrong understanding and myths about big data that even experienced individuals tend to have.', 'The problems related to big data are infinite, and the chapter addresses common myths and misunderstandings about big data.']}, {'end': 472.757, 'start': 291.939, 'title': 'Big data myths and solutions', 'summary': 'Discusses the misconception that big data only solves volume problems and highlights the various other issues like value, quality, visualization, velocity, and variety, emphasizing that big data can address a wide range of data problems, not just volume.', 'duration': 180.818, 'highlights': ['Big data can solve a wide range of data problems, not just volume, including value, quality, visualization, velocity, and variety. Big data is not limited to solving volume problems but also addresses issues like value, quality, visualization, velocity, and variety, showcasing its capability to handle diverse data challenges.', 'Only 20% of use cases in companies are related to volume problems, while the rest involve different data challenges. Out of 10 companies, only 2 use cases are related to volume problems, indicating that the majority, 80%, involve a variety of different data challenges.', 'Big data solutions are not limited to volume problems and can address issues such as velocity and viability. Apart from volume, big data solutions also cater to problems related to velocity and viability, showcasing its versatility in handling various data challenges.']}], 'duration': 253.7, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU219057.jpg', 'highlights': ['Big data addresses a wide range of issues beyond volume, including value, quality, visualization, velocity, and variety.', 'The chapter aims to clarify misconceptions and myths about big data, even among experienced individuals.', 'Only 20% of use cases in companies are related to volume problems, while the rest involve different data challenges.']}, {'end': 817.828, 'segs': [{'end': 569.243, 'src': 'embed', 'start': 521.792, 'weight': 0, 'content': [{'end': 522.472, 'text': "I'll tell you that.", 'start': 521.792, 'duration': 0.68}, {'end': 526.095, 'text': "I asked him like, he said, I'm not satisfied.", 'start': 523.332, 'duration': 2.763}, {'end': 526.675, 'text': 'I asked why.', 'start': 526.115, 'duration': 0.56}, {'end': 532.419, 'text': 'He said, 25 GB of data is what you are processing for one country by EOD.', 'start': 527.235, 'duration': 5.184}, {'end': 539.224, 'text': 'This 25 GB seems to be very less, right? So why then you are coming for big data? This is what the question he asked.', 'start': 533.46, 'duration': 5.764}, {'end': 543.987, 'text': 'Then I told him, the problem is not with volume in my project.', 'start': 540.244, 'duration': 3.743}, {'end': 551.239, 'text': 'The problem is the existing technology, the processing speed of the data was slow.', 'start': 544.437, 'duration': 6.802}, {'end': 554.839, 'text': 'So the data storage was not a problem in the previous technology.', 'start': 551.719, 'duration': 3.12}, {'end': 557.62, 'text': 'The data storage is still fine with the previous technology.', 'start': 554.919, 'duration': 2.701}, {'end': 561.601, 'text': 'We migrated to big data only for the processing speed.', 'start': 557.68, 'duration': 3.921}, {'end': 565.762, 'text': 'So I told him volume was not the use case in my project.', 'start': 562.601, 'duration': 3.161}, {'end': 569.243, 'text': 'The problem is velocity, that speed in processing.', 'start': 566.642, 'duration': 2.601}], 'summary': 'Migrating to big data for faster processing, not volume; 25 gb insufficient for one country.', 'duration': 47.451, 'max_score': 521.792, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU521792.jpg'}, {'end': 762.342, 'src': 'embed', 'start': 731.703, 'weight': 2, 'content': [{'end': 736.204, 'text': 'if you cross, then you can bring big data into it.', 'start': 731.703, 'duration': 4.501}, {'end': 739.845, 'text': 'so if you, if you see the use cases of different companies, go to their engineering blog,', 'start': 736.204, 'duration': 3.641}, {'end': 744.106, 'text': 'if you see facebook will say i use big data and my data size is petabytes.', 'start': 739.845, 'duration': 4.261}, {'end': 746.374, 'text': 'and And there is one more small startup company.', 'start': 744.106, 'duration': 2.268}, {'end': 749.415, 'text': 'They say we use big data and my data volume is just gigabyte.', 'start': 746.394, 'duration': 3.021}, {'end': 750.436, 'text': "That's fine.", 'start': 749.956, 'duration': 0.48}, {'end': 756.699, 'text': 'So their existing technology has given the problem with that particular size.', 'start': 751.076, 'duration': 5.623}, {'end': 757.139, 'text': "That's it.", 'start': 756.799, 'duration': 0.34}, {'end': 762.342, 'text': 'So the answer for the question what is the use case of the volume?', 'start': 757.799, 'duration': 4.543}], 'summary': 'Big data use cases vary from gigabytes to petabytes in size.', 'duration': 30.639, 'max_score': 731.703, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU731703.jpg'}, {'end': 824.5, 'src': 'embed', 'start': 796.815, 'weight': 3, 'content': [{'end': 799.296, 'text': 'your 1 gb is your use case, actually.', 'start': 796.815, 'duration': 2.481}, {'end': 803.939, 'text': "so that means like you have 5 gb pen drive and i'm asking you to store 6 gb data is not possible, right?", 'start': 799.296, 'duration': 4.643}, {'end': 807.802, 'text': 'so for that 5 gb pen drive, the balance 1 gb data is big data.', 'start': 803.939, 'duration': 3.863}, {'end': 810.123, 'text': "that's it, okay?", 'start': 807.802, 'duration': 2.321}, {'end': 812.564, 'text': 'so the scenario i just explained you the reason.', 'start': 810.123, 'duration': 2.441}, {'end': 815.566, 'text': 'i wanted to build you the confidence that volume is not the only problem.', 'start': 812.564, 'duration': 3.002}, {'end': 817.828, 'text': 'we do have a lot of a lot of other problems.', 'start': 815.566, 'duration': 2.262}, {'end': 818.228, 'text': 'okay, fine.', 'start': 817.828, 'duration': 0.4}, {'end': 824.5, 'text': 'So now we kind of understood the problems, what we face with data.', 'start': 818.959, 'duration': 5.541}], 'summary': '1 gb is the use case, 5 gb pen drive, 6 gb data, balance 1 gb is big data, volume is not the only problem', 'duration': 27.685, 'max_score': 796.815, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU796815.jpg'}], 'start': 472.757, 'title': 'Big data interview scenario and understanding big data volume', 'summary': 'Delves into an interview scenario demonstrating the use of big data for processing speed with 25 gb of data, and emphasizes the significance of understanding big data volume beyond sheer size, drawing examples from leading companies like google and facebook.', 'chapters': [{'end': 569.243, 'start': 472.757, 'title': 'Big data interview scenario', 'summary': 'Highlights an interview scenario where the interviewee explains the use of big data for processing speed despite processing only 25 gb of data for one country by eod.', 'duration': 96.486, 'highlights': ['The interviewee explained a scenario from a past interview where the interviewer questioned the need for big data when processing only 25 GB of data for one country by EOD, emphasizing the importance of processing speed over data volume.', 'The interviewee clarified that the use of big data in their project was not due to data volume but rather the need for improved processing speed, highlighting the significance of velocity in data processing over sheer volume.']}, {'end': 817.828, 'start': 569.883, 'title': 'Understanding big data volume', 'summary': 'Highlights the importance of understanding big data volume through a conversation with an interviewer, emphasizing that volume is not the only problem and that even processing a small amount of data can be considered big data in certain contexts, as illustrated by examples from companies like google and facebook.', 'duration': 247.945, 'highlights': ['The importance of understanding big data volume through a conversation with an interviewer, emphasizing that volume is not the only problem and that processing a small amount of data can be considered big data in certain contexts. Emphasizes the importance of understanding big data volume. Illustrates that volume is not the only problem and that processing a small amount of data can be considered big data in certain contexts.', 'Illustrates examples from companies like Google and Facebook, emphasizing that even processing a small amount of data can be considered big data in certain contexts. Provides examples from companies like Google and Facebook to illustrate that even processing a small amount of data can be considered big data in certain contexts.', 'Explains the concept of big data volume using the analogy of a pen drive, emphasizing that volume is not the only problem in big data and that there are other challenges to consider. Uses the analogy of a pen drive to explain that volume is not the only problem in big data and there are other challenges to consider.']}], 'duration': 345.071, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU472757.jpg', 'highlights': ['The interviewee emphasized the importance of processing speed over data volume when dealing with 25 GB of data.', 'Understanding big data volume is crucial, as processing a small amount of data can still be considered big data in certain contexts.', 'Examples from leading companies like Google and Facebook were used to illustrate that even processing a small amount of data can be considered big data.', 'The analogy of a pen drive was used to explain that big data volume is not the only challenge to consider.']}, {'end': 1244.35, 'segs': [{'end': 862.706, 'src': 'embed', 'start': 817.828, 'weight': 0, 'content': [{'end': 818.228, 'text': 'okay, fine.', 'start': 817.828, 'duration': 0.4}, {'end': 824.5, 'text': 'So now we kind of understood the problems, what we face with data.', 'start': 818.959, 'duration': 5.541}, {'end': 828.141, 'text': 'And we are facing all these data problems.', 'start': 824.96, 'duration': 3.181}, {'end': 836.562, 'text': 'The reason is like the applications usage, right? So we got the same application in mobile and in desktop and in the laptop, tablet.', 'start': 828.481, 'duration': 8.081}, {'end': 841.503, 'text': 'So people usage means the usage of having the data usage got created.', 'start': 836.842, 'duration': 4.661}, {'end': 847.284, 'text': 'So both in front and back end, we got a lot of problems and people invented such technologies to solve it.', 'start': 841.523, 'duration': 5.761}, {'end': 856.383, 'text': 'And if you take, we do have more than 10 000 plus solutions for data under the big data technology.', 'start': 847.664, 'duration': 8.719}, {'end': 857.464, 'text': 'okay, so we do have 10 000.', 'start': 856.383, 'duration': 1.081}, {'end': 859.905, 'text': 'hadoop. is one such technology like that?', 'start': 857.464, 'duration': 2.441}, {'end': 862.706, 'text': 'we have spark, kafka, strom, flume.', 'start': 859.905, 'duration': 2.801}], 'summary': 'Data usage across multiple devices leads to 10,000+ solutions under big data technology.', 'duration': 44.878, 'max_score': 817.828, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU817828.jpg'}, {'end': 935.803, 'src': 'embed', 'start': 908.325, 'weight': 2, 'content': [{'end': 913.369, 'text': 'Like, we have C, C++, Java and we call them as programming languages right?', 'start': 908.325, 'duration': 5.044}, {'end': 917.552, 'text': 'So, similar to that, we need to give a unified term to all these solutions.', 'start': 913.409, 'duration': 4.143}, {'end': 921.135, 'text': 'So they came up with the problem name itself as a market name.', 'start': 917.612, 'duration': 3.523}, {'end': 923.056, 'text': 'Big data is a problem name.', 'start': 921.955, 'duration': 1.101}, {'end': 925.818, 'text': "But the thing is they don't have any other names.", 'start': 923.897, 'duration': 1.921}, {'end': 931.823, 'text': 'So they just gave this big data as a technology name, the market name, the designations name.', 'start': 925.898, 'duration': 5.925}, {'end': 935.803, 'text': "So, they just gave the problem name as a solution name and that's it.", 'start': 932.301, 'duration': 3.502}], 'summary': 'Big data was given a unified term as a market, technology, and solution name.', 'duration': 27.478, 'max_score': 908.325, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU908325.jpg'}, {'end': 1000.154, 'src': 'heatmap', 'start': 971.64, 'weight': 1, 'content': [{'end': 974.723, 'text': 'So in big data, you have technology for storage.', 'start': 971.64, 'duration': 3.083}, {'end': 977.045, 'text': 'That means databases and file systems.', 'start': 975.043, 'duration': 2.002}, {'end': 982.009, 'text': 'And in big data, you have technology for processing, for data processing.', 'start': 977.365, 'duration': 4.644}, {'end': 986.993, 'text': 'And these are called, so this is one data layer, storage layer, processing layer.', 'start': 982.529, 'duration': 4.464}, {'end': 996.692, 'text': 'we do have technology for data testing and we do have for visualization, and we do have technology for data science,', 'start': 987.546, 'duration': 9.146}, {'end': 1000.154, 'text': 'machine learning and artificial intelligence, etc.', 'start': 996.692, 'duration': 3.462}], 'summary': 'Big data technology includes storage, processing, testing, visualization, data science, machine learning, and artificial intelligence.', 'duration': 28.514, 'max_score': 971.64, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU971640.jpg'}, {'end': 1042.451, 'src': 'embed', 'start': 1014.736, 'weight': 3, 'content': [{'end': 1019.498, 'text': 'So these layers are already there, right? So we do have for storage, we have Oracle and MySQL.', 'start': 1014.736, 'duration': 4.762}, {'end': 1023.66, 'text': 'For processing, we do have some processing frameworks like Informatica and ETL.', 'start': 1019.538, 'duration': 4.122}, {'end': 1030.684, 'text': 'But still, these layers also supported in big data as well, but with the different technology names.', 'start': 1023.941, 'duration': 6.743}, {'end': 1032.204, 'text': 'Okay, fine.', 'start': 1030.904, 'duration': 1.3}, {'end': 1037.667, 'text': 'And I have a separate video like explaining you the different layers of data.', 'start': 1032.605, 'duration': 5.062}, {'end': 1039.608, 'text': 'Okay, so I have a separate video.', 'start': 1038.167, 'duration': 1.441}, {'end': 1042.451, 'text': "I've just given that video link in the description box.", 'start': 1039.628, 'duration': 2.823}], 'summary': 'Current data architecture includes oracle and mysql for storage, informatica and etl for processing, with big data support using different technologies.', 'duration': 27.715, 'max_score': 1014.736, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1014736.jpg'}, {'end': 1131.431, 'src': 'heatmap', 'start': 1084.077, 'weight': 4, 'content': [{'end': 1089.96, 'text': 'So Hadoop, like in 2002, Google released a paper called GFS, Google File System.', 'start': 1084.077, 'duration': 5.883}, {'end': 1091.721, 'text': "It's to distribute the data.", 'start': 1090.3, 'duration': 1.421}, {'end': 1096.764, 'text': 'And in 2004, Google released another base paper called Google Map Reduce.', 'start': 1092.341, 'duration': 4.423}, {'end': 1098.925, 'text': 'This is to process the distributed data.', 'start': 1096.804, 'duration': 2.121}, {'end': 1102.403, 'text': 'The data is already distributed in GFS.', 'start': 1099.982, 'duration': 2.421}, {'end': 1110.826, 'text': 'so to process that data in a distributed in a parallel processing, they got GMR, and Google invented these two based paper in these two years.', 'start': 1102.403, 'duration': 8.423}, {'end': 1117.528, 'text': 'So in mid of 2005-06 Hadoop has been invented by duck cutting.', 'start': 1111.366, 'duration': 6.162}, {'end': 1120.229, 'text': 'So he invented this.', 'start': 1119.169, 'duration': 1.06}, {'end': 1122.75, 'text': "So he's an ex-employee of Yahoo.", 'start': 1121.07, 'duration': 1.68}, {'end': 1124.291, 'text': 'He has his own company.', 'start': 1122.91, 'duration': 1.381}, {'end': 1125.531, 'text': 'I will tell you the company name.', 'start': 1124.331, 'duration': 1.2}, {'end': 1127.592, 'text': 'So Hadoop has two projects.', 'start': 1126.111, 'duration': 1.481}, {'end': 1131.431, 'text': 'HDFS and then MapReduce.', 'start': 1128.427, 'duration': 3.004}], 'summary': "Hadoop, invented by doug cutting in 2005-06, is based on google's gfs and mapreduce papers, with two main projects: hdfs and mapreduce.", 'duration': 47.354, 'max_score': 1084.077, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1084077.jpg'}, {'end': 1226.143, 'src': 'embed', 'start': 1196.27, 'weight': 6, 'content': [{'end': 1201.354, 'text': "so that is a generation technology, generation gap between big companies and to us it's always happened.", 'start': 1196.27, 'duration': 5.084}, {'end': 1210.572, 'text': 'One more example is, I think, like Android application and a solid application, a phone developed by 2012 or 13,, I think so.', 'start': 1201.846, 'duration': 8.726}, {'end': 1219.778, 'text': "But by the time Google said it is 10 year old project, that means it's like 10 years old from 2012 is something again, it's touching 90s.", 'start': 1211.172, 'duration': 8.606}, {'end': 1226.143, 'text': "So by the time only we started using this, the Nokia black phone, right? It's like a brick.", 'start': 1220.259, 'duration': 5.884}], 'summary': 'Generation technology creates a gap between big companies and individuals, illustrated by the 10-year-old android phone in 2012 resembling 90s technology.', 'duration': 29.873, 'max_score': 1196.27, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1196270.jpg'}], 'start': 817.828, 'title': 'Big data technology and hadoop impact', 'summary': 'Covers the challenges of handling diverse data, emergence of over 10,000 big data solutions, layers and technologies including specific ones like oracle and mysql, and the evolution of hadoop from 2002 to its open-source announcement in 2005.', 'chapters': [{'end': 956.518, 'start': 817.828, 'title': 'Understanding big data technology', 'summary': 'Introduces the challenges in handling data due to diverse application usage, the emergence of over 10,000 solutions under big data technology, and the naming history of big data as a unified term for diverse solutions.', 'duration': 138.69, 'highlights': ['The emergence of over 10,000 solutions under big data technology. The speaker mentions that there are more than 10,000 solutions available for data under big data technology, showcasing the extensive range of solutions developed to address data challenges.', 'Challenges in handling data due to diverse application usage. The problems in handling data are attributed to the usage of the same application across multiple devices, leading to the creation of data usage issues, both at the front and back end.', "Naming history of big data as a unified term for diverse solutions. The speaker explains that the term 'big data' was chosen as a unified name for diverse solutions in the absence of any other names, likening it to the naming of programming languages like C, C++, and Java."]}, {'end': 1083.096, 'start': 958.809, 'title': 'Layers and technologies in big data', 'summary': 'Discusses the layers and technologies in big data, including storage, processing, analytics, and automation, and mentions specific technologies like oracle, mysql, informatica, and etl.', 'duration': 124.287, 'highlights': ['The chapter explains the various layers in big data, including storage, processing, analytics, and automation, and mentions specific technologies like Oracle, MySQL, Informatica, and ETL.', 'It describes the technology layers in big data, such as storage, processing, analytics, and automation, and mentions specific examples like Oracle, MySQL, Informatica, and ETL.', 'The chapter emphasizes the different technology layers in big data, including storage, processing, analytics, and automation, and mentions specific technologies such as Oracle, MySQL, Informatica, and ETL.']}, {'end': 1244.35, 'start': 1084.077, 'title': 'Evolution of hadoop and its impact', 'summary': "Explores the evolution of hadoop, starting from google's release of gfs and mapreduce in 2002 and 2004, to the invention of hadoop by doug cutting in mid-2005, and its subsequent open-source announcement.", 'duration': 160.273, 'highlights': ["Google released GFS and MapReduce in 2002 and 2004, respectively, which laid the foundation for distributed data processing. Google's release of GFS and MapReduce set the groundwork for distributed data management and processing, influencing the development of Hadoop.", 'Hadoop was invented by Doug Cutting in mid-2005, following the concepts of GFS and MapReduce, and he announced it as open source. Doug Cutting, an ex-employee of Yahoo, invented Hadoop, building upon the ideas from GFS and MapReduce, and subsequently declared it as open source, marking a significant milestone in big data technology.', 'The evolution of technology, such as GFS and Hadoop, from the late 1990s to the 2000s, showcases the generational gap in the adoption of modern innovations. The transition of technology from the late 1990s to the 2000s, as seen with GFS and Hadoop, highlights the generational gap in the adoption of modern innovations, providing an intriguing perspective on the pace of technological advancements.']}], 'duration': 426.522, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU817828.jpg', 'highlights': ['The emergence of over 10,000 solutions under big data technology.', 'Challenges in handling data due to diverse application usage.', 'Naming history of big data as a unified term for diverse solutions.', 'The chapter explains the various layers in big data, including storage, processing, analytics, and automation, and mentions specific technologies like Oracle, MySQL, Informatica, and ETL.', 'Google released GFS and MapReduce in 2002 and 2004, respectively, which laid the foundation for distributed data processing.', 'Hadoop was invented by Doug Cutting in mid-2005, following the concepts of GFS and MapReduce, and he announced it as open source.', 'The evolution of technology, such as GFS and Hadoop, from the late 1990s to the 2000s, showcases the generational gap in the adoption of modern innovations.']}, {'end': 1777.172, 'segs': [{'end': 1321.156, 'src': 'embed', 'start': 1298.424, 'weight': 1, 'content': [{'end': 1305.989, 'text': "But it's like you have to go to the doorstep of each and every company and you have to explain them, right? But open source will work in reverse.", 'start': 1298.424, 'duration': 7.565}, {'end': 1307.888, 'text': 'So you have your own website.', 'start': 1306.607, 'duration': 1.281}, {'end': 1315.532, 'text': 'Imagine Gautam.com and I release my code and company started looking into my website and they love and they like my code and they download it and they will use it.', 'start': 1307.928, 'duration': 7.604}, {'end': 1321.156, 'text': 'And then they will come to me and they will say like the code, what you have in your website is really good.', 'start': 1316.213, 'duration': 4.943}], 'summary': 'Open source allows companies to discover and use code, increasing visibility and adoption.', 'duration': 22.732, 'max_score': 1298.424, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1298424.jpg'}, {'end': 1384.08, 'src': 'embed', 'start': 1359.048, 'weight': 0, 'content': [{'end': 1364.411, 'text': 'So one such community, very famous, You might have heard about it, Apache Software Foundation.', 'start': 1359.048, 'duration': 5.363}, {'end': 1366.072, 'text': "So it's like ISO.", 'start': 1365.151, 'duration': 0.921}, {'end': 1370.536, 'text': 'In India, we used to get that ISO, right, for the company and the product.', 'start': 1366.993, 'duration': 3.543}, {'end': 1373.118, 'text': 'So similar to that, these guys will give you the license.', 'start': 1370.556, 'duration': 2.562}, {'end': 1378.402, 'text': 'And this Apache license was trusted by the whole world of IT giant companies.', 'start': 1373.218, 'duration': 5.184}, {'end': 1379.243, 'text': 'They will trust this.', 'start': 1378.462, 'duration': 0.781}, {'end': 1384.08, 'text': 'And each and every company they have their own research and development team.', 'start': 1379.716, 'duration': 4.364}], 'summary': 'The apache software foundation provides a trusted license, relied upon by many it giants worldwide.', 'duration': 25.032, 'max_score': 1359.048, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1359048.jpg'}, {'end': 1454.109, 'src': 'embed', 'start': 1426.051, 'weight': 3, 'content': [{'end': 1428.953, 'text': 'recognition for you if your project is in apache.', 'start': 1426.051, 'duration': 2.902}, {'end': 1430.274, 'text': 'you can add that in your resume.', 'start': 1428.953, 'duration': 1.321}, {'end': 1434.076, 'text': 'that will give you like more opportunity with respect to job as well.', 'start': 1430.274, 'duration': 3.802}, {'end': 1436.818, 'text': 'fine, and the people who release their project in apache?', 'start': 1434.076, 'duration': 2.742}, {'end': 1438.639, 'text': "it's not like an one person.", 'start': 1436.818, 'duration': 1.821}, {'end': 1441.521, 'text': 'it could be a group of team or it could be a company as well.', 'start': 1438.639, 'duration': 2.882}, {'end': 1442.842, 'text': 'okay, fine.', 'start': 1441.521, 'duration': 1.321}, {'end': 1444.843, 'text': 'so now now comes.', 'start': 1442.842, 'duration': 2.001}, {'end': 1450.907, 'text': "so i told you, right, i'm explaining the history, not for really to know what is this khadub and when they invented it.", 'start': 1444.843, 'duration': 6.064}, {'end': 1454.109, 'text': 'right. so there is a different thing i want to explain based on this.', 'start': 1450.907, 'duration': 3.202}], 'summary': 'Releasing a project in apache can enhance job opportunities and is usually a team effort.', 'duration': 28.058, 'max_score': 1426.051, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1426051.jpg'}, {'end': 1700.586, 'src': 'embed', 'start': 1675.725, 'weight': 2, 'content': [{'end': 1682.619, 'text': 'Okay, there are some companies who is doing this, Cloudera, So Cloudera is the company was formed by the duck cutting who is father of Hadoop.', 'start': 1675.725, 'duration': 6.894}, {'end': 1685.761, 'text': 'Okay With some group of engineers as a partners.', 'start': 1683, 'duration': 2.761}, {'end': 1690.582, 'text': 'And the second leading company in providing big data as a service in market Hortonworks.', 'start': 1686.361, 'duration': 4.221}, {'end': 1691.963, 'text': 'First is Cloudera.', 'start': 1691.042, 'duration': 0.921}, {'end': 1694.364, 'text': 'And recently these two company has been merged.', 'start': 1692.283, 'duration': 2.081}, {'end': 1696.484, 'text': 'And the next company is Amazon.', 'start': 1694.944, 'duration': 1.54}, {'end': 1700.586, 'text': 'And there is a company like Google is also doing this.', 'start': 1697.685, 'duration': 2.901}], 'summary': 'Cloudera and hortonworks merged as leading big data providers, with amazon and google also in the market.', 'duration': 24.861, 'max_score': 1675.725, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1675725.jpg'}], 'start': 1244.45, 'title': 'Open source, apache license, hadoop, and big data services', 'summary': "Covers the concept of open source, benefits in business model and community trust, apache software foundation's role, limitations of apache hadoop, and commercial hadoop services provided by companies like cloudera and hortonworks, with cloudera as the leading provider and recent merger of cloudera and hortonworks.", 'chapters': [{'end': 1444.843, 'start': 1244.45, 'title': 'Understanding open source and apache license', 'summary': 'Explains the concept of open source, highlighting its benefits in terms of business model, community trust, and recognition, as well as the role of apache software foundation in ensuring trust and support for open source projects.', 'duration': 200.393, 'highlights': ['Open source allows creators to offer their code for free, leading to potential business opportunities as companies may approach them for support and funding.', 'Apache Software Foundation provides a trusted license for open source code, gaining the trust of IT giant companies and offering recognition for projects hosted on its platform.', 'Companies monitor the Apache website for new source code, potentially leading to funding and support opportunities for creators.', "Having a project in Apache Software Foundation can enhance a creator's job opportunities and serve as a valuable addition to their resume."]}, {'end': 1777.172, 'start': 1444.843, 'title': 'Commercial enterprise hadoop and big data services', 'summary': 'Explains the limitations of using apache hadoop due to lack of support, and how companies like cloudera, hortonworks, amazon, microsoft, and others provide commercial hadoop and big data services to address this issue, with cloudera being the leading provider and cloudera and hortonworks recently merging.', 'duration': 332.329, 'highlights': ['Companies like Cloudera, Hortonworks, Amazon, Microsoft, and others provide commercial Hadoop and big data services to address the lack of support from Apache, with Cloudera being the leading provider and Cloudera and Hortonworks recently merging. Cloudera, Hortonworks, Amazon, Microsoft, and others provide commercial Hadoop and big data services, addressing the lack of support from Apache, with Cloudera being the leading provider and Cloudera and Hortonworks recently merging.', 'The limitations of using Apache Hadoop due to lack of support from Apache Software Foundation are highlighted, leading to the need for commercial Hadoop and big data services. The limitations of using Apache Hadoop due to lack of support from Apache Software Foundation are highlighted, leading to the need for commercial Hadoop and big data services.', 'Cloudera, formed by Doug Cutting, and Hortonworks are identified as companies providing commercial Hadoop and big data services, with Cloudera being the leading provider and Cloudera and Hortonworks recently merging. Cloudera, formed by Doug Cutting, and Hortonworks are identified as companies providing commercial Hadoop and big data services, with Cloudera being the leading provider and Cloudera and Hortonworks recently merging.']}], 'duration': 532.722, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1244450.jpg', 'highlights': ['Apache Software Foundation provides a trusted license for open source code, gaining the trust of IT giant companies and offering recognition for projects hosted on its platform.', 'Open source allows creators to offer their code for free, leading to potential business opportunities as companies may approach them for support and funding.', 'Companies like Cloudera, Hortonworks, Amazon, Microsoft, and others provide commercial Hadoop and big data services to address the lack of support from Apache, with Cloudera being the leading provider and Cloudera and Hortonworks recently merging.', "Having a project in Apache Software Foundation can enhance a creator's job opportunities and serve as a valuable addition to their resume."]}, {'end': 2360.645, 'segs': [{'end': 1844.552, 'src': 'embed', 'start': 1818.74, 'weight': 0, 'content': [{'end': 1824.883, 'text': "So it's like Apache is always a common one, and even if you can do freelancing as well, but these are enterprise editions.", 'start': 1818.74, 'duration': 6.143}, {'end': 1830.746, 'text': "So it's not that much easy and your laptop requirement, hardware requirement is also important.", 'start': 1825.343, 'duration': 5.403}, {'end': 1835.808, 'text': 'You need at least more than 8 GB of RAM at least to deploy all these enterprise editions.', 'start': 1830.766, 'duration': 5.042}, {'end': 1839.79, 'text': 'But what these things are doing, the same thing you can do with Apache as well.', 'start': 1835.868, 'duration': 3.922}, {'end': 1844.552, 'text': 'But the problem here is you have to install, you have to configure everything you have to do.', 'start': 1839.85, 'duration': 4.702}], 'summary': 'Apache is common, but enterprise editions need at least 8 gb of ram to deploy, with similar capabilities as apache.', 'duration': 25.812, 'max_score': 1818.74, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1818740.jpg'}, {'end': 1938.735, 'src': 'embed', 'start': 1912.12, 'weight': 1, 'content': [{'end': 1915.921, 'text': 'people used to say we need to know java, only then we can jump into big data.', 'start': 1912.12, 'duration': 3.801}, {'end': 1917.081, 'text': "it's not like that.", 'start': 1915.921, 'duration': 1.16}, {'end': 1918.482, 'text': "it's like 50, 50.", 'start': 1917.081, 'duration': 1.401}, {'end': 1922.844, 'text': 'so sql 50 and then programming language 50.', 'start': 1918.482, 'duration': 4.362}, {'end': 1924.924, 'text': 'and then linux also plays an important role.', 'start': 1922.844, 'duration': 2.08}, {'end': 1926.225, 'text': "so it's a mixed set of thing.", 'start': 1924.924, 'duration': 1.301}, {'end': 1928.391, 'text': "okay, it's not like You need to know only SQL.", 'start': 1926.225, 'duration': 2.166}, {'end': 1929.371, 'text': 'You need to know only Java.', 'start': 1928.411, 'duration': 0.96}, {'end': 1930.091, 'text': "It's not like that.", 'start': 1929.431, 'duration': 0.66}, {'end': 1934.433, 'text': "And even in programming language, it's not like you need to know only Java.", 'start': 1930.211, 'duration': 4.222}, {'end': 1936.794, 'text': 'Any programming language is wide enough.', 'start': 1934.833, 'duration': 1.961}, {'end': 1938.735, 'text': "It's not always to be Java.", 'start': 1936.994, 'duration': 1.741}], 'summary': 'In big data, 50% sql, 50% programming language. linux also important.', 'duration': 26.615, 'max_score': 1912.12, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1912120.jpg'}, {'end': 2112.682, 'src': 'embed', 'start': 2083.929, 'weight': 3, 'content': [{'end': 2087.331, 'text': 'But but still you need general IT experience at least one year or two year.', 'start': 2083.929, 'duration': 3.402}, {'end': 2088.742, 'text': 'general it.', 'start': 2088.021, 'duration': 0.721}, {'end': 2092.083, 'text': 'you can be from testing support or data side or mobile computing.', 'start': 2088.742, 'duration': 3.341}, {'end': 2093.386, 'text': 'cloud web is fine,', 'start': 2092.083, 'duration': 1.303}, {'end': 2100.071, 'text': 'but you should have some one to two years of general it experience and then you can learn big data and you can move on to this field.', 'start': 2093.386, 'duration': 6.685}, {'end': 2102.874, 'text': 'next, for experienced people, yeah, the same.', 'start': 2100.071, 'duration': 2.803}, {'end': 2112.682, 'text': 'see. so far you have been for mainframes for 10 years and you are been in for etl for 10 years or five years and you were been in testing for 10 years is still fine.', 'start': 2102.874, 'duration': 9.808}], 'summary': 'Minimum 1-2 years of general it experience required; experienced individuals with 10+ years in specific areas also considered.', 'duration': 28.753, 'max_score': 2083.929, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU2083929.jpg'}, {'end': 2206.774, 'src': 'embed', 'start': 2176.736, 'weight': 2, 'content': [{'end': 2178.836, 'text': 'see, you can be from any technology.', 'start': 2176.736, 'duration': 2.1}, {'end': 2184.617, 'text': "it's all about you learn big data, you do some p versus and then you prove them in the interview that you have the knowledge on it,", 'start': 2178.836, 'duration': 5.781}, {'end': 2186.418, 'text': "and that's what you have to do.", 'start': 2184.617, 'duration': 1.801}, {'end': 2187.338, 'text': "so don't hesitate.", 'start': 2186.418, 'duration': 0.92}, {'end': 2193.579, 'text': 'so if you are trying to move for some different technology, like data science or big data or cloud computing, you can pick this.', 'start': 2187.338, 'duration': 6.241}, {'end': 2194.799, 'text': "that's my option.", 'start': 2193.579, 'duration': 1.22}, {'end': 2197.34, 'text': "means that's my suggestions for you.", 'start': 2194.799, 'duration': 2.541}, {'end': 2199.88, 'text': 'okay, so this is what i want to tell you with the job perspective.', 'start': 2197.34, 'duration': 2.54}, {'end': 2204.411, 'text': 'And when it comes to Resume, yeah, I want to tell you one more thing.', 'start': 2200.646, 'duration': 3.765}, {'end': 2206.774, 'text': 'So they can ask you this question in the interview.', 'start': 2204.451, 'duration': 2.323}], 'summary': 'Transition to big data or data science is recommended for job prospects.', 'duration': 30.038, 'max_score': 2176.736, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU2176736.jpg'}, {'end': 2246.206, 'src': 'embed', 'start': 2216.346, 'weight': 4, 'content': [{'end': 2219.169, 'text': "Okay, that's what the distribution, that's what the environment.", 'start': 2216.346, 'duration': 2.823}, {'end': 2224.783, 'text': 'So when you say Apache, right, Apache is what I told you to use it when you are in the learning phase.', 'start': 2219.878, 'duration': 4.905}, {'end': 2232.01, 'text': 'But when you say Apache, people will think Apache was not Apache Hadoop or Apache big data services are not used to be real time companies.', 'start': 2224.843, 'duration': 7.167}, {'end': 2234.031, 'text': "Right And that means he's not having knowledge.", 'start': 2232.05, 'duration': 1.981}, {'end': 2235.173, 'text': "And that's what people will think.", 'start': 2234.051, 'duration': 1.122}, {'end': 2239.917, 'text': 'So you have to say the cloud are heart and such environment.', 'start': 2235.773, 'duration': 4.144}, {'end': 2241.278, 'text': 'So the environment is different.', 'start': 2239.957, 'duration': 1.321}, {'end': 2243.48, 'text': 'The content inside the environment is still same.', 'start': 2241.318, 'duration': 2.162}, {'end': 2246.206, 'text': 'Okay, So you can download Cloudera, Hartanox.', 'start': 2243.861, 'duration': 2.345}], 'summary': 'Using cloudera and hartanox in a different environment is advised for learning phase, instead of apache.', 'duration': 29.86, 'max_score': 2216.346, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU2216346.jpg'}], 'start': 1777.272, 'title': 'Hadoop distribution and big data prerequisites', 'summary': 'Discusses choosing hadoop distribution for personal use, recommending apache due to its ease of use and flexibility, and explores the prerequisites for entering the field of big data, emphasizing knowledge in linux, sql, and a programming language for both freshers and experienced professionals.', 'chapters': [{'end': 1860.365, 'start': 1777.272, 'title': 'Choosing hadoop distribution for personal use', 'summary': 'Discusses the choice of hadoop distribution for personal use, recommending apache as the preferred option due to its ease of use and flexibility, while highlighting the hardware requirements and the consideration of enterprise editions for freelancing. the speaker also briefly touches on how to project this experience on a resume.', 'duration': 83.093, 'highlights': ['The speaker recommends using Apache as the preferred Hadoop distribution for personal use, due to its ease of use and flexibility, and briefly explains the hardware requirements of at least 8 GB of RAM for deploying enterprise editions.', 'The speaker mentions having worked with various Hadoop distributions such as Cloudera, Hortonworks, and EMR, but still recommends Apache as the common choice.']}, {'end': 2360.645, 'start': 1860.365, 'title': 'Prerequisites for big data', 'summary': 'Discusses the prerequisites for entering the field of big data, emphasizing the importance of knowledge in linux, sql, and a programming language, as well as providing insights for both freshers and experienced professionals.', 'duration': 500.28, 'highlights': ['The importance of knowledge in Linux, SQL, and a programming language is emphasized as prerequisites for entering the field of big data. The chapter underscores the significance of understanding Linux, SQL, and a programming language as essential prerequisites for entering the big data field.', 'Freshers are advised to have at least one to two years of general IT experience before delving into big data. Freshers are recommended to possess one to two years of general IT experience before venturing into the realm of big data.', 'Experienced professionals from diverse technological backgrounds are encouraged to transition into big data by showcasing their knowledge and skills in the field. The chapter encourages experienced professionals from various technological backgrounds to transition into big data by demonstrating their expertise and aptitude in the field.']}], 'duration': 583.373, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/rsOSrEbK7sU/pics/rsOSrEbK7sU1777272.jpg', 'highlights': ['The speaker recommends using Apache as the preferred Hadoop distribution for personal use, due to its ease of use and flexibility, and briefly explains the hardware requirements of at least 8 GB of RAM for deploying enterprise editions.', 'The importance of knowledge in Linux, SQL, and a programming language is emphasized as prerequisites for entering the field of big data. The chapter underscores the significance of understanding Linux, SQL, and a programming language as essential prerequisites for entering the big data field.', 'Experienced professionals from diverse technological backgrounds are encouraged to transition into big data by showcasing their knowledge and skills in the field. The chapter encourages experienced professionals from various technological backgrounds to transition into big data by demonstrating their expertise and aptitude in the field.', 'Freshers are advised to have at least one to two years of general IT experience before delving into big data. Freshers are recommended to possess one to two years of general IT experience before venturing into the realm of big data.', 'The speaker mentions having worked with various Hadoop distributions such as Cloudera, Hortonworks, and EMR, but still recommends Apache as the common choice.']}], 'highlights': ['The emergence of over 10,000 solutions under big data technology.', 'The difference between Hadoop and big data is explained, with Hadoop being a solution for data-related problems, while big data itself is the problem.', 'The fundamental concept of big data as a collection of information is emphasized, setting the foundation for understanding the problem-solution paradigm.', 'Big data addresses a wide range of issues beyond volume, including value, quality, visualization, velocity, and variety.', 'The interviewee emphasized the importance of processing speed over data volume when dealing with 25 GB of data.', 'Apache Software Foundation provides a trusted license for open source code, gaining the trust of IT giant companies and offering recognition for projects hosted on its platform.', 'The speaker recommends using Apache as the preferred Hadoop distribution for personal use, due to its ease of use and flexibility, and briefly explains the hardware requirements of at least 8 GB of RAM for deploying enterprise editions.']}