title
Apache Kafka Tutorial - 1 | What is Apache Kafka? | Kafka Tutorial for Beginners - 1 | Edureka

description
( Apache Kafka Training: https://www.edureka.co/apache-kafka ) This Kafka tutorial video gives an introduction to Kafka, Kafka architecture, Kafka cluster setup and hand's on session. This Kafka video is ideal for beginners. To attend a live class, click here: http://goo.gl/BWpYsE This video will help you learn: • What is Apache Kafka ? • Architecture of Kafka • Multiple ways of setting Kafka cluster • Comparing Kafka with other messaging systems • Hands-On : Getting started with Kafka The topics related to ‘Apache Kafka’ have been widely covered in our course. For more information, Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll-free).

detail
{'title': 'Apache Kafka Tutorial - 1 | What is Apache Kafka? | Kafka Tutorial for Beginners - 1 | Edureka', 'heatmap': [{'end': 1332.516, 'start': 1239.297, 'weight': 1}], 'summary': "Provides an overview of apache kafka, discussing challenges in data aggregation and architecture, zookeeper's role, cluster modes, linkedin's usage, and a comparison with activemq and rabbitmq, highlighting kafka's scalability and throughput.", 'chapters': [{'end': 146.383, 'segs': [{'end': 146.383, 'src': 'embed', 'start': 1.783, 'weight': 0, 'content': [{'end': 2.383, 'text': 'Hello, everyone.', 'start': 1.783, 'duration': 0.6}, {'end': 4.485, 'text': 'Welcome, everyone.', 'start': 3.704, 'duration': 0.781}, {'end': 7.867, 'text': 'Good morning, good evening, those who are joining from West Coast.', 'start': 4.925, 'duration': 2.942}, {'end': 9.848, 'text': "And I'm Madan Mohan.", 'start': 8.127, 'duration': 1.721}, {'end': 13.471, 'text': 'And just brief about me.', 'start': 10.749, 'duration': 2.722}, {'end': 25.719, 'text': 'So recently I started working on a platform which works on predictive analytics and provide a lot of recommendations based on the consumer,', 'start': 14.231, 'duration': 11.488}, {'end': 27.02, 'text': 'personas and activities.', 'start': 25.719, 'duration': 1.301}, {'end': 30.182, 'text': 'And the platform, we call it as a ProCovie.', 'start': 27.68, 'duration': 2.502}, {'end': 32.391, 'text': 'So, basically,', 'start': 31.01, 'duration': 1.381}, {'end': 48.783, 'text': 'we started exploring for a reliable and distributed connecting system or a messaging platform eight months back and after having a lot of discussions and after having a lot of insightful analysis,', 'start': 32.391, 'duration': 16.392}, {'end': 66.204, 'text': 'we felt Apache Kafka was a natural fit and we just tried to work on Kafka for the last eight months and we are super excited with the kind of features and the kind of scale capabilities it was offering,', 'start': 48.783, 'duration': 17.421}, {'end': 67.985, 'text': 'and it is still going good for us.', 'start': 66.204, 'duration': 1.781}, {'end': 76.992, 'text': "So what we're gonna go today is I'm gonna briefly talk about Apache Kafka and our learnings in the last six to eight months.", 'start': 69.086, 'duration': 7.906}, {'end': 87.337, 'text': "And I should thank Edureka to providing the medium to share our knowledge, and we'll all just briefly touch base about in the next hour or so, okay?", 'start': 77.853, 'duration': 9.484}, {'end': 95.061, 'text': 'So if time permits, I may just quickly run through a demo in the last five minutes.', 'start': 87.897, 'duration': 7.164}, {'end': 106.387, 'text': "So briefly, what we're gonna cover today, in the next one hour or so, is basically get a basic understanding of what Apache Kafka is all about,", 'start': 96.282, 'duration': 10.105}, {'end': 109.473, 'text': "what's the buzz about this messaging platform,", 'start': 106.387, 'duration': 3.086}, {'end': 120.758, 'text': 'and briefly talk about the high level architecture and the components of the messaging platform and what are the different ways you could use Kafka for development or for production use.', 'start': 109.473, 'duration': 11.285}, {'end': 130.023, 'text': 'And also, we may also try to contrast Kafka today with the most top contenders like ActiveMQ or RabbitMQ.', 'start': 121.399, 'duration': 8.624}, {'end': 137.547, 'text': "And also, as I mentioned, we're gonna just briefly run through a small demo, if time permits, at the end of this session.", 'start': 130.683, 'duration': 6.864}, {'end': 146.383, 'text': 'okay?. Great so, as you guys might have just watching, for the last few years,', 'start': 137.547, 'duration': 8.836}], 'summary': 'Madan mohan presents on apache kafka and its benefits with procovie platform, emphasizing 8 months of successful usage and potential demo.', 'duration': 144.6, 'max_score': 1.783, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1783.jpg'}], 'start': 1.783, 'title': 'Apache kafka overview', 'summary': 'Provides an overview of apache kafka, focusing on its adoption for procovie, decision-making process, feature exploration over the last eight months, and plan to cover architecture, usage, and a possible comparison with other messaging platforms.', 'chapters': [{'end': 146.383, 'start': 1.783, 'title': 'Apache kafka overview', 'summary': 'Provides an overview of apache kafka, including its adoption for predictive analytics platform called procovie, the decision-making process, and the exploration of its features over the last eight months, with a plan to cover its architecture, usage, and a possible comparison with other messaging platforms.', 'duration': 144.6, 'highlights': ['The speaker has been working on a predictive analytics platform called ProCovie, which utilizes Apache Kafka for providing recommendations based on consumer personas and activities, over the last eight months.', 'After extensive discussions and insightful analysis, the team found Apache Kafka to be a natural fit for their requirements and has been working with it for the last eight months, being excited about its features and scale capabilities.', 'The session aims to cover the basic understanding of Apache Kafka, its high-level architecture, its components, and the various ways it can be used for development or production.', 'There is a plan to contrast Apache Kafka with other messaging platforms like ActiveMQ or RabbitMQ, and a possibility of a brief demo at the end of the session.', 'The speaker expresses gratitude to Edureka for providing a platform to share knowledge and hints at the potential inclusion of a demo in the last five minutes of the session.']}], 'duration': 144.6, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1783.jpg', 'highlights': ['The speaker has been working on a predictive analytics platform called ProCovie, which utilizes Apache Kafka for providing recommendations based on consumer personas and activities, over the last eight months.', 'After extensive discussions and insightful analysis, the team found Apache Kafka to be a natural fit for their requirements and has been working with it for the last eight months, being excited about its features and scale capabilities.', 'The session aims to cover the basic understanding of Apache Kafka, its high-level architecture, its components, and the various ways it can be used for development or production.', 'There is a plan to contrast Apache Kafka with other messaging platforms like ActiveMQ or RabbitMQ, and a possibility of a brief demo at the end of the session.', 'The speaker expresses gratitude to Edureka for providing a platform to share knowledge and hints at the potential inclusion of a demo in the last five minutes of the session.']}, {'end': 671.997, 'segs': [{'end': 191.827, 'src': 'embed', 'start': 167.606, 'weight': 0, 'content': [{'end': 178.776, 'text': 'So today we have the capability of capturing every human interaction with computers or every human interactions with electronics or the change of the state in the electronics or in the computers.', 'start': 167.606, 'duration': 11.17}, {'end': 191.827, 'text': 'So we have the capability today to monitor and capture and aggregate all the activities which is being changed in the computer systems or in the electronic systems.', 'start': 179.496, 'duration': 12.331}], 'summary': 'We can capture and monitor all human-computer interactions and changes in electronic systems.', 'duration': 24.221, 'max_score': 167.606, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE167606.jpg'}, {'end': 317.099, 'src': 'embed', 'start': 219.715, 'weight': 1, 'content': [{'end': 227.459, 'text': 'Just imagine if there are, say, 10 million users who are visiting on a particular internet applications, just for a hypothetical case right?', 'start': 219.715, 'duration': 7.744}, {'end': 232.862, 'text': 'So you would have say, for example, roughly say 100 to 150 pages.', 'start': 227.779, 'duration': 5.083}, {'end': 242.755, 'text': 'so just imagine the kind of the exponential data, what you can gather on the activity of a particular user and the patterns of a particular user.', 'start': 232.862, 'duration': 9.893}, {'end': 248.92, 'text': "So the data activity and the data growth is enormous in a day's time.", 'start': 243.115, 'duration': 5.805}, {'end': 255.846, 'text': 'Imagine, if you have, say, multiple systems and multiple components and all the users interacting with all those things,', 'start': 249.521, 'duration': 6.325}, {'end': 258.188, 'text': 'the enormous data and the activity history.', 'start': 255.846, 'duration': 2.342}, {'end': 261.271, 'text': "what you could capture is it's limitless.", 'start': 258.188, 'duration': 3.083}, {'end': 266.179, 'text': 'So these are all one of the scales of the data today, how it is growing into the world,', 'start': 261.711, 'duration': 4.468}, {'end': 274.321, 'text': 'where you could just get lots of lots of data activity from a user to a system or a computer or an electronic device.', 'start': 266.179, 'duration': 8.142}, {'end': 279.543, 'text': 'Apart from that, as I mentioned, you could say, for example,', 'start': 275.421, 'duration': 4.122}, {'end': 289.546, 'text': 'today we have the capability to go and capture every phone record or every call records of a mobile user, from messaging to voice calls.', 'start': 279.543, 'duration': 10.003}, {'end': 291.126, 'text': 'So all those things are possible.', 'start': 289.626, 'duration': 1.5}, {'end': 301.852, 'text': 'So most of the mobile network operators today, they do capture all this information of the user activity on a phone or on a messaging platform.', 'start': 291.567, 'duration': 10.285}, {'end': 310.156, 'text': 'Apart from that, you could also have noticed so when you have a DTH connection or a set-top box connection.', 'start': 302.332, 'duration': 7.824}, {'end': 317.099, 'text': 'even today, every event, every channel you watch, every channel you change from which channel to which channel it is being jumped,', 'start': 310.156, 'duration': 6.943}], 'summary': 'Enormous data growth from millions of users, capturing diverse activity across devices and networks.', 'duration': 97.384, 'max_score': 219.715, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE219715.jpg'}, {'end': 479.54, 'src': 'embed', 'start': 455.663, 'weight': 5, 'content': [{'end': 472.368, 'text': 'the challenge has come where we need effective messaging systems or platform which can capture the big data generating sources and try to analyze and present all the rightful information to the rightful sources at the right time.', 'start': 455.663, 'duration': 16.705}, {'end': 474.649, 'text': "So that's the primary challenge.", 'start': 472.888, 'duration': 1.761}, {'end': 479.54, 'text': "That's what has been people trying to do with various systems.", 'start': 476.297, 'duration': 3.243}], 'summary': 'The challenge is to create an effective messaging platform for big data analysis and presentation.', 'duration': 23.877, 'max_score': 455.663, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE455663.jpg'}, {'end': 541.942, 'src': 'embed', 'start': 506.653, 'weight': 7, 'content': [{'end': 512.015, 'text': 'So all these aspects has been challenged and people used to manage it in one way or the other way.', 'start': 506.653, 'duration': 5.362}, {'end': 525.565, 'text': "Okay So what I'm gonna do is maybe one or two questions, I would take it in between, but most of the questions I'm gonna park it at the end.", 'start': 515.379, 'duration': 10.186}, {'end': 529.407, 'text': 'Kafka is not just for instrumentation and capturing the data.', 'start': 525.965, 'duration': 3.442}, {'end': 541.942, 'text': 'It is a messaging platform where it instruments and collects data from various sources and acts as a message broker to giving it to various other consumers.', 'start': 529.948, 'duration': 11.994}], 'summary': 'Kafka is a messaging platform that collects and distributes data from various sources.', 'duration': 35.289, 'max_score': 506.653, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE506653.jpg'}, {'end': 649.809, 'src': 'embed', 'start': 627.884, 'weight': 6, 'content': [{'end': 637.969, 'text': 'These are all the kind of activity which is being really captured at real time and they are trying to put restrictions based on your usage or based on your pattern.', 'start': 627.884, 'duration': 10.085}, {'end': 649.809, 'text': 'And also you have noticed where there are few such relevant sites where they do go ahead and make auto-complete recommendations.', 'start': 638.823, 'duration': 10.986}], 'summary': 'Real-time activity captured with usage restrictions and auto-complete recommendations.', 'duration': 21.925, 'max_score': 627.884, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE627884.jpg'}], 'start': 146.383, 'title': 'Data activity and aggregation challenges', 'summary': 'Discusses the exponential growth of data activity in the smart world, as well as the challenges of managing increasing data aggregation, impacting revenue, quality, and security optimizations, with examples of real-time analytics and data usage restrictions.', 'chapters': [{'end': 364.875, 'start': 146.383, 'title': 'Rise of data activity in the smart world', 'summary': 'Discusses the exponential growth of data activity and the capability to capture and aggregate human interactions in the smart world, exemplifying the massive data potential from user interactions with internet applications and mobile devices, as well as the monitoring of various systems and networks.', 'duration': 218.492, 'highlights': ['The capability to capture and aggregate human interactions with computers and electronics, enabling the monitoring of every human interaction and change of state. The data capability allows for monitoring and capturing every human interaction with computers and electronics, enabling the comprehensive analysis of user behavior and system changes.', 'The potential to gather exponential data from user interactions with internet applications, exemplifying the massive data potential from user interactions with internet applications and the patterns of a particular user. The exponential data potential is evident from the example of 10 million users visiting an internet application, showcasing the enormous data gathering potential and user behavior analysis.', "The enormous data activity and growth from multiple systems, components, and user interactions, highlighting the limitless data potential in today's world. The data activity and growth are immense, with multiple systems and user interactions leading to enormous data potential and activity history, showcasing the scale of data growth in today's world.", 'The capability to capture every phone record and call records of mobile users, demonstrating the extensive data capture capabilities of mobile network operators. The extensive data capture capabilities of mobile network operators are evident from the ability to capture every phone record, call records, and user activity on mobile and messaging platforms.', 'The capturing of user activity history from DTH and set-top box connections, showcasing the extensive data capture in monitoring user behavior. The capturing of user activity history from DTH and set-top box connections exemplifies the extensive data capture in monitoring user behavior and system interactions.']}, {'end': 671.997, 'start': 364.916, 'title': 'Challenges with data aggregation', 'summary': 'Discusses the challenges of managing increasing data aggregation and the need for effective messaging systems to process and utilize the data for business needs, impacting revenue, quality, and security optimizations, with examples of real-time analytics and data usage restrictions.', 'duration': 307.081, 'highlights': ['The need for effective messaging systems to process and utilize increasing data for business needs. The chapter emphasizes the challenge of connecting increasing data from various systems to the right channels or places for processing and utilization, impacting revenue, quality, and security optimizations.', 'Examples of real-time analytics and data usage restrictions. The transcript provides examples of real-time data usage, such as targeted ads based on user interests, restrictions on data consumption patterns, and auto-complete recommendations on online platforms.', 'The role of Kafka as a messaging platform for data collection and distribution. Kafka is highlighted as a messaging platform that collects data from various sources and acts as a message broker to distribute data to various consumers, emphasizing its role in data aggregation and distribution.']}], 'duration': 525.614, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE146383.jpg', 'highlights': ['The capability to capture and aggregate human interactions with computers and electronics, enabling the monitoring of every human interaction and change of state.', 'The potential to gather exponential data from user interactions with internet applications, exemplifying the massive data potential from user interactions with internet applications and the patterns of a particular user.', "The enormous data activity and growth from multiple systems, components, and user interactions, highlighting the limitless data potential in today's world.", 'The capability to capture every phone record and call records of mobile users, demonstrating the extensive data capture capabilities of mobile network operators.', 'The capturing of user activity history from DTH and set-top box connections, showcasing the extensive data capture in monitoring user behavior.', 'The need for effective messaging systems to process and utilize increasing data for business needs.', 'Examples of real-time analytics and data usage restrictions.', 'The role of Kafka as a messaging platform for data collection and distribution.']}, {'end': 1458.325, 'segs': [{'end': 785.759, 'src': 'embed', 'start': 739.435, 'weight': 1, 'content': [{'end': 751.582, 'text': 'there will be heterogeneous distributed systems which are trying to consume the events or messages from various sources and trying to analyze that and trying to make the business decision or business information.', 'start': 739.435, 'duration': 12.147}, {'end': 754.203, 'text': 'So the world has changed drastically.', 'start': 752.042, 'duration': 2.161}, {'end': 759.944, 'text': 'from a point to point messaging, the world has evolved into a multi point to point,', 'start': 754.203, 'duration': 5.741}, {'end': 765.726, 'text': 'where the data has to be routed or managed through different systems.', 'start': 759.944, 'duration': 5.782}, {'end': 777.295, 'text': 'Because of all these things, biggest need to have a proper scalable or a highly scalable distributed messaging system was the requirement.', 'start': 766.326, 'duration': 10.969}, {'end': 784.279, 'text': 'Okay, so if that is the problem, so it has to be highly scalable.', 'start': 777.315, 'duration': 6.964}, {'end': 785.759, 'text': 'it has to be properly distributed,', 'start': 784.279, 'duration': 1.48}], 'summary': 'Need for highly scalable distributed messaging system due to evolution from point to point messaging to multi point to point.', 'duration': 46.324, 'max_score': 739.435, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE739435.jpg'}, {'end': 838.463, 'src': 'embed', 'start': 812.129, 'weight': 0, 'content': [{'end': 818.874, 'text': 'So the answer, at least for this session, seems to be pretty natural.', 'start': 812.129, 'duration': 6.745}, {'end': 820.916, 'text': "It's going to be the Apache Kafka.", 'start': 819.054, 'duration': 1.862}, {'end': 829.218, 'text': "So, with so much of data coming in with all these kind of challenges, We're gonna make a recommendation.", 'start': 820.956, 'duration': 8.262}, {'end': 833.5, 'text': 'Apache Kafka fits the bill in this kind of scenarios.', 'start': 829.218, 'duration': 4.282}, {'end': 838.463, 'text': "So let's see what exactly Apache Kafka is all about and what's the buzz going around this.", 'start': 833.92, 'duration': 4.543}], 'summary': 'Apache kafka is recommended for handling large volumes of data and challenges.', 'duration': 26.334, 'max_score': 812.129, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE812129.jpg'}, {'end': 1008.733, 'src': 'embed', 'start': 982.172, 'weight': 3, 'content': [{'end': 992.057, 'text': 'the project intention was to have a unified distributed messaging platform which has a very high throughput for, say,', 'start': 982.172, 'duration': 9.885}, {'end': 995.418, 'text': 'millions of or billions of messages per day.', 'start': 992.057, 'duration': 3.361}, {'end': 1008.733, 'text': 'and also had a very should have a very low latency for all publishing or consumption and it should really really handle the real time activity data stream from different systems.', 'start': 996.247, 'duration': 12.486}], 'summary': 'Project aimed for high throughput, low latency messaging platform for millions of messages daily.', 'duration': 26.561, 'max_score': 982.172, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE982172.jpg'}, {'end': 1063.321, 'src': 'embed', 'start': 1039.558, 'weight': 4, 'content': [{'end': 1056.354, 'text': 'So the commit ahead logs or the transaction logs are primarily used in most of the transaction or OLTP based system to record all the changes or events which is happening in the OLTP databases.', 'start': 1039.558, 'duration': 16.796}, {'end': 1063.321, 'text': 'So this was the technique which most of the databases or NoSQLs use it to make their data consistent.', 'start': 1056.394, 'duration': 6.927}], 'summary': 'Transaction logs record changes in oltp databases for consistency.', 'duration': 23.763, 'max_score': 1039.558, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1039558.jpg'}, {'end': 1113.943, 'src': 'embed', 'start': 1089.674, 'weight': 5, 'content': [{'end': 1098.418, 'text': 'but potentially it should run on multiple machines with a cluster-centric design, where, if you run on distributed machines,', 'start': 1089.674, 'duration': 8.744}, {'end': 1103.18, 'text': 'then eventually the partitioning challenges comes into the play.', 'start': 1098.418, 'duration': 4.762}, {'end': 1113.943, 'text': 'So the partition is if you have a huge data set In your cluster, you cannot keep that entire dataset cluster in one machine or in a one node.', 'start': 1103.74, 'duration': 10.203}], 'summary': 'Cluster-centric design enables running on multiple machines, addressing partitioning challenges for huge datasets.', 'duration': 24.269, 'max_score': 1089.674, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1089674.jpg'}, {'end': 1332.516, 'src': 'heatmap', 'start': 1239.297, 'weight': 1, 'content': [{'end': 1252.257, 'text': "Okay? Fine, so let's briefly also talk about the little bit design or the architectural aspects of Kafka.", 'start': 1239.297, 'duration': 12.96}, {'end': 1256.859, 'text': 'So Kafka primarily has five components in the cluster.', 'start': 1253.418, 'duration': 3.441}, {'end': 1259.76, 'text': 'So the first one is Zookeeper.', 'start': 1257.439, 'duration': 2.321}, {'end': 1267.502, 'text': 'Zookeeper primarily is used as a configuration and the registry index kind of a service.', 'start': 1259.8, 'duration': 7.702}, {'end': 1270.443, 'text': "I'll just briefly talk about Zookeeper as well.", 'start': 1268.322, 'duration': 2.121}, {'end': 1274.824, 'text': 'And the second one is the broker and the third one is the topic.', 'start': 1271.103, 'duration': 3.721}, {'end': 1277.645, 'text': 'The fourth one is the producer and the consumer.', 'start': 1275.324, 'duration': 2.321}, {'end': 1291.151, 'text': 'So a topic is nothing but, like many other publish, subscribe messaging system, it maintains the messages or the bunch of messages in its repository.', 'start': 1278.686, 'duration': 12.465}, {'end': 1295.093, 'text': 'So topic, I think you guys are all very familiar about it.', 'start': 1292.252, 'duration': 2.841}, {'end': 1302.996, 'text': 'So Kafka also uses the concept of topics, but the topics can be partitioned and distributed in multiple missions.', 'start': 1295.473, 'duration': 7.523}, {'end': 1304.457, 'text': "That's the difference.", 'start': 1303.176, 'duration': 1.281}, {'end': 1317.589, 'text': 'And the producers, the producers are the ones which processes and publishes the incoming message or the activity data to the broker or to the cluster.', 'start': 1305.437, 'duration': 12.152}, {'end': 1332.516, 'text': 'And consumers are the processes which subscribe to the topics and pull all the messages or the items or the data from the topics.', 'start': 1320.51, 'duration': 12.006}], 'summary': 'Kafka has five components: zookeeper, broker, topic, producer, and consumer.', 'duration': 93.219, 'max_score': 1239.297, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1239297.jpg'}, {'end': 1291.151, 'src': 'embed', 'start': 1253.418, 'weight': 6, 'content': [{'end': 1256.859, 'text': 'So Kafka primarily has five components in the cluster.', 'start': 1253.418, 'duration': 3.441}, {'end': 1259.76, 'text': 'So the first one is Zookeeper.', 'start': 1257.439, 'duration': 2.321}, {'end': 1267.502, 'text': 'Zookeeper primarily is used as a configuration and the registry index kind of a service.', 'start': 1259.8, 'duration': 7.702}, {'end': 1270.443, 'text': "I'll just briefly talk about Zookeeper as well.", 'start': 1268.322, 'duration': 2.121}, {'end': 1274.824, 'text': 'And the second one is the broker and the third one is the topic.', 'start': 1271.103, 'duration': 3.721}, {'end': 1277.645, 'text': 'The fourth one is the producer and the consumer.', 'start': 1275.324, 'duration': 2.321}, {'end': 1291.151, 'text': 'So a topic is nothing but, like many other publish, subscribe messaging system, it maintains the messages or the bunch of messages in its repository.', 'start': 1278.686, 'duration': 12.465}], 'summary': 'Kafka cluster has 5 components: zookeeper, broker, topic, producer, and consumer.', 'duration': 37.733, 'max_score': 1253.418, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1253418.jpg'}], 'start': 672.077, 'title': 'Challenges and architecture of apache kafka', 'summary': "Discusses the challenges of distributed messaging systems in today's world, emphasizing the need for a highly scalable system, and recommends apache kafka as a solution. it also explains the architecture of apache kafka, highlighting its features including high throughput, low latency, real-time data stream handling, transaction logs, scalability, and durability.", 'chapters': [{'end': 867.347, 'start': 672.077, 'title': 'Challenges of distributed messaging systems', 'summary': "Discusses the evolution from point-to-point messaging to multi point-to-point systems in today's world, highlighting the need for a highly scalable distributed messaging system and recommending apache kafka as a solution for managing real-time activities and data from multiple sources.", 'duration': 195.27, 'highlights': ['Apache Kafka recommended as a solution for managing real-time activities from multiple sources The chapter recommends Apache Kafka as a solution for managing real-time activities with a high volume of data and challenges of multiple point-to-point systems.', "Evolution from point-to-point messaging to multi point-to-point systems in today's world The chapter emphasizes the shift from point-to-point messaging to multi point-to-point systems in today's world, where multiple systems generate and consume events or messages.", 'Need for a highly scalable distributed messaging system highlighted The chapter highlights the need for a highly scalable distributed messaging system to manage a big amount of data coming in from various sources and to handle failovers and other aspects.']}, {'end': 1458.325, 'start': 867.347, 'title': 'Understanding apache kafka architecture', 'summary': 'Discusses apache kafka, an open-source message broker and publisher-subscriber system developed to handle high throughput, low latency, and real-time data stream from different systems, primarily designed with the concept of transaction logs and distributed aspect for partitioning challenges, offering persistent messaging, high throughput, scalability, and durability.', 'duration': 590.978, 'highlights': ["Apache Kafka's Objective Apache Kafka was built to provide a unified distributed messaging platform with very high throughput for millions or billions of messages per day, low latency for all publishing or consumption, and handling real-time activity data stream from different systems.", "Utilization of Transaction Logs Apache Kafka's design approach heavily relied on the concept of transaction logs, also known as commit logs or write-ahead logs, for efficient disk utilization and data consistency, similar to the techniques used in most transaction or OLTP-based systems.", 'Distributed Aspect and Partitioning Challenges Apache Kafka focused on the distributed aspect with a cluster-centric design, utilizing partitioning to address the challenge of handling huge datasets across multiple machines for elastic growth, high availability, and high throughput, while persisting data into transaction logs.', 'Components of Kafka Architecture Kafka primarily comprises five components: Zookeeper for configuration and registry index, broker for message storage, topic for maintaining messages, producer for processing and publishing messages, and consumer for subscribing to topics and pulling data, with communication happening over the TCP protocol.']}], 'duration': 786.248, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE672077.jpg', 'highlights': ['Apache Kafka recommended as a solution for managing real-time activities from multiple sources', "Evolution from point-to-point messaging to multi point-to-point systems in today's world", 'Need for a highly scalable distributed messaging system highlighted', "Apache Kafka's Objective: built to provide a unified distributed messaging platform with very high throughput", 'Utilization of Transaction Logs for efficient disk utilization and data consistency', "Distributed Aspect and Partitioning Challenges addressed by Kafka's cluster-centric design", 'Components of Kafka Architecture: Zookeeper, broker, topic, producer, and consumer']}, {'end': 1927.788, 'segs': [{'end': 1505.83, 'src': 'embed', 'start': 1479.145, 'weight': 0, 'content': [{'end': 1491.212, 'text': 'So how is Zookeeper related to Kafka? So Zookeeper is an independent Apache project which Kafka recommends and incorporates for its internal use.', 'start': 1479.145, 'duration': 12.067}, {'end': 1503.209, 'text': 'So again, as I mentioned, it is an open source project which is highly available system, primarily used for coordination and lookup service,', 'start': 1491.952, 'duration': 11.257}, {'end': 1505.83, 'text': 'as a registry index in a distributed system.', 'start': 1503.209, 'duration': 2.621}], 'summary': 'Zookeeper is an independent apache project recommended by kafka for coordination and lookup service in a distributed system.', 'duration': 26.685, 'max_score': 1479.145, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1479145.jpg'}, {'end': 1624.826, 'src': 'embed', 'start': 1589.185, 'weight': 2, 'content': [{'end': 1597.447, 'text': 'it may be efficient if you store it in a distributed platform where that can be dynamically changed or where that can be programmatically changed.', 'start': 1589.185, 'duration': 8.262}, {'end': 1610.918, 'text': 'So ZooKeeper offers you this kind of flexibility where you can read the configurations or the service lookup from a distributed system where the ZooKeeper offers all those things.', 'start': 1597.787, 'duration': 13.131}, {'end': 1624.826, 'text': 'So ZooKeeper is also like a very quick index or a quick access, cache access where you can get all these things.', 'start': 1611.218, 'duration': 13.608}], 'summary': 'Zookeeper provides flexible and efficient distributed storage and quick access for configurations and service lookup.', 'duration': 35.641, 'max_score': 1589.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1589185.jpg'}, {'end': 1673.498, 'src': 'embed', 'start': 1647.153, 'weight': 3, 'content': [{'end': 1651.737, 'text': 'So all those things say the naming service or the registry lookup.', 'start': 1647.153, 'duration': 4.584}, {'end': 1660.344, 'text': 'So what Zookeeper plays here in Kafka is basically it serves as a coordination between the nodes in the entire cluster.', 'start': 1652.858, 'duration': 7.486}, {'end': 1664.591, 'text': 'Even a publisher can interact with Zookeeper.', 'start': 1661.869, 'duration': 2.722}, {'end': 1673.498, 'text': 'The brokers or the cluster nodes can also interact with the Zookeeper and at times even the consumer can interact with the Zookeeper.', 'start': 1665.252, 'duration': 8.246}], 'summary': 'Zookeeper coordinates nodes in kafka cluster for registry and interaction', 'duration': 26.345, 'max_score': 1647.153, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1647153.jpg'}, {'end': 1756.16, 'src': 'embed', 'start': 1727.726, 'weight': 1, 'content': [{'end': 1730.187, 'text': 'So that is one of the reasons why we need the zookeeper.', 'start': 1727.726, 'duration': 2.461}, {'end': 1738.104, 'text': 'And when it comes to the brokers, what brokers does is when there is a lead broker identified for a topic.', 'start': 1731.058, 'duration': 7.046}, {'end': 1742.048, 'text': 'OK And, as I mentioned,', 'start': 1738.705, 'duration': 3.343}, {'end': 1751.356, 'text': 'Apache Kafka also provides high availability by duplicating the data or replicating the data into multiple nodes or multiple brokers.', 'start': 1742.048, 'duration': 9.308}, {'end': 1756.16, 'text': 'So in that case, what happens is there is a lead broker and there are following brokers.', 'start': 1751.676, 'duration': 4.484}], 'summary': 'Apache kafka ensures high availability by duplicating data into multiple nodes or brokers.', 'duration': 28.434, 'max_score': 1727.726, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1727726.jpg'}, {'end': 1869.901, 'src': 'embed', 'start': 1837.347, 'weight': 4, 'content': [{'end': 1849.646, 'text': "So let's briefly talk about what are all the different ways you could potentially, potentially,", 'start': 1837.347, 'duration': 12.299}, {'end': 1858.973, 'text': 'set up the Kafka cluster and how you can leverage the Kafka for various use cases.', 'start': 1849.646, 'duration': 9.327}, {'end': 1865.457, 'text': 'So primarily there are three modes which you can use it based on your use case and your requirement.', 'start': 1859.573, 'duration': 5.884}, {'end': 1869.901, 'text': 'So the first one is the single node, single broker cluster.', 'start': 1865.998, 'duration': 3.903}], 'summary': 'Kafka cluster can be set up in three modes, including single node, single broker cluster.', 'duration': 32.554, 'max_score': 1837.347, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1837347.jpg'}], 'start': 1458.865, 'title': 'Zookeeper in kafka architecture', 'summary': "Explains the role of zookeeper in kafka, highlighting its recommendation by kafka, its use as a coordination and lookup service, and its role as a highly available system. it also discusses zookeeper's use for configuration management, service lookup, coordination between nodes, and identifying lead brokers, with a focus on three potential cluster modes for usage.", 'chapters': [{'end': 1505.83, 'start': 1458.865, 'title': 'Zookeeper in kafka architecture', 'summary': 'Explains the role of zookeeper in kafka, highlighting its recommendation by kafka, its use as a coordination and lookup service, and its role as a highly available system. zookeeper is an independent apache project recommended and incorporated by kafka, primarily used for coordination and lookup service in a distributed system.', 'duration': 46.965, 'highlights': ['Zookeeper is an independent Apache project which Kafka recommends and incorporates for its internal use.', 'It is an open source project which is highly available system, primarily used for coordination and lookup service, as a registry index in a distributed system.']}, {'end': 1927.788, 'start': 1506.19, 'title': 'Zookeeper in apache kafka', 'summary': 'Discusses the role of zookeeper in apache kafka, including its use for configuration management, service lookup, coordination between nodes, and identifying lead brokers, with a focus on three potential cluster modes for usage.', 'duration': 421.598, 'highlights': ["ZooKeeper's role in Apache Kafka ZooKeeper's role in Apache Kafka involves configuration management, service lookup, and coordination between nodes, with a focus on identifying lead brokers.", 'Usage of ZooKeeper for configuration management and service lookup ZooKeeper offers flexibility in storing and dynamically changing configurations in a distributed platform, providing quick access to configurations and acting as a registry lookup service.', "ZooKeeper's coordination between nodes in the entire cluster ZooKeeper serves as a coordination mechanism between nodes in the entire cluster, enabling interaction with publishers, brokers, and consumers, with a focus on identifying lead brokers and persisting message state.", 'Cluster modes for Kafka usage The chapter introduces three potential cluster modes for Kafka usage: single node, single broker cluster; multiple brokers in the same node; and multiple nodes, multiple brokers cluster, with a brief overview of their potential use cases.']}], 'duration': 468.923, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1458865.jpg', 'highlights': ['Zookeeper is an independent Apache project which Kafka recommends and incorporates for its internal use.', "ZooKeeper's role in Apache Kafka involves configuration management, service lookup, and coordination between nodes, with a focus on identifying lead brokers.", 'Usage of ZooKeeper for configuration management and service lookup ZooKeeper offers flexibility in storing and dynamically changing configurations in a distributed platform, providing quick access to configurations and acting as a registry lookup service.', "ZooKeeper's coordination between nodes in the entire cluster ZooKeeper serves as a coordination mechanism between nodes in the entire cluster, enabling interaction with publishers, brokers, and consumers, with a focus on identifying lead brokers and persisting message state.", 'Cluster modes for Kafka usage The chapter introduces three potential cluster modes for Kafka usage: single node, single broker cluster; multiple brokers in the same node; and multiple nodes, multiple brokers cluster, with a brief overview of their potential use cases.']}, {'end': 2651.162, 'segs': [{'end': 2007.451, 'src': 'embed', 'start': 1974.97, 'weight': 6, 'content': [{'end': 1984.858, 'text': 'So Apache Kafka also adapts the same technique or the principle where all the components or all the processes run in the single node.', 'start': 1974.97, 'duration': 9.888}, {'end': 1992.219, 'text': 'Okay, Apache Kafka also comes with very simple and clean configuration management,', 'start': 1985.559, 'duration': 6.66}, {'end': 2005.429, 'text': 'where it does support all the configurations or all the properties required for us to run the entire cluster in a single node,', 'start': 1992.219, 'duration': 13.21}, {'end': 2007.451, 'text': 'single broker cluster architecture.', 'start': 2005.429, 'duration': 2.022}], 'summary': 'Apache kafka uses single broker cluster architecture with simple configuration management.', 'duration': 32.481, 'max_score': 1974.97, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1974970.jpg'}, {'end': 2217.475, 'src': 'embed', 'start': 2185.615, 'weight': 4, 'content': [{'end': 2195.125, 'text': "So let's assume we have created a topic, and that topic potentially is partitioned onto broker one and two.", 'start': 2185.615, 'duration': 9.51}, {'end': 2203.354, 'text': 'And based on the partition, the partitions most of the time work on a key.', 'start': 2196.187, 'duration': 7.167}, {'end': 2205.517, 'text': 'So, based on the key,', 'start': 2203.955, 'duration': 1.562}, {'end': 2217.475, 'text': 'the producer either it can interact with broker one as a lead for that particular replica set or broker two as a lead for a different partition or a replica set.', 'start': 2205.517, 'duration': 11.958}], 'summary': 'Topic partitioned onto broker one and two, producer interacts based on key', 'duration': 31.86, 'max_score': 2185.615, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2185615.jpg'}, {'end': 2356.571, 'src': 'embed', 'start': 2332.627, 'weight': 3, 'content': [{'end': 2343.234, 'text': 'The consumer interacts with the Zookeeper and tries to get the state of that particular topic and thereafter it tries to reach to the brokers to fetch the messages.', 'start': 2332.627, 'duration': 10.607}, {'end': 2347.437, 'text': 'It pulls the messages from the brokers or the cluster.', 'start': 2343.394, 'duration': 4.043}, {'end': 2356.571, 'text': 'So another one good difference from traditional publish-subscribe model is in Kafka.', 'start': 2348.004, 'duration': 8.567}], 'summary': 'Consumer interacts with zookeeper to get topic state and fetch messages from brokers in kafka.', 'duration': 23.944, 'max_score': 2332.627, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2332627.jpg'}, {'end': 2470.993, 'src': 'embed', 'start': 2403.475, 'weight': 2, 'content': [{'end': 2410.943, 'text': "Since this is just the introductory session, we're gonna touch base very at the high level, okay,", 'start': 2403.475, 'duration': 7.468}, {'end': 2417.687, 'text': 'and also the producers can send the messages in batches.', 'start': 2410.943, 'duration': 6.744}, {'end': 2427.672, 'text': 'most of the systems, or most of the messaging platform lack this capability where they cannot send the messages or the data in batches.', 'start': 2417.687, 'duration': 9.985}, {'end': 2436.382, 'text': 'Kafka is really good in bundling the message batches when there is real time data coming in,', 'start': 2427.672, 'duration': 8.71}, {'end': 2441.607, 'text': 'which helps in lot of network latencies or lot of round trips.', 'start': 2436.382, 'duration': 5.225}, {'end': 2448.312, 'text': "Okay, so that's how the single node multiple cluster happens.", 'start': 2442.788, 'duration': 5.524}, {'end': 2452.416, 'text': 'Okay, great.', 'start': 2451.935, 'duration': 0.481}, {'end': 2458.501, 'text': "So let's briefly also touch base on the multiple node multiple brokers model.", 'start': 2452.696, 'duration': 5.805}, {'end': 2470.993, 'text': 'So, primarily, this kind of setup is recommended for production use and this is the way how you could have any number of partitions,', 'start': 2459.266, 'duration': 11.727}], 'summary': "Introduction to kafka's capability to send messages in batches and its benefits in reducing network latencies and round trips.", 'duration': 67.518, 'max_score': 2403.475, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2403475.jpg'}, {'end': 2525.358, 'src': 'embed', 'start': 2498.811, 'weight': 0, 'content': [{'end': 2506.597, 'text': 'with this kind of a model where you could read the same topic from multiple brokers by the same consumer,', 'start': 2498.811, 'duration': 7.786}, {'end': 2515.211, 'text': 'it increases the throughput of your in incoming items or your processing capability, Okay.', 'start': 2506.597, 'duration': 8.614}, {'end': 2525.358, 'text': 'and also, since it also persists the data in a transaction log format, your entire messages are durable, even during crash,', 'start': 2515.211, 'duration': 10.147}], 'summary': 'Model increases throughput, processing capability, and ensures message durability.', 'duration': 26.547, 'max_score': 2498.811, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2498811.jpg'}, {'end': 2585.513, 'src': 'embed', 'start': 2558.412, 'weight': 1, 'content': [{'end': 2563.175, 'text': 'So there are few techniques or few configurations where you fine tune.', 'start': 2558.412, 'duration': 4.763}, {'end': 2569.539, 'text': 'the entire model of Kafka itself can be fine tuned or it can be augmented in a very small way.', 'start': 2563.175, 'duration': 6.364}, {'end': 2579.928, 'text': 'So, in this model, most of the time it is recommended that you have lot many machines and run all these processes,', 'start': 2571.061, 'duration': 8.867}, {'end': 2585.513, 'text': 'or all these primarily the brokers on different machines or on different nodes.', 'start': 2579.928, 'duration': 5.585}], 'summary': 'Kafka model can be fine-tuned for performance, recommended to run brokers on many machines or nodes.', 'duration': 27.101, 'max_score': 2558.412, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2558412.jpg'}], 'start': 1928.268, 'title': 'Kafka cluster modes and producer-broker interaction', 'summary': 'Explains the setup and configuration of a single node, single broker cluster and briefly touches on the single node, multiple brokers mode, and explores how producers and consumers interact with multiple brokers in a kafka cluster, including the role of zookeeper, interaction patterns, and the benefits of batch message sending and multiple broker setup for high availability and scalability.', 'chapters': [{'end': 2183.849, 'start': 1928.268, 'title': 'Kafka cluster modes', 'summary': 'Explains the setup and configuration of a single node, single broker cluster and briefly touches on the single node, multiple brokers mode, highlighting the steps and differences between the two modes.', 'duration': 255.581, 'highlights': ["Apache Kafka supports a single node, single broker cluster mode, ideal for learning or building applications, with simple and clean configuration management. This mode allows all components to run in a single virtual machine, similar to Hadoop's standalone or pseudo distributed mode, with straightforward configuration management.", 'In the single node, single broker cluster mode, the process involves starting zookeeper, then the server or broker, creating a topic, and starting a producer and consumer individually. The setup process includes starting zookeeper, server/broker, creating a topic, and then starting a producer and consumer, enabling the publishing and subscribing of messages.', 'The single node, multiple brokers mode allows for the configuration of multiple brokers, enabling replication and fault tolerance, with the same setup process as the single node, single broker cluster mode. This mode permits the configuration of multiple brokers in a single machine, providing options for replication and fault tolerance, while the setup process remains similar to the single node, single broker cluster mode.']}, {'end': 2651.162, 'start': 2185.615, 'title': 'Kafka: producer-broker interaction', 'summary': 'Explores how producers and consumers interact with multiple brokers in a kafka cluster, including the role of zookeeper, the interaction patterns, and the benefits of batch message sending and multiple broker setup for high availability and scalability.', 'duration': 465.547, 'highlights': ['The producer interacts with brokers based on the partition and key of the topic, allowing for interaction with different brokers depending on the lead for a particular replica set. Producers interact with different brokers based on the partition and key of the topic, allowing for interaction with different brokers depending on the lead for a particular replica set.', 'The consumers interact with Zookeeper to get the state of a topic and then pull messages from the brokers, using topic offsets to fetch messages and avoiding continuous pulling for overhead reduction. Consumers interact with Zookeeper to get the state of a topic and then pull messages from brokers, using topic offsets to fetch messages and avoiding continuous pulling for overhead reduction.', 'Kafka allows producers to send messages in batches, which reduces network latencies and round trips, enhancing efficiency in real-time data processing. Kafka allows producers to send messages in batches, reducing network latencies and round trips, enhancing efficiency in real-time data processing.', 'The multiple broker model is recommended for production use, offering scalability, high availability, failover, and durability of messages even during a crash. The multiple broker model is recommended for production use, offering scalability, high availability, failover, and durability of messages even during a crash.', 'Running Kafka on different configurable modes and fine-tuning the model to run brokers on different machines enhances the overall performance and scalability. Running Kafka on different configurable modes and fine-tuning the model to run brokers on different machines enhances the overall performance and scalability.']}], 'duration': 722.894, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE1928268.jpg', 'highlights': ['The multiple broker model is recommended for production use, offering scalability, high availability, failover, and durability of messages even during a crash.', 'Running Kafka on different configurable modes and fine-tuning the model to run brokers on different machines enhances the overall performance and scalability.', 'Kafka allows producers to send messages in batches, reducing network latencies and round trips, enhancing efficiency in real-time data processing.', 'The consumers interact with Zookeeper to get the state of a topic and then pull messages from brokers, using topic offsets to fetch messages and avoiding continuous pulling for overhead reduction.', 'The producer interacts with brokers based on the partition and key of the topic, allowing for interaction with different brokers depending on the lead for a particular replica set.', 'The single node, multiple brokers mode allows for the configuration of multiple brokers, enabling replication and fault tolerance, with the same setup process as the single node, single broker cluster mode.', 'Apache Kafka supports a single node, single broker cluster mode, ideal for learning or building applications, with simple and clean configuration management.']}, {'end': 3184.249, 'segs': [{'end': 2694.549, 'src': 'embed', 'start': 2651.162, 'weight': 2, 'content': [{'end': 2666.268, 'text': 'the consumer follows the same technique of interacting with the zookeeper and identifying to which broker it has to interact to get the data from the broker or the clusters.', 'start': 2651.162, 'duration': 15.106}, {'end': 2678.973, 'text': 'So primarily in this model, replications supports the failover and all the lead broker management, everything is handled by the zookeeper.', 'start': 2667.408, 'duration': 11.565}, {'end': 2694.549, 'text': "Great So let's briefly talk about one of the interesting use cases.", 'start': 2683.302, 'duration': 11.247}], 'summary': 'Zookeeper manages broker interactions for data retrieval and failover in a model supporting replications.', 'duration': 43.387, 'max_score': 2651.162, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2651162.jpg'}, {'end': 2750.461, 'src': 'embed', 'start': 2718.933, 'weight': 1, 'content': [{'end': 2728.54, 'text': 'LinkedIn uses internally Apache Kafka to stream all the activity history and provide all these insights in near real time.', 'start': 2718.933, 'duration': 9.607}, {'end': 2742.837, 'text': 'So LinkedIn trains their machine learning models against all the activity and the user interactions with the system and it tries to predict the connections of,', 'start': 2729.12, 'duration': 13.717}, {'end': 2750.461, 'text': 'say, for example, if you are doing this kind of activity, maybe you should try to say look at these people.', 'start': 2742.837, 'duration': 7.624}], 'summary': 'Linkedin uses apache kafka to stream activity history and train machine learning models for real-time insights.', 'duration': 31.528, 'max_score': 2718.933, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2718933.jpg'}, {'end': 2799.082, 'src': 'embed', 'start': 2774.818, 'weight': 0, 'content': [{'end': 2782.562, 'text': 'it relies heavily on Kafka to connect all these systems for processing and analyze and present this information.', 'start': 2774.818, 'duration': 7.744}, {'end': 2789.465, 'text': 'And also there is another very popular feature on LinkedIn,', 'start': 2784.963, 'duration': 4.502}, {'end': 2799.082, 'text': 'where Kafka is heavily used is on the activity-driven newsfeed based on the relevant occurrences in the social network.', 'start': 2789.465, 'duration': 9.617}], 'summary': 'Kafka is heavily relied upon to connect systems and process information, especially for the activity-driven newsfeed feature on linkedin.', 'duration': 24.264, 'max_score': 2774.818, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2774818.jpg'}, {'end': 2985.583, 'src': 'embed', 'start': 2860.48, 'weight': 3, 'content': [{'end': 2873.687, 'text': 'And they also store this entire activity history in Hadoop for later analysis or later processing required for various other aspects or various other needs.', 'start': 2860.48, 'duration': 13.207}, {'end': 2879.293, 'text': 'Great Fine.', 'start': 2874.847, 'duration': 4.446}, {'end': 2889.878, 'text': 'so apart from using it for their functional needs, LinkedIn also uses Kafka primarily to do a lot of system monitoring,', 'start': 2879.293, 'duration': 10.585}, {'end': 2898.641, 'text': 'application monitoring and providing all the application and system metrics to their businesses.', 'start': 2889.878, 'duration': 8.763}, {'end': 2907.585, 'text': "Okay So I think we spoke a lot about LinkedIn, so I'm just gonna briefly touch base.", 'start': 2899.962, 'duration': 7.623}, {'end': 2918.685, 'text': 'So apart from LinkedIn, there are a bunch of people, I think, including my own company, who started adapting Kafka.', 'start': 2908.323, 'duration': 10.362}, {'end': 2921.945, 'text': 'There are a bunch of companies which you guys can go through.', 'start': 2919.665, 'duration': 2.28}, {'end': 2930.827, 'text': 'So there are a few people like Datasoft primarily uses to collect all the social behavior or the social activity of the users,', 'start': 2922.485, 'duration': 8.342}, {'end': 2937.488, 'text': 'and they do provide a platform or an API access for businesses to analyze and consume that kind of data.', 'start': 2930.827, 'duration': 6.661}, {'end': 2946.583, 'text': 'Like similarly for Uber uses it for all their games hosted on Facebook, how things are going and all those kind of stuff.', 'start': 2937.915, 'duration': 8.668}, {'end': 2960.597, 'text': "I'm not gonna touch a lot of details about it, but I think you guys can always go and check the link where you could find loads of loads of use cases.", 'start': 2947.064, 'duration': 13.533}, {'end': 2964.306, 'text': 'And one important thing I would like to touch base is the Loggly.', 'start': 2961.263, 'duration': 3.043}, {'end': 2966.808, 'text': "I think I've used Loggly a while back.", 'start': 2964.386, 'duration': 2.422}, {'end': 2975.735, 'text': "It's a beautiful platform where it provides a neat framework for all your log management and reporting needs.", 'start': 2966.868, 'duration': 8.867}, {'end': 2983.521, 'text': 'So even Loggly uses the Kafka for backend log collection and log movement to the processing center.', 'start': 2976.595, 'duration': 6.926}, {'end': 2985.583, 'text': 'So these are all some of these things.', 'start': 2984.362, 'duration': 1.221}], 'summary': 'Linkedin, datasoft, uber, and loggly use kafka for various purposes such as system monitoring, social behavior analysis, game hosting, and log management.', 'duration': 125.103, 'max_score': 2860.48, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2860480.jpg'}, {'end': 3087.223, 'src': 'embed', 'start': 3062.852, 'weight': 7, 'content': [{'end': 3074.718, 'text': 'So there is no comparison, but when it comes to the scale and when it comes to the throughput, Kafka really outbeats say ActiveMQ in some aspects.', 'start': 3062.852, 'duration': 11.866}, {'end': 3082.701, 'text': 'And also, I would also state, ActiveMQ is a real contender or competitor to RabbitMQ,', 'start': 3075.578, 'duration': 7.123}, {'end': 3087.223, 'text': "rather than I wouldn't put a head on with Kafka at this point of time.", 'start': 3082.701, 'duration': 4.522}], 'summary': 'Kafka outperforms activemq in scale and throughput, while activemq competes with rabbitmq.', 'duration': 24.371, 'max_score': 3062.852, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE3062852.jpg'}], 'start': 2651.162, 'title': "Linkedin's kafka usage and use cases", 'summary': "Explores linkedin's use of apache kafka to process 40-50 billion messages in real time, enabling activity-driven newsfeeds and personalized recommendations. additionally, it discusses various kafka use cases including those of linkedin, datasoft, uber, and loggly, and compares kafka with activemq, emphasizing its superior scalability and throughput.", 'chapters': [{'end': 2859.74, 'start': 2651.162, 'title': "Linkedin's kafka usage", 'summary': 'Discusses how linkedin utilizes apache kafka to process over 40 to 50 billion messages in real time, enabling features like activity-driven newsfeeds and personalized recommendations through machine learning models.', 'duration': 208.578, 'highlights': ['LinkedIn processes over 40 to 50 billion messages in real time using Kafka for features like activity-driven newsfeeds and personalized recommendations. LinkedIn handles more than 200K to 300K messages per second from various sources, delivering over 40 to 50 billion messages in real time for activities like activity-driven newsfeeds and personalized recommendations.', 'LinkedIn uses Kafka to stream all activity history and provide insights in near real time, training machine learning models against user interactions. LinkedIn leverages Kafka to stream activity history and train machine learning models for predicting connections and providing personalized content based on user interactions in near real time.', 'Zookeeper handles lead broker management and failover, enabling replication support in the Kafka model. Zookeeper manages lead broker and failover, supporting replication in the Kafka model.']}, {'end': 3184.249, 'start': 2860.48, 'title': 'Kafka use cases and activemq comparison', 'summary': "Discusses various use cases of kafka, including linkedin, datasoft, uber, and loggly, and briefly contrasts kafka with activemq, highlighting kafka's superior scalability and throughput.", 'duration': 323.769, 'highlights': ['LinkedIn uses Kafka for system and application monitoring, providing application and system metrics to their businesses. LinkedIn uses Kafka for system monitoring, application monitoring, and providing application and system metrics to their businesses.', 'Datasoft primarily uses Kafka to collect social behavior and activity of users, providing a platform or API access for businesses to analyze and consume the data. Datasoft uses Kafka to collect social behavior and activity of users, providing a platform or API access for businesses to analyze and consume the data.', 'Uber uses Kafka for monitoring their games hosted on Facebook and other related activities. Uber uses Kafka for monitoring their games hosted on Facebook and other related activities.', 'Loggly uses Kafka for backend log collection and movement to the processing center, providing a neat framework for log management and reporting needs. Loggly uses Kafka for backend log collection and movement to the processing center, providing a neat framework for log management and reporting needs.', "Kafka outbeats ActiveMQ in terms of scale and throughput, despite ActiveMQ's feature-rich platform. Kafka outbeats ActiveMQ in terms of scale and throughput, despite ActiveMQ's feature-rich platform."]}], 'duration': 533.087, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE2651162.jpg', 'highlights': ['LinkedIn processes over 40 to 50 billion messages in real time using Kafka for features like activity-driven newsfeeds and personalized recommendations.', 'LinkedIn uses Kafka to stream all activity history and provide insights in near real time, training machine learning models against user interactions.', 'Zookeeper handles lead broker management and failover, enabling replication support in the Kafka model.', 'LinkedIn uses Kafka for system and application monitoring, providing application and system metrics to their businesses.', 'Datasoft primarily uses Kafka to collect social behavior and activity of users, providing a platform or API access for businesses to analyze and consume the data.', 'Uber uses Kafka for monitoring their games hosted on Facebook and other related activities.', 'Loggly uses Kafka for backend log collection and movement to the processing center, providing a neat framework for log management and reporting needs.', "Kafka outbeats ActiveMQ in terms of scale and throughput, despite ActiveMQ's feature-rich platform."]}, {'end': 4006.584, 'segs': [{'end': 3235.975, 'src': 'embed', 'start': 3207.682, 'weight': 2, 'content': [{'end': 3221.428, 'text': 'I would say RabbitMQ is still a very popular and preferred solution if you have a message pattern which has to be routed based on some conditions.', 'start': 3207.682, 'duration': 13.746}, {'end': 3227.231, 'text': 'So in those cases, yes, RabbitMQ plays a very good tool.', 'start': 3222.209, 'duration': 5.022}, {'end': 3235.975, 'text': "So I wouldn't prefer Kafka if you have to choose that based on some conditional routing.", 'start': 3227.811, 'duration': 8.164}], 'summary': 'Rabbitmq is preferred for conditional routing, kafka not recommended.', 'duration': 28.293, 'max_score': 3207.682, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE3207682.jpg'}, {'end': 3331.296, 'src': 'embed', 'start': 3296.051, 'weight': 0, 'content': [{'end': 3300.176, 'text': "that's where you could see exponential growth on the throughput from the producers.", 'start': 3296.051, 'duration': 4.125}, {'end': 3305.341, 'text': 'So what it means is basically it is recommended.', 'start': 3300.876, 'duration': 4.465}, {'end': 3313.824, 'text': 'if you do the batching in the messaging from the producer, you would have the throughput, which can be exponentially increased.', 'start': 3305.341, 'duration': 8.483}, {'end': 3320.688, 'text': 'But being said that, we cannot just ignore the fact that the other systems does not provide the batching.', 'start': 3314.264, 'duration': 6.424}, {'end': 3331.296, 'text': 'So if you compare one item or one message versus ActiveMQ, Kafka, RabbitMQ, you would see Kafka still stays ahead, but not that significant.', 'start': 3321.029, 'duration': 10.267}], 'summary': 'Batching in messaging can exponentially increase throughput, with kafka leading among similar systems.', 'duration': 35.245, 'max_score': 3296.051, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE3296051.jpg'}, {'end': 3404.809, 'src': 'embed', 'start': 3345.638, 'weight': 1, 'content': [{'end': 3349.021, 'text': 'Kafka seems to be a little bit more efficient on the storage format.', 'start': 3345.638, 'duration': 3.383}, {'end': 3360.832, 'text': 'On an average, it saves around, say, 140 bytes, so that seems to be very little, but when it comes to the volume of, say,', 'start': 3349.742, 'duration': 11.09}, {'end': 3369.299, 'text': 'millions of millions of messages per day or per hour, that could be your significant overhead or the significant cost factor.', 'start': 3360.832, 'duration': 8.467}, {'end': 3375.262, 'text': "Okay So let's quickly jump onto the consumer test.", 'start': 3370.42, 'duration': 4.842}, {'end': 3378.765, 'text': "Similarly, I'm just briefly gonna talk about the consumer test.", 'start': 3375.782, 'duration': 2.983}, {'end': 3391.575, 'text': 'So consumer test, I think you should definitely see the Kafka just outstands everything because it reads the topic from various different brokers.', 'start': 3379.545, 'duration': 12.03}, {'end': 3400.442, 'text': 'So that is one of the reasons where you see the throughput or the consumption aspect compared to any other system, Kafka outperforms.', 'start': 3392.296, 'duration': 8.146}, {'end': 3404.809, 'text': 'there is no any apples to apples comparison here.', 'start': 3402.148, 'duration': 2.661}], 'summary': "Kafka's storage efficiency saves 140 bytes/message, leading to significant overhead for millions of messages/day or hour; kafka outperforms in throughput and consumption.", 'duration': 59.171, 'max_score': 3345.638, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE3345638.jpg'}], 'start': 3184.469, 'title': 'Messaging system comparison and benchmarking', 'summary': "Compares activemq, kafka, and rabbitmq, favoring rabbitmq for message routing and presents benchmark results, highlighting kafka's superior throughput and consumption efficiency.", 'chapters': [{'end': 3235.975, 'start': 3184.469, 'title': 'Comparison of activemq, kafka, and rabbitmq', 'summary': 'Discusses the differences between activemq, kafka, and rabbitmq, highlighting rabbitmq as a preferred solution for message routing based on conditions, while emphasizing that the features of rabbitmq and activemq cannot be compared to kafka.', 'duration': 51.506, 'highlights': ['RabbitMQ is a popular and preferred solution for message routing based on conditions, compared to Kafka.', 'The features of RabbitMQ and ActiveMQ cannot be compared to Kafka.', 'RabbitMQ is feature rich and closest to Kafka in terms of features.']}, {'end': 4006.584, 'start': 3236.288, 'title': 'Kafka vs other messaging systems', 'summary': "Presents benchmark testing results comparing kafka with other messaging systems, highlighting kafka's superior throughput with batching and storage efficiency, as well as its outperformance in consumption compared to other systems.", 'duration': 770.296, 'highlights': ["Kafka's throughput with batching is exponentially higher than other systems, recommended for increased throughput. Kafka's throughput increases exponentially when batching messages in a batch of 50, outperforming other systems.", "Kafka's storage efficiency saves around 140 bytes on average, significant for high volume of messages per day or hour. Kafka is more efficient on the storage format, saving around 140 bytes on average, significant for high volumes of messages per day or hour.", 'Kafka outperforms other systems in consumption, reading topics from various brokers and standing out in consumption test. Kafka outperforms other systems in consumption, reading topics from various brokers and excelling in consumption tests.']}], 'duration': 822.115, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/BGhlHsFBhLE/pics/BGhlHsFBhLE3184469.jpg', 'highlights': ["Kafka's throughput with batching is exponentially higher than other systems, recommended for increased throughput. Kafka's throughput increases exponentially when batching messages in a batch of 50, outperforming other systems.", 'Kafka outperforms other systems in consumption, reading topics from various brokers and standing out in consumption test. Kafka outperforms other systems in consumption, reading topics from various brokers and excelling in consumption tests.', 'RabbitMQ is a popular and preferred solution for message routing based on conditions, compared to Kafka.', "Kafka's storage efficiency saves around 140 bytes on average, significant for high volume of messages per day or hour. Kafka is more efficient on the storage format, saving around 140 bytes on average, significant for high volumes of messages per day or hour."]}], 'highlights': ['LinkedIn processes over 40 to 50 billion messages in real time using Kafka for features like activity-driven newsfeeds and personalized recommendations.', "Kafka's throughput with batching is exponentially higher than other systems, recommended for increased throughput. Kafka's throughput increases exponentially when batching messages in a batch of 50, outperforming other systems.", 'The multiple broker model is recommended for production use, offering scalability, high availability, failover, and durability of messages even during a crash.', 'Apache Kafka supports a single node, single broker cluster mode, ideal for learning or building applications, with simple and clean configuration management.', "Zookeeper's role in Apache Kafka involves configuration management, service lookup, and coordination between nodes, with a focus on identifying lead brokers."]}