title
Apache Kafka Tutorial | What is Apache Kafka? | Kafka Tutorial for Beginners | Edureka

description
πŸ”₯ Apache Kafka Training (Use Code "π˜πŽπ”π“π”ππ„πŸπŸŽ"): https://www.edureka.co/kafka-certification-training ) This Apache Kafka Tutorial video will help you understand what is Apache Kafka & its features. It covers different components of Apache Kafka & it’s architecture..So, the topics which we will be discussing in this Apache Kafka Tutorial are: 1. Need of Messaging System 2. What is Kafka? 3. Kafka Features 4. Kafka Components 5. Kafka architecture 6. Installing Kafka 7. Working with Single Node Single Broker Cluster Subscribe to our channel to get video updates. Hit the subscribe button above. Check our complete Hadoop playlist here: https://goo.gl/hzUO0m - - - - - - - - - - - - - - How it Works? 1. This is a 5 Week Instructor led Online Course, with assignments and project work. 2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 3. Edureka certifies you as an Apache Kafka expert based on the project reviewed by our expert panel. - - - - - - - - - - - - - - About the Course Apache Kafka Certification Training is designed to provide you with the knowledge and skills to become a successful Kafka Big Data Developer. The training encompasses the fundamental concepts (such as Kafka Cluster and Kafka API) of Kafka and covers the advanced topics (such as Kafka Connect, Kafka streams, Kafka Integration with Hadoop, Storm and Spark) thereby enabling you to gain expertise in Apache Kafka. After the completion of Real-Time Analytics with Apache Kafka course at Edureka, you should be able to: Learn Kafka and its components Set up an end to end Kafka cluster along with Hadoop and YARN cluster Integrate Kafka with real time streaming systems like Spark & Storm Describe the basic and advanced features involved in designing and developing a high throughput messaging system Use Kafka to produce and consume messages from various sources including real time streaming sources like Twitter Get insights of Kafka API Get an insights of Kafka API Understand Kafka Stream APIs Work on a real-life project, β€˜Implementing Twitter Streaming with Kafka, Flume, Hadoop & Storm - - - - - - - - - - - - - - Who should go for this course? This course is designed for professionals who want to learn Kafka techniques and wish to apply it on Big Data. It is highly recommended for: Developers, who want to gain acceleration in their career as a "Kafka Big Data Developer" Testing Professionals, who are currently involved in Queuing and Messaging Systems Big Data Architects, who like to include Kafka in their ecosystem Project Managers, who are working on projects related to Messaging Systems Admins, who want to gain acceleration in their careers as a "Apache Kafka Administrator - - - - - - - - - - - - - - Why Learn Apache Kafka? Kafka training helps you gain expertise in Kafka Architecture, Installation, Configuration, Performance Tuning, Kafka Client APIs like Producer, Consumer and Stream APIs, Kafka Administration, Kafka Connect API and Kafka Integration with Hadoop, Storm and Spark using Twitter Streaming use case. - - - - - - - - - - - - - - For more information, Please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll-free). Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka Customer Review: Michael Harkins, System Architect, Hortonworks says: β€œThe courses are top rate. The best part is live instruction, with playback. But my favourite feature is viewing a previous class. Also, they are always there to answer questions, and prompt when you open an issue if you are having any trouble. Added bonus ~ you get lifetime access to the course you took!!! ~ This is the killer education app... I've take two courses, and I'm taking two more.”

detail
{'title': 'Apache Kafka Tutorial | What is Apache Kafka? | Kafka Tutorial for Beginners | Edureka', 'heatmap': [{'end': 334.526, 'start': 304.374, 'weight': 0.782}, {'end': 516.1, 'start': 440.562, 'weight': 0.898}, {'end': 567.115, 'start': 535.49, 'weight': 0.745}, {'end': 751.183, 'start': 699.463, 'weight': 0.705}, {'end': 1085.372, 'start': 1053.461, 'weight': 0.721}], 'summary': 'Provides an in-depth understanding of apache kafka, covering its features, data pipeline, durability, scalability, replica leader, setup process, and cluster architecture, emphasizing practical applications and real-time scenarios across various topics.', 'chapters': [{'end': 218.744, 'segs': [{'end': 33.158, 'src': 'embed', 'start': 0.24, 'weight': 4, 'content': [{'end': 1.961, 'text': 'Hello everyone, this is Shubham from Edureka.', 'start': 0.24, 'duration': 1.721}, {'end': 4.681, 'text': "The topic of our today's session is Kafka tutorial.", 'start': 2.321, 'duration': 2.36}, {'end': 8.502, 'text': "So without any further delay, let's move ahead and look at the agenda for today's session.", 'start': 5.121, 'duration': 3.381}, {'end': 12.083, 'text': 'I believe that it is important to understand the need of technology.', 'start': 9.143, 'duration': 2.94}, {'end': 14.444, 'text': "So we'll start with the need of messaging systems.", 'start': 12.263, 'duration': 2.181}, {'end': 17.505, 'text': "Then we'll understand what is Apache Kafka and its features.", 'start': 15.004, 'duration': 2.501}, {'end': 23.387, 'text': "Further advancing in our Kafka tutorial, we'll learn about the different components of Apache Kafka and its architecture.", 'start': 18.105, 'duration': 5.282}, {'end': 28.008, 'text': "At last, we'll install Apache Kafka and deploy a single node, single broker cluster.", 'start': 24.007, 'duration': 4.001}, {'end': 33.158, 'text': "So let's move ahead to our first topic that is need of messaging systems first.", 'start': 28.675, 'duration': 4.483}], 'summary': 'Shubham discusses apache kafka tutorial, covering messaging systems and deploying a single node, single broker cluster.', 'duration': 32.918, 'max_score': 0.24, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc240.jpg'}, {'end': 128.336, 'src': 'embed', 'start': 53.391, 'weight': 0, 'content': [{'end': 57.076, 'text': 'Now, you can have different types of server instead of chat server or a database server.', 'start': 53.391, 'duration': 3.685}, {'end': 61.581, 'text': 'It could be a web server, application server, FTP server, mail server, et cetera.', 'start': 57.496, 'duration': 4.085}, {'end': 69.55, 'text': 'Now, if it would have been 90s or early 21st century, your organization can go ahead with two or three servers and fulfill their requirements.', 'start': 62.021, 'duration': 7.529}, {'end': 73.915, 'text': 'But now in 2018, your organization cannot survive with two or three servers.', 'start': 70.11, 'duration': 3.805}, {'end': 77.457, 'text': 'Moving ahead with time the server also started increasing.', 'start': 74.495, 'duration': 2.962}, {'end': 83.66, 'text': 'Let us take an example of e-commerce scenario where it can have multiple servers at front end like Weber application server.', 'start': 77.717, 'duration': 5.943}, {'end': 85.561, 'text': 'It can have a Hadoop server.', 'start': 84.18, 'duration': 1.381}, {'end': 91.124, 'text': 'It can have charge server for the customers to provide chat facilities a separate payment server and Etc.', 'start': 86.081, 'duration': 5.043}, {'end': 94.778, 'text': 'Now all the servers want to communicate with database server.', 'start': 92.074, 'duration': 2.704}, {'end': 98.943, 'text': "So we'll have multiple data pipelines connecting all of them to database server.", 'start': 95.298, 'duration': 3.645}, {'end': 106.352, 'text': 'Similarly organization can have servers at the back end which will be receiving messages from different servers based on the requirements.', 'start': 99.403, 'duration': 6.949}, {'end': 111.018, 'text': 'They can have security systems For the user, authentication and authorization.', 'start': 106.873, 'duration': 4.145}, {'end': 117.605, 'text': 'they can have real-time monitoring systems which will gather data from various servers in real time and then show predictions to the users.', 'start': 111.018, 'duration': 6.587}, {'end': 126.954, 'text': 'Then they can also have a data warehouse where they can be dumping all their data for further analysis using various ETL or BI tools like Informatica,', 'start': 117.985, 'duration': 8.969}, {'end': 128.336, 'text': 'Pentaho, Power BI, et cetera.', 'start': 126.954, 'duration': 1.382}], 'summary': 'In 2018, organizations cannot survive with 2-3 servers, e-commerce scenarios require multiple servers for different functions and data analytics tools like informatica, pentaho, and power bi are used for analysis.', 'duration': 74.945, 'max_score': 53.391, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc53391.jpg'}], 'start': 0.24, 'title': "Kafka's features and data pipelines", 'summary': 'Covers the need for messaging systems, features of apache kafka, its components and architecture, and includes deployment of a single node, single broker cluster. it also discusses the challenges of managing complex data pipelines and emphasizes the need for messaging systems to ensure reliable data transfer across networks.', 'chapters': [{'end': 36.3, 'start': 0.24, 'title': 'Kafka tutorial overview', 'summary': 'Covers the need of messaging systems, features of apache kafka, different components and architecture of kafka, and includes deployment of a single node, single broker cluster in a session led by shubham from edureka.', 'duration': 36.06, 'highlights': ['Understanding the need of messaging systems in a real-time scenario is the first topic covered, emphasizing the importance of data pipelines.', 'Exploring the features of Apache Kafka is the next focus, providing insights into its functionalities and capabilities.', 'Learning about the different components and architecture of Apache Kafka is another key aspect of the session.', 'The session concludes with the installation of Apache Kafka and deployment of a single node, single broker cluster, providing practical hands-on experience.']}, {'end': 218.744, 'start': 36.4, 'title': 'Data pipelines and messaging systems', 'summary': 'Highlights the challenges of managing complex data pipelines when connecting multiple systems or servers, emphasizing the need for messaging systems to simplify communication and ensure reliable data transfer across networks.', 'duration': 182.344, 'highlights': ['The increase in the number of systems and servers has led to complex data pipelines, making it difficult to manage the flow of data and adding new systems or servers (Relevance: 5)', 'Different types of servers such as web server, application server, FTP server, mail server, etc., require multiple data pipelines to connect with the database server (Relevance: 4)', 'Organizations in 2018 cannot survive with just two or three servers, and with time, the number of servers has significantly increased (Relevance: 3)', 'Messaging systems reduce the complexity of data pipelines and enable simpler and manageable communication between systems, providing a common paradigm independent of platforms and languages (Relevance: 2)', 'The need for reliable communication across networks has led to the origin of messaging systems, allowing for asynchronous communication and ensuring message reliability even in adverse network conditions (Relevance: 1)']}], 'duration': 218.504, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc240.jpg', 'highlights': ['Exploring the features of Apache Kafka is the next focus, providing insights into its functionalities and capabilities.', 'Learning about the different components and architecture of Apache Kafka is another key aspect of the session.', 'Understanding the need of messaging systems in a real-time scenario is the first topic covered, emphasizing the importance of data pipelines.', 'The session concludes with the installation of Apache Kafka and deployment of a single node, single broker cluster, providing practical hands-on experience.', 'The increase in the number of systems and servers has led to complex data pipelines, making it difficult to manage the flow of data and adding new systems or servers (Relevance: 5)', 'Different types of servers such as web server, application server, FTP server, mail server, etc., require multiple data pipelines to connect with the database server (Relevance: 4)', 'Organizations in 2018 cannot survive with just two or three servers, and with time, the number of servers has significantly increased (Relevance: 3)', 'Messaging systems reduce the complexity of data pipelines and enable simpler and manageable communication between systems, providing a common paradigm independent of platforms and languages (Relevance: 2)', 'The need for reliable communication across networks has led to the origin of messaging systems, allowing for asynchronous communication and ensuring message reliability even in adverse network conditions (Relevance: 1)']}, {'end': 705.085, 'segs': [{'end': 334.526, 'src': 'heatmap', 'start': 304.374, 'weight': 0.782, 'content': [{'end': 307.976, 'text': 'messaging traditionally has two models queuing and publish subscribe.', 'start': 304.374, 'duration': 3.602}, {'end': 313.198, 'text': 'in a queue, a pool of consumers may read from a server and each record only goes to one of them.', 'start': 307.976, 'duration': 5.222}, {'end': 317.56, 'text': 'Whereas in publish subscribe the record is broadcasted to all the consumers.', 'start': 313.558, 'duration': 4.002}, {'end': 319.641, 'text': 'So multiple consumer can get the record.', 'start': 317.88, 'duration': 1.761}, {'end': 323.842, 'text': 'The Kafka cluster is distributed and have multiple machines running in parallel.', 'start': 320.061, 'duration': 3.781}, {'end': 327.344, 'text': 'This is the reason why Kafka is fast scalable and fault-tolerant.', 'start': 324.202, 'duration': 3.142}, {'end': 329.804, 'text': 'We will understand this briefly in upcoming slides guys.', 'start': 327.684, 'duration': 2.12}, {'end': 334.526, 'text': 'At last Kafka was developed at LinkedIn and later it became a part of Apache project.', 'start': 330.325, 'duration': 4.201}], 'summary': 'Kafka utilizes queuing and publish-subscribe models, is fast, scalable, and fault-tolerant due to its distributed cluster with multiple machines, and was developed at linkedin before becoming an apache project.', 'duration': 30.152, 'max_score': 304.374, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc304374.jpg'}, {'end': 516.1, 'src': 'heatmap', 'start': 440.562, 'weight': 0.898, 'content': [{'end': 446.956, 'text': '20% reported that the number is growing a lot and 52% of organizations have at least six systems running Kafka.', 'start': 440.562, 'duration': 6.394}, {'end': 448.622, 'text': 'Now the ground is all set.', 'start': 447.541, 'duration': 1.081}, {'end': 451.185, 'text': 'Let us understand the basic terminologies of Apache Kafka.', 'start': 448.743, 'duration': 2.442}, {'end': 453.288, 'text': 'Let us start with topic first.', 'start': 451.686, 'duration': 1.602}, {'end': 457.112, 'text': 'topic is a category or feed name to which records are published.', 'start': 453.288, 'duration': 3.824}, {'end': 460.516, 'text': 'topic in Kafka are always multi-subscriber, that is,', 'start': 457.112, 'duration': 3.404}, {'end': 466.763, 'text': 'topic can have zero more or many consumers that can subscribe to the topic and consume the data written to Kafka.', 'start': 460.516, 'duration': 6.247}, {'end': 472.808, 'text': 'You can have sales record getting published to sales topic, product records getting published to product topic and so on.', 'start': 467.103, 'duration': 5.705}, {'end': 478.473, 'text': 'this will actually segregate your messages and the consumer will only subscribe to topic which they need.', 'start': 472.808, 'duration': 5.665}, {'end': 481.376, 'text': 'Kafka topics are divided into number of partitions.', 'start': 478.473, 'duration': 2.903}, {'end': 487.621, 'text': 'partitions allows you to paralyze a topic by splitting the data in a particular topic across multiple brokers.', 'start': 481.376, 'duration': 6.245}, {'end': 493.927, 'text': 'each partition can be placed on a separate machine to allow multiple consumers to read from a topic parallelly.', 'start': 487.621, 'duration': 6.306}, {'end': 499.992, 'text': 'So, in case of sales topic, you can have three partitions from where three consumers can read data parallelly.', 'start': 494.347, 'duration': 5.645}, {'end': 503.134, 'text': 'as we already discussed, producer are used to publish data to Kafka.', 'start': 499.992, 'duration': 3.142}, {'end': 506.597, 'text': 'producer publishes the data to the topics of their own choice.', 'start': 503.134, 'duration': 3.463}, {'end': 512.799, 'text': 'The next is consumer so consumer can subscribe to one topic and consumes data from that topic.', 'start': 507.276, 'duration': 5.523}, {'end': 516.1, 'text': 'You can have multiple consumers in a consumer group.', 'start': 513.039, 'duration': 3.061}], 'summary': '52% of organizations run at least 6 kafka systems, with topics allowing multi-subscriber consumption and partitioning for parallel data reading.', 'duration': 75.538, 'max_score': 440.562, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc440562.jpg'}, {'end': 595.355, 'src': 'heatmap', 'start': 535.49, 'weight': 0, 'content': [{'end': 539.453, 'text': 'Consumer instances can be a separate processes or can be separate machines.', 'start': 535.49, 'duration': 3.963}, {'end': 541.215, 'text': 'Now next we have brokers.', 'start': 539.754, 'duration': 1.461}, {'end': 544.878, 'text': 'So brokers are a single machine in a Kafka cluster at last.', 'start': 541.595, 'duration': 3.283}, {'end': 548.481, 'text': 'We have zookeeper zookeeper is another Apache open source project.', 'start': 544.938, 'duration': 3.543}, {'end': 554.446, 'text': 'It stores the metadata information related to Kafka cluster like broker information topic details, etc.', 'start': 548.821, 'duration': 5.625}, {'end': 556.267, 'text': "We'll be talking about zookeeper in a while guys.", 'start': 554.586, 'duration': 1.681}, {'end': 558.549, 'text': 'Let us move ahead and look at Kafka cluster.', 'start': 556.687, 'duration': 1.862}, {'end': 567.115, 'text': 'Now here we can see that we have multiple producers producing data to Kafka broker and those Kafka broker reside inside a Kafka cluster again.', 'start': 558.929, 'duration': 8.186}, {'end': 571.979, 'text': 'We have multiple consumers consuming the data from the Kafka broker, or you can say Kafka cluster,', 'start': 567.155, 'duration': 4.824}, {'end': 574.701, 'text': 'and this Kafka cluster is getting managed by zookeeper.', 'start': 571.979, 'duration': 2.722}, {'end': 578.244, 'text': 'So zookeeper is basically maintaining the metadata of your Kafka cluster.', 'start': 574.941, 'duration': 3.303}, {'end': 580.605, 'text': "Now, let's move and take a look at Kafka features.", 'start': 578.644, 'duration': 1.961}, {'end': 582.547, 'text': 'The first feature is high throughput.', 'start': 581.186, 'duration': 1.361}, {'end': 587.09, 'text': 'So, basically, throughput is the amount of data passing through a system or a process.', 'start': 582.727, 'duration': 4.363}, {'end': 591.973, 'text': 'in terms of Kafka, producer throughput is the number of messages getting produced at Kafka,', 'start': 587.09, 'duration': 4.883}, {'end': 595.355, 'text': 'and consumer throughput is the number of messages that is getting consumed.', 'start': 591.973, 'duration': 3.382}], 'summary': 'Kafka cluster with multiple producers and consumers, managed by zookeeper, offers high throughput.', 'duration': 50.417, 'max_score': 535.49, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc535490.jpg'}, {'end': 635.208, 'src': 'embed', 'start': 611.628, 'weight': 1, 'content': [{'end': 619.675, 'text': 'but let me tell you one thing never try exceeding the number of consumers in a consumer group from the number of partitions of a topic that the consumer is consuming,', 'start': 611.628, 'duration': 8.047}, {'end': 623.259, 'text': 'because the consumer needs to consume the messages sequentially from the partition.', 'start': 619.675, 'duration': 3.584}, {'end': 630.045, 'text': 'and if we have multiple consumers consuming messages from a single partition, then the consumer group cannot figure out the sequence.', 'start': 623.259, 'duration': 6.786}, {'end': 635.208, 'text': 'So we can only have a single consumer consuming from a partition next is scalability.', 'start': 630.605, 'duration': 4.603}], 'summary': 'Avoid exceeding consumer group size compared to partitions for sequential message consumption.', 'duration': 23.58, 'max_score': 611.628, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc611628.jpg'}, {'end': 679.894, 'src': 'embed', 'start': 650.817, 'weight': 2, 'content': [{'end': 652.778, 'text': 'You can see that the last feature is replication.', 'start': 650.817, 'duration': 1.961}, {'end': 654.499, 'text': 'When you create a topic,', 'start': 653.178, 'duration': 1.321}, {'end': 660.842, 'text': 'you can specify the replication factor over there and Kafka will replicate the topic into multiple brokers in your Kafka cluster.', 'start': 654.499, 'duration': 6.343}, {'end': 667.565, 'text': 'For a topic with replication factor n, it can tolerate up to n-1 server failures without losing any record committed to the lock.', 'start': 661.182, 'duration': 6.383}, {'end': 670.037, 'text': 'The next is stream processing.', 'start': 668.254, 'duration': 1.783}, {'end': 674.224, 'text': 'Kafka stream processor takes continual stream of data from input topics,', 'start': 670.037, 'duration': 4.187}, {'end': 679.894, 'text': 'perform some processing operations on this input and produces continual stream of data to output topics.', 'start': 674.224, 'duration': 5.67}], 'summary': 'Kafka allows specifying replication factor for topics, tolerating up to n-1 server failures. it enables stream processing for continual data streams.', 'duration': 29.077, 'max_score': 650.817, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc650817.jpg'}], 'start': 219.325, 'title': "Understanding apache kafka's data pipeline and its overview", 'summary': "Delves into how kafka de-couples the data pipeline, facilitating message production and consumption, and discusses its adoption by fortune 500 companies, with key statistics from linkedin's implementation. it also covers basic terminologies and features like high throughput, scalability, data loss prevention, replication, and stream processing.", 'chapters': [{'end': 329.804, 'start': 219.325, 'title': "Understanding kafka's data pipeline", 'summary': 'Explains how kafka solves the problem by de-coupling the data pipeline, enabling easy message production and consumption for various applications, and its similarity to a distributed publish-subscribe messaging system with fast, scalable, and fault-tolerant characteristics.', 'duration': 110.479, 'highlights': ['Kafka de-couples the data pipeline, enabling easy message production and consumption for various applications. Kafka solves the complexity problem by allowing applications to produce and consume messages, with front-end, Hadoop, and database servers as producers and database, security systems, and real-time monitoring systems as consumers.', 'Kafka operates similar to a distributed publish-subscribe messaging system with fast, scalable, and fault-tolerant characteristics. Apache Kafka is a distributed publish-subscribe messaging system, broadcasting messages to multiple consumers, running on parallel machines to achieve fast, scalable, and fault-tolerant operations.', "Kafka's analogy to radio stations and consumers simplifies the understanding of its message broadcasting and consumption. The analogy of Kafka to radio stations, broadcasting towers, and consumers simplifies the understanding of message broadcasting and consumption, similar to different radio stations broadcasting messages and multiple listeners tuning in."]}, {'end': 705.085, 'start': 330.325, 'title': 'Apache kafka overview', 'summary': "Discusses the usage and growth of apache kafka, including its adoption by fortune 500 companies and key statistics from linkedin's implementation, as well as an overview of basic terminologies and features, such as high throughput, scalability, data loss prevention, replication, and stream processing.", 'duration': 374.76, 'highlights': ["LinkedIn has 1,100+ commodity machines running Kafka, with 31,000+ topics, 350,000+ partitions, and 675 billion messages per day, handling peak loads of 10.5 million messages per second and 18.5 GB per second inbound, demonstrating the scale and performance of Kafka at LinkedIn. LinkedIn's implementation of Kafka includes 1,100+ machines, 31,000+ topics, 350,000+ partitions, and 675 billion messages per day, with peak loads of 10.5 million messages per second and 18.5 GB per second inbound.", 'More than one third of Fortune 500 companies use Apache Kafka, including top travel, banking, insurance, and telecom companies, along with major tech players like LinkedIn, Microsoft, and Netflix, processing billions of messages per day, reflecting the widespread adoption of Kafka in diverse industries. Apache Kafka is utilized by over one third of Fortune 500 companies, including top players in travel, banking, insurance, telecom, and tech, processing billions of messages per day.', '86% of respondents reported an increasing number of systems using Kafka, with 20% experiencing significant growth, and 52% of organizations running at least six systems on Kafka, indicating the rising popularity and usage of Kafka in various organizational setups. The survey indicates that 86% of respondents are witnessing a growth in Kafka usage, with 20% reporting significant increases, and 52% of organizations running at least six systems on Kafka.', "Kafka's features include high throughput, scalability, data loss prevention, replication, and stream processing, enabling distributed, scalable, fault-tolerant, and real-time processing of data for a wide range of use cases and industries. Key features of Kafka include high throughput, scalability, data loss prevention, replication, and stream processing, catering to diverse data processing needs."]}], 'duration': 485.76, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc219325.jpg', 'highlights': ["LinkedIn's implementation of Kafka includes 1,100+ machines, 31,000+ topics, 350,000+ partitions, and 675 billion messages per day, with peak loads of 10.5 million messages per second and 18.5 GB per second inbound.", 'Apache Kafka is utilized by over one third of Fortune 500 companies, including top players in travel, banking, insurance, telecom, and tech, processing billions of messages per day.', "Kafka's features include high throughput, scalability, data loss prevention, replication, and stream processing, catering to diverse data processing needs.", 'Kafka solves the complexity problem by allowing applications to produce and consume messages, with front-end, Hadoop, and database servers as producers and database, security systems, and real-time monitoring systems as consumers.', 'The analogy of Kafka to radio stations, broadcasting towers, and consumers simplifies the understanding of message broadcasting and consumption, similar to different radio stations broadcasting messages and multiple listeners tuning in.']}, {'end': 912.169, 'segs': [{'end': 848.679, 'src': 'embed', 'start': 820.927, 'weight': 4, 'content': [{'end': 824.729, 'text': 'the partition in the log server basically serves two important purposes first.', 'start': 820.927, 'duration': 3.802}, {'end': 832.533, 'text': 'They allow the log to scale beyond a size that will fit into a single server only individual partition must fit on the server that is hosted.', 'start': 825.169, 'duration': 7.364}, {'end': 838.595, 'text': 'and a topic may have many partitions so it can handle an arbitrary amount of data for an example.', 'start': 833.053, 'duration': 5.542}, {'end': 844.757, 'text': 'You can have a 1 DB of data and you can store those logs using three broker having 500 GB of capacity,', 'start': 838.655, 'duration': 6.102}, {'end': 848.679, 'text': 'as you can break your topic in three partitions and you can store them on different brokers.', 'start': 844.757, 'duration': 3.922}], 'summary': 'Log server partitions enable scaling and distribute data across brokers. example: 1 db split into 3 partitions on 3 brokers with 500 gb capacity.', 'duration': 27.752, 'max_score': 820.927, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc820927.jpg'}, {'end': 887.917, 'src': 'embed', 'start': 857.179, 'weight': 0, 'content': [{'end': 862.743, 'text': 'So, as you have three partitions over here partition 0, partition 1, partition 2 so you can have three consumers.', 'start': 857.179, 'duration': 5.564}, {'end': 865.305, 'text': 'consumer 0 will be consuming data from partition 0.', 'start': 862.743, 'duration': 2.562}, {'end': 869.749, 'text': 'consumer 1 may consume data from partition 1 and consumer 2 may consume data from partition 2..', 'start': 865.305, 'duration': 4.444}, {'end': 876.615, 'text': 'So this will give a parallelism to your topic or you can say it will give you a parallelism for processing the data that is stored in your topic.', 'start': 869.749, 'duration': 6.866}, {'end': 882.153, 'text': 'So now we have a topic over here which have four partitions 0 1 2 3.', 'start': 877.532, 'duration': 4.621}, {'end': 885.356, 'text': 'Now we are replicating our first partition three times.', 'start': 882.154, 'duration': 3.202}, {'end': 887.917, 'text': 'So we have three replication created over here.', 'start': 885.796, 'duration': 2.121}], 'summary': 'Three consumers can parallelly process data from four partitions, with three partitions being replicated.', 'duration': 30.738, 'max_score': 857.179, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc857179.jpg'}], 'start': 705.505, 'title': "Kafka's durability and scalability", 'summary': "Delves into kafka's durability, focusing on its persistency, replication, ordered sequence, and append-only structure. it also explores scalability, retention policy, unordered consumption, and benefits of partitioning, offering insights into fault tolerance and parallelism.", 'chapters': [{'end': 759.948, 'start': 705.505, 'title': 'Kafka durability and components overview', 'summary': "Discusses kafka's durability feature, stating that data written to kafka is persisted to disk and replicated for fault tolerance, while also delving into the ordered, immutable sequence of records in partitions and the append-only data structure, and highlighting the specifics of partitions and offsets.", 'duration': 54.443, 'highlights': ["Kafka allows producer to wait on acknowledgement so that right isn't considered complete until it is fully replicated and guaranteed.", 'Each partition is an ordered, immutable sequence of records that is continually appended, with records assigned a sequential ID called offset.', 'Partition has append-only data structure, where only new records can be added and cannot be removed or changed.']}, {'end': 912.169, 'start': 759.948, 'title': 'Kafka cluster durability and scalability', 'summary': 'Discusses the durability and scalability features of a kafka cluster, including the retention policy of seven days, the ability to consume records in any order, and the benefits of partitioning for scaling and parallelism.', 'duration': 152.221, 'highlights': ['Kafka cluster retains all published records with a default retention policy of seven days The Kafka cluster retains all published records, with a default retention policy of seven days, providing durability and ensuring that the records are available for a specified period.', 'Ability for consumers to consume records in any order and reset to older offsets Consumers can consume records in any order and have the flexibility to reset to older offsets for reprocessing data, providing a high degree of control over data consumption.', 'Partitioning allows scaling beyond single server capacity and provides parallelism for processing data Partitioning enables scaling beyond single server capacity and provides parallelism for processing data, allowing for arbitrary amounts of data to be handled and multiple consumers to consume data from different partitions.']}], 'duration': 206.664, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc705505.jpg', 'highlights': ['Kafka allows producer to wait on acknowledgement for full replication.', 'Each partition is an ordered, immutable sequence of records with sequential IDs.', 'Partition has an append-only data structure, allowing only new records to be added.', 'Kafka cluster retains all published records with a default retention policy of seven days.', 'Consumers can consume records in any order and reset to older offsets for reprocessing data.', 'Partitioning enables scaling beyond single server capacity and provides parallelism for processing data.']}, {'end': 1337.412, 'segs': [{'end': 1085.372, 'src': 'heatmap', 'start': 1053.461, 'weight': 0.721, 'content': [{'end': 1061.856, 'text': 'So if a consumer goes down, It can store the current state or current offset that it is consuming from a partition and when it comes back,', 'start': 1053.461, 'duration': 8.395}, {'end': 1064.098, 'text': 'it can again start consuming from the same offset.', 'start': 1061.856, 'duration': 2.242}, {'end': 1069.401, 'text': 'So it can just store the current state and it can restore back to the current state.', 'start': 1064.718, 'duration': 4.683}, {'end': 1071.282, 'text': 'Next we have Zookeeper.', 'start': 1070.282, 'duration': 1}, {'end': 1074.124, 'text': 'Zookeeper basically performs three major functions.', 'start': 1072.043, 'duration': 2.081}, {'end': 1077.847, 'text': 'Electing a controller, cluster membership and topic configuration.', 'start': 1074.645, 'duration': 3.202}, {'end': 1085.372, 'text': 'So what is a controller? The controller is one of the broker and is responsible for maintaining leader-follower relationship for all partitions.', 'start': 1078.327, 'duration': 7.045}], 'summary': 'Consumers can store and restore current state or offset; zookeeper handles controller election, cluster membership, and topic configuration.', 'duration': 31.911, 'max_score': 1053.461, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1053461.jpg'}, {'end': 1091.952, 'src': 'embed', 'start': 1061.856, 'weight': 2, 'content': [{'end': 1064.098, 'text': 'it can again start consuming from the same offset.', 'start': 1061.856, 'duration': 2.242}, {'end': 1069.401, 'text': 'So it can just store the current state and it can restore back to the current state.', 'start': 1064.718, 'duration': 4.683}, {'end': 1071.282, 'text': 'Next we have Zookeeper.', 'start': 1070.282, 'duration': 1}, {'end': 1074.124, 'text': 'Zookeeper basically performs three major functions.', 'start': 1072.043, 'duration': 2.081}, {'end': 1077.847, 'text': 'Electing a controller, cluster membership and topic configuration.', 'start': 1074.645, 'duration': 3.202}, {'end': 1085.372, 'text': 'So what is a controller? The controller is one of the broker and is responsible for maintaining leader-follower relationship for all partitions.', 'start': 1078.327, 'duration': 7.045}, {'end': 1091.952, 'text': 'when a node shuts down, it is the controller that tells other replicas to become partition leaders.', 'start': 1086.167, 'duration': 5.785}], 'summary': 'Systems can restore to previous state and zookeeper elects controller, handles cluster membership, and configures topics.', 'duration': 30.096, 'max_score': 1061.856, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1061856.jpg'}, {'end': 1132.79, 'src': 'embed', 'start': 1098.818, 'weight': 3, 'content': [{'end': 1100.999, 'text': 'It also elects a new one if it crashes.', 'start': 1098.818, 'duration': 2.181}, {'end': 1106.484, 'text': 'So in cluster membership zookeeper checks which broker are alive and are part of the cluster.', 'start': 1101.84, 'duration': 4.644}, {'end': 1115.727, 'text': 'Then again, at last, in topic configurations, zookeeper keeps a track of which topic exists, how many partitions each topic has?', 'start': 1107.145, 'duration': 8.582}, {'end': 1116.907, 'text': 'where are the replicas?', 'start': 1115.727, 'duration': 1.18}, {'end': 1118.527, 'text': 'who is the preferred leader Etc?', 'start': 1116.907, 'duration': 1.62}, {'end': 1124.829, 'text': 'So you can see that zookeeper is doing too much coordinating things between the broker in the Kafka cluster,', 'start': 1119.407, 'duration': 5.422}, {'end': 1132.79, 'text': 'and it also stores thought of information about the Kafka cluster so that it helps in providing a reliable system whenever a node fails.', 'start': 1124.829, 'duration': 7.961}], 'summary': 'Zookeeper coordinates broker in kafka cluster, tracks topics, and ensures reliability.', 'duration': 33.972, 'max_score': 1098.818, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1098818.jpg'}, {'end': 1165.474, 'src': 'embed', 'start': 1138.998, 'weight': 4, 'content': [{'end': 1143.721, 'text': 'We have a producer producing messages to a topic and this topic has three partitions.', 'start': 1138.998, 'duration': 4.723}, {'end': 1147.883, 'text': 'Now, there are three consumers which are consuming from these partitions.', 'start': 1144.581, 'duration': 3.302}, {'end': 1156.428, 'text': 'So we have two producer first is producing in partition a and partition B and then we have producer B which is producing to partition 3.', 'start': 1148.684, 'duration': 7.744}, {'end': 1160.271, 'text': 'then we can see here that we have one consumer for each partition.', 'start': 1156.428, 'duration': 3.843}, {'end': 1165.474, 'text': 'So this is the best case scenario where you have total parallelism in processing those messages.', 'start': 1160.791, 'duration': 4.683}], 'summary': 'Three producers and three consumers achieve full parallelism in processing messages across three partitions.', 'duration': 26.476, 'max_score': 1138.998, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1138998.jpg'}, {'end': 1246.66, 'src': 'embed', 'start': 1223.3, 'weight': 0, 'content': [{'end': 1232.788, 'text': 'This means site activity like pageview search or other action user may take is published into a central topic with one topic per activity type.', 'start': 1223.3, 'duration': 9.488}, {'end': 1238.433, 'text': 'These feeds are available for subscription for a range of use cases, including real-time processing,', 'start': 1233.208, 'duration': 5.225}, {'end': 1244.318, 'text': 'real-time monitoring and loading into various Hadoop or offline data warehousing systems for further reporting.', 'start': 1238.433, 'duration': 5.885}, {'end': 1246.66, 'text': 'So next we have metrics and logging.', 'start': 1245.159, 'duration': 1.501}], 'summary': 'Site activities are published into a central topic for subscriptions, including real-time processing and loading into data warehousing systems.', 'duration': 23.36, 'max_score': 1223.3, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1223300.jpg'}, {'end': 1301.68, 'src': 'embed', 'start': 1268.65, 'weight': 1, 'content': [{'end': 1270.911, 'text': 'It could be HDFS perhaps for processing.', 'start': 1268.65, 'duration': 2.261}, {'end': 1276.293, 'text': 'Kafka abstract away the detail of files and give a cleaner abstraction of logs.', 'start': 1271.511, 'duration': 4.782}, {'end': 1279.207, 'text': 'or even data as a stream of messages.', 'start': 1277.086, 'duration': 2.121}, {'end': 1290.234, 'text': 'This allows lower latency processing and easy support for multiple data sources and distribute data consumption in comparison to log centric systems like scribe or flume.', 'start': 1279.848, 'duration': 10.386}, {'end': 1295.997, 'text': 'Kafka offers equally good performance, stronger durability guarantees and much lower end-to-end latency.', 'start': 1290.234, 'duration': 5.763}, {'end': 1301.68, 'text': 'Next we have commit logs Kafa can serve as a kind of external commit logs for a distributed system.', 'start': 1296.737, 'duration': 4.943}], 'summary': 'Kafka enables lower latency processing, supports multiple data sources, and offers stronger durability guarantees.', 'duration': 33.03, 'max_score': 1268.65, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1268650.jpg'}], 'start': 912.865, 'title': "Kafka's replica leader and use cases", 'summary': "Delves into the election of a replica as a leader, message handling, partitioning, and zookeeper's role, along with diverse use cases such as messaging, activity tracking, operational monitoring, and serving as external commit logs.", 'chapters': [{'end': 1160.271, 'start': 912.865, 'title': 'Kafka replica leader and message handling', 'summary': 'Explains the election of a replica as a leader, the handling of messages including partitioning, and the functions of zookeeper in ensuring a reliable kafka cluster.', 'duration': 247.406, 'highlights': ['The replica in broker 5 is elected as a leader for a particular partition, serving until failure when another replica becomes leader, ensuring continuity and fault tolerance.', 'Messages are directed to partitions using a message key, allowing for load balancing and segregation of data based on partitions, ensuring efficient message handling.', 'Zookeeper performs key functions such as electing a controller, managing cluster membership, and tracking topic configurations, ensuring a reliable and coordinated Kafka cluster.']}, {'end': 1337.412, 'start': 1160.791, 'title': 'Kafka use cases overview', 'summary': 'Explores various use cases of kafka, including messaging with better throughput and fault tolerance, activity tracking for real-time pub sub feeds, operational monitoring data for log aggregation, and serving as external commit logs for a distributed system.', 'duration': 176.621, 'highlights': ['Kafka is a good solution for large-scale message processing applications with better throughput, built-in partitioning, replication, and fault tolerance. Kafka offers better throughput, built-in partitioning, replication, and fault tolerance, making it suitable for large-scale message processing applications.', 'Kafka was originally used at LinkedIn to rebuild user activity tracking pipelines as a set of real-time pub sub feeds for site activity tracking. LinkedIn used Kafka to rebuild user activity tracking pipelines as real-time pub sub feeds for tracking site activity like pageviews, searches, and other user actions.', 'Kafka is often used for operational monitoring data and can act as a replacement for log aggregation solutions, offering lower end-to-end latency and stronger durability guarantees. Kafka is utilized for operational monitoring and can replace log aggregation solutions, providing lower latency and stronger durability guarantees.', 'Kafka can serve as external commit logs for a distributed system, with the log compaction feature supporting this use case. Kafka can act as external commit logs for distributed systems, supported by the log compaction feature.']}], 'duration': 424.547, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc912865.jpg', 'highlights': ['Kafka offers better throughput, built-in partitioning, replication, and fault tolerance, making it suitable for large-scale message processing applications.', 'Kafka was originally used at LinkedIn to rebuild user activity tracking pipelines as real-time pub sub feeds for tracking site activity like pageviews, searches, and other user actions.', 'Kafka can act as external commit logs for distributed systems, supported by the log compaction feature.', 'Messages are directed to partitions using a message key, allowing for load balancing and segregation of data based on partitions, ensuring efficient message handling.', 'The replica in broker 5 is elected as a leader for a particular partition, serving until failure when another replica becomes leader, ensuring continuity and fault tolerance.', 'Zookeeper performs key functions such as electing a controller, managing cluster membership, and tracking topic configurations, ensuring a reliable and coordinated Kafka cluster.', 'Kafka is often used for operational monitoring data and can act as a replacement for log aggregation solutions, offering lower end-to-end latency and stronger durability guarantees.']}, {'end': 1783.855, 'segs': [{'end': 1422.197, 'src': 'embed', 'start': 1385.363, 'weight': 2, 'content': [{'end': 1391.867, 'text': 'You can see we have two versions that is for Scala 2.11 and we also have for Scala 2.12.', 'start': 1385.363, 'duration': 6.504}, {'end': 1394.948, 'text': 'So you can go ahead and download it as per your needs.', 'start': 1391.867, 'duration': 3.081}, {'end': 1401.492, 'text': 'I would suggest go with Scala 2.12.', 'start': 1395.449, 'duration': 6.043}, {'end': 1402.272, 'text': 'Click on this link.', 'start': 1401.492, 'duration': 0.78}, {'end': 1405.634, 'text': 'And now you can go ahead and save this tar.gz file.', 'start': 1403.493, 'duration': 2.141}, {'end': 1414.234, 'text': "Now it's Apache Kafka has been downloaded.", 'start': 1412.654, 'duration': 1.58}, {'end': 1415.595, 'text': "Let's go to our download folder.", 'start': 1414.254, 'duration': 1.341}, {'end': 1422.197, 'text': 'As you can see we have Kafka 2.12.', 'start': 1418.576, 'duration': 3.621}], 'summary': 'Apache kafka offers versions for scala 2.11 and 2.12. suggested to use scala 2.12 for download.', 'duration': 36.834, 'max_score': 1385.363, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1385363.jpg'}, {'end': 1487.003, 'src': 'embed', 'start': 1459.752, 'weight': 0, 'content': [{'end': 1465.013, 'text': "So you can go ahead and save this thing but I won't be doing it because I have already Kafka installed in my system.", 'start': 1459.752, 'duration': 5.261}, {'end': 1469.435, 'text': "Now you're all set you can go ahead and start your Zookeeper server first.", 'start': 1466.034, 'duration': 3.401}, {'end': 1473.716, 'text': 'So for that you need to go inside your Kafka root directory.', 'start': 1470.075, 'duration': 3.641}, {'end': 1487.003, 'text': 'Now, inside bin folder, you will have a script called zookeeper server start.sh, and now you need to provide the zookeeper configuration file,', 'start': 1476.218, 'duration': 10.785}], 'summary': 'Kafka is already installed, start zookeeper server in kafka root directory.', 'duration': 27.251, 'max_score': 1459.752, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1459752.jpg'}], 'start': 1337.412, 'title': 'Apache kafka and zookeeper setup', 'summary': 'Covers the installation process of apache kafka and zookeeper, including the requirement of java, downloading the latest version of kafka, choosing between scala 2.11 and scala 2.12, setting up kafka and zookeeper, untarring files, configuring server properties, starting zookeeper and kafka broker, creating and listing topics, and starting console consumer and producer.', 'chapters': [{'end': 1422.197, 'start': 1337.412, 'title': 'Installing apache kafka and zookeeper', 'summary': 'Covers the installation process of apache kafka and zookeeper, including the requirement of java, downloading the latest version of kafka, and choosing between scala 2.11 and scala 2.12.', 'duration': 84.785, 'highlights': ['Apache Kafka requires Java for installation. Java is required for installing Apache Kafka.', 'The latest version of Kafka is 1.0.0, available for Scala 2.11 and Scala 2.12. The latest version of Kafka is 1.0.0, offered for Scala 2.11 and Scala 2.12.', 'Choosing between Scala 2.11 and Scala 2.12 for downloading Kafka. Users have the option to choose between Scala 2.11 and Scala 2.12 for downloading Kafka.']}, {'end': 1783.855, 'start': 1422.197, 'title': 'Setting up kafka and zookeeper', 'summary': 'Explains the process of setting up kafka and zookeeper, including instructions on untarring files, configuring server properties, starting zookeeper and kafka broker, creating and listing topics, and starting console consumer and producer.', 'duration': 361.658, 'highlights': ['The chapter includes instructions on starting zookeeper and Kafka broker, creating and listing topics, and starting console consumer and producer.', 'The script file for creating a topic is Kafka topic dot sh, and for listing topics is Kafka topic dot script file.', 'The port for zookeeper is specified as 2181, and for Kafka broker is 9092 by default, and log retention hours is set at 168, which is equal to seven days.', "The number of partitions and replication factor for the created topic is specified as one, and the name of the created topic is 'test-edureka'.", 'The log directory of the Kafka broker is specified as temp Kafka hyphen logs, and the default number of partitions is one.']}], 'duration': 446.443, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1337412.jpg', 'highlights': ['The latest version of Kafka is 1.0.0, available for Scala 2.11 and Scala 2.12.', 'Choosing between Scala 2.11 and Scala 2.12 for downloading Kafka.', 'The script file for creating a topic is Kafka topic dot sh, and for listing topics is Kafka topic dot script file.', 'The port for zookeeper is specified as 2181, and for Kafka broker is 9092 by default, and log retention hours is set at 168, which is equal to seven days.', "The number of partitions and replication factor for the created topic is specified as one, and the name of the created topic is 'test-edureka'."]}, {'end': 2052.525, 'segs': [{'end': 2015.19, 'src': 'embed', 'start': 1972.849, 'weight': 0, 'content': [{'end': 1976.01, 'text': 'the leader of this partition is again broker 0,', 'start': 1972.849, 'duration': 3.161}, {'end': 1984.312, 'text': 'the replica is stored and again broker 0 and the in sync replica that is the replica which is mostly in sync with your leader.', 'start': 1976.01, 'duration': 8.302}, {'end': 1987.593, 'text': "replica is again 0 because you don't have a single load multi broker cluster.", 'start': 1984.312, 'duration': 3.281}, {'end': 1995.098, 'text': 'So you can see the details and at last we have all consumer and producers operating on that.', 'start': 1988.374, 'duration': 6.724}, {'end': 1998.02, 'text': 'partition must be connected to the leader now moving ahead.', 'start': 1995.098, 'duration': 2.922}, {'end': 2002.142, 'text': 'These are the type of Kafka clusters single node single broker cluster.', 'start': 1998.96, 'duration': 3.182}, {'end': 2005.044, 'text': 'So we have executed this one here.', 'start': 2002.903, 'duration': 2.141}, {'end': 2011.588, 'text': 'We can have multiple producers, but there will be one Kafka broker in the whole cluster and then we will have zookeeper,', 'start': 2005.084, 'duration': 6.504}, {'end': 2015.19, 'text': 'from which the Kafka cluster or you can see, the single broker would be connected.', 'start': 2011.588, 'duration': 3.602}], 'summary': 'Kafka cluster has single broker, in-sync replica and multiple producers operating', 'duration': 42.341, 'max_score': 1972.849, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1972849.jpg'}], 'start': 1783.855, 'title': 'Kafka console producer and consumer and kafka cluster architecture', 'summary': 'Covers how to use kafka console producer and console consumer, explaining their parameters, differences, and benefits for verifying data in a single node single broker cluster. it also delves into kafka cluster architecture, discussing cluster controller responsibilities, partition ownership, and different types of kafka clusters including single node single broker, single node multi broker, and multi-node multi broker clusters.', 'chapters': [{'end': 1915.322, 'start': 1783.855, 'title': 'Kafka console producer and consumer', 'summary': 'Explains how to use kafka console producer and console consumer, highlighting the parameters to be specified, the difference between the two, and the benefit of using them for verifying data in a single node single broker cluster.', 'duration': 131.467, 'highlights': ["The developer use those Kafka console producer or Kafka console consumer to rectify those things, when you're writing a producer API or consumer API and you're producing your data to Kafka.", 'In producer we will pass hyphen hyphen broker list, while in consumer we need to pass hyphen hyphen bootstrap server.', 'Zookeeper maintains the metadata regarding the topics for Kafka.', 'The benefit of using console producer and console consumer is to verify that your data is getting published in Kafka topic, which is useful when writing a producer or consumer API.', 'The chapter explains the parameters to be specified for the console producer and consumer, such as hyphen hyphen broker list and hyphen hyphen bootstrap server.']}, {'end': 2052.525, 'start': 1915.903, 'title': 'Kafka cluster architecture', 'summary': 'Discusses the responsibilities of a cluster controller, details about partition ownership, and various types of kafka clusters including single node single broker, single node multi broker, and multi-node multi broker clusters.', 'duration': 136.622, 'highlights': ['The cluster controller is responsible for administrative operations like assigning partitions to brokers and monitoring broker failure in a cluster.', 'The leader of a partition is responsible for handling all consumer and producer operations on that partition.', 'In a single node single broker cluster, there is one Kafka broker and a zookeeper managing the cluster, while in a multi-node multi broker cluster, there are multiple nodes with multiple brokers, providing real-time throughput.', 'In a single node multi broker cluster, multiple brokers are deployed on a single machine, allowing message production and consumption from multiple brokers.']}], 'duration': 268.67, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc1783855.jpg', 'highlights': ['The benefit of using console producer and console consumer is to verify that your data is getting published in Kafka topic, which is useful when writing a producer or consumer API.', "The developer use those Kafka console producer or Kafka console consumer to rectify those things, when you're writing a producer API or consumer API and you're producing your data to Kafka.", 'The chapter explains the parameters to be specified for the console producer and consumer, such as hyphen hyphen broker list and hyphen hyphen bootstrap server.', 'The cluster controller is responsible for administrative operations like assigning partitions to brokers and monitoring broker failure in a cluster.', 'In a single node single broker cluster, there is one Kafka broker and a zookeeper managing the cluster, while in a multi-node multi broker cluster, there are multiple nodes with multiple brokers, providing real-time throughput.']}, {'end': 2338.211, 'segs': [{'end': 2200.958, 'src': 'embed', 'start': 2174.014, 'weight': 0, 'content': [{'end': 2180.255, 'text': 'how to validate system reliability, then how to perform the tuning in Kafka clusters, those things.', 'start': 2174.014, 'duration': 6.241}, {'end': 2186.506, 'text': 'Next moving ahead to model 5 that is Kafka cluster architecture and administering Kafka in this module.', 'start': 2181.101, 'duration': 5.405}, {'end': 2189.268, 'text': 'You will learn about multi-cluster architectures.', 'start': 2186.866, 'duration': 2.402}, {'end': 2197.555, 'text': 'They will learn about Kafka mirror maker and you will be also going through administering Kafka operations like topic operation dynamic configuration changes,', 'start': 2189.348, 'duration': 8.207}, {'end': 2200.958, 'text': 'partition management handling, unsafe operations, etc.', 'start': 2197.555, 'duration': 3.403}], 'summary': 'Learn about kafka cluster architecture and administration, including multi-cluster architectures and kafka mirror maker.', 'duration': 26.944, 'max_score': 2174.014, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc2174014.jpg'}, {'end': 2301.883, 'src': 'embed', 'start': 2271.895, 'weight': 4, 'content': [{'end': 2273.576, 'text': 'then architecture overview, et cetera.', 'start': 2271.895, 'duration': 1.681}, {'end': 2278.48, 'text': 'Moving ahead to module eight, that is, integration of Kafka with Hadoop Stop and Spark.', 'start': 2274.397, 'duration': 4.083}, {'end': 2286.933, 'text': 'So here you will learn about Hadoop storm spark in a brief, and then you will learn how to integrate Kafka with Hadoop storm and spark,', 'start': 2279.068, 'duration': 7.865}, {'end': 2287.874, 'text': 'moving to module 9..', 'start': 2286.933, 'duration': 0.941}, {'end': 2291.656, 'text': 'You will again learn about the integration of Kafka with bloom Cassandra and talent.', 'start': 2287.874, 'duration': 3.782}, {'end': 2301.883, 'text': 'So module 8 and module 9 shows you the real power of Kafka where you will get an idea that how Kafka works in a real-time systems then in module 10.', 'start': 2292.236, 'duration': 9.647}], 'summary': 'Learn about integrating kafka with hadoop, spark, cassandra, and talend in modules 8 and 9.', 'duration': 29.988, 'max_score': 2271.895, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc2271895.jpg'}], 'start': 2052.626, 'title': 'Apache kafka: course outline & modules overview', 'summary': 'Covers the setup and usage of a single-node, single-broker kafka cluster, the updated course curriculum, and provides an overview of the 10 modules of the kafka course, emphasizing practical applications and real-time scenarios across various topics.', 'chapters': [{'end': 2153.95, 'start': 2052.626, 'title': 'Apache kafka: course outline & real-time performance', 'summary': 'Covers the setup and usage of a single-node, single-broker kafka cluster, including starting zookeeper and kafka server, creating topics, producing and consuming messages, and outlines the updated course curriculum of apache kafka certification at edureka.', 'duration': 101.324, 'highlights': ['The course outline of Apache Kafka covers the introduction to Big Data and Apache Kafka, understanding Kafka cluster components, configuring producers, creating custom serializers, working with consumers, consumer groups, partition rebalance, and understanding commits and offsets.', 'Creating a single-node, single-broker Kafka cluster involves starting zookeeper, starting Kafka server, and creating a topic for message communication.', 'The chapter demonstrates the usage of console producer and consumer for producing and verifying messages in the Kafka cluster, providing real-time latency and performance monitoring.', "The course curriculum at Edureka covers modules on Kafka producers, consumers, and understanding Kafka's role in the Big Data space.", 'The Kafka course curriculum includes learning to produce keyed and non-keyed messages, sending messages synchronously and asynchronously, and configuring producers based on specific requirements.']}, {'end': 2338.211, 'start': 2154.31, 'title': 'Kafka course modules overview', 'summary': 'Provides an overview of the 10 modules of the kafka course, which cover topics ranging from kafka internals, cluster architecture, monitoring, stream processing, and integration with other systems, emphasizing the practical applications and real-time scenarios of kafka.', 'duration': 183.901, 'highlights': ['Module 10 involves working on a real-time project gathering messages from multiple sources, storing them, and performing stream operations, providing practical understanding of Kafka in real-time scenarios. Real-time project, gathering messages from multiple sources, performing stream operations', "Modules 8 and 9 demonstrate the real power of Kafka by showcasing its integration with systems like Hadoop, Storm, Spark, Bloom, Cassandra, and Talend, offering insight into Kafka's functionality in real-time systems. Integration with Hadoop, Storm, Spark, Bloom, Cassandra, Talend", 'Module 6 covers Kafka monitoring and Kafka Connect, focusing on considerations for building data pipelines, metric basics, broker metrics, client monitoring, and using Kafka Connect for reliable streaming of data between systems. Monitoring considerations, metric basics, Kafka Connect for data streaming']}], 'duration': 285.585, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/hyJZP-rgooc/pics/hyJZP-rgooc2052626.jpg', 'highlights': ["The course curriculum at Edureka covers modules on Kafka producers, consumers, and understanding Kafka's role in the Big Data space.", 'Module 10 involves working on a real-time project gathering messages from multiple sources, storing them, and performing stream operations, providing practical understanding of Kafka in real-time scenarios.', "Modules 8 and 9 demonstrate the real power of Kafka by showcasing its integration with systems like Hadoop, Storm, Spark, Bloom, Cassandra, and Talend, offering insight into Kafka's functionality in real-time systems.", 'Module 6 covers Kafka monitoring and Kafka Connect, focusing on considerations for building data pipelines, metric basics, broker metrics, client monitoring, and using Kafka Connect for reliable streaming of data between systems.', 'Creating a single-node, single-broker Kafka cluster involves starting zookeeper, starting Kafka server, and creating a topic for message communication.']}], 'highlights': ["Kafka's features include high throughput, scalability, data loss prevention, replication, and stream processing, catering to diverse data processing needs.", "LinkedIn's implementation of Kafka includes 1,100+ machines, 31,000+ topics, 350,000+ partitions, and 675 billion messages per day, with peak loads of 10.5 million messages per second and 18.5 GB per second inbound.", 'Kafka offers better throughput, built-in partitioning, replication, and fault tolerance, making it suitable for large-scale message processing applications.', 'Kafka was originally used at LinkedIn to rebuild user activity tracking pipelines as real-time pub sub feeds for tracking site activity like pageviews, searches, and other user actions.', 'Kafka can act as external commit logs for distributed systems, supported by the log compaction feature.', 'Kafka cluster retains all published records with a default retention policy of seven days.', 'Kafka allows producer to wait on acknowledgement for full replication.', 'Each partition is an ordered, immutable sequence of records with sequential IDs.', 'Partition has an append-only data structure, allowing only new records to be added.', 'Consumers can consume records in any order and reset to older offsets for reprocessing data.', 'Partitioning enables scaling beyond single server capacity and provides parallelism for processing data.', 'The session concludes with the installation of Apache Kafka and deployment of a single node, single broker cluster, providing practical hands-on experience.', 'The benefit of using console producer and console consumer is to verify that your data is getting published in Kafka topic, which is useful when writing a producer or consumer API.', "The course curriculum at Edureka covers modules on Kafka producers, consumers, and understanding Kafka's role in the Big Data space.", 'Module 10 involves working on a real-time project gathering messages from multiple sources, storing them, and performing stream operations, providing practical understanding of Kafka in real-time scenarios.', "Modules 8 and 9 demonstrate the real power of Kafka by showcasing its integration with systems like Hadoop, Storm, Spark, Bloom, Cassandra, and Talend, offering insight into Kafka's functionality in real-time systems."]}