Kafka Tutorial | Learn Kafka | Intellipaat

Kafka Tutorial | Learn Kafka | Intellipaat


hey guys welcome to this session by
Intellipaat Apache Kafka is used for building real-time data pipelines and
streaming applications uber Spotify and slack are a few companies who use Kafka
in their technology stack and in this session we’ll be learning about Kafka
comprehensively and also before moving on with this session please subscribe to
our channel so that you don’t miss our upcoming videos now let us take a quick
glance at the agenda we’ll start off with a quick introduction to Kafka and
its features and after that we’ll look into Kafka topics and partitions after
that we look into the workflow of PUB /SUB messaging moving on we’ll be
looking at the various CLI tools in Kafka and after that we’ll learn how to
configure a single node in Kafka and finally we will do the multi node
cluster setup also guys if you want to do an end-to-end certification training
on Kafka Intellipaat provides a complete certification training on Kafka and
those details are available in the description now let us begin this
session let’s start by understanding what is the need of Kafka so the
currently industry is emerging with the lots of real-time data that needs to be
processed in real time so this is the 21st century and all of the top
organizations over there are generating the lot of real-time data so let us
look at some examples so we’ve got sensor data when it’s used to predict
the failure of a system ahead of time now we’ve got all of these sensors
generating real-time data and it’s very important to understand and process this
real-time data in a very quick way similarly we’ve got real time economic
data and it’s based on the preliminary estimates and is frequently adjusted for
better estimates to be available so this is all of the financial data which is
being generated from the stocks and so on and if we tap into this real-time
data which is generated by the stock market then it could be a huge boom for
our economy now what happens is all of the organizations can you have multiple
servers at the front end and back end so we can have multiple servers at the front end and like web application server for hosting websites
or applications and we can also have a lot of back-end servers now all of these
servers will need to communicate with the database server thus we’ll have
multiple data pipelines connecting all of them to the database server so this
is what we have so let’s say we’ve got this organization which has all of these
servers over here so this is the front end and this is the back end so we’ve
got the front end server we’ve got a couple of application servers over here
and we’ve got a chat server for similarly at the back end we’ve called a
database server or security system server and we’ve called our real-time
monitoring server and they’ve got a date over that house now all of these servers
which are present at the front end would want to interact with the database
server and not just the database server but all of the servers which are present
at the back end so you see over here these are all of the data pipelines
which are present now let’s just take the keys of this front-end server and
this database server so you see that these are all of the data pipelines
which connects the front and server to the database server similarly these are
these are the data pipelines connecting this application one server to the
database server now we’ve just got four servers at the front end and five
servers in the backend and we’ve got like almost 50 data pipelines connecting
these servers to the front end and back end and these are a huge amount of data
pipelines and dealing with all of these their pipelines can be a very cumbersome
and time-consuming task so as we see that the data pipelines are getting
complex with the increase in the number of systems or an adding a new system or
a server requires more data pipelines which will make the data flow even more
complicated so let’s say I go ahead and add a couple more front end servers and
a couple more database servers over here now just imagine the number of data
pipelines interacting over here but that is a very complex system isn’t it and
managing all of these data pipelines becomes very difficult as each data
pipeline has its own set of requirements so if you got a thousand data pipelines
between the front end servers in the back end servers then managing all of
these thousand data pipelines is a very very cumbersome task and even adding or
removing some of these pipelines is difficult in such cases so this is what
Kafka comes in to solve this problem now Kafka is basically a messaging system
now what is a messaging system so just consider that this is a system which is
there between the front end servers and the back end servers and this basically
decouples all of the data pipelines now what happens is you’ve got producers
which produce all of the messages or generate all of the data and all of the
data is stored in the form of a stream in the Kafka cluster well you can
consider Kafka cluster to be a group of servers which are known as brokers now
all of this data is being generated in real time from these producers and that
data stored in streams in this Kafka cluster ok guys a quick info if you
want to do an end-to-end certification on Kafka Intellipaat provides a
complete certification training on Kafka and those details are available in the
description now let us continue with this session now this consumer generates
a request and it takes in or consumes the data from this Kafka cluster so this
is how the process flow goes so the producer generates all of the data which
is stored in the Kafka cluster and from the Kafka cluster the consumer consumes
all of the data right so we see that the number of data pipelines have decreased
over here right now let’s understand what exactly is Kafka so Apache Kafka is
an open source distributed publish/subscribe messaging system that
manages and maintains field time stream of data from different applications and
websites so this basically means that this is an
intermediate system between all of the producers and all of the consumers or
the front-end servers all the back-end servers and it basically
provides a proper system with which we can deal and maintain the real-time
stream of data so Apache Kafka Basically originated at LinkedIn now they had a
problem to solve where they were dealing with huge amounts of real-time data so
that is why LinkedIn part of using a publish/subscribe messaging system and
they came up with Kafka now once they understood how important and valuable
Kafka is that is when it later became an open source Apache project in 2011
and then it became the first class Apache project in 2012 now it’s a simple
fact so Apache Kafka is written in Scala in Java and it is extremely fast
scalable durable fault tolerant and distributed by design so these are like
some basic features of Apache Kafka all right now let’s properly understand the
solution provided by Kafka so what is happening over here is apache Kafka
basically reduces the complexity of the data pipelines and it makes
communications between systems simpler and manageable and with Kafka it is very
easy to establish remote communication and send data across a network so you
can establish asynchronous communication and send messages with the help of Kafka
so what do I mean by a synchronous communication so this basically means
that the producers over here keep on sending the messages in the form of a
stream to the Kafka cluster and they do not have to wait for an acknowledgement
from the Kafka clustered right so these producers will keep on sending the
messages which would be stored in the Kafka cluster and they would be consumed
by all of these consumers over here so this sort of ensures that there is a
reliable asynchronous communication between all of the producers and all of
the consumers right now let’s go ahead and look at all of the Kafka features so
Kafka is highly scalable in Nature because it has distributed systems with
no downtime now what do I mean by distributed systems so let’s say if you
had just one server which would take care of all of the messages consuming
from the producers and only this server sends all of the messages back to the
consumer now dealing with this huge amount of data by a single server is
very difficult so this is where they’ve got something as a Kafka cluster who
have we got multiple brokers and all of these multiple brokers simultaneously
take care of all of the messages coming from the producer over there and Kafka
can also take care of huge volume of data so Kafka can take care of terabytes
of data which is continuously being generated by all of the producers and it
can seamlessly send them to the consumers at the back end and Kafka also
provides fault tolerance let’s see if there is failure of one
node then all the data which is present in that particular node would have a
replicas stored in some other systems so let’s say if there is one broker which
feels then the data which is present on that broker would also be present in two
or three more brokers right so this is how Kafka provides fault tolerance and
Kafka also provides reliability because it has distributed partition replicated
and fault tolerant next Kafka also provides durability because it uses
distributed commit logs that as messages persist on the disk as fast as possible
so the producer sends a message and this message is immediately stored in the
form of a distributed log in the disk and the performance of Apache Kafka is
also very very good now when I say performance what I
basically mean over here is high throughput and what do I mean by
throughput so throughput is basically the amount of information which is being
passed in a particular amount of time let me just say that Apache Kafka
enables huge amount of information being transferred in just one second of time
so this Kafka cluster can taken terabytes of messages from the producers
in single second of time and then transfer these messages to the consumers
in a single second of time so this is what is known as throughput so here you
have something known as the producer throughput and the consumer throughput
so producer throughput is the amount of information which the producer is
generated in a particular amount of time and the consumer throughput is the
amount of data the consumers can consume in a particular amount of time so this
is how Kafka ensures that there is very high performance and Kafka also provides
zero downtime now Kafka is very fast and guarantees zero downtime and Zero data loss
and another feature of Kafka is it is extensible so there are many ways by
which applications can plug in and make use of Kafka so you can used on any
platform and use it for multiple purposes right so now that we’ve looked
at some of the features of Kafka let’s go through the components of Kafka so
let’s start off by understanding what exactly is a Kafka broker so Kafka
brokers are the servers that manage and mediate the conversation
between two different systems so this is basically that server which makes sure
that all of the messages which are coming from the producer are properly
stored in the Kafka cluster and it also ensures that these messages are properly
consumed by the consumer so this is the intermediary between the producer and
the consumer or between the front end and the back end and brokers are also
responsible for the delivery of messages to the right party so next we’ll
understand what exactly is a message so messages are simple byte arrays and any
object can be stored in any format by the developers so the format of these
messages could be in the form of string JSON a through and so on right so simply
put messages are just simple byte arrays which are sent from the producer and
then broker sends it to the consumer which requests for it right now we’ll
understand what exactly is a topic so in Apache Kafka all the messages are
maintained in what we call as topics so consider it like this so let’s say
there’s this huge organization and there is data related to different things so
that could be data related to sales there could be data related to accounts
there could be data related to technology and there also could be data
related to analytics now all of these could be different topics so let’s say I
just pick up sales over here so sales could be one particular topic now all of
the messages which are related to the sales topic would come under one
category so over here topic basically means that there is a
particular category of messages which could be clustered into and all of these
messages are stored published and organized in Kafka topics next we’ll
understand what exactly is a cluster so in Kafka more than one broker notice a
set of sewers is collectively known as a cluster so when you have just a single
broker so it is just a one broker Kafka architecture and when you have more than
one broker that is known as a Kafka clusters there is basically a
group of computers each having one instance of a Kafka broker so let’s say
if you have three computers then all of these three computers
or servers would have by an instance of a Kafka broker so next up we’ll
understand where exactly our producers so producers other processes that
publish data or messages to one or more topics so this would come at the front
end or these are basically entities which would generate all of the data and
they are basically the source of data stream in Kafka next we’ll understand
what are consumers so consumers are the processes that read and process the data
from topics by subscribing to one or more topics in the Kafka cluster right
so we already know that consumers are used to consume data from the Kafka
cluster now the consumers can either consume one or more than one topic right
so let’s say there’s this one consumer group and it wants to consume only data
or only messages related to the sales topic that could be another consumer
group which wants to consume data and with respect to the analytics topic as
well as the tech topic right so this is how consumers work next we’ll understand
what up partitions so every broker holds few partitions and each partition can
either be a leader or a replica for a topic now what basically happens is when
topics are sent from the producer to the consumer these topics are divided into
partitions so you don’t really send the entire topic as a whole from the
producer to the consumer so these are divided into a set of partitions and
these partitions are were restored in the cluster and all the rights and reads
to a topic go where the leader which is responsible for updating replicas with
new data and unfortunately the leader fails and
the replica takes over as the new leader so this is why Kafka is known as fault
tolerant now let’s go ahead and understand the architecture of Kafka
Cluster so we’ve got producers over here and these producers send messages in the
form of topics and which are received by the consumers over here ok guys a quick
info if you want to do an end-to-end certification on kafka intellipaat
provides a complete certification training on Kafka and those details are
available in the description now let us continue with this session now as of
I’ve already told you these topics are divided into partitions over here this
is a one group of cluster so these producers send a topic to this broker
over here now this is just one topic right keep this in mind so we’ve got
topic 1 and this topic 1 is divided into three partitions so we’ve got partition
0 partition 1 and partition 2 and all of these three partitions are stored in a
single broker over here because this is a single broker cluster now this topic
would be consumed by the consumers which are present over here so this is the
basic architecture of a Kafka cluster and the same thing is happening
over here so we’ve got data or the topic which is
sent to the broker over here and this is a single topic over here and the single
topic is being divided into three partitions now when the producer sends
this topic it is basically stored in the form of what is known as offset so
consider that there are three messages which are being sent by this producer
over here right and all of these three messages correspond to the topic one now
message zero and message one so each message is tagged with a offset
value over here so let’s say there is message one which is tagged with offset
zero and that has message two which is stack with the offset one and there is
message three it is stack with the offset 2 over here so this is how it happens so
all of the messages are tagged with an offset value and are stored in this
cluster over here and whenever the consumer request for this topic
you got it from the broker now let’s understand the workflow of a Kafka
producer so these producers send records to the topics so these records are
nothing but the messages which are sent as topics to the program now these
producers select the partition to send the message per topic right so these are
like randomly selected and these producers just select one particular
partition to send one particular message per topic to these partitions and its
producers can implement priority systems which are based on sending records to
certain partitions depending on the priority of the record and the send
these records to a partition based on the record scheme so producers don’t
only wait for acknowledgments from a broker and send messages as fast as a
broker can handle so this is the asynchronous
communication which we are talking about these producers keep on sending these
messages or records to the broker over here and it will not wait for an
acknowledgement back from the broker so as and when the broker can handle all
these messages we’ll keep sending these messages to the broker now let’s
understand about the Kafka broker so Kafka clustered typically consists of
multiple brokers to maintain the load balance there was already told you if
there are multiple messages being produced by the producer then handling
all of them by single broker would be a difficult task that is why we have got
more than one broker which is known as a Kafka cluster and a broker on receiving
messages from the producer assigns off sense to them and commits the messages
to the storage on the desk so as you see over here these are basically the offset
members so this partition or this broker has received these two messages from the
producer and it has assigned these offset values to these messages and it
has stored them on the desk now his broker serves the consumers by
responding to fetch request so partitions and one broker instance can
handle thousands of reads and writes per second and terabytes of messages that is
really huge isn’t it all right so if we just take this one particular broker
over here this one broker itself can handle thousands of writes per second
and terabytes of messages now backups of topic partitions are present in multiple
brokers and if a broker goes down one of the brokers containing the back of
partitions would be elected as the leader of the respective partitions now
let’s understand about Kafka topics and partitions so messages in Kafka are
categorized into topics which we already know now again let’s take the same
example let’s say there is a school and there are different tables over there so
there would be one student table there would be one teachers stable and there
would be one table relate to departments now we can consider all of these three
different tables to be three different topics and there would be data
pertaining to these three different topics over here right so the messages
or the data related to these three different topics are categorized
individually into these topics and these topics are basically broken down into a
number of partitions and this messages are written
to it in an append-only fashion now what do I mean by append-only fashion this
basically means that I’ve got a producer over here and it sends the first message
or the first record which would be given the offset zero after that we’ll go
ahead and send the next message which will be assigned the offset one and then
it will send the next message which will be assigned the offset to so you can
actually not go back to the earlier messages or you can skip one particular
message and go forward right so this would happen in an append-only fashion
and you will be writing message by message so this is how producers write
the messages into the broker but reading messages is different so the reading
messages can either be done in the order from beginning to the end or we can
actually skip certain messages or we can actually go behind or we can refine to
any point in the partition by providing an offset value now let’s say this
consumer a wants this message at offset zero from partition one right now it
doesn’t have to read this message and it can only read this message over here
similarly if this consumer B and if it wants to read only this message with the
offset value one from partition two then it can do it and ignore these two
messages so this is how consumers work now this offset value is basically the
sequential ID provided to the messages another thing to keep in mind is
partitions provide redundancy and scalability so partitions can be hosted
on our different so that is a single topic can be scaled horizontally across
multiple servers thus enhancing the performance all right now let’s take
this example to understand Howard topic is basically divided into different
partitions so this figure shows that we have a topic which is divided into four
partitions with rights being appended to the end of each of these partitions so
this is the topic and this topic will have one particular name and it has been
divided into four partitions and these are partition 0 partition 1 partition 2
one partition 3 and these are all of the messages which are present in partition
0 right and the writing takes place at the end
of this rate so we basically happened the next message over here
similarly the same thing happens at partition one partition two and
partition three now recorders stored in a partition either by the record e if it
is present or by round-robin if the key is missing so this is the basic default
behavior right now let’s understand about replication so if we take the same
example where we have a topic and that topic is configured into four partitions
so this is partition 0 1 2 & 3 and if we set the replication factor of a topic to
be equal to 3 then Kafka will create three identical replicas of each
partition right so this partition would have three replicas partition 2 will
have 3 replicas partition 0 would have 3 replicas and partition 3 would also have
3 replicas now all of these replicas would be present on the available
brokers which are present in the cluster so let’s take this partition over here
now since the replication factor is set to 3 there would be 3 replicas of this
partition over here and the ID of the replica is the same as the ID of the
broker that use it so over here the ID of the broker is too right since this is
broker – that is why the ID of the replica will also be 2 so this is
replica 2 similarly over here this is broker 3 or the ID of the broker is 3
and that is why the idea of the replica will also be 3 now after all this is
done for each partition kafka will elect one broker as the leader and out of
these 5 brokers over here this broker has been elected as the leader and if
this broker fails to do some unfortunate incident then one of these 2 brokers either
the broker 2 or broker 3 would be elected as the leader because this has a
replica of partition 1 right now finally let us understand about the
Kafka consumer so the Kafka consumer can subscribe to one or more
topics and read messages in the order they were produced so it’s not a
necessary that every consumer has to read only one particular topic so it’s
totally fine if one consumer reads more than one topic and the consumer can keep
track of all of the messages it has already consumed by
keeping track of the offset of messages so let’s say consumer a reads this
message which has offset value zero now to remember that it has already read the
the message at offset value zero or it will do is will keep a glance of the offset
value and since it knows that it has already read the offset value zero then
it will go to the next offset and it will read the message which is presented
offset number one so consumers can also work as a part of a consumer group so as
we see over here we’ve got a consumer group that is they’ve got one or more
consumers that work together to consumer topic so over here we’ve got three
consumers present in a consumer group we’ve got consumer a consumer B and
consumer C and all of these three consumers are consuming only this one
particular topic now messages were the same he and I were the same consumer and
this consumer group basically assures that each partition is consumed by only
one member over here partition 1 is being consumed by only consumer A
partition 2 is being consumed by only consumer B and partition 3 is being
consumed by only consumer C let’s look at this working mechanism okay guys a
quick confirm if you want to do an end-to-end certification on Kafka
Intellipaat provides a complete certification training on Kafka and
those details are available in the description now let us continue with
this session so over here again we’ve got these three consumers in a single
group consuming one particular topic now we see that consumer 0 is working on
only one partition which is partition 0 similarly consumer 2 is working on a
single partition which is partition 3 what consumer 1 is working on two
partitions simultaneously which are partition 1 and partition 2 now this is
also possible when it comes to consuming messages with the help of the consumer
but the only thing which matters is that all of the offset values are in check
over here consumer one reads offset value 5 and 7
consumer 0 reads offset value 6 and consumer two reads offset value then so
if all of the consumers work in tandem and maintain the synchronicity then
would be absolutely no problem right now let’s start by understanding where
exactly is Apache a zookeeper and how does it help when it comes to working
with Kafka so let’s actually start off by looking
at the definition so zookeeper is an open source Apache project that provides
centralized infrastructure and services that enable synchronization across an
Apache Hadoop cluster now this is a very complicated definition so what do we
mean by this now Apache zookeeper comes in whenever we are working with any sort
of distributed applications now if it is a distributed application for example
Kafka for this would obviously run on multiple
systems so you have multiple nodes Working together parallely now when
multiple nodes are working together parallely like again let’s take Kafka so
you’ve got a multi node cluster set up let’s say you’ve got multiple brokers
you’ve got multiple producers and you’ve got multiple consumers now all of these
brokers consumers and producers have to work parallely in sync now there could
be a lot of cases where this does not happen right so either the broker might
fail or the message is sent by the producer would not have been received by
the brokers or there could be a case where the consumers are not able to
process the information sent by the brokers so to make sure all of this is
working properly in sync we would need a apache zookeeper so zookeeper make sure
whenever you are working with any sort of distributed application all of these
work in tandem I provides a lot of services so one of those services which
apache zookeeper provides this the naming service now what do we mean by
naming service should basically means that it is sort of like a DNS but just
for the nodes which are present in this cluster setup so with the help of Apacha
zookeeper we can identify which brokers are present in our cluster currently and
which producer is sending what messages to our broker currently it’s all of this
naming information is present and also it is used to elect the leaders and the
sleeves now it is very important when it comes to cluster setup to have one
leader and as the workers that there is a proper
workflow maintained now what happens if the leader itself fails so if the leader
itself fails then Apaches Zookeeper maintains a list of all of the workers
which would have the same topics present in them now Apache zookeeper would go
ahead and elect one of these sleeves as the leader
so this is where Apaches who keep Oh comes in and not also this it also makes
sure that all the topics which are present in the partitions they have the
relevant offset numbers and it also makes sure that these messages are
properly sent to the consumer which so these are all the services which are
provided by Apaches Zookeeper now just a bit of information about Apaches
you keeper service originally developed at Yahoo and facilitates synchronization
in the process by maintaining the status on the zookeeper servers which stored
information in local log files and a Apache zookeeper servers are capable of
supporting a large Hadoop class of which we have already known all right so this
is the zookeeper architecture so we’ve got all of these servers so these are
the server applications and these are all of the client applications now all
of these server applications work parallely and all of these client
applications also work parallely so let’s say there is this client which
sends a request to the server application over here now if this line
does not get an acknowledgment from this server then what happens is it will not
wait for a long time and it will send the acknowledgement to the next server
so this is how paralyzation works in right so if one server doesn’t respond
to this client then immediately that request is sent to the next server and
it will wait for the acknowledgement from this server in the brief amount of
time and if you in the second server does not respond to this client in that
specified amount of time then that request is sent to the third server and
it will be responding to this particular client over here so this is how Apache
zookeeper works now let’s see how the keeper and Kafka work in tandem so the
Kafka brokers coordinate with each other using zookeeper right so over here we’ve
got just one broker but let’s say it is a multi node cluster setup or a multi
broker cluster setup so if you have multiple brokers then all of those
multiple brokers need to work with each other
entire now that is possible by using the
zookeeper alright so this is how the kafka brokers work and then producers
and consumers are notified by the zookeeper service about the presence of
a new broker in the system what about the failure of a broker in the system so
let’s say as of now we have just got one broker in this Kafka cluster now
suddenly we decide to scale up the process and we add two more brokers in
the system now when we add two more brokers in the system then these two
producers and these three consumers would have to know about the presence of
the new brokers so this is where zookeeper comes in so
zookeeper it notifies these two producers and these three consumers but
listen I have added two more bro goes into this Kafka cluster and whatever
messages should be sending that would be processed badly by all of the three
brokers over here right now after only denote fields they’re on the basis of
the currently live notes apart is root keeper will elect the new lead-up so
I’ve already told you about this so let’s see if one of these partitions are
selected as a leader and that partition fails then the other partitions which
are currently alive one of those would be made the leader and zookeeper and
Kafka keeps a set of in sync replicas so this is how the zookeeper maintained
synchronization between producers consumers and brokers now let’s look at
the Kafka workflow so the producers will start off by sending messages to a topic
at regular intervals so let’s say there is one particular topic and the topic is
related to football now the producer will publish or send messages to that
particular topic at regular intervals of that the brokers will store the messages
in partitions configured for that particular topic so let’s say this topic
football has three partitions then these messages would be stored in these three
partitions now if a producer sends two messages and there exists two partitions
Kafka will store one message in the first partition and second message in
the second so this is how it works so again let’s take our football topic over
there as I’ve said let’s see if this football topic is divided into three
partitions and we are sending around six messages now the first message would be
stored in the force second message would be stored in the
second partition third message would be stored in the third partition no again
the fourth message would be stored in the force partition again fifth message
would be stored in the second partition and the sixth message would be stored in
the third partition so this is how it works and after that once the producer
sends all of these messages and it is stored in the paruko system then a
consumer would always subscribe to a specific topic so over here we’ve got
consumers and these consumers would subscribe to our football topic and when
the consumer subscribes to a topic Kafka provides the current offset of the topic
to the consumer and the offset is saved in the zookeeper Ensemble now let’s say
we’ve got this consumer which reads the first two messages which are present in
the partitions so over here the offset value would become two now since it has
read two messages the offset value becomes two and it has stored in the
zookeeper now going ahead the consumer has to read the third message which
would be present in the third partition so the consumer would know this by
looking at the offset of the currently consumed message and four new messages
the consumer will request Kafka in regular intervals now as soon as the
messages received from the producer it is forwarded to the consumers and on
receiving the message consumer will process it and once the message is
processed an acknowledgement is sent from the consumer to the broker and once
the broker cluster receives the acknowledgment the offset has changed to
a new value and it has updated the zookeeper now the consumers are also
able to read the new message correctly even during so route registrants the
offsets are maintained in the zookeeper now let’s say we’ve got this producer
over here and it continuously keeps on sending messages and there are let’s see
100 messages in total and the server breaks down when it has sent around 57
messages now there is no need to worry anything at all because the offset
number would be stored in the zookeeper and the offset value of 57 would be
there now once the server goes back up because rumor can start consuming
messages from the offset number 58 because of the help of the Sookie
and the slow repeats until consumer stops the request so this is the entire
flow of the messaging system now let’s look at some of the top companies which
are using Apache kafka so these are of the top companies so we’ve got Samsung
Electronics whom these who are in corporation Herrmann International
industries and lanagin’s in cooperation so these are all of the top companies
which are using apache kafka for distributed processing and distributed
synchronization now Intel part already provides the kafka setter for you guys
so you don’t have to worry about the setup at all so you can just go through
those support documents and you’ll be able to figure it out and if you have
any doubts you can reach out to our support so they are available 24/7 now
the spring us to the end of the module let’s just go through a quick quiz to
recap all of that stuff so we’ve got our first question over here
now each kafka partition has one server that access so what do you think is the
answer right so the answer is leader so each kafka partition has one server
that acts as the leader then we have our next quiz or so Kafka provides only your
– or durable messages within a partition so Kafka provides only a total order
over messages within a partition question number three so Kafka maintains
feeds of messages in categories called so what would be the answer to this –
the answer is topics so all of the feeds of messages are stored in categories
called as topics all right right so let’s start by understanding the Kafka
cluster now whenever we are dealing with small amount of theta or we are working
in a local development to work then it’s fine if you are using just a single
broker but let’s say we have a huge amount of real-time data and we have
around one terabyte of real-time data coming every single second now to
process this one terabyte of real-time data every single second one broke up
wouldn’t be enough so we’d have to scale this load across multiple servers so
this is where we would need multi broker system set up now instead
of using this one broker what we’ll do is we’ll have multiple brokers and all
of this theta would be scaled across multiple servers so that the load across
one single broker is reduced now another advantage with
respect to multiple broker setup is the topic is not just sent as a single topic
it’ll have its replicas as well and when a single topic has multiple replicas
this would also given more fault tolerance now also as we have seen Kafka
cluster is effective for applications that involve large-scale message
processing right so now let’s go ahead and look at some Kafka command-line
tools so the Kafka cluster can run against this following proper setup so
we can either have a single broker clustered or we can have multiple bruco
cluster so in single broker clustered we just have a single broker which would
serve all of the requests and when we have multiple broker cluster what
happens is the load is divided across all of these multiple brokers and these
are some of the commonly used commands so we’ll start off with a zookeeper so
we’ll start dot s hedge now whenever we are working with Kafka or real-time
processing we need zookeeper so to start zookeeper we would have to run this
command so we’ll type in zookeeper service dot dot assets as we see over
here it starts zookeeper using the properties configured under config slash
zookeeper dot properties so in simple terms this command line is basically
used to start the zookeeper service and then we also need to start the Kafka
service so this is the command so we’ll type in Kafka service dot dot sh and
this is a description so it starts the Kafka server using the properties
configured under con fig slash so what properties now when it comes to topics
we’ll use this command so we’ve got Kafka topics dot sh and this command is
used to create topics list effects delete topics and modify the topic and
then we’ve also got producer and consumer now you also have Kafka
producer and Kafka consumer so as we all know producer is used to send messages
to the Kafka cluster so this is the command will type in Kafka console
producer dot sh to send messages to the Kafka cluster and then we also know that
consumer is used to consume messages and this is the command to create a consumer
so we’ll type in cough-cough console consumer dot sh and so as it was stated
over here it is our command line client to consume messages from the Kafka
cluster right now let’s look at the different
types of kafka clusters available so first we have single node single broker
cluster so in a single node single broker cluster we just have a single
node now what do I mean by single node so single node basically means a single
system so in the single system we have a single kafka broker over here and we
have producers sending messages to this single node and these messages would
again be consumed by all of these consumers over here so we just have a
single system or a single node and in that single system we have a single
broker present and that broker is responsible for maintaining the balance
between the producers and the consumers so next we have a single node multiple
broker cluster now this means that inside a single system so single node
basically means a single system so inside the single system we have
multiple brokers so as we see over here we have a single system and inside the
single system we’ve got 3 brokers broker one broker two and property so these
producers send all of their messages to this cluster over here right so this is
a multi broker cluster and these consumers consume the messages from this
multi broker system and then we have a multiple node multi broker cluster so
when it comes to multiple node multi broker cluster we have more than one
nodes or in simple terms we’ve got more than one system and in more than one
system we’ve got multiple glucose present inside it so as we see over here
this is system 1 and system 2 or node 1 and node 2 and inside node 1 we’ve got
broker one and broker 2 and similarly inside node 2 we’ve got broker one and
broker 2 so over here these producers send their messages to node 1 as well as
node 2 and similarly over here this consumer we see that it consumes
messages from this node and these two consumers they consume messages from
this node over here so this is how multiple node multi broker cluster works
now it’s finally time to go ahead then configure our single node single broker
cluster right so these are some prerequisites we should have so that we
can configure our system so we need to have Java Kafka and zookeeper
pre-installed so that we can go ahead and set up a cluster all right now to
set up a single broker so we’d have to start off by opening your terminal and
we’ll launch the services of both zookeeper and Kafka so as we have
already seen to launch the zookeeper service will type in zookeeper so was
start dot s hitch and after that we will go ahead and launch this zookeeper so
kafka slash config slash zookeeper properties so this is basically the part
where this is present so inside the kafka folder there is this configuration
directory and inside this configuration directory there is this properties file
which is zookeeper properties so this helps us to launch the zookeeper service
now similarly to launch kafka will type in Kafka’s / star dot sh and inside the
kafka directory we have the configuration directory and inside the
configuration directory we have this server dot property so it should help us
to launch the kafka service right so we are starting off the zookeeper service
and they’re also starting off the kafka service right now to see if both of the
services are running or not we’ll just type in GPS and this would give us all
of these results so this Kafka over here the space key means that Kafka is
running and this coram pure mean this means that even the zookeeper daemon is
running so good though Kafka daemon and the quorum pure main daemons are running
so now that we first started the Kafka service it’s time to create the topic
and send it from the producer and consume it from the consumer so this is
how we can create a topic so as we had seen earlier this is the command create
a topic so I’ll type in Kafka topics dot SH and then we’ll type create and after
that we’ll initialize zookeeper and then we’ll set a localhost port number so we
are setting up the localhost port number to be equal to two one eight one and the
replication factor which we are setting up is equal to one after that we’ll also
set the number of partitions to be equal to one and then we’ll give the name of
the topic so here the name of the topic is example one right so basically we are
creating a topic where the replication factor is 1 and the number of partitions
is also equal to 1 now if we want to have a glance at all of the topics which
are present then we just need to type in Kafka topics dot sh
list zookeeper localhost so this would give us all of the topics which are
present so now that we have created a topic it’s
time to launch our producer so this is the command for that so we’ll type in
Kafka console producer dot s hitch and well given – – pro vocalist and then
we’ll give the port number where the producer sends these messages so it is
nine zero nine two and then this is the topic which the producer is sending so
topic example one so example one is the topic which is being sent by this
producer so we have launched the producer now we have to create a
consumer to consume or process all of these messages so we’ll use this command
over here we’ll type in Kafka console consumer dot SH and then we’ll start
this bootstraps over and we’ll set the local host to be equal to nine zero nine
two because both the producer and consumer are listening at the same put
and this has subscribed to the topic example one the producer is sending the
topic example one and also the consumer has subscribed to the topic example one
and it is reading the topic from the beginning right so when it reads the
topic from the beginning this is the result which it gets hello this is my
first example right now let’s go ahead and actually perform this so this is
pretty I’ll just go ahead and login so this was the ID of your training and
then I’ll go ahead and given the password so let me just type in the
password over here right so how successfully logged in to pudding now
it’s time to run the Kafka server so I’ll type in Kafka so we’ll start dot s
hitch and then I’ll type in Kafka slash config slash and then after that I need
to type in so word so let me try the spelling correctly so this has to be so
word dot properties let me hit end up so let’s just wait till the Kafka server
starts right so we see that behalf success we
use in the kafka say what no what I’ll do is
I will duplicate this session over here and again let me login over here
let me type in the password as well so now that I’ve started those so now it’s
time to go ahead and create the topic first so this is the command to create a
topic so we’ll just type in Kafka topics dot s hedge create zookeeper will set
the localhost to be equal to two one eight one and I’m sitting there
application factor for this topic to be equal to one and I’m setting the number
of partitions also to be equal to one and I’ll set the topic name to be equal
to example one so I’ll hit on enter now let’s just wait till this topic is
created right so we see that we have this
message created topic example one so now that we have created the topic let’s
have a glance at all the topics which are present now so this is the command
for that so kafka topics dot SH list zookeeper so this gives us all of the
list of the topics which are present so it tells us that these are all the
topics which are currently present we’ve got example1 example2 flume Kafka topic
one consumer offsets in my topic so now that we have created the topic it’s time
to start the producer to send the messages from this topic so again I will
duplicate the session let me log in again and I’ll also type in the password
over you right so this is the command to start
the producer sukhovka console producer dot sh broke a list and then I am
sending this topic from the producer so the name of the topic is example one so
the producer has started let me go ahead and type in some messages so I’ll type
in hello how are you I good wait so these other messages which
should be sent from the producer to the consumer now I’ll create another session
and I will start the consumer in that let me type in the username and the
password let me also type in the password again and this is the command to start the
consumer so we’ve got Kafka console consumer dot sh and this consumer would
be listening to this topic example one from the beginning so let me just wait
till all of the messages are coming from the producer side to the consumer side
right so this is what we have hello how are you doing I am good right now let me
actually open up the producer again right so let me add some more messages
over here so I’ll type in Sparta plus 300 now let’s see if we have these
messages in the consumer as well right so we have these messages in the
consumer side as well now again let me add some messages from the producer so
I’ll type in let’s see this is my first Kafka project and I love Kafka do you
like Kafka let me just type all of this right now let me go to the consumer side
right so we see that we have all of these messages on the consumer side as
well so this is how we can set up a single broker set up now let’s see how
can we configure a single node multi broker cluster so again to set up a
multi broker system so we’d have to start off by loading the zookeeper
service and as well as the kafka service so we’ll type in zookeeper service star
dot SH and then we’ll load up this zookeeper tour properties file similarly
we will type in Kafka’s / slash star dot sh and then we’ll load up this server
dot properties broker now once we start the kafka service and as well as the
zookeeper service we’d have to go ahead and create multiple brokers till now
we’ve just got one broker instance which is already in the config slash
server dot property spread so this file which we had loaded up earlier this is
basically the only broker which we have till now now to create multiple broker
instances what we’ll do is we will copy this existing so our properties file
into two new files and rename them as so one dot properties and so were two dot
properties now after we do that it will go inside the Kafka slash config
directory and then we create these files right so we’ll go inside Kafka slash
config and then we’ll create a copy of the original broker file so we’ll type
in CP so r dot properties so under properties so we are just creating the
copy of this original so what our properties file and when aiming it to be
server 1 dot properties similarly we’ll create another copy of this file and
we’ll rename it to be server 2 dot properties so now that we have created
our two brokers would have to go ahead and make these changes in this so we’ll
go inside the server 1 dot properties file and we’ll make all of these changes
so in our earlier so word dot properties file the broker ID was 0 now we’d have
to make sure that each broker instance has a unique broker ID so we’ll give
this new file the broker ID of 1 and this will be listening to the port 9 0 9
3 so the original broker was listening to
port number 9 0 9 2 now again it needs to be kept in mind that one
can listen to only one port that is why since this is our new broker this will
be listening to a new port which is nine zero nine three and again we will make
changes over here so we’ll set the log dot there’s over here now initially to
ask Africa underscore logs so we’ll change this to be Kafka logs one
similarly will go inside the servitude or properties file and we’ll make these
changes so over here we’ll set the broker ID to be equal to do because this
is our third broken and this broker will be listening to the port number nine
zero nine four so the first broker was listening to
port number nine zero nine two second broker was listening to port number nine
zero nine three and this broker which we’ve created would be listening to port
number nine zero nine foot and again we’ll make changes over here so we’ll
change this to logs do now once we set up this we’d have to go ahead and start
all of these brokers which we’ve created so the command would be same will type
in Kafka so was dot dot SH and we’ll load up this server wondered properties
similarly we will type in Kafka service star dot asset and we’ll type in sir
would do dot properties so this is how we can start these two brokers which we
have just created now what we have created the brokers will go ahead and
create a Kafka topic so to create a topic again it will be the same so we’ll
type Kafka topics dot s ht8 and then we’ll give the name of the topic so
which is a topic example two so example two is the name of this topic and then
we’ll type in – – zookeeper and then we’ll set the poor of the localhost
which is two one eight one so for this topic which we are creating we are
setting the number of partitions to be equal to three and we are setting the
number of partitions to be equal to three because we have three brokers
earlier when we had setup just a single broker system over there since we had
just one broker the number of partitions but one and over here since we have
three brokers we are setting the number of partitions to be equal to three and
similarly we will said that the application factor to be equal to two
now to check which broker is listening on the current created topic you can
just use the described command so when will type in described we get an idea of
which partition is listening to which topic over here so now that we have
created the Kafka topic we have to do the same procedure we will have to start
off the producer so I will create my producer over here
cough cough oil producer dot SH and this time I
actually have a broker list so this would be sending messages to three
brokers over here which are listening to these port numbers nine zero nine three
nine zero nine four and nine zero nine two and the producer would be sending
messages from the topic example two right so these are the messages which
are being sent over here Biagio do you understand Kafka now hope you
like this session and then going at will also start the consumer so here I’ll
type in Kafka console consumer dot SH and I’ll start the bootstrap so would I
say the local has to be equal to nine zero nine three in the topic is example
two because the producer is sending messages from example two and this will
read everything from beginning alright so now let’s go ahead and perform this
demo so I’ve got my virtual machine running over here so I’d have to do the
same thing I’d have to start off by loading the Kafka server so I will type
in Kafka so word let me type in the correct spelling over here so Kafka so
would start dot as hedged after this this would be Kafka / config slash and
then I’d have to give in though so without properties name so dot
properties so let me just wait till the first broker starts alright so we have
successfully started though for is broker which is listening on the port
number nine zero nine two now we’d have to go ahead and start the other two
brokers so I’ll duplicate this session over here let me again load in so I’ll
type in ID which would be training and then I’ll give in the password so the
password is rights or Kevin the password now as we already know we have this over
dot properties file so I’d have to make a copy of it so to make a copy I’ll type
in CP and the name of the file is over dot properties and I will make a copy
with the name server 1 dot properties now similarly I’ll make another copy
with the names over to dot properties so I will change this to be sober to right
so I have created so one dot properties in servitude
Patties now let me go inside sir one dog properties and make the relevant changes
I’ll type in VI M let me open this file up so this would be so one dot
properties let me edit this right so let me go up
and let me show you what exactly we are supposed to change over here so as we see we’ve got the broker ID to
be equal to zero because we’ve just copied the original file so I will
change this to be equal to one similarly I will go down and wherever we
have broker ID to be equal to zero we’ll set it to be one we’ll make all of those
relevant changes over here so we’d have to do the same thing now once we change
over the broker IDs to be equal to one let me just head down over here and let
me show you what is the next chains to be done so a little more down over here
so you see this log dot directories over here so over here I would have to change
this to 1 similarly let me just go to the bottom of this page and let me add a new line over here so
this time the port number would be equal to nine zero nine three
so the original blue code which was listening to nine zero nine two and
since we are creating a new broker this would be listening to nine zero nine
three now let me just press escape I’ll type in : WQ so this would help me to
see the changes which have made to this new file right now similarly let me also
make these changes in the sir whatever dot properties file
let me type an insert over here let’s change all of these broker IDs for this
time I’ll be changing the value of all of these broker IDs to be equal to two
so this would be two over here the value would be equal to two so I’ll just
delete the zero over here which is present and I’ll change wherever it is
equal to two now similarly let me head down and go down to the server settings
over here so over here we have this log orders and I will change it to Kafka
logs too and I’ll head down to the bottom of the page
so over here Isles at the port number to be equal to nine zero nine four let me
hit escape and let me save this file so : WQ I’ll hit enter so we’ve made the
necessary changes in sub under properties in servitude our properties
now let me again open up a new session let me hear it our duplicate session
over here so let me login now I will go ahead and start these new
pokers which I have created Kafka so we start dark as hedge Kafka’s
slash will be config slash so one dot properties let me attend up and let me just wait
till the new broker which I’ve set up which is so one loads up so you see that though you broker is
starting now again let me open our duplicate session again let me login
inside this let me type in the password now I’ll go
ahead and start the second row cook Kafka’s over star dot as hitch and after
this I’ll type in Kafka / config slash name of the file is
over to dot properties so now again I’ll just wait till this
loads up so I have successfully started both of
these new brokers over here so now that I have started these two brokers let me
go ahead and also create my new topic so you’d have to keep in mind that for
every single thing which are doing you’d have to start out duplicate session now
I’ll type in the command to create a topic so this is the command CAF car
topics dot SH and I’m creating a topic with the name example due and this topic
would have three partitions and the replication factor of this topic is two
so we have successfully created this topic example two now I’ll also go ahead
and start there producer so this is the command to start the producer Kafka
console producer dot s hedge so the broker list is localhost 9 zero nine
three nine zero nine four and nine zero nine two so these are the three brokers
which are listening through these three ports over here and they are sending the
topic example to write so now the producer has started and let me just
send him some messages over here so I’ll type in I love let’s see Paris I also
love India and I love let’s see sherwani as well so these messages we
are sending from the producer now I’ll again duplicate this session
and start the consumer so this is the command to start the
country whoa so cough cock console consumer dot SH and this is listening
through the port nine zero nine three and this has subscribed to the topic
example 2 and this will be processing it from the beginning or it will be reading
it from the beginning so we have this result over here right so producer has
send these messages and the consumer has consumed these messages I also love
India I love Paris I love Germany as well right now let me add something else
over here so let me live it up the producer and let me add a few new lines
or I’ll do is I’ll just add some gibberish over you so I’ve just added
four lines of gibberish let me go to the consumer so we have this four lines of
gibberish over here right so this is how we can set up a multi broker system now
we’ll look at some basic topic operations so let’s see how can we
modify a topic so to modify a topic we just have to use this alter topic
command so this is the entire command but the main thing which I have to keep
in mind is you just have to type in altered topic so this cough car topics
dot has edge zookeeper and it is listening through this port over here to
one eighth one half third I will type in – – altered – – topic and then I’ll
given the name of the topic which I want to call it so this is the name of the
topic example one so initially when we had created this example since it was a
single broker set up the number of partitions for equal to one now since we
have a multi broker set up I can actually set the number of partitions to
be equal to two and this is the result which we get so if partitions are
increased for a topic that has a key the partition logic or ordering of the
messages will be affected all right adding partitions succeeded now we will
see how to delete a topic so to delete a topic this is the command so we’ll just
type in delete topic and then we’ll give in the name of the topic which is flume
topic one and this is the result which we’ll get so flume topic one is marked
for deletion this laughs no impact of delete or topic dot enable is not set to
true and when we go ahead and check the list of all of the topics available we
see that flume topic has been deleted now let’s go ahead and perform this demo
so as we have seen in the PPT so initially our first example one had just
one partition now I want two partitions in it so this is how we can make the
changes alter topic example one and I am setting the number of partitions to be
equal to alright so adding partitions succeeded
now I’ll go ahead and delete a topic so now I actually want to delete the topic
example one so I will type in delete topic example one like we hit enter
right so you see that example one is marked for deletion now let me go ahead
and check the list if this topic has been deleted or not so let me just type
in this command over you so list of car topics or assets let’s do people over
those two one eight one and we see that we don’t have example
one in our list of topics so this means that we have successfully deleted
example one right guys so this brings us to the end of the session so let us go
through a quick quiz so this is our first question over here so cough car is
run as a cluster comprised of one or more servers each of which is called so
what do you think is the answer the answer is broker isn’t it
right so you have got multiple brokers or a single broker in the cluster so a
second question point of the wrong statement so the first statement is the
cough cough cluster does not retain all published messages second segment does
or single kafka broker can handle hundreds of megabytes of reads and
writes per second from thousands of clients the third statement does Kafka
is designed to allow a single cluster to so as the central data backbone for a
large organization and the fourth statement is messages are posted on this
can replicate it within the cluster to prevent data loss so the wrong statement
among this is the cough cough cluster does not retain all of the published
messages all right okay guys a quick info if you want to do an end-to-end
certification on cough cough in telepods provides a complete certification
training on Kafka and those details are available in the description now let us
continue with this session we will set up a multi node crosswalk Lister then we
will see some important education commands and after that we will see some
Kafka tools that can help us in different operations on capital steps
these operations can be graceful shutdown balancing the D balancing –
that is going to be used when you expand your cluster or you decommission your
cluster you can also move the topics partitions from one worker to another
broker using that rebalancing tool and also you can increase the replication
factor let us start with the multi node copper cluster implementation the next
few slides are exactly same that we followed in our previous sessions that
is downloading the kafka topple and this you keep at our bows so I am going to
however these two or three slides very fast this is the link that you can see
on the on the web user interface where from where you can download zookeeper
now the important setup is so to complete the multi node cluster setup
what we will do we will first set up a single node cluster and after that we
will simply copy the zookeeper and half guitar walls to rest of the servers and
start the services so that they can join the cluster so on the first node we will
simply untack the directories that is kasper tarpaulin zookeeper tar wall we
will rename them according to our convenience then we will set up the
environmental variables like Kafka home and zookeeper home in doorways RC file
we will create the Kasbah – locks tag tree where crafter will store its data a
zookeeper underscore home for us less jitka data where do keeper will store
its data we will complete this step on first node and after that we will make
some configuration changes on the same node let us prepare the first node first
and this is my first machine and I am having two turbos here one after one for
the zookeeper I am simply going to untie them you now I am going to rename this newly
created directory you now we have to hunt our directories one
is for Kafka and one is for zookeeper that of the Kafka home and zookeeper
home in Dogpatch RC file I have done this already this is the gospel home
this is the zookeeper home the next step is to create the data directive forecast
by the zookeeper so I am going to make a direct create Pavan icon looks this is
my directive and tocqueville store it’s better similarly I am going to create
one direct paper zookeeper that K Delta this is the director will suit where
zookeeper will store its did now we have completed the first step now we will
make the configuration changes for zookeeper and raka and after that we
will simply copy these craft and zookeeper directories to rest of the
service now let us see what our slide says we have completed this step we will
make the zookeeper configurations in this first for this we need to go to
this directory which is present at zookeeper horns and zoo dot CSV the two
important properties are the data directory that we will update and the
last three lines where we are mentioning that we are having a zookeeper servers
which are running on innovation – Hadoop one dot abc.com the second is running on
nourish – how – two toed a b c doha the third two people server is running it in
a rage – hadoo three-door abc.com let us complete this first I am going to the
computer operation directory of zookeeper here you can see we have Zoo dot you
wonder core sample dot CFG simply rename this to zoo dot CL open this file first of all just update is data
directory you remember we just created the directory on copper zookeeper man
that came data this is the data directory where zookeeper will store its
data secondly we are going to mention the names for all of the zookeeper
servers the first to give our server is going to run Matt Matt is not how to
door marriage – Hadoop one dot abc.com the ports used will be two triple eight
and three triple it and other servers or the rest of the two servers is going to
run off in adage – habbo to abc.com 3.0 you see the first part of the
configuration changes for zookeeper what is the second part
secondly we need to create my ID file and put the same integers used in server
dot X properties in do dot CSV which means we will create a file my ID at
this directly and we will put an integer one for the first degree per server 2
for second duty per server and card for third duty personal so we are on the
first duty per server so we will simply go to VI data directory yes
z:k data you can create a file my ID and this is our first to keeper server so we
will simply put one here this is simply to tell zookeeper that this is my first
zookeeper server and for the rest of the service it will recognize that this is
the second and the third one zookeeper service if you are using only one
zookeeper server we do not need to mention this my ID 5 now we are done
with the zookeeper configuration changes let us make the configuration changes as
well for that we need to go to the config directory of Khafre and where we
can see a server dot properties file where we need to mention be zookeeper
dot connect property and it will improve all the zookeeper server entries you can
see 3u keeper service I think I skipped one property here that is important
property that is brokered or ID and we need to put a unique integer in this
field for every kappa server so we are on the first custom machine so let us
make the changes for crafters well CD babka
conflict here you can see this server door property is fine let us open this
end make big changes you can see this is the broker ID what of first server let
us you can start with 0 as well so then west open to machines where we will run
the Kafka server we can simply put the propriety 1 & 2 there or even simply
start this one the main point is that the holder should be screen ship like 1
2 3 or 0 1 2 so I am just putting one here I am
mentioning that this is my Kafka server and it’s poker ID is one this is the
port that is going to be used nine zero nine to after that I will make the
changes for low tor di yachts means the data directory that we created for Kafka
we created at it Kafka ok this was the path that we created for
after two storage data and the last property is zookeeper dot Connect here we need to
mention all of our zookeeper service all of the three servers either mentioned
here so we are done with the kafka configuration changes let us move to the
slide now the next step is to start these services so whenever you set up
your crafter cluster the important parameter the important processes
zookeeper because it is making coordination among all of your after
services and if your zookeeper is not running the rest of the path of brokers
and service will not run and they will simply shut up so first of all we need
to start this looking for services on all the machines by mentioning all the
machine I mean the machines that you have decided to be a zookeeper forum in
our case we are having three zookeeper service so we will need to start all of
them first and before starting these services let us make the copy of craft
and zookeeper that we have done on our first node we will simply SCP these
crafts and zookeeper directly to node second and North third now I’m going to
SCP the changes that we have done on this first node to second and third one
first let me se P in the kafka directory a second node that is innovation – due
to dot abc.com and I want to copy that – kafra directory
you similarly OPD zookeeper directly to
second we also made changes to don’t bash RC
file so that we can set the environment variable let us copy that – so that we
do not need to manually change this door pushes you on each node okay so from
this first node we copied kafka directory zookeeper directory and dot
wedge RC file to second node let us copy the same on the third node that is
nourish – Hadoop 3 dot abc.com I am popping the tough connective first OPP
zookeeper directory now and lastly open the door better see what so that we need
to not need to make the environmental variable changes manually okay we have
copied the required direct edge to rest of the servers now the important thing
is that the my ID file that we created for zookeeper should have unique value
inside it and the propriety of a crafter server should be unique so on this first
server we are having B value for my ID file is one and the value of propriety for the
surveillance what so on rest other machines for
example on second machine the venue of my addition be second that is – and the
venue of Road ID should be to let us make this changes on second machine I
want the second machine you can see the battery side here I am going to make the
change for my ID and making it to similarly I am making this 3 on the
third zookeeper server now which time to change the forgot properties file where
we need to mention the unique polka dot heidi the rocket or dieting for first
server is 1 so that is put on this second server rope ID – no need to make
any other chain simply save this file do the same on third node open the server
dot properties go to polka dot ID make it unique and let us name it 3 so we are
having 3 pokers first second and third ok we are all set to go let us say what
our slide says now now it’s time to start the services and as I said that
zookeeper is the main service that should be started at first otherwise
rest of the Walker servers will not be working because they will simply shut it
shut down let us start be zookeeper servers on three machines this
is present at the zookeeper start notice it script is presented zookeeper hall
bin set start you can simply do JPS and you can see we are having quorum be and
main that is this routing / services that we have just started let us start
the same on second and third machine is here we can see the service is not equal
zookeeper let us do the same for third node okay so all of our zoo keeper
servers are running now we can simply start the kafka brokers let us start
this on the first node first I am simply hoping this command I am on the first
node I just use Kafka – sir / – start or passage and I provided the configuration
file data server tour properties from these APIs come on we got one new
service that is broken when the same command to start the broker at second
and third node this is the second node I think we’ve got some error let us see
what is the error let us take a hop on it come on oh god we have not said we
mentioned the Kafka home and zookeeper home in toadfish RC file because we has
to be that file from node first but we event problem this source for Paris I
command which will which is required to set up the new variables now we have
done this by running source dot which has a command key now we can start the
server and it should be a success you can see the second doctor broker is
also up let us do the same for third machine first let us see into the source
code beta C and copy the command post article broker okay we got our third
Kafka broker up so we are done with the multi node plus the setup of Kafka let us move to the slides now the next
thing to verify is to create a top tip first of all let us create one topic you
simply used Kafka – Tokyo TSS script this is the command the zookeeper
details but application factor which is giving we are keeping one the partitions
that we want we want only one partition and finally the topic name we got the
output created doping test we can simply verify this by running the list come up so we got locked that we are having a
topic and its name is touched we have we are done till here and further you can
simply run a console console – producer dot SS to produce some messages and
Kappa – console – consumer dot I set to consume those messages we have done this
while we were setting the single node after cluster so I am simply skips
keeping this step this is the output that you can see on your screen when you
will run the Koshka producer and consumer okay as we are done with the
multi node setup of Kafka let us move to some important administration commands
that you are going to use generally the first one is to create a topic that we
just did we run this command after – poking tortoises – – create it mention
the replication factor the partitions that we were and finally the topic name
then we check the same thing if our newly created topic is there or not
using – hi best command now the next command is to
describe I want to see how many partitions my topic have how many what
is the replication factors and what are the configurations that are already set
for my topic for that you can simply run this comma
describe comma let us turn this I from heaven describe – – topic name I
want to check desktop okay
we got the output and we check for popping tests we are having partition
count one that application factor is one where is this topic present at present
I mean on which server is it on the first node second node or third node we
can decide this band leader leader who means this is present at second Kappa
both replication is we are having only one poppy
so replica is also same and is our that is in sync Grampa that is also on the
same node because we are not having any practicals
let us see the next come on now as we were having only one partition let us
change the partition number two three you can use the alter come on here – –
alter you can simply mention these partitions that we want let us do this the command is – – to alter – –
partitions I will need three partitions now here in
this command okay adding partitions succeeded I can
smoothly again run the describe command to see the details this time we should
have partitions count 3 okay you can see partition count is 3 partition 0 1 & 2 as we are not having 4 we are having
only one application factor the only one leader is 2 1
to means for this part this partition is present at the first broker this is
present at the second broker and this partition is also present at the second
block similar is the case pin replicas if we will have multiple depth across
them there these will be mentioned here in Kumasi printed form like if you are
having the replica on first and second then one former two will be here okay if
you want to delete a topic you can simply to use – – delayed
but that if you want to change the configuration for example I want to set
the maximum message bite you can see I’m setting this here so earlier you can see
that we are not having this configuration and I will take this part
now let us set the new configuration I am simply coping this I use the same
command that is – – Delta and provided the new parameter is – – config and the
configuration that I need to put you can see the output updated config for topic
test now you run the dispel command again and you will see that earlier we
were having no configs here and this time you will see then you will be
headed configuration here ok you can see some things match got my sister whites
is equal to this this is what we have recently added now again if you want to
delete any configuration you can simply use – – believe – conflict
let us delete the newly added configuration now ok updated config hot
topic test again we can turn the describe command and we should we should
not see this country mismatched dot message – to whites here you can see it
is gone so these are some of the important commands that you might have
need to run on the topics now okay so we are done till here let us move to the
next patch select their going to cover some important tools and topics in Kapaa
first of all let us see what is a graceful shutdown so when you have a
upend anakov per cluster can get some server dumps you can get
these failures also or you can simply keep this servers down intentionally for
any maintenance purposes so what will happen when a broker goes down or it we
think they’ve done two things happens first one is to sync all its log to risk
it is it will be done automatically by Kafka
the second thing is to move all the leader partitions from that down node to
rest of the nodes it doesn’t mean that we are moving the partitions manually
but we will simply transfer the leadership for example if we are having
three replicas and the one off let us say one broker goes down and it was
having one leader replica so now the leadership will be transferred to rest
of the replica partitions so to make this happen at every stage we need to
set this property control door shut download unable is equal to true if this
property is set the leadership election will be done automatically by Kafka okay
and one important point to note here is that if you are having only one
replication factor then this control door shutdown would enable is equal to
true would not have any meaning because Kafka need some replica copies so that
it can transfer the leadership to that replica copies let us move ahead now now
balancing leadership so what will happen as we know that as
soon as a node will go down a broker will go down all of its
partitions leadership will be transferred to its replicas consider
that note came back up what will happen that as the leaders ship has been
already transferred the bokor that just came up will simply work as the follower
which means that all reads and writes will not go to that node because the
partitions that are present on that broker are only for robots there is no
leader so what we got we got imbalance to maintain or to handle this kind of
situation we need to run this kappa – prefered – replica – election DOTA set
script as an example you can see on the screen that if the list of replicas for
a partition is one five nine okay then note one is preferred as the leadership
to either note 5 or 9 because it is earlier in the replica list you can tell
cough-cough cluster try to restore the leadership to the restored replicas by
running the below come on in this scenario if not first goes down and then
simply it simply comes back up after that you can run this step to restore
the leadership back to node one as we or nobody you know like to do the manual
work we can simply automate this by setting this property Auto dot leader
tour to be balanced or unable is equal to true now extending the cluster in
Kafka it is very easy to add a node or expand your cluster to do that you
simply bring the not copy the or the Kafka configuration directory
from any of the note2 that you know and put a unique new broker ID in the
servitor properties file after that you simply start the service broker services
and it you are done but what about the rebalancing means the new broker will
not have the data automatically well itself if the new data will come that
will simply go or distribute will be distributed among all the servers but
till then consider some of the topics are heavily some of the workers are
heavily loaded and we got some new broker server in that case we will need
to run the balancer mended how can we do that for that we can use partition
reassignment tool first thing is that it is not automatic we need to run this
manually using a script this tool has three modes first one is generate second
one is execute and the third one is verified generate will take some inputs
like what is the topic name and where do you want to move them it will take the
topic names and the broker list where we want to move those topics after that it
will generate a result that this is your proposed reserved depth this is what you
want if we are satisfied with that we can simply copy that into a JSON file
and then run the next mode that is execute mode this is the important mode
that will actually move your topic from one property to another block and the
third note is the verify mode after the completion of execute mode we can simply
run the verify mode and it will tell us whether our target was successful or not let us take an example we will create
two topics that is for one and for two okay with replication factor to one and
partition one after that we will see the locations of these two topics and then
we will try to move them from that location to some other broker let us
first create this one and who I am on the first node I’m going to try create
one similarly I’m going to get now let us describe these two newly
created topics so that we come to know where does they exist actually you can
see that it is having one partition that is partition 0 and it is present there
third bakka let us see where does the second
topic that is 4 – where is it present now ok you can see it is present at
second Walker now first of all we need to create a JSON
file because this tool works on the JSON file only in that JSON file we will
simply mention that I want to take action on these topics okay
let us copy this position format from here okay we will simply name this is to
move we are will create a file top picks to move okay I just pasted that contents
here okay you can see we are taking action of 4-1 taking action for one and
four just save it so we will run our first mode that is generate more that
will simply tell us that you are going to take action on these topics and these
are the new broker lists where this topic will be moved to so I am just
taking this from and from here we generate is the note that I want to
run okay you can see top picks to move JSON file
this is the file that we created where we mentioned that we want to move
through one and four okay so our phone is present at third broke up so I’m
going to move that to first and second and the so to Oh big is present at
second so let us see what will happen when we will run this okay
I think we are not okay with the command that is let me check let me copy the
complete come on and then I think the command is not correct so just let me
check it on the internet what is the exact comma so this is the exact term
and then we see what we are missing here I’m just poking it pasting it into the
first not just updating the occurs localist while is saying operating the
zookeeper server as well okay now I’m pressing enter now it should
generate off proposed JSON file you can see the current partition replica
assignment is who one is having partition zero which is present a third
machine who – is having partition zero which is present at second machine and
the proposed partition reassignment configuration is that we want to move
for one partition zero to the first node and for two partition zero to first node
means we are going to move both of these partition two personal now it’s time to
run the execute not let us run the acute more now okay before we do that we need
to create a file where we will store the proposed results and we will use that
file to run the execute more so I am going to create this file expand cluster
reassignment dot chaser what yeah what I am going to put in there we propose
results that is this simply save this now run the Institute
you okay so you can see you got success
successfully started a segment of partitions we will now run the third
mode that is very high mode which will just confirm that our target was
successful or not this is the command that we will use to verify you can see that the reassignment for
for one partition zero completed successfully
Oh for to cartesian zero completed successfully let us move ahead with the
custom reassignment my custom reassignment means that earlier we were
completely moving one topic from one broker to another but consider I want to
move some partitions only consider adoptees having three partitions and I
want to move the first partition only to some new server in that case we what we
will do we will just like earlier we will simply run these three modes
okay the generate which will show the proposed results then we will do the
execute more and finally we receive verify mode okay so in this screen you
can see consider if I want to move partition 0 of topic for one to some
propers and partition one of topic for 2 to some other brokers we will simply
mention in the custom – the assignment or JSON file that topic is this I want
to move partition 0 and I want to go this two nodes I want to create up
together these two modes 5th or 6th similarly for photo I want to move
partition 1 and I want them to move to 2 when 3 this is called custom
reassignment nothing has changed we just updated our
JSON file on that we want to run the execute and verify mode we will then
simply when the execute more and then simply than the verify more to verify if
we got success or not now how to decommission or broker so
this functionality is not there in cough cut means there is no tool to
decommission a broker so your admin will have to manually move your topics from
the broker that you want to be commissioned to some other broker and
after that you can simply shut down that broker and that will be decommissioned
so in future disease of pasta we can we may have this big Commission tool so
that we do not need to take actions manually now in my previous session I
said that we cannot increase that application factor so by that mean that
we cannot simply you know for on the altar come on to change the application
factor if we want to change the replication factor we need to use this
reassignment – okay so for that we will simply add this replica thing you can
you remember you know earlier we are simply mentioning the topic name then we
try to do the custom reassignment by mentioning the partition name and now
you can see we can also mention the replicas that will increase that
application factor to the number of servers that are given in this field
okay now let us try to do this thing let us prepare this file that is in
cheese replication fact about this we are going to do this for you okay I open
this file I am just hoping these content syndication format so that I do not mess
up with the format of creation file because it is a very critical thing I am
changing the topic name to one and I’m saying that I want the application set
one two and three I want this one all three books now we will run the execute
comma that is this hope to copy it you can see we are using this increase –
application – factor for JSON file that we just created earlier you can see we
got success successfully started the assignment of partitions for one to
these three brokers let us verify this by verify come on if our three
assignment was successful or not okay we got the result that for one
partition zero is completed successfully we can also verify this by running the
describe command and earlier we were having only one application factor so
let us see what this shows us this time okay you can see partition zero
replication factor has been changed to three and these three replicas are
present at first second and third norm and Instagram because means all the
replicas are in sync okay guys a quick confirm if you want to
do an end-to-end certification on cough cough in telepods provides a complete
certification training on Kafka and those details are available in the
description okay guys we’ve come to the end of this session
I hope this session on cough cough was helpful and informative for you if you
have any queries regarding this session please leave a comment below and be
allowed to help you out thank you

4 Comments

  • Intellipaat

    February 13, 2020

    Guys, what else do you want to learn from Intellipaat? Comment down below and let us know so we can create more such tutorials for you.

    Reply
  • Bharath Goud

    February 13, 2020

    It's programming are tool??

    Reply
  • Vamsi V3

    February 13, 2020

    Can u do a session on Json ……..!

    Reply
  • Anush Babu

    February 13, 2020

    Please do one video on micro service

    Reply

Leave a Reply