Kafka 0.9+Zookeeper3.4.6集群搭建、配置，新Client API的使用要點，高可用性測試，以及各種坑

2019-11-06 09:59:34
字體：大中小
來源：轉載
供稿：網友
Kafka 0.9版本對java client的api做出了較大調整，本文主要總結了Kafka 0.9在集群搭建、高可用性、新API方面的相關過程和細節，以及本人在安裝調試過程中踩出的各種坑。
關于Kafka的結構、功能、特點、適用場景等，網上到處都是，我就不再贅述了，直接進入正文
Kafka 0.9集群安裝配置
操作系統：CentOS 6.5
1. 安裝Java環境
Zookeeper和Kafka的運行都需要Java環境，所以先安裝JRE，Kafka默認使用G1垃圾回收器，如果不更改垃圾回收器，官方推薦使用 7u51以上版本的JRE。如果你使用老版本的JRE，需要更改Kafka的啟動腳本，指定G1以外的垃圾回收器。
Java環境的安裝過程在此不贅述了。
2. Zookeeper集群搭建
Kafka依賴Zookeeper管理自身集群（Broker、Offset、PRoducer、Consumer等），所以先要安裝 Zookeeper。自然，為了達到高可用的目的，Zookeeper自身也不能是單點，接下來就介紹如何搭建一個最小的Zookeeper集群（3個 zk節點）
此處選用Zookeeper的版本是3.4.6，此為Kafka0.9中推薦的Zookeeper版本。
首先解壓
tar -xzvf zookeeper-3.4.6.tar.gz 
    進入zookeeper的conf目錄，將zoo_sample.cfg復制一份，命名為zoo.cfg，此即為Zookeeper的配置文件
 
cp zoo_sample.cfg zoo.cfg 
    編輯zoo.cfg
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/data/zk/zk0/data dataLogDir=/data/zk/zk0/logs # the port at which the clients will connect clientPort=2181 server.0=10.0.0.100:4001:4002 server.1=10.0.0.101:4001:4002 server.2=10.0.0.102:4001:4002 
 
dataDir和dataLogDir的路徑需要在啟動前創建好clientPort為zookeeper的服務端口server.0/1/2為zk集群中三個node的信息，定義格式為hostname:port1:port2，其中port1是node間通信使用的端口，port2是node選舉使用的端口，需確保三臺主機的這兩個端口都是互通的    在另外兩臺主機上執行同樣的操作，安裝并配置zookeeper
    分別在三臺主機的dataDir路徑下創建一個文件名為myid的文件，文件內容為該zk節點的編號。例如在第一臺主機上建立的myid文件內容是0，第二臺是1。
 
    接下來，啟動三臺主機上的zookeeper服務：
bin/zkServer.sh start    3個節點都啟動完成后，可依次執行如下命令查看集群狀態：
bin/zkServer.sh status    命令輸出如下：
    Mode: leader 或 Mode: follower
    3個節點中，應有1個leader和兩個follower
 
 
    驗證zookeeper集群高可用性：
    假設目前3個zk節點中，server0為leader，server1和server2為follower
    我們停掉server0上的zookeeper服務：
bin/zkServer.sh stop    再到server1和server2上查看集群狀態，會發現此時server1（也有可能是server2）為leader，另一個為follower。
 
    再次啟動server0的zookeeper服務，運行zkServer.sh status檢查，發現新啟動的server0也為follower
    至此，zookeeper集群的安裝和高可用性驗證完成。
 
    附：Zookeeper默認會將控制臺信息輸出到啟動路徑下的zookeeper.out中，顯然在生產環境中我們不能允許Zookeeper這樣做，通過如下方法，可以讓Zookeeper輸出按尺寸切分的日志文件：
    修改conf/log4j.properties文件，將
    zookeeper.root.logger=INFO, CONSOLE
    改為
    zookeeper.root.logger=INFO, ROLLINGFILE
    修改bin/zkEnv.sh文件，將
    ZOO_LOG4J_PROP="INFO,CONSOLE"
    改為
    ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
    然后重啟zookeeper，就ok了
 
 
3. Kafka集群搭建
    此例中，我們會安裝配置一個有兩個Broker組成的Kafka集群，并在其上創建一個兩個分區的Topic
    本例中使用Kafka最新版本0.9.0.1
 
    首先解壓
tar -xzvf kafka_2.11-0.9.0.1.tgz    編輯config/server.properties文件，下面列出關鍵的參數
 
#此Broker的ID，集群中每個Broker的ID不可相同 broker.id=0 #◇◇器，端口號與port一致即可 listeners=PLAINTEXT://:9092 #Broker◇◇的端口 port=9092 #Broker的Hostname，填主機ip即可 host.name=10.0.0.100 #向Producer和Consumer建議連接的Hostname和port （此處有坑，具體見后） advertised.host.name=10.0.0.100 advertised.port=9092 #進行IO的線程數，應大于主機磁盤數 num.io.threads=8 #消息文件存儲的路徑 log.dirs=/data/kafka-logs #消息文件清理周期，即清理x小時前的消息記錄 log.retention.hours=168 #每個Topic默認的分區數，一般在創建Topic時都會指定分區數，所以這個配成1就行了 num.partitions=1 #Zookeeper連接串，此處填寫上一節中安裝的三個zk節點的ip和端口即可 zookeeper.connect=10.0.0.100:2181,10.0.0.101:2181,10.0.0.102:2181 
 
    配置項的詳細說明請見官方文檔：http://kafka.apache.org/documentation.html#brokerconfigs
 
    此處的坑：
按 照官方文檔的說法，advertised.host.name和advertised.port這兩個參數用于定義集群向Producer和 Consumer廣播的節點host和port，如果不定義的話，會默認使用host.name和port的定義。但在實際應用中，我發現如果不定義 advertised.host.name參數，使用Java客戶端從遠端連接集群時，會發生連接超時，拋出異 常：org.apache.kafka.common.errors.TimeoutException: Batch Expired 經過debug發現，連接到集群是成功的，但連接到集群后更新回來的集群meta信息卻是錯誤的： 能夠看到，metadata中的Cluster信息，節點的hostname是iZ25wuzqk91Z這樣的一串數字，而不是實際的ip地址 10.0.0.100和101。iZ25wuzqk91Z其實是遠端主機的hostname，這說明在沒有配置advertised.host.name 的情況下，Kafka并沒有像官方文檔宣稱的那樣改為廣播我們配置的host.name，而是廣播了主機配置的hostname。遠端的客戶端并沒有配置 hosts，所以自然是連接不上這個hostname的。要解決這一問題，把host.name和advertised.host.name都配置成絕對 的ip地址就可以了。 
    接下來，我們在另一臺主機也完成Kafka的安裝和配置，然后在兩臺主機上分別啟動Kafka：
bin/kafka-server-start.sh -daemon config/server.properties  
    此處的坑：
官方給出的后臺啟動kafka的方法是：bin/kafka-server-start.sh config/server.properties &     但用這種方式啟動后，只要斷開Shell或登出，Kafka服務就會自動shutdown，不知是OS的問題還是SSH的問題還是Kafka自己的問題，總之我改用-daemon方式啟動Kafka才不會在斷開shell后自動shutdown。
 
 
    接下來，我們創建一個名為test，擁有兩個分區，兩個副本的Topic：
bin/kafka-topics.sh --create --zookeeper 10.0.0.100:2181,10.0.0.101:2181,10.0.0.102:2181 --replication-factor 2 --partitions 2 --topic test 
    創建完成后，使用如下命令查看Topic狀態：
bin/kafka-topics.sh --describe --zookeeper 10.0.0.100:2181,10.0.0.101:2181,10.0.0.102:2181 --topic test 
    輸出：
Topic:test PartitionCount:2 ReplicationFactor:2 Configs:      Topic: test Partition: 0 Leader: 1 Replicas: 1,0 Isr: 0,1      Topic: test Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1 
    解讀：test這個topic，當前有2個分區，分別為0和1，分區0的Leader是1（這個1是broker.id），分區0有兩個 Replica（副本），分別是1和0，這兩個副本中，Isr（In-sync）的是0和1。分區2的Leader是0，也有兩個Replica，同樣也 是兩個replica都是in-sync狀態
 
 
至此，Kafka 0.9集群的搭建工作就完成了，接下來我們將介紹新的Java API的使用，以及集群高可用性的驗證測試。
 
4. 使用Kafka的Producer API來完成消息的推送
 
1) Kafka 0.9.0.1的java client依賴：
	<dependency>	    <groupId>org.apache.kafka</groupId>	    <artifactId>kafka-clients</artifactId>	    <version>0.9.0.1</version>	</dependency> 
2) 寫一個KafkaUtil工具類，用于構造Kafka Client
public class KafkaUtil {	private static KafkaProducer<String, String> kp;	public static KafkaProducer<String, String> getProducer() {		if (kp == null) {			Properties props = new Properties();			props.put("bootstrap.servers", "10.0.0.100:9092,10.0.0.101:9092");			props.put("acks", "1");			props.put("retries", 0);			props.put("batch.size", 16384);			props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");			props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");			kp = new KafkaProducer<String, String>(props);		}		return kp;	}}  KafkaProducer<K,V>的K代表每條消息的key類型，V代表消息類型。消息的key用于決定此條消息由哪一個partition接收，所以我們需要保證每條消息的key是不同的。
  Producer端的常用配置
bootstrap.servers：Kafka集群連接串，可以由多個host:port組成acks：broker消息確認的模式，有三種：0：不進行消息接收確認，即Client端發送完成后不會等待Broker的確認1：由Leader確認，Leader接收到消息后會立即返回確認信息all：集群完整確認，Leader會等待所有in-sync的follower節點都確認收到消息后，再返回確認信息我們可以根據消息的重要程度，設置不同的確認模式。默認為1retries：發送失敗時Producer端的重試次數，默認為0batch.size：當同時有大量消息要向同一個分區發送時，Producer端會將消息打包后進行批量發送。如果設置為0，則每條消息都DuLi發送。默認為16384字節linger.ms：發送消息前等待的毫秒數，與batch.size配合使用。在消息負載不高的情況下，配置linger.ms能夠讓Producer在發送消息前等待一定時間，以積累更多的消息打包發送，達到節省網絡資源的目的。默認為0key.serializer/value.serializer：消息key/value的序列器Class，根據key和value的類型決定buffer.memory：消息緩沖池大小。尚未被發送的消息會保存在Producer的內存中，如果消息產生的速度大于消息發送的速度，那么緩沖池滿后發送消息的請求會被阻塞。默認33554432字節（32MB）  更多的Producer配置見官網：http://kafka.apache.org/documentation.html#producerconfigs
 
  3) 寫一個簡單的Producer端，每隔1秒向Kafka集群發送一條消息：
public class KafkaTest {	public static void main(String[] args) throws Exception{		Producer<String, String> producer = KafkaUtil.getProducer();		int i = 0;		while(true) {			ProducerRecord<String, String> record = new ProducerRecord<String, String>("test", String.valueOf(i), "this is message"+i);			producer.send(record, new Callback() {				public void onCompletion(RecordMetadata metadata, Exception e) {					if (e != null)						e.printStackTrace();					System.out.println("message send to partition " + metadata.partition() + ", offset: " + metadata.offset());				}			});			i++;			Thread.sleep(1000);		}	}} 
  在調用KafkaProducer的send方法時，可以注冊一個回調方法，在Producer端完成發送后會觸發回調邏輯，在回調方法的 metadata對象中，我們能夠獲取到已發送消息的offset和落在的分區等信息。注意，如果acks配置為0，依然會觸發回調邏輯，只是拿不到 offset和消息落地的分區信息。
    跑一下，輸出是這樣的：
message send to partition 0, offset: 28 message send to partition 1, offset: 26 message send to partition 0, offset: 29 message send to partition 1, offset: 27 message send to partition 1, offset: 28 message send to partition 0, offset: 30 message send to partition 0, offset: 31 message send to partition 1, offset: 29 message send to partition 1, offset: 30 message send to partition 1, offset: 31 message send to partition 0, offset: 32 message send to partition 0, offset: 33 message send to partition 0, offset: 34 message send to partition 1, offset: 32  乍一看似乎offset亂掉了，但其實這是因為消息分布在了兩個分區上，每個分區上的offset其實是正確遞增的。
 
5. 使用Kafka的Consumer API來完成消息的消費
 
1) 改造一下KafkaUtil類，加入Consumer client的構造。
public class KafkaUtil {	private static KafkaProducer<String, String> kp;	private static KafkaConsumer<String, String> kc;	public static KafkaProducer<String, String> getProducer() {		if (kp == null) {			Properties props = new Properties();			props.put("bootstrap.servers", "10.0.0.100:9092,10.0.0.101:9092");			props.put("acks", "1");			props.put("retries", 0);			props.put("batch.size", 16384);			props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");			props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");			kp = new KafkaProducer<String, String>(props);		}		return kp;	}		public static KafkaConsumer<String, String> getConsumer() {		if(kc == null) {			Properties props = new Properties();			props.put("bootstrap.servers", "10.0.0.100:9092,10.0.0.101:9092");			props.put("group.id", "1");			props.put("enable.auto.commit", "true");			props.put("auto.commit.interval.ms", "1000");			props.put("session.timeout.ms", "30000");			props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");			props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");			kc = new KafkaConsumer<String, String>(props);		}		return kc;	}}  同樣，我們介紹一下Consumer常用配置
bootstrap.servers/key.deserializer/value.deserializer：和Producer端的含義一樣，不再贅述fetch.min.bytes：每次最小拉取的消息大小（byte）。Consumer會等待消息積累到一定尺寸后進行批量拉取。默認為1，代表有一條就拉一條max.partition.fetch.bytes：每次從單個分區中拉取的消息最大尺寸（byte），默認為1Mgroup.id：Consumer的group id，同一個group下的多個Consumer不會拉取到重復的消息，不同group下的Consumer則會保證拉取到每一條消息。注意，同一個group下的consumer數量不能超過分區數。enable.auto.commit：是否自動提交已拉取消息的offset。提交offset即視為該消息已經成功被消費，該組下的Consumer無法再拉取到該消息（除非手動修改offset）。默認為trueauto.commit.interval.ms：自動提交offset的間隔毫秒數，默認5000。  全部的Consumer配置見官方文檔：http://kafka.apache.org/documentation.html#newconsumerconfigs
 
2) 編寫Consumer端：
public class KafkaTest {	public static void main(String[] args) throws Exception{		KafkaConsumer<String, String> consumer = KafkaUtil.getConsumer();		consumer.subscribe(Arrays.asList("test"));		while(true) {			ConsumerRecords<String, String> records = consumer.poll(1000);			for(ConsumerRecord<String, String> record : records) {				System.out.println("fetched from partition " + record.partition() + ", offset: " + record.offset() + ", message: " + record.value());			}		}	}} 
  運行輸出：
fetched from partition 0, offset: 28, message: this is message0 fetched from partition 0, offset: 29, message: this is message2 fetched from partition 0, offset: 30, message: this is message5 fetched from partition 0, offset: 31, message: this is message6 fetched from partition 0, offset: 32, message: this is message10 fetched from partition 0, offset: 33, message: this is message11 fetched from partition 0, offset: 34, message: this is message12 fetched from partition 1, offset: 26, message: this is message1 fetched from partition 1, offset: 27, message: this is message3 fetched from partition 1, offset: 28, message: this is message4 fetched from partition 1, offset: 29, message: this is message7 fetched from partition 1, offset: 30, message: this is message8 fetched from partition 1, offset: 31, message: this is message9 fetched from partition 1, offset: 32, message: this is message13 
說明：
KafkaConsumer的poll方法即是從Broker拉取消息，在poll之前首先要用subscribe方法訂閱一個Topic。poll方法的入參是拉取超時毫秒數，如果沒有新的消息可供拉取，consumer會等待指定的毫秒數，到達超時時間后會直接返回一個空的結果集。如 果Topic有多個partition，KafkaConsumer會在多個partition間以輪詢方式實現負載均衡。如果啟動了多個 Consumer線程，Kafka也能夠通過zookeeper實現多個Consumer間的調度，保證同一組下的Consumer不會重復消費消息。注 意，Consumer數量不能超過partition數，超出部分的Consumer無法拉取到任何數據。可以看出，拉取到的消息并不是完全順序化的，kafka只能保證一個partition內的消息先進先出，所以在跨partition的情況下，消息的順序是沒有保證的。本 例中采用的是自動提交offset，Kafka client會啟動一個線程定期將offset提交至broker。假設在自動提交的間隔內發生故障（比如整個JVM進程死掉），那么有一部分消息是會被 重復消費的。要避免這一問題，可使用手動提交offset的方式。構造consumer時將enable.auto.commit設為false，并在代 碼中用consumer.commitSync()來手動提交。如果不想讓kafka控制consumer拉取數據時在partition間的負載均衡，也可以手工控制：
	public static void main(String[] args) throws Exception{		KafkaConsumer<String, String> consumer = KafkaUtil.getConsumer();	    String topic = "test";	    TopicPartition partition0 = new TopicPartition(topic, 0);	    TopicPartition partition1 = new TopicPartition(topic, 1);	    consumer.assign(Arrays.asList(partition0, partition1));		while(true) {			ConsumerRecords<String, String> records = consumer.poll(100);			for(ConsumerRecord<String, String> record : records) {				System.out.println("fetched from partition " + record.partition() + ", offset: " + record.offset() + ", message: " + record.value());			}			consumer.commitSync();		}	} 使用consumer.assign()方法為consumer線程指定1個或多個partition。
 
  此處的坑：
在測試中我發現，如果用手工指定partition的方法拉取消息，不知為何kafka的自動提交offset機制會失效，必須使用手動方式才能正確提交已消費的消息offset。 
  題外話：
在 真正的應用環境中，Consumer端將消息拉取下來后要做的肯定不止是輸出出來這么簡單，在消費消息時很有可能需要花掉更多的時間。1個 Consumer線程消費消息的速度很有可能是趕不上Producer產生消息的速度，所以我們不得不考慮Consumer端采用多線程模型來消費消息。 然而KafkaConsumer并不是線程安全的，多個線程操作同一個KafkaConsumer實例會出現各種問題，Kafka官方對于Consumer端的多線程處理給出的指導建議如下： 1. 每個線程都持有一個KafkaConsumer對象 好處： 實現簡單不需要線程間的協作，效率最高最容易實現每個Partition內消息的順序處理弊端：
每個KafkaConsumer都要與集群保持一個TCP連接線程數不能超過Partition數每一batch拉取的數據量會變小，對吞吐量有一定影響2. 解耦，1個Consumer線程負責拉取消息，數個Worker線程負責消費消息好處：
可自由控制Worker線程的數量，不受Partition數量限制弊端：
消息消費的順序無法保證難以控制手動提交offset的時機個人認為第二種方式更加可取，consumer數不能超過partition數這個限制是很要命的，不可能為了提高Consumer消費消息的效率而把Topic分成更多的partition，partition越多，集群的高可用性就越低。
 
 
6. Kafka集群高可用性測試
 
1) 查看當前Topic的狀態：
/kafka-topics.sh --describe --zookeeper 10.0.0.100:2181,10.0.0.101:2181,10.0.0.102:2181 --topic test  輸出：
Topic:test PartitionCount:2 ReplicationFactor:2 Configs:    Topic: test Partition: 0 Leader: 1 Replicas: 1,0 Isr: 0,1    Topic: test Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1  可以看到，partition0的leader是broker1，parition1的leader是broker0
 
2) 啟動Producer向Kafka集群發送消息
  輸出：
message send to partition 0, offset: 35 message send to partition 1, offset: 33 message send to partition 0, offset: 36 message send to partition 1, offset: 34 message send to partition 1, offset: 35 message send to partition 0, offset: 37 message send to partition 0, offset: 38 message send to partition 1, offset: 36 message send to partition 1, offset: 37 
3) 登錄SSH將broker0，也就是partition 1的leader kill掉
 
  再次查看Topic狀態：
Topic:test PartitionCount:2 ReplicationFactor:2 Configs:   Topic: test Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1   Topic: test Partition: 1 Leader: 1 Replicas: 0,1 Isr: 1  可以看到，當前parition0和parition1的leader都是broker1了
 
  此時再去看Producer的輸出：
[kafka-producer-network-thread | producer-1] DEBUG org.apache.kafka.common.network.Selector - Connection with /10.0.0.100 disconnected java.net.ConnectException: Connection refused: no further information     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)     at org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:54)     at org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:72)     at org.apache.kafka.common.network.Selector.poll(Selector.java:274)     at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)     at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)     at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)     at java.lang.Thread.run(Thread.java:745)[kafka-producer-network-thread | producer-1] DEBUG org.apache.kafka.clients.Metadata - Updated cluster metadata version 7 to Cluster(nodes = [Node(1, 10.0.0.101, 9092)], partitions = [Partition(topic = test, partition = 1, leader = 1, replicas = [1,], isr = [1,], Partition(topic = test, partition = 0, leader = 1, replicas = [1,], isr = [1,]]) 
  能看到Producer端的DEBUG日志顯示與broker0的鏈接斷開了，此時Kafka立刻開始更新集群metadata，更新后的metadata表示broker1現在是兩個partition的leader，Producer進程很快就恢復繼續運行，沒有漏發任何消息，能夠看出Kafka集群的故障切換機制還是很厲害的
 
4) 我們再把broker0啟動起來
bin/kafka-server-start.sh -daemon config/server.properties   然后再次檢查Topic狀態：
Topic:test PartitionCount:2 ReplicationFactor:2 Configs:    Topic: test Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0    Topic: test Partition: 1 Leader: 1 Replicas: 0,1 Isr: 1,0  我們看到，broker0啟動起來了，并且已經是in-sync狀態（注意Isr從1變成了1,0），但此時兩個partition的leader還都是 broker1，也就是說當前broker1會承載所有的發送和拉取請求。這顯然是不行的，我們要讓集群恢復到負載均衡的狀態。
  這時候，需要使用Kafka的選舉工具觸發一次選舉：
bin/kafka-preferred-replica-election.sh --zookeeper 10.0.0.100:2181,10.0.0.101:2181,10.0.0.102:2181  選舉完成后，再次查看Topic狀態：
Topic:test PartitionCount:2 ReplicationFactor:2 Configs:    Topic: test Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0    Topic: test Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0  可以看到，集群重新回到了broker0掛掉之前的狀態
  但此時，Producer端產生了異常：
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.  原因是Producer端在嘗試向broker1的parition0發送消息時，partition0的leader已經切換成了broker0，所以消息發送失敗。
  此時用Consumer去消費消息，會發現消息的編號不連續了，確實漏發了一條消息。這是因為我們在構造Producer時設定了retries=0，所以在發送失敗時Producer端不會嘗試重發。
  將retries改為3后再次嘗試，會發現leader切換時再次發生了同樣的問題，但Producer的重發機制起了作用，消息重發成功，啟動Consumer端檢查也證實了所有消息都發送成功了。
 
每 次集群單點發生故障恢復后，都需要進行重新選舉才能徹底恢復集群的leader分配，如果嫌每次這樣做很麻煩，可以在broker的配置文件（即 server.properties）中配置auto.leader.rebalance.enable=true，這樣broker在啟動后就會自動進 行重新選舉 
 
至此，我們通過測試證實了集群出現單點故障和恢復的過程中，Producer端能夠保持正確運轉。接下來我們看一下Consumer端的表現：
 
5) 同時啟動Producer進程和Consumer進程
  此時Producer一邊在生產消息，Consumer一邊在消費消息
 
6) 把broker0干掉，觀察Consumer端的輸出：
能看到，在broker0掛掉后，consumer也端產生了一系列INFO和WARN輸出，但同Producer端一樣，若干秒后自動恢復，消息仍然是連續的，并未出現斷點。
 
7) 再次把broker0啟動，并觸發重新選舉，然后觀察輸出：
fetched from partition 0, offset: 418, message: this is message48 fetched from partition 0, offset: 419, message: this is message49 [main] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Offset commit for group 1 failed due to NOT_COORDINATOR_FOR_GROUP, will find new coordinator and retry [main] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483646 dead. [main] WARN org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Auto offset commit failed: This is not the correct coordinator for this group. fetched from partition 1, offset: 392, message: this is message50 fetched from partition 0, offset: 420, message: this is message51  能看到，重選舉后Consumer端也輸出了一些日志，意思是在提交offset時發現當前的調度器已經失效了，但很快就重新獲取了新的有效調度器，恢復 了offset的自動提交，驗證已提交offset的值也證明了offset提交并未因leader切換而發生錯誤。
 
  如上，我們也通過測試證實了Kafka集群出現單點故障時，Consumer端的功能正確性。