MIT 6.5840
Publish on 2023-04-28
MIT 6.5840 note

😇 All I need to know about 6.5840
Full code on:
https://github.com/humbornjo/MIT-6.5840

Lab1

💡 虽然本次实验没有触及,但是GFS才是Map Reduce的灵魂。这也是很多MR方法局限在单机运行的根本原因。在分布式的场景中,intermediate的交换才是事实上的重点。

Difficulties & solution

coordinator 作为调度者,就算其元素都是线程安全的,依然是需要加一个整体的锁的。(仅针对我的设计与实现)
type Coordinator struct {
	// Your definitions here.
	nMap     int
	nReduce  int
	stage    int
	currTask chan Task
	runnTask map[Task]chan int
	lock     sync.RWMutex
}

在我的实现中,当currTask和runnTask全为空时方可触发stage的转换(比如从MAP转移到REDUCE)。如果分别加锁,就可能导致 →

  1. 检查currTask时为空
  2. runnTask超时,删除任务,未完成的任务重新入队currTask
  3. 检查runnTask,也为空
  4. 状态转换了,MR失败
rpc的设计与细节
/*
type Task struct {
	Tid   int
	Fname string
}
*/

type AskForTaskArgs struct {
	Wid int
}

type AskForTaskReply struct {
	Stage int
	Task  Task
	Nout  int
}

type ReportForTaskArgs struct {
	Task Task
}

type ReportForTaskReply struct {
	Foo bool
}

总体分为Ask和Report两种,主要的问题在于Report RPC前后临时文件的保存和状态的转变,下面进行分类分析。

  1. MAP阶段的Report,是在RPC获得true的回复后进行保存临时文件,还是先保存临时文件,再call RPC。
  2. 如果在获得RPC回复后再保存。如果RPC结果为true(在我的实现中,是reply.Foo == true)。会出现一个恐怖的事,那就是coordinator已经转换状态了。开始分发REDUCE的任务了,但是MAP阶段的intermediate文件还没有保存完。这就会导致,如果有一个Worker此时恰好接了任务,并竞争运行了,然而目标文件甚至不存在,最终REDUCE的任务结果出错。如果RPC结果为false,影响不大,就算保存了也会被另一个Worker的文件写入替换。
  3. 如果在获得RPC回复前再保存。如果RPC结果为true,不会有任何问题,解决了上方的问题。如果RPC结果为false,影响不大,就算保存了也会被另一个Worker的文件写入替换。
  4. 综上,选择在RPC回复前保存。
  5. Reduce阶段的Report,没啥好说的,保存必然是在RPC之前的,因为实现borrow了mrsequential的代码,目前来看,换成在RPC之后阻塞地保存,唯一的风险就是Worker的临时宕机。(最后不要忘了删除临时文件)
if !ReportForTask(
	&ReportForTaskArgs{Task: reply.Task}, 
	&ReportForTaskReply{}) {
	os.Remove(oname)
} else {
	for _, filename := range filenames {
		os.Remove(filename)
	}
}
  • PASS LOG
MIT-6.5840/src/main via 🐹 v1.20.6 
> bash test-mr.sh
*** Starting wc test.
--- wc test: PASS
*** Starting indexer test.
--- indexer test: PASS
*** Starting map parallelism test.
--- map parallelism test: PASS
*** Starting reduce parallelism test.
--- reduce parallelism test: PASS
*** Starting job count test.
--- job count test: PASS
*** Starting early exit test.
--- early exit test: PASS
*** Starting crash test.
--- crash test: PASS
*** PASSED ALL TESTS

Lab2

💡 What u deserve to understand is that, even raft can make mistake. When the delay is high and command flood in.

Core Idea

  1. Raft RPCS is idempotent
  2. Randomization causes simplicity
  3. Leader takes it all
  4. Logical timing instead of physical timing

$$ broadcastTime ≪ electionTimeout ≪ MTBF $$

Mind Trail

lab2 Raft

Difficulties & solution

2A

  • Term严格小的才能给Term大的server投票,但是容易出现以下情况 (忽略第三列),即server S1和S2几乎以同一时间开始选举 (即使存在随机timeout),导致多轮都无法选出leader。
Debug log
S0 C0 ask for vote, T: 8                                                                                                                                     
S0 not leader, election timeout...                                                                                                                           
S0 convert to candidate, calling election T: 7                                                                                                               
                                                                                                         S2 C0 ask for vote, T: 8                            
                                                     S1 C0 ask for vote, T: 8                                                                                
S0 not leader, election timeout...                                                                                                                           
S0 convert to candidate, calling election T: 8                                                                                                               
                                                                                                         S2 C0 ask for vote, T: 9                            
                                                     S1 not leader, election timeout...                                                                      
                                                     S1 convert to candidate, calling election T: 8                                                          
                                                                                                         S2 C0 ask for vote, T: 9                            
                                                     S1 C0 ask for vote, T: 9                                                                                
S0 C0 ask for vote, T: 9                                                                                                                                     
S0 not leader, election timeout...                                                                                                                           
S0 convert to candidate, calling election T: 9                                                                                                               
                                                                                                         S2 C0 ask for vote, T: 10                           
                                                     S1 C0 ask for vote, T: 10                                                                               
                                                     S1 not leader, election timeout...                                                                      
                                                     S1 convert to candidate, calling election T: 9                                                          
                                                                                                         S2 C0 ask for vote, T: 10                           
S0 C0 ask for vote, T: 10                                                                                                                                    
S0 -> S1 vote false, VT: 0                                                                                                                                   
S0 not leader, election timeout...                                                                                                                           
S0 convert to candidate, calling election T: 10
排查出的问题 1. 在RequestVote RPCs中就更新了自己的Term,使得落后的server一下子就追上了因random而领先的server 2. timeout区间为 [150, 300] ms
可能的解决办法 1. ~~不在RequestVote RPCs中更新Term。~~ 在更新以后把isTimeout置为false,跳过一轮选举自增。 2. 增大timeout的区间
方案

采用了方法1,并将timeout区间修改为 [200, 500] ms,暂时解决问题。


  • 观察第二列和最后一列,可以发现这里 S1 和 S6 出现了脑裂现象,但是并未得到解决。原因是两方此时的Term相同,两方互相会拒绝对方的heartbeat log而不会进行状态转换。
Debug log
                       S1 multi leaders(2)                                                                                                                 
                       T: 7                                                                                                                                
                                                                                                                                     S6 multi leaders(2) T:
                                                                                                                                     7                     
                                                                                                                                     S6 -> S5 T: 7, send   
                                                                                                                                     {PLI: 0, PLT: 0}      
                       S1 -> S6 T: 7, send                                                                                                                 
                       {PLI: 0, PLT: 0}                                                                                                                    
                                             S2 convert to                                                                                                 
                                             candidate, calling                                                                                            
                                             election T: 7                                                                                                 
                                                                                                                                     S6 C0 ask for vote, T:
                                                                                                                                     8                     
                                                                                                                                     S6 -> S0 T: 7, send   
                                                                                                                                     {PLI: 0, PLT: 0}      
                                                                                                                                     S6 -> S1 T: 7, send   
                                                                                                                                     {PLI: 0, PLT: 0}      
                                                                                                                                     S6 -> S2 T: 7, send   
                                                                                                                                     {PLI: 0, PLT: 0}      
                                                                                                                                     S6 -> S3 T: 7, send   
                                                                                                                                     {PLI: 0, PLT: 0}      
                                                                                                                                     S6 -> S4 T: 7, send   
                                                                                                                                     {PLI: 0, PLT: 0}      
                       S1 -> S0 T: 7, send                                                                                                                 
                       {PLI: 0, PLT: 0}                                                                                                                    
                       S1 -> S2 T: 7, send                                                                                                                 
                       {PLI: 0, PLT: 0}                                                                                                                    
                       S1 -> S3 T: 7, send                                                                                                                 
                       {PLI: 0, PLT: 0}                                                                                                                    
                       S1 -> S4 T: 7, send                                                                                                                 
                       {PLI: 0, PLT: 0}                                                                                                                    
                       S1 -> S5 T: 7, send                                                                                                                 
                       {PLI: 0, PLT: 0}
排查出的问题 1. Term相同,在身为leader的同时对峙,由于拒绝了heartbeat log导致的状态无法转换
可能的解决方法 1. 本质上是在disconnect的时候没有检查自己的killed(),及时转换为follower。论文中没有提到Term相同的情况下Entry (Including heartbeat)的处理情况。
方案 1. 在每个heartbeat loop中检查killed(),及时转换状态。 2. 运行时又发现了一个小问题,下面分析出问题的代码。AppendEntry分为三个情况,currTerm < args.Term, currTerm = args.Term 和 currTerm > args.Term。**第一种情况**没啥好说的,**第二种情况下**,若server还未投票 (voteFor == -1) 则server一定不是Leader;投了票的情况下拒绝可以保证一个Term只投一次票的原则。第三种情况下,需要注意,server的状态可能是三个状态中的任意一种,所以一定要重置server的状态为Follower来保证Term大的优先的原则。
if rf.currTerm > args.Term {
	reply.VoteGranted = false
	DebugLog(dVote, "S%d -> S%d vote false, T: %d, CT: %d\n", rf.me, args.CandidateId, rf.currTerm, args.Term)
	return
} else if rf.currTerm == args.Term {
	if rf.votedFor == -1 {
		rf.votedFor = args.CandidateId
		reply.VoteGranted = true
		DebugLog(dVote, "S%d -> S%d vote true, same term\n", rf.me, args.CandidateId)
		return
	} else {
		reply.VoteGranted = false
		DebugLog(dVote, "S%d -> S%d vote false, VT: %d\n", rf.me, args.CandidateId, rf.votedFor)
		return
	}
} else {
	rf.currTerm = args.Term
	rf.isTimeout = false
	rf.state = Follower // !!!!!!!!!!!!!! IMPORTANT !!!!!!!!!!!!!!
	rf.votedFor = args.CandidateId
	reply.VoteGranted = true
	DebugLog(dVote, "S%d -> S%d vote true, from T: %d to CT: %d\n", rf.me, args.CandidateId, reply.Term, rf.currTerm)
	return
}

  • PASS LOG
humborn@shin MIT-6.5840/src/raft on master [!] [!]                                                                                        󱑆 09:36:06
go test -run 2A                                    
Test (2A): initial election ...
  ... Passed --   3.0  3   58   15440    0
Test (2A): election after network failure ...
  ... Passed --   4.4  3  140   27904    0
Test (2A): multiple elections ...
  ... Passed --   6.2  7  686  129664    0
PASS
ok      6.824/raft      13.599s

2B

  • Server在成功append以后,竟然没有让Leader更新commitIndex
Debug log
S0 not leader, election timeout...                                                                                                                         
                                        S1 -> S2 T: 1, send {PLI: 0, PLT: 0,                                                                               
                                        CI: 0}                                                                                                             
                                        S1 -> S0 T: 1, send {PLI: 0, PLT: 0,                                                                               
                                        CI: 0}                                                                                                             
S0 -> S1 append ok T: 1, LENLOG: 2                                                                                                                         
                                                                               S2 -> S1 append ok T: 1, LENLOG: 2                                          
                                                                               S2 not leader, election timeout...                                          
                                        S1 -> S2 T: 1, send {PLI: 1, PLT: 1,                                                                               
                                        CI: 0}                                                                                                             
                                        S1 -> S0 T: 1, send {PLI: 1, PLT: 1,                                                                               
                                        CI: 0}                                                                                                             
S0 -> S1 append ok T: 1, LENLOG: 2                                                                                                                         
                                                                               S2 -> S1 append ok T: 1, LENLOG: 2                                          
                                        S1 -> S2 T: 1, send {PLI: 0, PLT: 0,                                                                               
                                        CI: 0}                                                                                                             
                                        S1 -> S0 T: 1, send {PLI: 0, PLT: 0,                                                                               
                                        CI: 0}                                                                                                             
S0 -> S1 append ok T: 1, LENLOG: 2                                                                                                                         
                                                                               S2 -> S1 append ok T: 1, LENLOG: 2                                          
FAIL
FAIL    6.824/raft      2.313s
排查出的问题 1. sort中索引有问题,原来是从小到大排序后,取len(rf.peers)/2-1,当#Server为3时,结果为0,很显然不对。 2. 搞错索引了,AppendEntries时可能会传入空Entries。这时想普适地满足AppendEntries RPCs的最后一条实现规则,就需要取server自身logEntries的最后一条entry的index 3. 创建新logEntry时index和term位置反了,cao!
方案 1. 改为(len(rf.peers)-1)/2 2. 由args.Entries→rf.logEntries取最后一个entry的index 3. cao!
Task log
go test -run BasicAgree                          
Test (2B): basic agreement ...
  ... Passed --   0.5  3   16    4258    3
PASS
ok      6.824/raft      0.494s
  • 如果一个server掉线了,会不断StartElection()自增,当它重联时,它的Term将会很大。就会导致一个情况,当前 leader 的 AppendEntries RPCs 将会被拒绝,reply中的Term巨大,会将Leader打回Follower并更新Term,最好给Leader特权,让Leader尽快timeout,再次成为Leader;由它召开的Election也会被其他server拒绝(LastLogIndex很小)并使其他server更新Term,不用给特权通道。但是在下方log中 S0 apply msg: {true 101 1 false [] 0 0} 重复出现,且重联后,并未一直尝试Append(在第一次被拒绝后)导致没有及时更新。
Debug log
S0 reconnect...                                                                                                                                                
                                                                                                           S2 recv command, LENLOG: 6, T: 1                    
                                                                                                           S2 send to S1, ENTRIES: [{106 1 6}]                 
                                                                                                           S2 send to S0, ENTRIES: [{102 1 2} {103 1 3} {104 1 
                                                                                                           4} {105 1 5} {106 1 6}]                             
                                                      S1 recv append from S2, T: 1, ENTRIES: [{106 1 6}]                                                       
                                                      S1 -> S2 append ok T: 1, LENLOG: 7                                                                       
S0 recv append from S2, T: 1, ENTRIES: [{102 1 2}                                                                                                              
{103 1 3} {104 1 4} {105 1 5} {106 1 6}]                                                                                                                       
S0 append fail, T: 6 large term                                                                                                                                
                                                                                                           S2 -> S1 T: 1, send {PLI: 5, PLT: 1, CI: 5,         
                                                                                                           BEGINLOGIDX: 6, ENDLOGIDX: 6, LOG: [{106 1 6}]}     
                                                                                                           S2 update commitIndex CI: 6                         
                                                                                                           S2 apply msg: {true 106 6 false [] 0 0}             
                                                                                                           S2 -> S0 T: 1, send {PLI: 1, PLT: 1, CI: 5,         
                                                                                                           BEGINLOGIDX: 2, ENDLOGIDX: 6, LOG: [{102 1 2} {103 1
                                                                                                           3} {104 1 4} {105 1 5} {106 1 6}]}                  
                                                                                                           S2 convert to candidate, calling election T: 6      
                                                      S1 C6 ask for vote, T: 7                                                                                 
S0 C6 ask for vote, T: 7                                                                                                                                       
                                                      S1 -> S2 vote true, from T: 1 to CT: 7                                                                   
S0 -> S2 vote true, from T: 6 to CT: 7                                                                                                                         
                                                                                                           S2 convert to leader at T: 7                        
                                                      S1 recv append from S2, T: 7, ENTRIES: []                                                                
                                                      S1 -> S2 recv heartbeat                                                                                  
                                                      S1 apply msg: {true 106 6 false [] 0 0}                                                                  
                                                                                                           S2 -> S1 T: 7, send heartbeat                       
S0 recv append from S2, T: 7, ENTRIES: []                                                                                                                      
S0 append fail, diff term                                                                                                                                      
                                                                                                           S2 -> S0 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 update failed, NEXT: 6                     
                                                      S1 recv append from S2, T: 7, ENTRIES: []                                                                
                                                      S1 -> S2 recv heartbeat                                                                                  
S0 recv append from S2, T: 7, ENTRIES: []                                                                                                                      
S0 append fail, diff term                                                                                                                                      
                                                                                                           S2 -> S1 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 update failed, NEXT: 5                     
S0 recv append from S2, T: 7, ENTRIES: []                                                                                                                      
S0 append fail, diff term                                                                                                                                      
                                                      S1 recv append from S2, T: 7, ENTRIES: []                                                                
                                                      S1 -> S2 recv heartbeat                                                                                  
                                                                                                           S2 -> S0 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 update failed, NEXT: 4                     
                                                                                                           S2 -> S1 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 T: 7, fail send {PLI: 1, PLT: 1, CI: 4}    
S0 recv append from S2, T: 7, ENTRIES: []                                                                                                                      
S0 append fail, diff term                                                                                                                                      
                                                      S1 recv append from S2, T: 7, ENTRIES: []                                                                
                                                      S1 -> S2 recv heartbeat                                                                                  
                                                                                                           S2 -> S1 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 update failed, NEXT: 3                     
                                                      S1 recv append from S2, T: 7, ENTRIES: []                                                                
                                                      S1 -> S2 recv heartbeat                                                                                  
S0 recv append from S2, T: 7, ENTRIES: []                                                                                                                      
S0 append fail, diff term                                                                                                                                      
                                                                                                           S2 -> S1 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 update failed, NEXT: 2                     
                                                      S1 recv append from S2, T: 7, ENTRIES: []                                                                
                                                      S1 -> S2 recv heartbeat                                                                                  
S0 recv append from S2, T: 7, ENTRIES: []                                                                                                                      
S0 -> S2 recv heartbeat                                                                                                                                        
S0 apply msg: {true 101 1 false [] 0 0}                                                                                                                        
                                                                                                           S2 -> S1 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 T: 7, send heartbeat                       
                                                                                                           S2 -> S0 T: 7, fail send {PLI: 1, PLT: 1, CI: 3}    
                                                      S1 recv append from S2, T: 7, ENTRIES: []                                                                
                                                      S1 -> S2 recv heartbeat                                                                                  
                                                                                                           S2 -> S1 T: 7, send heartbeat                       
S0 recv append from S2, T: 7, ENTRIES: []                                                                                                                      
S0 -> S2 recv heartbeat                                                                                                                                        
                                                                                                           S2 -> S0 T: 7, send heartbeat
排查出的问题 1. 只要leaderCommit比自己的commitId要大,就会信号提示applyMsg。 2. 重联后会重新选举Leader,也就是说重新选出的Leader会重新初始化自己的nextIndex。自然的,第一次尝试对重联的Server进行append会因为没有匹配的index而被拒绝,于是寄了。
方案 1. 不再对heartbeat特殊对待,也就是说heartbeat和broadcast争抢对Server的更新
  • Apply massage的时候出现了顺序出错的问题
Debug log
// log
S0 commitIndex: 6                                                                                                                                                    
S0 -> S2 append ok T: 9, LENLOG: 7                                                                                                                                   
                                                                                                               S2 -> S1 T: 9, send heartbeat                         
                                                                                                               S2 -> S1 update success, NEXT: [1 7 7], MATCH: [0 6 0]
                                                                                                               S2 get N: 6                                           
S0 apply msg: {true 106 6 false [] 0 0}                                                                                                                              
S0 ######## CI: 6, cfg.log: [map[0:init server 1:101]                                                                                                                
map[0:init server 1:101 2:102 3:103 4:104 5:105 6:106]                                                                                                               
map[0:init server 1:101 2:102 3:103 4:104 5:105 6:106]]                                                                                 S2 -> S0 T: 7, send heartbeat


// related code
func (cfg *config) checkLogs(i int, m ApplyMsg) (string, bool) {
	err_msg := ""
	v := m.Command
	for j := 0; j < len(cfg.logs); j++ {
		if old, oldok := cfg.logs[j][m.CommandIndex]; oldok && old != v {
			log.Printf("%v: log %v; server %v\n", i, cfg.logs[i], cfg.logs[j])
			// some server has already committed a different value for this entry!
			err_msg = fmt.Sprintf("commit index=%v server=%v %v != server=%v %v",
				m.CommandIndex, i, m.Command, j, old)
		}
	}

	DebugLog(dDrop, "S%d ######## CI: %d, cfg.log: %v\n", i, m.CommandIndex, cfg.logs)
	_, prevok := cfg.logs[i][m.CommandIndex-1]
	cfg.logs[i][m.CommandIndex] = v
	if m.CommandIndex > cfg.maxIndex {
		cfg.maxIndex = m.CommandIndex
	}
	return err_msg, prevok
}
排查出的问题 1. 首先,chan是一个fifo。这里会根据fifo中吐出的ApplyMsg进行判断,将ApplyMsg中的信息提取并储存在cfg.log中。Server重联后,仅放入最新的ApplyMsg会导致index out of range的问题。
方案 1. 也就是说,需要按照顺序往chan里一个一个放入ApplyMsg而不是仅放入最新的ApplyMsg。
  • PASS LOG
go test -run 2B   
Test (2B): basic agreement ...
  ... Passed --   0.5  3   16    4290    3
Test (2B): RPC byte count ...
  ... Passed --   1.3  3   48  113614   11
Test (2B): agreement after follower reconnects ...
  ... Passed --   5.3  3  192   50800    8
Test (2B): no agreement if too many followers disconnect ...
  ... Passed --   3.4  5  264   51340    3
Test (2B): concurrent Start()s ...
  ... Passed --   0.6  3   24    6790    6
Test (2B): rejoin of partitioned leader ...
  ... Passed --   3.9  3  146   35480    4
Test (2B): leader backs up quickly over incorrect follower logs ...
  ... Passed --  14.0  5 2244 1728755  102
Test (2B): RPC counts aren't too high ...
  ... Passed --   2.0  3   62   18665   12
PASS
ok      6.824/raft      31.039s

2C

  • 一运行到unrealiable的test就出现out of index的错误,现在想来好像在2B中偶尔也会出现这个错误,还债了属于是。
Debug log
go test -run UnreliableChurn
Test (2C): unreliable churn ...
panic: runtime error: index out of range [8] with length 8

goroutine 22 [running]:
6.824/raft.(*Raft).Apply(0xc0001b2000)
        /home/humborn/Code/go/MIT-6.5840/src/raft/raft.go:596 +0x214
created by 6.824/raft.Make
        /home/humborn/Code/go/MIT-6.5840/src/raft/raft.go:757 +0x3d6
exit status 2
FAIL    6.824/raft      0.300s
排查出的问题 1. 很奇怪,有时可以通过,有时不行,2C中的Unreliable测试放大了系统的弱点,可以确定的是,心跳间隔的设置和这个错误是有关的。后来发现原来是因为延迟,Follower接受了Index更古老的消息,并覆盖了新的消息。
S0 reconnect...                                                                                                                                  
                              S1 reconnect...                                                                                                    
                                                           S2 reconnect...                                                                       
                                                                                        S3 reconnect...                                          
                                                                                        S3 -> S4 append ok T: 26,                                
                                                                                        LENLOG: 207                                              
                                                                                                                     S4 -> S2 LogInfo {IDX: 192, 
                                                                                                                     LENLOG: 207, LOG:           
                                                                                                                     {426083054347800940 26 206}}
                                                                                                                     S4 LogInfo {IDX: 192,       
                                                                                                                     LENLOG: 207, LOG:           
                                                                                                                     {426083054347800940 26 206}}
                                                                                                                     S4 send to S2, ENTRIES:     
                                                                                                                     {426083054347800940 26 206} 
                                                                                                                     S4 -> S3 update success,    
                                                                                                                     NEXT: [193 205 193 207 193],
                                                                                                                     MATCH: [0 204 0 206 0]      
                                                                                                                     S4 get N: 204               
                                                                                                                     S4, LENLOG: 207, LA: 204    
                                                                                                                     apply msg {CI: 205}         
                                                                                                                     S4 -> S3 LogInfo {IDX: 206, 
                                                                                                                     LENLOG: 207, LOG:           
                                                                                                                     {426083054347800940 26 206}}
                                                                                                                     S4 LogInfo {IDX: 206,       
                                                                                                                     LENLOG: 207, LOG:           
                                                                                                                     {426083054347800940 26 206}}
                                                                                                                     S4 send to S3, ENTRIES:     
                                                                                                                     {426083054347800940 26 206} 
                              S1 -> S4 append ok T: 26,                                                                                          
                              LENLOG: 207                                                                                                        
                                                           S2 append fail, T: 27 large                                                           
                                                           term                                                                                  
                                                                                                                     S4 -> S1 update success,    
                                                                                                                     NEXT: [193 207 193 207 193],
                                                                                                                     MATCH: [0 206 0 206 0]      
                                                                                                                     S4 get N: 206               
                                                                                                                     S4 update commitIndex CI:   
                                                                                                                     206                         
                                                                                                                     S4, LENLOG: 207, LA: 205    
                                                                                                                     apply msg {CI: 206}         
                                                                                                                     S4 -> S2 LogInfo {IDX: 192, 
                                                                                                                     LENLOG: 207, LOG:           
                                                                                                                     {426083054347800940 26 206}}
                                                                                                                     S4 LogInfo {IDX: 192,       
                                                                                                                     LENLOG: 207, LOG:           
                                                                                                                     {426083054347800940 26 206}}
                                                                                                                     S4 send to S2, ENTRIES:     
                                                                                                                     {426083054347800940 26 206} 
                                                                                        S3 commit {CI: 205, LENLOG:                              
                                                                                        207}                                                     
                                                           S2 append fail, T: 27 large                                                           
                                                           term                                                                                  
                                                                                                                     S4 reconnect...             
                              S1 -> S4 append ok T: 26,                                                                                          
                              LENLOG: 206                                                                                                        
                                                                                                                     S4 -> S1 LogInfo {IDX: 206, 
                                                                                                                     LENLOG: 207, LOG:           
                                                                                                                     {426083054347800940 26 206}}
                                                                                                                     S4 LogInfo {IDX: 206,       
                                                                                                                     LENLOG: 207, LOG:           
                                                                                                                     {426083054347800940 26 206}}
                                                                                                                     S4 send to S1, ENTRIES:     
                                                                                                                     {426083054347800940 26 206} 
                                                                                        S3 -> S4 append ok T: 26,                                
                                                                                        LENLOG: 205
  1. 有时候会出现索引取到-1的情况,排查出是append时条件设置有误。此外,当一个Server detach以后,会持续向别的server发送信息,不断收到fail update,自然会不断减小nextIndex的值。直到nextIndex变为0,自减后变为-1,索引越界。
方案 1. 修改条目更新的规则,遵循VoteRequest RPCs中的up-to-date原则。
if len(args.Entries) == 0 {
	DebugLog(dLog, "S%d -> S%d recv heartbeat\n", rf.me, args.LeaderId, rf.currTerm, len(rf.logEntries))
} else {
	rfLastLogIndex, rfLastLogTerm := rf.LogInfoByIndex(len(rf.logEntries) - 1)
	argLastLogIndex, argLastLogTerm := args.Entries[len(args.Entries)-1].Index, args.Entries[len(args.Entries)-1].Term
	if rfLastLogTerm < argLastLogTerm || (rfLastLogTerm == argLastLogTerm && rfLastLogIndex < argLastLogIndex) {
		rf.logEntries = rf.logEntries[:args.PrevLogIndex+1]
		rf.logEntries = append(rf.logEntries, args.Entries...) // persist
		rf.persist()
	}

	DebugLog(dLog2, "S%d -> S%d append ok T: %d, LENLOG: %d\n", rf.me, args.LeaderId, rf.currTerm, len(rf.logEntries))
}

同时要注意修改reply处理部分的代码,不然会将nextIndex和matchIndex更新成旧的索引。

if reply.Success {
	if rf.nextIndex[id] < endOfIndex {
		rf.nextIndex[id] = endOfIndex
		rf.matchIndex[id] = endOfIndex - 1
		DebugLog(dError, "S%d -> S%d update success, NEXT: %v, MATCH: %v\n", rf.me, id, rf.nextIndex, rf.matchIndex)
		go rf.updateCommitIndex()
	}
	// 讲道理,更新成功就要试试能不能update commitIndex
} else {
	rf.nextIndex[id] /= 2
	DebugLog(dError, "S%d -> S%d update failed, NEXT: %d\n", rf.me, id, rf.nextIndex[id])
}
  1. 贴出错误代码
~~if args.PrevLogIndex >= len(rf.logEntries) || rf.logEntries[args.PrevLogIndex].Term != args.PrevLogTerm {~~
if args.PrevLogIndex > len(rf.logEntries)-1 || rf.logEntries[args.PrevLogIndex].Term != args.PrevLogTerm {
	reply.Success = false
	DebugLog(dVote, "S%d append fail, diff term\n", rf.me)
	return
}

此外,打了个补丁,让nextIndex等于max(1, nextIndex/2),不知道这个补丁可不可以美化。

  • 有极小的概率出现问题:ApplyMsg顺序错位。
Debug log
Test (2C): Figure 8 (unreliable) ...
3: log map[1:2296 2:812 3:2449 4:9494 5:4661 6:4352 7:549 8:7084 9:6810 10:2131 11:5797 12:3999 13:6910 14:5209 15:4842 16:8485 17:8166 18:6914 19:6957 20:2262 21:4452 22:1043 23:1784 24:9148 25:759 26:2552 27:7916 28:6243 29:3173 30:749 31:3392 32:2467 33:1591 34:2306 35:2107 36:9868 37:9973 38:1755 39:8718 40:6107 41:7839 42:2187 43:1553 44:629 45:6596 46:4295 47:1687 48:9196 49:2304 50:54 51:6167]; server map[1:2296 2:812 3:2449 4:9494 5:4661 6:4352 7:549 8:7084 9:6810 10:2131 11:5797 12:3999 13:6910 14:5209 15:4842 16:8485 17:8166 18:6914 19:6957 20:2262 21:4452 22:1043 23:1784 24:9148 25:759 26:2552 27:7916 28:6243 29:3173 30:749 31:3392 32:2467 33:1591 34:2306 35:2107 36:9868 37:9973 38:1755 39:8718 40:6107 41:7839 42:2187 43:1553 44:629 45:6596 46:4295 47:1687 48:9196 49:2304 50:54 51:6167 52:9070 53:9855 54:4432 55:1244 56:1773 57:5898 58:2104 59:6905 60:3111 61:6309 62:8721 63:1023 64:4093 65:7848 66:7256 67:6673 68:4167 69:7717 70:9556]
3: log map[1:2296 2:812 3:2449 4:9494 5:4661 6:4352 7:549 8:7084 9:6810 10:2131 11:5797 12:3999 13:6910 14:5209 15:4842 16:8485 17:8166 18:6914 19:6957 20:2262 21:4452 22:1043 23:1784 24:9148 25:759 26:2552 27:7916 28:6243 29:3173 30:749 31:3392 32:2467 33:1591 34:2306 35:2107 36:9868 37:9973 38:1755 39:8718 40:6107 41:7839 42:2187 43:1553 44:629 45:6596 46:4295 47:1687 48:9196 49:2304 50:54 51:6167]; server map[1:2296 2:812 3:2449 4:9494 5:4661 6:4352 7:549 8:7084 9:6810 10:2131 11:5797 12:3999 13:6910 14:5209 15:4842 16:8485 17:8166 18:6914 19:6957 20:2262 21:4452 22:1043 23:1784 24:9148 25:759 26:2552 27:7916 28:6243 29:3173 30:749 31:3392 32:2467 33:1591 34:2306 35:2107 36:9868 37:9973 38:1755 39:8718 40:6107 41:7839 42:2187 43:1553 44:629 45:6596 46:4295 47:1687 48:9196 49:2304 50:54 51:6167 52:9070 53:9855 54:4432 55:1244 56:1773 57:5898 58:2104 59:6905 60:3111 61:6309 62:8721 63:1023 64:4093 65:7848 66:7256 67:6673 68:4167 69:7717 70:9556]
3: log map[1:2296 2:812 3:2449 4:9494 5:4661 6:4352 7:549 8:7084 9:6810 10:2131 11:5797 12:3999 13:6910 14:5209 15:4842 16:8485 17:8166 18:6914 19:6957 20:2262 21:4452 22:1043 23:1784 24:9148 25:759 26:2552 27:7916 28:6243 29:3173 30:749 31:3392 32:2467 33:1591 34:2306 35:2107 36:9868 37:9973 38:1755 39:8718 40:6107 41:7839 42:2187 43:1553 44:629 45:6596 46:4295 47:1687 48:9196 49:2304 50:54 51:6167]; server map[1:2296 2:812 3:2449 4:9494 5:4661 6:4352 7:549 8:7084 9:6810 10:2131 11:5797 12:3999 13:6910 14:5209 15:4842 16:8485 17:8166 18:6914 19:6957 20:2262 21:4452 22:1043 23:1784 24:9148 25:759 26:2552 27:7916 28:6243 29:3173 30:749 31:3392 32:2467 33:1591 34:2306 35:2107 36:9868 37:9973 38:1755 39:8718 40:6107 41:7839 42:2187 43:1553 44:629 45:6596 46:4295 47:1687 48:9196 49:2304 50:54 51:6167 52:9070 53:9855 54:4432 55:1244 56:1773 57:5898 58:2104 59:6905 60:3111 61:6309 62:8721 63:1023 64:4093 65:7848 66:7256 67:6673 68:4167 69:7717 70:9556]
apply error: commit index=52 server=3 2540 != server=4 9070
排查出的问题 1. 没排查出来
方案 1. 将AppendEntries中的 `defer rf.applyCh.Signal()` 变成了 `rf.applyCh.Signal()`
  • PASS LOG
go test -run 2C                                               
Test (2C): basic persistence ...
  ... Passed --   3.1  3   78   18954    6
Test (2C): more persistence ...
  ... Passed --  14.4  5 1036  200240   16
Test (2C): partitioned leader and one follower crash, leader restarts ...
  ... Passed --   1.3  3   36    8539    4
Test (2C): Figure 8 ...
  ... Passed --  27.0  5 1180  256006   58
Test (2C): unreliable agreement ...
  ... Passed --   1.7  5 1048  345471  246
Test (2C): Figure 8 (unreliable) ...
  ... Passed --  33.6  5 19228 18476272  138
Test (2C): churn ...
  ... Passed --  16.4  5 6992 9763562 1432
Test (2C): unreliable churn ...
  ... Passed --  16.1  5 4236 5664994  384
PASS
ok      6.824/raft      113.623s

2D

在2D中,很重要的一点是,切片底层是数组实现,如果想要让GC reach到被snapshot掉的无用内存,就需要重新创建切片,原切片指针置空。

  • 有概率出现问题:在更新commitIndex的时候出现out of Index问题。
Debug log
Test (2C): unreliable churn ...
panic: runtime error: index out of range [1656] with length 1656

goroutine 106315 [running]:
6.824/raft.(*Raft).updateCommitIndex(0xc000136200)
        /home/humborn/Code/go/MIT-6.5840/src/raft/raft.go:647 +0x345
created by 6.824/raft.(*Raft).broadcastAppendEntries.func1
        /home/humborn/Code/go/MIT-6.5840/src/raft/raft.go:622 +0x796
exit status 2
排查出的问题 1. updateCommitIndex函数原本是使用协程实现的,所以有极小的概率,在updateCommitIndex之前,完成了Leader转换、重新广播、更新新的CommitIndex这一系列操作。使得一开始的更新协程出现越界。
方案 1. 线性执行updateCommitIndex,去除函数中的加锁操作,统一由主协程管理锁
func (rf *Raft) updateCommitIndex() {
	// rf.mu.Lock()
	// defer rf.mu.Unlock()

	var dupMatchIndex []int
	dupMatchIndex = append(dupMatchIndex, rf.matchIndex...)
	dupMatchIndex[rf.me] = rf.logEntries[len(rf.logEntries)-1].Index
	sort.Ints(dupMatchIndex)
	DebugLog(dError, "S%d get N: %d\n", rf.me, dupMatchIndex[(len(rf.peers)-1)/2])

	var N int = dupMatchIndex[(len(rf.peers)-1)/2]
	if N > rf.commitIndex && rf.logEntries[N].Term == rf.currTerm {
		rf.commitIndex = N
		defer rf.applyCond.Signal()
		DebugLog(dError, "S%d update commitIndex CI: %d\n", rf.me, rf.commitIndex)
	}
}
  • 死锁问题:一开始执行Snapshot函数就卡在获得锁的操作这里。
Debug log
S2 LogInfo {IDX: 13,                                                          
                                                      LENLOG: 14, LOG:                                                              
                                                      {8815208668614808923 1                                                        
                                                      13}}                                                                          
                                                      S2 -> S1 append ok T: 1,                                                      
                                                      LENLOG: 14                                                                    
                            S1 start create snapshot                                                                                
                            LII: 9, T: 1                                                                                            
                            S1 start create snapshot,                                                                               
                            before lock                                                                                             
                            S1, LENLOG: 14, LA: 1                                                                                   
                            apply msg {CI: 10}                                                                                      
S0 convert to candidate,                                                                                                            
calling election T: 1                                                                                                               
S0 LogInfo {IDX: 1, LENLOG:                                                                                                         
2, LOG:                                                                                                                             
{7859782014890577642 1 1}}                                                                                                          
                                                      S2 C1 ask for vote, T: 2                                                      
                            S1 C1 ask for vote, T: 2                                                                                
                                                      S2 LogInfo {IDX: 13,                                                          
                                                      LENLOG: 14, LOG:                                                              
                                                      {8815208668614808923 1                                                        
                                                      13}}                                                                          
                                                      S2 -> S0 vote false, old                                                      
                                                      log                                                                           
                                                      S2 convert to candidate,                                                      
                                                      calling election T: 2                                                         
                                                      S2 LogInfo {IDX: 13,                                                          
                                                      LENLOG: 14, LOG:                                                              
                                                      {8815208668614808923 1                                                        
                                                      13}}                                                                          
                            S1 C1 ask for vote, T: 3                                                                                
S0 C1 ask for vote, T: 3                                                                                                            
S0 LogInfo {IDX: 1, LENLOG:                                                                                                         
2, LOG:                                                                                                                             
{7859782014890577642 1 1}}                                                                                                          
S0 -> S2 vote true, from T:                                                                                                         
2 to CT: 3                                                                                                                          
                                                      S2 convert to leader at                                                       
                                                      T: 3                                                                          
                                                      S2 -> S1 LogInfo {IDX:                                                        
                                                      13, LENLOG: 14, LOG:                                                          
                                                      {8815208668614808923 1                                                        
                                                      13}}                                                                          
                                                      S2 LogInfo {IDX: 13,                                                          
                                                      LENLOG: 14, LOG:                                                              
                                                      {8815208668614808923 1                                                        
                                                      13}}
排查出的问题 1. 快照的间隔是每10条command进行一次快照,因此节点在进行将已经提交了的指令发送到applyCh进行执行的时候不能获取有rf.mu这个互斥锁,因为在你提交指令并将该指令发送到applyCh执行的同时,测试脚本会调用Snapshot函数进行快照,但是我设计的这个函数也需要获取rf.mu互斥锁,那么这个节点就会进入死锁状态:无法获取rf.mu互斥锁进行快照,另一边是需要等快照结束才能继续提交指令并执行,以及后续动作。
方案 1. 在Apply函数中进行如下修改
	++	rf.mu.Unlock()
			rf.applyCh <- msg
	++	rf.mu.Lock()
  • AppendEntries中出现负数的Log索引
排查出的问题 1. 原来只需要检查args.PrevLogIndex小于等于Log最后一项的索引值。但是在引入snapshot后还需要检查args.PrevLogIndex大于等于lastIncludedIndex,因为历史的AppendEntries RPC可以延迟抵达
	if args.PrevLogIndex > rf.logEntries[len(rf.logEntries)-1].Index ||
		rf.logEntries[args.PrevLogIndex-rf.snapshot.LastIncludedIndex].Term != args.PrevLogTerm { // ATTENTION 这里可能会越界,要减去lastincludedindex
		reply.Success = false
		DebugLog(dVote, "S%d append fail, diff term\n", rf.me)
		return
	}
方案 1. 在这种情况下,可以直接将reply的success置true并返回,因为对于成功时返回值的处理已经考虑到了延迟消息(2C)
	if rf.snapshot.LastIncludedIndex > args.PrevLogIndex {
		reply.Success = true
		DebugLog(dVote, "S%d append success, snapshot interrupt...\n", rf.me)
		return
	}
  • broadcast中出现负数的Log索引
Debug log

nextIndex[0] == snapshot.lastIncludedIndex

S1 -> S0 LogInfo {IDX: 108,                                                                                                 
                                 LENLOG: 8, LOG:                                                                                                             
                                 {5178380306894727311 1 116}},                                                                                               
                                 NEXT: [109 1 117], SS: {109 1}                                                                                              
                                                                S2, LENLOG: 18, LA: 108 apply                                                                
                                                                msg {CI: 109}                                                                                
                                                                S2, LENLOG: 18, LA: 108 apply                                                                
                                                                msg {CI: 110}                                                                                
                                                                S2 -> S1 recv heartbeat

#################################################
panic: runtime error: index out of range [-1]

goroutine 890 [running]:
6.824/raft.(*Raft).LogInfoByIndex(0xc000134100, 0xffffffffffffffff)
	/home/humborn/Code/go/MIT-6.5840/src/raft/util.go:101 +0xbd
6.824/raft.(*Raft).broadcastAppendEntries.func1(0x0)
	/home/humborn/Code/go/MIT-6.5840/src/raft/raft.go:716 +0x84c
created by 6.824/raft.(*Raft).broadcastAppendEntries
	/home/humborn/Code/go/MIT-6.5840/src/raft/raft.go:669 +0x15f
exit status 2
排查出的问题 1. 同样的,广播发布了协程后并未立即执行,而snapshot在协程运行之前更新了lastIncludedIndex。仔细观察可以发现出错的Log的索引都是mod 10余9的,对应了snapshot的生成规律,验证了猜想。
	if args.PrevLogIndex > rf.logEntries[len(rf.logEntries)-1].Index ||
		rf.logEntries[args.PrevLogIndex-rf.snapshot.LastIncludedIndex].Term != args.PrevLogTerm { // ATTENTION 这里可能会越界,要减去lastincludedindex
		reply.Success = false
		DebugLog(dVote, "S%d append fail, diff term\n", rf.me)
		return
	}
方案 1. 将广播条件中触发快照加载的条件由 < 改为 ≤ 。
go func(id int) {
			rf.mu.Lock()
			// 需要注意的是,当rf.nextIndex[id] == rf.logEntries[len(rf.logEntries)-1].Index+1,对应的是发送heartbeat的情况
			if rf.nextIndex[id] > rf.logEntries[len(rf.logEntries)-1].Index+1 {
				rf.mu.Unlock()
				return
			}

			~~if rf.nextIndex[id] < rf.snapshot.LastIncludedIndex {~~
			if rf.nextIndex[id] <= rf.snapshot.LastIncludedIndex {
  • ApplyMsg顺序错位第二弹,详循2C中的第二个问题
Debug log
Test (2D): snapshots basic ...
apply error: server 0 apply out of order, expected index 100, got 101
exit status 1
FAIL    6.824/raft      1.269s
排查出的问题 1. 在原来的Apply实现中,循环中的cid只被初始化了一次,但是在这里,每次循环中会随插入记录的数量增加生成snapshot,改变原来的初始值。
		for cid := max(rf.lastApplied, rf.snapshot.LastIncludedIndex) + 1; cid <= rf.commitIndex; cid++ {
			DebugLog(dError, "S%d, LENLOG: %d, LA: %d apply msg {CI: %d} \n", rf.me, len(rf.logEntries), rf.lastApplied, cid)
			msg := ApplyMsg{
				CommandValid: true,
				Command:      rf.logEntries[cid-rf.snapshot.LastIncludedIndex].Command,
				CommandIndex: cid,
			}
			rf.mu.Unlock()
			rf.applyCh <- msg
			rf.mu.Lock()
		}
方案 1. 多次初始化。
		cid := max(rf.lastApplied, rf.snapshot.LastIncludedIndex) + 1
		for cid <= rf.commitIndex {
			DebugLog(dError, "S%d, LENLOG: %d, LA: %d apply msg {CI: %d} \n",
				rf.me, len(rf.logEntries), rf.lastApplied, cid)
			msg := ApplyMsg{
				CommandValid: true,
				Command:      rf.logEntries[cid-rf.snapshot.LastIncludedIndex].Command,
				CommandIndex: cid,
			}
			rf.mu.Unlock()
			rf.applyCh <- msg
			rf.mu.Lock()
			rf.lastApplied = cid
			cid = max(rf.lastApplied, rf.snapshot.LastIncludedIndex) + 1
		}
  • 首先,可以知道,apply out of order有两种形态,一种是接收到了有延迟的信息,还有一种是接收到了未来的信息。以InstallSnapshot RPC为例,可以通过下面这段代码对延迟信息进行过滤。但是raft的Term、commitIndex和lastApplied等索引都是完备的,讲道理不会出现“未来的”信息。
	if rf.commitIndex < args.LastIncludedIndex {
		rf.commitIndex = args.LastIncludedIndex
	}
	if rf.lastApplied < args.LastIncludedIndex {
		rf.lastApplied = args.LastIncludedIndex
	}
Debug log
apply error: server 0 apply out of order, expected index 230, got 231
exit status 1
FAIL	6.824/raft	1.852s
排查出的问题 1. 在测试代码中,snapshot在ApplyCh中被接收后,会以自身的lastIncludedIndex刷新service server的对应参数,造成“版本回退”。
方案 1. 在InstallSnapshot中与lastApplied进行比较判断过滤“版本回退”
	if args.LastIncludedIndex <= rf.snapshot.LastIncludedIndex ||
		args.LastIncludedIndex <= rf.lastApplied {
		rf.mu.Unlock()
		return
	}
  • 会报莫名其妙的错误
Debug log

第一列的Index和LII不对应造成的

S0 ingestSnap: snapshot doesn't match                                                                                                                    
m.SnapshotIndex, INDEX: 39, LII: 29                                                                                                                      
S0 C46 install snapshot, IDX: 39 T: 1,                                                                                                                   
start from 39 to 46                                                                                                                                      
#############################################################################

apply error: server 0 snapshot doesn't match m.SnapshotIndex
exit status 1
FAIL	6.824/raft	1.041s
排查出的问题 1. 由于在前面的问题排查中已经基本解决了out of order的问题,推测出现这种情况的原因是SaveStateAndSnapshot函数存在先后执行顺序,协程有延迟,使得保存的时候没有起到拦截out of order的作用
方案 1. 把所有的rf.mu.Lock()统统移到SaveStateAndSnapshot函数后面保证线性顺序。
  • PASS LOG
go test -run 2D              
Test (2D): snapshots basic ...
  ... Passed --   1.8  3  506  164008  235
Test (2D): install snapshots (disconnect) ...
  ... Passed --  32.9  3 2444 1245742  316
Test (2D): install snapshots (disconnect+unreliable) ...
  ... Passed --  38.7  3 3604 1300893  327
Test (2D): install snapshots (crash) ...
  ... Passed --  24.5  3 1148  879888  334
Test (2D): install snapshots (unreliable+crash) ...
  ... Passed --  33.0  3 1432  692990  322
Test (2D): crash and restart all servers ...
  ... Passed --   6.7  3  244   64312   54
PASS
ok      6.824/raft      137.622s
© 2024 humbornjo :: based on 
nobloger  ::  rss