PostgreSQL中RelationGetBufferForTuple函数有什么作用

数据库 2024年03月15日 17:31

这篇文章主要讲解了“PostgreSQL中RelationGetBufferForTuple函数有什么作用”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习“PostgreSQL中RelationGetBufferForTuple函数有什么作用”吧！

本节简单介绍了PostgreSQL在执行插入过程中与缓存相关的函数RelationGetBufferForTuple,该函数返回满足空闲空间 >= 给定大小的page,并且该page对应的buffer状态为pinned和并持有独占锁。

一、数据结构

BufferDesc共享缓冲区的共享描述符(状态)数据

/**Flagsforbufferdescriptors*buffer描述器标记**Note:TAG_VALIDessentiallymeansthatthereisabufferhashtable*entryassociatedwiththebuffer'stag.*注意:TAG_VALID本质上意味着有一个与缓冲区的标记相关联的缓冲区散列表条目。*///bufferheader锁定#defineBM_LOCKED(1U<<22)/*bufferheaderislocked*///数据需要写入(标记为DIRTY)#defineBM_DIRTY(1U<<23)/*dataneedswriting*///数据是有效的#defineBM_VALID(1U<<24)/*dataisvalid*///已分配buffertag#defineBM_TAG_VALID(1U<<25)/*tagisassigned*///正在R/W#defineBM_IO_IN_PROGRESS(1U<<26)/*readorwriteinprogress*///上一个I/O出现错误#defineBM_IO_ERROR(1U<<27)/*previousI/Ofailed*///开始写则变DIRTY#defineBM_JUST_DIRTIED(1U<<28)/*dirtiedsincewritestarted*///存在等待solepin的其他进程#defineBM_PIN_COUNT_WAITER(1U<<29)/*havewaiterforsolepin*///checkpoint发生,必须刷到磁盘上#defineBM_CHECKPOINT_NEEDED(1U<<30)/*mustwriteforcheckpoint*///持久化buffer(不是unlogged或者初始化fork)#defineBM_PERMANENT(1U<<31)/*permanentbuffer(notunlogged,*orinitfork)*//**BufferDesc--shareddescriptor/statedataforasinglesharedbuffer.*BufferDesc--共享缓冲区的共享描述符(状态)数据**Note:Bufferheaderlock(BM_LOCKEDflag)mustbeheldtoexamineorchange*thetag,stateorwait_backend_pidfields.Ingeneral,bufferheaderlock*isaspinlockwhichiscombinedwithflags,refcountandusagecountinto*singleatomicvariable.Thislayoutallowustodosomeoperationsina*singleatomicoperation,withoutactuallyacquiringandreleasingspinlock;*forinstance,increaseordecreaserefcount.buf_idfieldneverchanges*afterinitialization,sodoesnotneedlocking.freeNextisprotectedby*thebuffer_strategy_locknotbufferheaderlock.TheLWLockcantakecare*ofitself.Thebufferheaderlockis*not*usedtocontrolaccesstothe*datainthebuffer!*注意:必须持有Bufferheader锁(BM_LOCKED标记)才能检查或修改tag/state/wait_backend_pid字段.*通常来说,bufferheaderlock是spinlock,它与标记位/参考计数/使用计数组合到单个原子变量中.*这个布局设计允许我们执行原子操作,而不需要实际获得或者释放spinlock(比如,增加或者减少参考计数).*buf_id字段在初始化后不会出现变化,因此不需要锁定.*freeNext通过buffer_strategy_lock锁而不是bufferheaderlock保护.*LWLock可以很好的处理自己的状态.*务请注意的是:bufferheaderlock不用于控制buffer中的数据访问!**It'sassumedthatnobodychangesthestatefieldwhilebufferheaderlock*isheld.Thusbufferheaderlockholdercandocomplexupdatesofthe*statevariableinsinglewrite,simultaneouslywithlockrelease(cleaning*BM_LOCKEDflag).Ontheotherhand,updatingofstatewithoutholding*bufferheaderlockisrestrictedtoCAS,whichinsurethatBM_LOCKEDflag*isnotset.Atomicincrement/decrement,OR/ANDetc.arenotallowed.*假定在持有bufferheaderlock的情况下,没有人改变状态字段.*持有bufferheaderlock的进程可以执行在单个写操作中执行复杂的状态变量更新,*同步的释放锁(清除BM_LOCKED标记).*换句话说,如果没有持有bufferheaderlock的状态更新,会受限于CAS,*这种情况下确保BM_LOCKED没有被设置.*比如原子的增加/减少(AND/OR)等操作是不允许的.**Anexceptionisthatifwehavethebufferpinned,itstagcan'tchange*underneathus,sowecanexaminethetagwithoutlockingthebufferheader.*Also,inplaceswedoone-timereadsoftheflagswithoutbotheringto*lockthebufferheader;thisisgenerallyforsituationswherewedon't*expecttheflagbitbeingtestedtobechanging.*一种例外情况是如果我们已有bufferpinned,该buffer的tag不能改变(在本进程之下),*因此不需要锁定bufferheader就可以检查tag了.*同时,在执行一次性的flags读取时不需要锁定bufferheader.*这种情况通常用于我们不希望正在测试的flagbit将被改变.**Wecan'tphysicallyremoveitemsfromadiskpageifanotherbackendhas*thebufferpinned.Hence,abackendmayneedtowaitforallotherpins*togoaway.ThisissignaledbystoringitsownPIDinto*wait_backend_pidandsettingflagbitBM_PIN_COUNT_WAITER.Atpresent,*therecanbeonlyonesuchwaiterperbuffer.*如果其他进程有bufferpinned,那么进程不能物理的从磁盘页面中删除items.*因此,后台进程需要等待其他pins清除.这可以通过存储它自己的PID到wait_backend_pid中,*并设置标记位BM_PIN_COUNT_WAITER.*目前,每个缓冲区只能由一个等待进程.**Weusethissamestructforlocalbufferheaders,butthelocksarenot*usedandnotalloftheflagbitsareusefuleither.Toavoidunnecessary*overhead,manipulationsofthestatefieldshouldbedonewithoutactual*atomicoperations(i.e.onlypg_atomic_read_u32()and*pg_atomic_unlocked_write_u32()).*本地缓冲头部使用同样的结构,但并不需要使用locks,而且并不是所有的标记位都使用.*为了避免不必要的负载,状态域的维护不需要实际的原子操作*(比如只有pg_atomic_read_u32()andpg_atomic_unlocked_write_u32())**Becarefultoavoidincreasingthesizeofthestructwhenaddingor*reorderingmembers.Keepingitbelow64bytes(themostcommonCPU*cachelinesize)isfairlyimportantforperformance.*在增加或者记录成员变量时,小心避免增加结构体的大小.*保持结构体大小在64字节内(通常的CPU缓存线大小)对于性能是非常重要的.*/typedefstructBufferDesc{//buffertagBufferTagtag;/*IDofpagecontainedinbuffer*///buffer索引编号(0开始),指向相应的bufferpoolslotintbuf_id;/*buffer'sindexnumber(from0)*//*stateofthetag,containingflags,refcountandusagecount*///tag状态,包括flags/refcount和usagecountpg_atomic_uint32state;//pin-count等待进程IDintwait_backend_pid;/*backendPIDofpin-countwaiter*///空闲链表链中下一个空闲的bufferintfreeNext;/*linkinfreelistchain*///缓冲区内容锁LWLockcontent_lock;/*tolockaccesstobuffercontents*/}BufferDesc;

BufferTagBuffer tag标记了buffer存储的是磁盘中哪个block

/**Buffertagidentifieswhichdiskblockthebuffercontains.*Buffertag标记了buffer存储的是磁盘中哪个block**Note:theBufferTagdatamustbesufficienttodeterminewheretowritethe*block,withoutreferencetopg_classorpg_tablespaceentries.It's*possiblethatthebackendflushingthebufferdoesn'tevenbelievethe*relationisvisibleyet(itsxactmayhavestartedbeforethexactthat*createdtherel).Thestoragemanagermustbeabletocopeanyway.*注意:BufferTag必须足以确定如何写block而不需要参照pg_class或者pg_tablespace数据字典信息.*有可能后台进程在刷新缓冲区的时候深圳不相信关系是可见的(事务可能在创建rel的事务之前).*存储管理器必须可以处理这些事情.**Note:ifthere'sanypadbytesinthestruct,INIT_BUFFERTAGwillhave*tobefixedtozerothem,sincethisstructisusedasahashkey.*注意:如果在结构体中有填充的字节,INIT_BUFFERTAG必须将它们固定为零，因为这个结构体用作散列键.*/typedefstructbuftag{//物理relation标识符RelFileNodernode;/*physicalrelationidentifier*/ForkNumberforkNum;//相对于relation起始的块号BlockNumberblockNum;/*blknumrelativetobeginofreln*/}BufferTag;二、源码解读

RelationGetBufferForTuple函数返回满足空闲空间>=给定大小的page,并且该page对应的buffer状态为pinned和并持有独占锁输入：relation-数据表len-需要的空间大小otherBuffer-用于update场景，上一次pinned的bufferoptions-处理选项bistate-BulkInsert标记vmbuffer-第1个vm(visibilitymap)vmbuffer_other-用于update场景，上一次pinned的buffer对应的vm(visibilitymap)注意:otherBuffer这个参数让人觉得困惑，原因是PG的机制使然Update时，不是原地更新，而是原数据保留（更新xmax），新数据插入原数据&新数据如果在不同Block中，锁定Block的时候可能会出现Deadlock举个例子：Session A更新表T的第一行，第一行在Block 0中，新数据存储在Block 2中 Session B更新表T的第二行，第二行在Block 0中，新数据存储在Block 2中 Block 0/2均要锁定才能完整实现Update操作：如果Session A先锁定了Block 2，Session B先锁定了Block 0，然后Session A尝试锁定Block 0，Session B尝试锁定Block 2，这时候就会出现死锁为了避免这种情况，PG规定锁定时，同一个Relation，按Block的编号顺序锁定，如需要锁定0和2，那必须先锁定Block 0，再锁定2输出：为Tuple分配的Buffer其主要实现逻辑如下:1.初始化相关变量2.获取预留空间3.如为Update操作,则获取上次pinned buffer对应的Block4.获取目标page:targetBlock5.如targetBlock非法,并且使用FSM,则使用FSM寻找6.如targetBlock仍非法,则循环遍历page检索合适的Block6.1.读取并独占锁定目标block,以及给定的otherBuffer(如给出)6.2.获取vm6.3.读取buffer,判断是否有足够的空闲空间,如足够,则返回6.4.如仍不足够,则调用RecordAndGetPageWithFreeSpace获取targetBlock,再次循环7.遍历完毕,仍找不到block,则扩展表8.扩展表后,以P_NEW模式读取buffer并锁定9.获取该buffer对应的page,执行相关校验10.校验不通过报错,校验通过则返回buffer

/**RelationGetBufferForTuple**Returnspinnedandexclusive-lockedbufferofapageingivenrelation*withfreespace>=givenlen.*返回满足空闲空间>=给定大小的page,并且该page对应的buffer状态为pinned和并持有独占锁**IfotherBufferisnotInvalidBuffer,thenitreferencesapreviously*pinnedbufferofanotherpageinthesamerelation;onreturn,this*bufferwillalsobeexclusive-locked.(Thiscaseisusedbyheap_update;*theotherBuffercontainsthetuplebeingupdated.)*如果otherBuffer不是InvalidBuffer,*那么otherBuffer依赖的是先前同一个relation但是其他page的pinnedbuffer.*返回时,该buffer同时被独占锁定.*(heap_update会出现这种情况,otherBuffer存储正update的tuple)**ThereasonforpassingotherBufferisthatiftwobackendsaredoing*concurrentheap_updateoperations,adeadlockcouldoccuriftheytry*tolockthesametwobuffersinoppositeorders.Toensurethatthis*can'thappen,weimposetherulethatbuffersofarelationmustbe*lockedinincreasingpagenumberorder.Thisismostconvenientlydone*byhavingRelationGetBufferForTuplelockthemboth,withsuitablecare*forordering.*传递otherBuffer的原因是如果两个进程在并发heap_update操作,*如果它们尝试以相反的顺序锁定相同的两个buffer,那会出现死锁.*为了确保这种情况不会出现,我们规定，关系缓冲区必须按page的编号顺序锁定.*要做到这一点,最方便的方法是让RelationGetBufferForTuple注意顺序锁定它们.**NOTE:itisunlikely,butnotquiteimpossible,forotherBuffertobethe*samebufferweselectforinsertionofthenewtuple(thiscouldonly*happenifspaceisfreedinthatpageafterheap_updatefindsthere'snot*enoughthere).Inthatcase,thepagewillbepinnedandlockedonlyonce.*注意:这不太可能,但又不是不可能,为了让otherBuffer与我们选择插入新元组的buffer一致.*(这只会发生在在执行heap_update检索page发现没有足够的空闲空间,但随后空间被释放的情况)*在这种情况下,page会被pinned并且只会lock一次.**Forthevmbufferandvmbuffer_otherarguments,weavoiddeadlockby*lockingthemonlyafterlockingthecorrespondingheappage,andtaking*nofurtherlwlockswhiletheyarelocked.*对于vmbuffer和vmbuffer_other参数,通过在锁定相应的heappage后再锁定它们来避免死锁,*同时,在被锁定后,不再持有lwlocks.**WenormallyuseFSMtohelpusfindfreespace.However,*ifHEAP_INSERT_SKIP_FSMisspecified,wejustappendanewemptypageto*theendoftherelationifthetuplewon'tfitonthecurrenttargetpage.*Thiscansavesomecycleswhenweknowtherelationisnewanddoesn't*containusefulamountsoffreespace.*通常来说,使用FSM检索空闲空间.但是,如果指定了HEAP_INSERT_SKIP_FSM,*那么如果当前的目标page不适合,则直接在relation的最后追加空page.*这样可以在知道relation是新的情况下,节省一些处理时间,而且不需要持有有用的空闲空间计数信息.**HEAP_INSERT_SKIP_FSMisalsousefulfornon-WAL-loggedadditionstoa*relation,ifthecallerholdsexclusivelockandiscarefultoinvalidate*relation'ssmgr_targblockbeforethefirstinsertion---thatensuresthat*allinsertionswilloccurintonewlyaddedpagesandnotbeintermixed*withtuplesfromothertransactions.Thatway,acrashcan'trisklosing*anycommitteddataofothertransactions.(Seeheap_insert'scomments*foradditionalconstraintsneededforsafeusageofthisbehavior.)*HEAP_INSERT_SKIP_FSM同时对于非WALlogged关系也是有用的,*如果调用者持有独占锁并且在首次插入前使得关系的smgr_targblock无效---*这可以确保所有的插入会出现在新增加的pages中,而不会与其他事务的tuple混起来.*按这种方式,如果出现宕机,那么就不会有丢失其他事务提交的数据的风险.*(详细参考heap_insert的注释,里面提到了使用该动作的其他约束)**ThecallercanalsoprovideaBulkInsertStateobjecttooptimizemany*insertionsintothesamerelation.Thiskeepsapinonthecurrent*insertiontargetpage(tosavepin/unpincycles)andalsopassesa*BULKWRITEbufferselectionstrategyobjecttothebuffermanager.*PassingNULLforbistateselectsthedefaultbehavior.*调用者同时提供了BulkInsertState对象用于优化大量插入到同一个relation的情况.*这会在当前插入的目标page保持pin(节省pin/unpin处理过程)*同时会传递BULKWRITE缓冲区选择器策略对象到buffermanager中.*如使用默认模式,则设置bitstate为NULL.**Wealwaystrytoavoidfillingexistingpagesfurtherthanthefillfactor.*ThisisOKsincethisroutineisnotconsultedwhenupdatingatupleand*keepingitonthesamepage,whichisthescenariofillfactorismeant*toreservespacefor.*我们通常尝试避免填充现有页面超过填充因子设定的范围.*这是没有问题的,因为在更新元组并将其保存在同一个page中时,不会参考此例程,*该场景下填充因子会用到.**ereport(ERROR)isallowedhere,sothisroutine*must*becalled*beforeany(unlogged)changesaremadeinbufferpool.*ereport(ERROR)可在这允许使用,因此该例程必须在bufferpool出现任何变化前调用.*//*输入：relation-数据表len-需要的空间大小otherBuffer-用于update场景，上一次pinned的bufferoptions-处理选项bistate-BulkInsert标记vmbuffer-第1个vm(visibilitymap)vmbuffer_other-用于update场景，上一次pinned的buffer对应的vm(visibilitymap)注意:otherBuffer这个参数让人觉得困惑，原因是PG的机制使然Update时，不是原地更新，而是原数据保留（更新xmax），新数据插入原数据&新数据如果在不同Block中，锁定Block的时候可能会出现Deadlock举个例子：SessionA更新表T的第一行，第一行在Block0中，新数据存储在Block2中SessionB更新表T的第二行，第二行在Block0中，新数据存储在Block2中Block0/2均要锁定才能完整实现Update操作：如果SessionA先锁定了Block2，SessionB先锁定了Block0，然后SessionA尝试锁定Block0，SessionB尝试锁定Block2，这时候就会出现死锁为了避免这种情况，PG规定锁定时，同一个Relation，按Block的编号顺序锁定，如需要锁定0和2，那必须先锁定Block0，再锁定2输出：为Tuple分配的Buffer附：Pinnedbuffers：meansbuffersarecurrentlybeingused,itshouldnotbeflushedout.*/BufferRelationGetBufferForTuple(Relationrelation,Sizelen,BufferotherBuffer,intoptions,BulkInsertStatebistate,Buffer*vmbuffer,Buffer*vmbuffer_other){booluse_fsm=!(options&HEAP_INSERT_SKIP_FSM);//是否使用FSM寻找空闲空间Bufferbuffer=InvalidBuffer;//Pagepage;//SizepageFreeSpace=0,//page空闲空间saveFreeSpace=0;//page需要预留的空间BlockNumbertargetBlock,//目标BlockotherBlock;//上一次pinned的buffer对应的BlockboolneedLock;//是否需要上锁//大小对齐len=MAXALIGN(len);/*beconservative*//*Bulkinsertisnotsupportedforupdates,onlyinserts.*///otherBuffer有效，说明是update操作，不支持bi(BulkInsert)//bulk操作仅支持插入Assert(otherBuffer==InvalidBuffer||!bistate);/**Ifwe'regonnafailforoversizetuple,doitrightaway*对于超限的元组,直接报错*///#defineMaxHeapTupleSize(BLCKSZ-MAXALIGN(SizeOfPageHeaderData+sizeof(ItemIdData)))//#defineMinHeapTupleSizeMAXALIGN(SizeofHeapTupleHeader)if(len>MaxHeapTupleSize)ereport(ERROR,(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),errmsg("rowistoobig:size%zu,maximumsize%zu",len,MaxHeapTupleSize)));/*Computedesiredextrafreespaceduetofillfactoroption*///获取预留空间//#defineRelationGetTargetPageFreeSpace(relation,defaultff)\(BLCKSZ*(100-RelationGetFillFactor(relation,defaultff))/100)saveFreeSpace=RelationGetTargetPageFreeSpace(relation,HEAP_DEFAULT_FILLFACTOR);//update操作,获取上次pinnedbuffer对应的Blockif(otherBuffer!=InvalidBuffer)otherBlock=BufferGetBlockNumber(otherBuffer);elseotherBlock=InvalidBlockNumber;/*justtokeepcompilerquiet*//**Wefirsttrytoputthetupleonthesamepagewelastinsertedatuple*on,ascachedintheBulkInsertStateorrelcacheentry.Ifthat*doesn'twork,weasktheFreeSpaceMaptolocateasuitablepage.*SincetheFSM'sinfomightbeoutofdate,wehavetobepreparedto*looparoundandretrymultipletimes.(Toinsurethisisn'taninfinite*loop,wemustupdatetheFSMwiththecorrectamountoffreespaceon*eachpagethatprovesnottobesuitable.)IftheFSMhasnorecordof*apagewithenoughfreespace,wegiveupandextendtherelation.*首先会尝试把元组放在最后插入元组的page上,比如BulkInsertState或者relcache条目.*如果找不到,那么我们通过FSM来定位合适的page.*由于FSM的信息可能过期,这时候不得不循环并尝试多次.*(为了确保这不是一个无限循环,必须使用正确的页面空闲空间信息更新不靠谱的FSM)*如果FSM中信息提示没有page有空闲空间,放弃并扩展relation.**Whenuse_fsmisfalse,weeitherputthetupleontotheexistingtarget*pageorextendtherelation.*如use_fsm为F,我们要不把元组放在现存的目标page上,要不扩展relation.*/if(len+saveFreeSpace>MaxHeapTupleSize){//如果需要的大小+预留空间大于可容纳的最大Tuple大小，不使用FSM，扩展后再尝试/*can'tfit,don'tbotheraskingFSM*/targetBlock=InvalidBlockNumber;use_fsm=false;}elseif(bistate&&bistate->current_buf!=InvalidBuffer)//BulkInsert模式targetBlock=BufferGetBlockNumber(bistate->current_buf);elsetargetBlock=RelationGetTargetBlock(relation);//普通Insert模式if(targetBlock==InvalidBlockNumber&&use_fsm){//还没有找到合适的BlockNumber，并且需要使用FSM/**Wehavenocachedtargetpage,soasktheFSMforaninitial*target.*没有缓存目标page,使用FSM获取初始目标page*///使用FSM申请空闲空间=len+saveFreeSpace的块targetBlock=GetPageWithFreeSpace(relation,len+saveFreeSpace);/**IftheFSMknowsnothingoftherel,trythelastpagebeforewe*giveupandextend.Thisavoidsone-tuple-per-pagesyndromeduring*bootstrappingorinarecently-startedsystem.*如果FSM对rel一无所知,在放弃并扩展前尝试下最后那个page.*这可以避免在bootstrapping或者最近已启动系统时一个元组一个page的情况.*///申请不到，使用最后一个块，否则扩展或者放弃if(targetBlock==InvalidBlockNumber){BlockNumbernblocks=RelationGetNumberOfBlocks(relation);if(nblocks>0)targetBlock=nblocks-1;}}loop:while(targetBlock!=InvalidBlockNumber){//----------循环直至成功获取插入数据的块号/**Readandexclusive-lockthetargetblock,aswellastheother*blockifonewasgiven,takingsuitablecarewithlockorderingand*thepossibilitytheyarethesameblock.*读取并独占锁定目标block,以及给定的另外一个快(如给出),需要适当的关注锁的顺序*并关注它们是否同一个块.**Ifthepage-levelall-visibleflagisset,callerwillneedto*clearboththatandthecorrespondingvisibilitymapbit.However,*bythetimewereturn,we'llhavex-lockedthebuffer,andwedon't*wanttodoanyI/Owhileinthatstate.Sowecheckthebithere*beforetakingthelock,andpinthepageifitappearsnecessary.*Checkingwithoutthelockcreatesariskofgettingthewrong*answer,sowe'llhavetorecheckafteracquiringthelock.*如果设置了块级别的all-visibleflag,调用者需要清空该块的标记和相应的vm标记.*但是,在返回时,我们将持有buffer的独占锁,并且我们不希望在这种情况下执行I/O操作.*因此,我们在获取锁前检查标记位,如看起来需要的话,pinpage.*没有持有锁执行检查会出现错误,因此我们将不得不在获取锁后重新执行检查.*/if(otherBuffer==InvalidBuffer){//-----------非Update操作/*easycase*///这种情况比较简单//获取Bufferbuffer=ReadBufferBI(relation,targetBlock,bistate);if(PageIsAllVisible(BufferGetPage(buffer)))//如果Page可见，那么把PagePin在内存中（Pin的意思是固定/保留）visibilitymap_pin(relation,targetBlock,vmbuffer);LockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);//锁定buffer}elseif(otherBlock==targetBlock){//-----------Update操作，新记录跟原记录在同一个Block中//这种情况也比较简单/*alsoeasycase*/buffer=otherBuffer;if(PageIsAllVisible(BufferGetPage(buffer)))visibilitymap_pin(relation,targetBlock,vmbuffer);LockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);}elseif(otherBlock<targetBlock){//-----------Update操作，原记录所在的Block<新记录的Block/*lockotherbufferfirst*///首先锁定otherBlockbuffer=ReadBuffer(relation,targetBlock);if(PageIsAllVisible(BufferGetPage(buffer)))visibilitymap_pin(relation,targetBlock,vmbuffer);//优先锁定BlockNumber小的那个LockBuffer(otherBuffer,BUFFER_LOCK_EXCLUSIVE);LockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);}else{//------------Update操作，原记录所在的Block>新记录的Block/*locktargetbufferfirst*/buffer=ReadBuffer(relation,targetBlock);if(PageIsAllVisible(BufferGetPage(buffer)))visibilitymap_pin(relation,targetBlock,vmbuffer);//优先锁定BlockNumber小的那个LockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);LockBuffer(otherBuffer,BUFFER_LOCK_EXCLUSIVE);}/**Wenowhavethetargetpage(andtheotherbuffer,ifany)pinned*andlocked.However,sinceourinitialPageIsAllVisiblechecks*wereperformedbeforeacquiringthelock,theresultsmightnowbe*outofdate,eitherfortheselectedvictimbuffer,orforthe*otherbufferpassedbythecaller.Inthatcase,we'llneedto*giveupourlocks,gogetthepin(s)wefailedtogetearlier,and*re-lock.That'sprettypainful,buthopefullyshouldn'thappen*often.*现在已有了targetpage,并且该page(包括otherbuffer,如存在)已缓存到内存中(pinned)且已锁定.*但是,由于初始的PageIsAllVisible在获取锁前执行,结果可能已经过期,*这时候可能选择了需要被淘汰的buffer或者otherBuffer出现了变化.*在这种情况下,需要放弃锁,回到先前曾经失败的pin的地方,重新锁定.*这蛮吐血的,希望不要经常出现.**Notethatthere'sasmallpossibilitythatwedidn'tpinthepage*abovebutstillhavethecorrectpagepinnedanyway,eitherbecause*we'vealreadymadeapreviouspassthroughthisloop,orbecause*callerpassedustherightpageanyway.*注意存在较小的可能是我们在上面不需要pinpage,但仍然需要持有正确的pinnedpage,*这一方面是因为我们已经通过该循环执行了一遍,另外一方面是调用者通过其他方式传入了正确的page.**Notealsothatit'spossiblethatbythetimewegetthepinand*retakethebufferlocks,thevisibilitymapbitwillhavebeen*clearedbysomeotherbackendanyway.Inthatcase,we'llhave*doneabitofextraworkfornogain,butthere'snorealharm*done.*同时要注意在我们获取pin并且重新获取bufferlock时,vm位已被其他后台进程清除了.*在这种情况下,我们需要执行一些额外的工作以避免重复工作,但这实质上并没有什么危害.*/if(otherBuffer==InvalidBuffer||buffer<=otherBuffer)GetVisibilityMapPins(relation,buffer,otherBuffer,targetBlock,otherBlock,vmbuffer,vmbuffer_other);//PinVM在内存中elseGetVisibilityMapPins(relation,otherBuffer,buffer,otherBlock,targetBlock,vmbuffer_other,vmbuffer);//PinVM在内存中/**Nowwecanchecktoseeifthere'senoughfreespacehere.Ifso,*we'redone.*现在我们可以检查是否有足够的空闲空间.*如有,则我们已完成所有工作了.*/page=BufferGetPage(buffer);pageFreeSpace=PageGetHeapFreeSpace(page);if(len+saveFreeSpace<=pageFreeSpace){//有足够的空间存储数据，返回此Buffer/*usethispageasfutureinserttarget,too*///用这个page作为未来插入的目标page/*#defineRelationSetTargetBlock(relation,targblock)\do{\RelationOpenSmgr(relation);\(relation)->rd_smgr->smgr_targblock=(targblock);\}while(0)*/RelationSetTargetBlock(relation,targetBlock);returnbuffer;}/**Notenoughspace,sowemustgiveupourpagelocksandpin(if*any)andpreparetolookelsewhere.Wedon'tcarewhichorderwe*unlockthetwobuffersin,sothiscanbeslightlysimplerthanthe*codeabove.*空间不够,必须放弃持有的pagelocks和pin,准备检索其他地方.*在解锁时不需要关注两个buffer的顺序,这个逻辑比先前的逻辑要简单.*/LockBuffer(buffer,BUFFER_LOCK_UNLOCK);if(otherBuffer==InvalidBuffer)ReleaseBuffer(buffer);elseif(otherBlock!=targetBlock){LockBuffer(otherBuffer,BUFFER_LOCK_UNLOCK);ReleaseBuffer(buffer);}/*WithoutFSM,alwaysfalloutoftheloopandextend*///不使用FSM定位空闲空间，跳出循环，执行扩展if(!use_fsm)break;/**UpdateFSMastoconditionofthispage,andaskforanotherpage*totry.*///使用FSM获取下一个备选的Block//注意：如果全部扫描后发现没有满足条件的Block，targetBlock=InvalidBlockNumber，跳出循环targetBlock=RecordAndGetPageWithFreeSpace(relation,targetBlock,pageFreeSpace,len+saveFreeSpace);}//---------没有获取满足条件的Block，扩展表/**Havetoextendtherelation.**Wehavetousealocktoensurenooneelseisextendingtherelatthe*sametime,elsewewillbothtrytoinitializethesamenewpage.We*canskiplockingfornewortemprelations,however,sincenooneelse*couldbeaccessingthem.*必须锁定以确保其他进程不能扩展rel,否则我们会同时尝试初始化新的page.*但是,我们可以为新的或者临时关系跳过锁定,这时候没有其他进程可以访问它们.*///新创建的数据表或者临时表，无需LockneedLock=!RELATION_IS_LOCAL(relation);/**Ifweneedthelockbutarenotabletoacquireitimmediately,we'll*considerextendingtherelationbymultipleblocksatatimetomanage*contentionontherelationextensionlock.However,thisonlymakes*senseifwe'reusingtheFSM;otherwise,there'snopoint.*如果需要锁定但不能够马上获取,考虑通过一次性多个blocks的方式扩展关系,*这样可以在关系扩展锁上管理竞争.*但是,这在使用FSM的时候才会奇效,否则没有其他太好的办法.*/if(needLock)//需要锁定{if(!use_fsm)//不使用FSMLockRelationForExtension(relation,ExclusiveLock);elseif(!ConditionalLockRelationForExtension(relation,ExclusiveLock)){/*Couldn'tgetthelockimmediately;waitforit.*///不能马上获取锁,等待LockRelationForExtension(relation,ExclusiveLock);/**Checkifsomeotherbackendhasextendedablockforuswhile*wewerewaitingonthelock.*///如有其它进程扩展了数据表，那么可以成功获取满足条件的targetBlocktargetBlock=GetPageWithFreeSpace(relation,len+saveFreeSpace);/**Ifsomeotherwaiterhasalreadyextendedtherelation,we*don'tneedtodoso;justusetheexistingfreespace.*如果其他等待进程已经扩展了关系,那么我们不需要再扩展了,使用现成的空闲空间即可.*/if(targetBlock!=InvalidBlockNumber){UnlockRelationForExtension(relation,ExclusiveLock);gotoloop;}/*Timetobulk-extend.*///其它进程没有扩展//Justextendit!RelationAddExtraBlocks(relation,bistate);}}/**Inadditiontowhateverextensionweperformedabove,wealwaysaddat*leastoneblocktosatisfyourownrequest.*处理上面执行的扩展,我们总是添加了至少一个block用以满足自身需要.**XXXThisdoesanlseek-ratherexpensive-butatthemomentitisthe*onlywaytoaccuratelydeterminehowmanyblocksareinarelation.Is*itworthkeepinganaccuratefilelengthinsharedmemorysomeplace,*ratherthanrelyingonthekerneltodoitforus?*XXX这相当于做了一次lseek-相当昂贵的操作!-在这时候这也是唯一可以准确确定关系有多少blocks的方法.*相对于不是使用内核来完成这个事情,在内存的某个地方保存准确的文件尺寸是否更好?*///扩展表后，NewPage！buffer=ReadBufferBI(relation,P_NEW,bistate);/**WecanbecertainthatlockingtheotherBufferfirstisOK,sinceit*musthavealowerpagenumber.*这时候可以确定首先锁定的otherBuffer没有问题,因为它有一个较小的page编号*/if(otherBuffer!=InvalidBuffer)////otherBuffer的顺序一定在扩展的Block之前，Lockit！LockBuffer(otherBuffer,BUFFER_LOCK_EXCLUSIVE);/**Nowacquirelockonthenewpage.*现在可以尝试为新page上锁*///锁定NewPageLockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);/**Releasethefile-extensionlock;it'snowOKforsomeoneelsetoextend*therelationsomemore.Notethatwecannotreleasethislockbefore*wehavebufferlockonthenewpage,orweriskaracecondition*againstvacuumlazy.c---seecommentstherein.*是否文件扩展锁.现在对于其他进程来说可以扩展relation了.*注意不能在持有新page的bufferlock前释放该锁,否则将会在vacuumlazy.c中存在条件竞争.*详细可参见注释.*/if(needLock)//释放扩展锁UnlockRelationForExtension(relation,ExclusiveLock);/**Weneedtoinitializetheemptynewpage.Double-checkthatitreally*isempty(thisshouldneverhappen,butifitdoeswedon'twantto*riskwipingoutvaliddata).*我们需要初始化空的新page.*需再次检查该page是空的(这应该不会出现,但执行这个操作是因为我们不希望冒删除有效数据的风险)*///获取相应的Pagepage=BufferGetPage(buffer);if(!PageIsNew(page))//不是NewPage，那一定某个地方搞错了！elog(ERROR,"page%uofrelation\"%s\"shouldbeemptybutisnot",BufferGetBlockNumber(buffer),RelationGetRelationName(relation));//初始化NewPagePageInit(page,BufferGetPageSize(buffer),0);//NewPage也满足不了要求的大小，报错if(len>PageGetHeapFreeSpace(page)){/*Weshouldnotgetheregiventhetestatthetop*/elog(PANIC,"tupleistoobig:size%zu",len);}/**Rememberthenewpageasourtargetforfutureinsertions.*记录新page为未来插入的目标page.**XXXshouldweenterthenewpageintothefreespacemapimmediately,*orjustkeepitforthisbackend'sexclusiveuseintheshortrun*(untilVACUUMseesit)?Seemstodependonwhetheryouexpectthe*currentbackendtomakemoreinsertionsornot,whichisprobablya*goodbetmostofthetime.Sofornow,don'taddittoFSMyet.*XXX我们应该马上把新的page放到FSM中吗,*或者只是把该page放在后台进程的私有空间中在很短时间内独占使用(直至vacuum可以看到它位置)?*看起来这依赖于你希望当前的后台进程是否执行更多的插入操作,这在大多数时间下会更好.*因此,现在还没有把它添加到FSM中.*///终于找到了可用于存储数据的BlockRelationSetTargetBlock(relation,BufferGetBlockNumber(buffer));//返回returnbuffer;}三、跟踪分析

测试脚本

15:54:13(xdb@[local]:5432)testdb=#insertintot1values(1,'1','1');

调用栈

(gdb)bRelationGetBufferForTupleBreakpoint1at0x4ef179:filehio.c,line318.(gdb)cContinuing.Breakpoint1,RelationGetBufferForTuple(relation=0x7f4f51fe39b8,len=32,otherBuffer=0,options=0,bistate=0x0,vmbuffer=0x7ffea95dbf6c,vmbuffer_other=0x0)athio.c:318318booluse_fsm=!(options&HEAP_INSERT_SKIP_FSM);(gdb)bt#0RelationGetBufferForTuple(relation=0x7f4f51fe39b8,len=32,otherBuffer=0,options=0,bistate=0x0,vmbuffer=0x7ffea95dbf6c,vmbuffer_other=0x0)athio.c:318#10x00000000004df1f8inheap_insert(relation=0x7f4f51fe39b8,tup=0x178a478,cid=0,options=0,bistate=0x0)atheapam.c:2468#20x0000000000709ddainExecInsert(mtstate=0x178a220,slot=0x178a680,planSlot=0x178a680,estate=0x1789eb8,canSetTag=true)atnodeModifyTable.c:529#30x000000000070c475inExecModifyTable(pstate=0x178a220)atnodeModifyTable.c:2159#40x00000000006e05cbinExecProcNodeFirst(node=0x178a220)atexecProcnode.c:445#50x00000000006d552einExecProcNode(node=0x178a220)at../../../src/include/executor/executor.h:247#60x00000000006d7d66inExecutePlan(estate=0x1789eb8,planstate=0x178a220,use_parallel_mode=false,operation=CMD_INSERT,sendTuples=false,numberTuples=0,direction=ForwardScanDirection,dest=0x17a7688,execute_once=true)atexecMain.c:1723#70x00000000006d5af8instandard_ExecutorRun(queryDesc=0x178e458,direction=ForwardScanDirection,count=0,execute_once=true)atexecMain.c:364#80x00000000006d5920inExecutorRun(queryDesc=0x178e458,direction=ForwardScanDirection,count=0,execute_once=true)atexecMain.c:307#90x00000000008c1092inProcessQuery(plan=0x16b3ac0,sourceText=0x16b1ec8"insertintot1values(1,'1','1');",params=0x0,queryEnv=0x0,dest=0x17a7688,completionTag=0x7ffea95dc500"")atpquery.c:161#100x00000000008c29a1inPortalRunMulti(portal=0x1717488,isTopLevel=true,setHoldSnapshot=false,dest=0x17a7688,altdest=0x17a7688,completionTag=0x7ffea95dc500"")atpquery.c:1286#110x00000000008c1f7ainPortalRun(portal=0x1717488,count=9223372036854775807,isTopLevel=true,run_once=true,dest=0x17a7688,altdest=0x17a7688,completionTag=0x7ffea95dc500"")atpquery.c:799#120x00000000008bbf16inexec_simple_query(query_string=0x16b1ec8"insertintot1values(1,'1','1');")atpostgres.c:1145#130x00000000008c01a1inPostgresMain(argc=1,argv=0x16dbaf8,dbname=0x16db960"testdb",username=0x16aeba8"xdb")atpostgres.c:4182#140x000000000081e07cinBackendRun(port=0x16d3940)atpostmaster.c:4361#150x000000000081d7efinBackendStartup(port=0x16d3940)atpostmaster.c:4033---Type<return>tocontinue,orq<return>toquit---#160x0000000000819be9inServerLoop()atpostmaster.c:1706#170x000000000081949finPostmasterMain(argc=1,argv=0x16acb60)atpostmaster.c:1379#180x0000000000742941inmain(argc=1,argv=0x16acb60)atmain.c:228(gdb)

感谢各位的阅读，以上就是“PostgreSQL中RelationGetBufferForTuple函数有什么作用”的内容了，经过本文的学习后，相信大家对PostgreSQL中RelationGetBufferForTuple函数有什么作用这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是，小编将为大家推送更多相关知识点的文章，欢迎关注！