我们大家都知道动态查找树能够提高查找效率,比如:二叉查找树,平衡二叉查找树,红黑树。他们查找效率的时间复杂度O(log2n),跟树的深度有关系,那么怎么样才能提高效率呢?当然最快捷的方式就是减少树的深度了。那么怎么减少树的深度呢?为了解答这个问题,我们慢慢来看,先看个实际问题吧。

问题背景

在大型的数据库存储中,实现索引查找,如果采用二叉查找树的查找的话,由于节点的存储数据是有限的(不可能将节点存储过多的数据,否则就变成线性的查找了),这样如果数据量很大的,就会导致树的深度过大从而造成磁盘IO操作过于频繁(你们知道磁盘IO操作是非常耗时的),就会导致效率非常低下。可能有童鞋会问了,那为什么不把节点索引加载到内存中,这样访问不就快了吗?其实这显然是不可能完成的,因为往往存储的索引可能就有好几个G了。全部加载到内存也是不现实的。能做的只有逐一加载每一个磁盘页,这里的磁盘页就相当于索引树的节点。

那怎么解决呢?就回到了前面的那个问题,可以减少树的深度。其中基本思想是:采用多叉树结构。也就是说,因为磁盘的操作费时费资源,如果过于频繁的多次查找势必效率低下。那么如何提高效率,即如何避免磁盘过于频繁的多次查找呢?根据磁盘查找存取的次数往往由树的高度所决定,所以,只要我们通过某种较好的树结构减少树的结构尽量减少树的高度,那么是不是便能有效减少磁盘查找存取的次数呢?那这种有效的树结构是一种怎样的树呢?

根据平衡二叉树的启发,自然就想到了平衡多路查找树结构。也就是本文的主题B-tree,好了废话不多说了,进入正题!

B-tree的简介

B-树就是我们平常说的B树,不要读成B减树了,它在文件系统中很有用(原因之前已经介绍了),我们先来看下一个m阶的Bs树具有如下几个特性:

  1. 根节点至少有两个子女
  2. 每个中间节点都包含k-1个元素和k个孩子,其中m/2<=k<=m
  3. 每个叶子节点都包含k-1元素,其中m/2<=k<=m
  4. 所有的叶子节点都位于同一层
  5. 每个节点的元素从小到大排列,节点当中k-1个元素正好是k个孩子包含的元素的值域分划。

看起来是不是很复杂,没看懂也没有关系,我们用实际例子来演示下。例子来源网络,参考:https://blog.csdn.net/qq_35644234/article/details/66969238

B-树插入

一个原始的B-树阶为3,如下图:

aHR0cDovL3AxLnBzdGF0cC5jb20vbGFyZ2UvcGdjLWltYWdlLzE1MzMyNTM2NTk3OTQyMjU4YzQ2NDU5

首先,我需要插入一个关键字:30,可以得到如下的结果:

aHR0cDovL3AxLnBzdGF0cC5jb20vbGFyZ2UvcGdjLWltYWdlLzE1MzMyNTM2ODQ0ODFiZDU2MDBhOWIz

再插入26,得到如下的结果:

aHR0cDovL3AzLnBzdGF0cC5jb20vbGFyZ2UvcGdjLWltYWdlLzE1MzMyNTM3MTA0MjdiYzE4ZjE3NmVj

OK,此时如图所示,在插入的那个终端结点中,它的关键字数已经超过了m-1=2,所以我们需要对结点进分裂,所以我们先对关键字排序,得到:26 30 37 ,所以它的左部分为(不包括中间值):26,中间值为:30,右部为:37,左部放在原来的结点,右部放入新的结点,而中间值则插入到父结点,并且父结点会产生一个新的指针,指向新的结点的位置,如下图所示:

aHR0cDovL3AzLnBzdGF0cC5jb20vbGFyZ2UvcGdjLWltYWdlLzE1MzMyNTM4NzU0OTM2OThmZmZjYmI5

OK,然后我们继续插入新的关键字:85,得到如下图结果:

aHR0cDovL3A5LnBzdGF0cC5jb20vbGFyZ2UvcGdjLWltYWdlLzE1MzMyNTM5NDkzNjc3ODU1OTgwODA4

正如图所示,我需要对刚才插入的那个结点进行“分裂”操作,操作方式和之前的一样,得到的结果如下:

aHR0cDovL3AzLnBzdGF0cC5jb20vbGFyZ2UvcGdjLWltYWdlLzE1MzMyNTQwMDY4NjcwMmUwNDdlZDIx

哦,当我们分裂完后,突然发现之前的那个结点的父亲结点的度为4了,说明它的关键字数超过了m-1,所以需要对其父结点进行“分裂”操作,得到如下的结果:

aHR0cDovL3AxLnBzdGF0cC5jb20vbGFyZ2UvcGdjLWltYWdlLzE1MzMyNTQwMzAzMzMxNTBjOGMzY2U4

本文提到的「B-树」,就是「B树」,都是 B-tree 的翻译,里面不是减号-,是连接符-。因为有人把 B-tree 翻成 「B-树」,让人以为「B树」和「B-树」是两种树,实际上两者就是同一种树。

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2ZmYTg0M2U5MmE1NDRiODc4ZjU2ZDE3ZGZjMzAzMmI3X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzA0YmZhZmUxNWJlZTQ2OWM5ZWNjOGQ4MTRkMTg1OGFiX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2Q0NDczNDYzM2E5OTQ5ZWVhMTk1NDJiZjY3MTJhNGI0X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2M5MGFhMmVkMGY3ZjQyMGM4ZDU2MjEyYjE0YjVkNjNiX3RoLmpwZw

————————————

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1Lzc4NWEyYzVlZGIxZDQ3ODU4MzU3NjFmMjQxOWM3YTE3X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2Y0MDVmYTBmNmZkYjQ0MTQ5MTc3YzgwOWU1OWMzMTM4X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzA0ZjkyYzUzOTkxMDQ2ODI4NTZmZjlhMDY5ZjJhYzhlX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2FkZmY4ODhiOWQ4OTQwZTlhYTNjNTViNDdjODFkODc1X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzI1NDU1YzhkMGM4MDRlMTZhYzk4NjZkNmNhNmVlNDY3X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2RjNzA3MDVhOTA5ZjQyMmY5NmRiY2IzYWM5ZGExZTU5X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzYzYTQwNzY5MDViZTRjNTQ4ZjVkOGNkZTRiY2U2MGZmX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzZmYWVlNjc2MTRiNTQzMmJiMTU0M2QxNjY0MmY2NGFiX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzIyOThhZDk4NDE0ZTRlMmRhMDBiZGI1YzZjZTg4ZGRhX3RoLmpwZw

————————————

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1Lzg3ZjU3OTY0MmM1NDQwOTc4MGRjNmRkODNmNzc5NThhX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2NmMTA2M2UwZTVkMTQ4MDE5M2Y5MDY0ODYzNWJjYTNkX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2NiNjgzZWM5ZDJlYTQ4M2Q4NTFhNzdlYzI4ZTI1MTI1X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzM5MDFiY2YyNDdiZTQwYjRiYjk1YjQzNTI0OWZkNDgwX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzYwNGFiOGNlNDhhODRlZTRiNjI4NzQ2MDM5ZjQ5YjZkX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzJjYzUwMzQzNDQ4YTQ1Mjk5NDViMDY5NDZhZGQ2MWE3X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzY5YzE0MjJmNWUzMTQ0N2FhODVlNmU0OGU4YTVmMWVkX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2ViMmZmMjgxYWI3MzRkNWQ4NGI4ZGM3MWM0OTE2NGUxX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzI5MTFkYzY4NWVmMzQxMmZiYmRlZmViNTI4NmNjNjhiX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1Lzc3NjIxMTg0ODg0MTQ1NzZiNjMxYzFkODcyZTgyMGQ2X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2M5MTM0MTE3NzJlNDQ2ZGNiMGE3OTlhNzlmZmM4NjFhX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2JmNGUyNDkzZGMwNzQzZDU5YTMxZjZkMjg0OTM4NWJjX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzRhNGY2ZDIxZDQwMzQzODg4ZGMzZjNjMDkwOWVjNDMzX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzJlYWYzMTZjOWM1NDQ3NmViNWVmZmM5ZjI5Nzc2YWRmX3RoLmpwZw

二叉查找树的结构:

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2U1NTY1Njc5OGVlYTQyMTE4OWEzZjUzMDUxYmU4ZjNmX3RoLmpwZw

第1次磁盘IO:

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2IwMDkwMzg3ZTg3MzRkYjI5MDVlMmYxN2E2NzQ2Y2UzX3RoLmpwZw

第2次磁盘IO:

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1Lzk3MmZmOTQ5MzhkNzQzOTY4ZmQzY2IzZmRiNzM2OTk1X3RoLmpwZw

第3次磁盘IO:

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzUyZjBmMzM3MWU4MTQ1ODk5YzU1YjdmNDg5Y2QwMTYzX3RoLmpwZw

第4次磁盘IO:

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzA1ZDZkMGI1MDVjMTQzZmQ5ZDAwZmUzZjZmN2ZjZTM0X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzZhM2Y2MTE2M2M4MTRhOGFhNzQxN2JjZjBjNTgxYzFlX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2UzZGRhNWVhYjdiNjQ5NjliM2MzZTUzM2QwODA3MzVlX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzYxYzUzNTdiYWJlMzQ2ZGZhOWJlYmQ1ZGFiNjBjMGQxX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzlmZDgxMmRhNjg5ZTQ1YzQ4MTgyMDNlYjU3MzY3MDQ5X3RoLmpwZw

下面来具体介绍一下B-树(Balance Tree),一个m阶的B树具有如下几个特征:

1.根结点至少有两个子女。

2.每个中间节点都包含k-1个元素和k个孩子,其中 m/2 <= k <= m

3.每一个叶子节点都包含k-1个元素,其中 m/2 <= k <= m

4.所有的叶子结点都位于同一层。

5.每个节点中的元素从小到大排列,节点当中k-1个元素正好是k个孩子包含的元素的值域分划。

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2UwM2JmOTA2ZDg4MzQ2MzlhYzhlNjQyODBiZjQzZjNmX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzRjMTllNDFlODRkYTQ1MDNhODMwMWQ3MjNhYTdmYmFhX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2IyZWMzZGFhNDEwOTQ5MjhiMjVhZWI1YjA0ODBlMjkxX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2Y5NTdlYTIxM2MyOTQ2ZGJiNmIwYWIyNjVlYmMxODcyX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzI0Y2QxMzgyOTBkMjQ4MmViZTc2MjI5YTBmOGVmZGUzX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2NhYTg0MTA0N2I3MjQzOTQ4MDQzNGU3ODZhNGZkOWZmX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzA4YzBkMmNhMDVkNjQ1ZDg4NzJhOTY3ZTM3MGVkNDA1X3RoLmpwZw

第1次磁盘IO:

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzBhNDllNzQyMDIzNjRkNTA5YTkwNzE3NDI1Mjg4ZTk1X3RoLmpwZw

在内存中定位(和9比较):

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzY3NWIwMDY1MjBiMjQ2MWZhYWVhZGYyMjIzNDA1MTg3X3RoLmpwZw

第2次磁盘IO:

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzRlMmEyYjZiMjBkODQzY2M4YzAwNjNhZmVkZTUxYmZmX3RoLmpwZw

在内存中定位(和2,6比较):

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzgyYjA4MWFiZDQyNjRhYjJhYTVjMmI0MzQzNTg4Y2Y0X3RoLmpwZw

第3次磁盘IO:

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzZlYjg2Y2Y3MGU0MjQzNDliYTBhNTY2YWIwOWJhMjExX3RoLmpwZw

在内存中定位(和3,5比较):

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2Q5Y2FkMzczYjIyMjQ0ZWQ5OWY1YjEwNDY5OTQzYzliX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzkzMGY2YzdiNTc1MjRlOTFiNGU3OWJhZmJjZTEzNmYyX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzFjOTk0MWZmMWNlMDQ3Zjg4MmNiMjFmNGMyNzFkNWMwX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzA4YzFhYmI4ODM3YjQ5YTU4NTEwNjg1YmZhOWE3ZTMyX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2FiZmI2OWY1MTk0ZTRhZDM4MmEwNmIzZWE0MTVlMmY1X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzFjNmI0YmIxNDRlYzQ5MWQ5ODQ5ODM4NjUxZTgwNmUxX3RoLmpwZw

自顶向下查找4的节点位置,发现4应当插入到节点元素3,5之间。

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2QzNTNjNmYyYWQ3ZjQ2NTliNDUzZmQwYzkzZmY2ODYyX3RoLmpwZw

节点3,5已经是两元素节点,无法再增加。父亲节点 2, 6 也是两元素节点,也无法再增加。根节点9是单元素节点,可以升级为两元素节点。于是拆分节点3,5与节点2,6,让根节点9升级为两元素节点4,9。节点6独立为根节点的第二个孩子。

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzAwZTA0OWY4NjNmMTRmYWFiZGNmNmMxZWZmMWI5OWE3X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzM2ZjgyN2E0OTIzODQ5ZWQ5ZDEwNzM0Yjk5YzEwYjg5X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2EyY2FmODQ0MTc3ZTQwMzdhZDA2OTgzOWY4ZDY4OTk4X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzA0ZWMyYzBlOTA1MTRhZTU5NzE5YWZkYzVmYWM4NDllX3RoLmpwZw

自顶向下查找元素11的节点位置。

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzE3ZWZiMzYxMTgyMDRhNzI5YjVjNTlhMTU0MDhkNmJjX3RoLmpwZw

删除11后,节点12只有一个孩子,不符合B树规范。因此找出12,13,15三个节点的中位数13,取代节点12,而节点12自身下移成为第一个孩子。(这个过程称为左旋)

aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1L2I5NThiYTc0MWRiODQ5NjFhZjk2ZGY4OTRmMzkwZGUzX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzEyMGZkNWIxNjI2NTQ2NDRiMjRmYmM3YWE2OGJhNmVlX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1Lzg2YTQ1ODNlNzg1MTRlNzliZWI4ZDgzODE1YTZhZGFiX3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzNkN2Q2YzE0YTQ0NDRiZTU5YTMxZGUyMzE0NzA0YmM1X3RoLmpwZw
aHR0cDovL2ltZy5tcC5pdGMuY24vdXBsb2FkLzIwMTcwNzA1LzM4ODIxMzg5MDg3NDRjNmZiYTVjYzdiYTRiOWRmMDc1X3RoLmpwZw