数据结构-跳跃表C++实现

跳跃表/跳表(SkipList)
In computer science, a skip list is a data structure that allows $O(\log n)$ search complexity as well as $O(\log n)$ insertion complexity within an ordered sequence of $n$ elements. —— 维基百科

概述

由于本人知识有限，对于跳跃表的认识可能不够深入，不过还是想把自己的看法写下来…初学跳跃表时也许无法理解，不过当深入理解了跳跃表的原理时，一切都变得easy。比起手写一个红黑树代码跳跃表要简单得多，而且跳跃表也有很多实现例子的，如 Qt中的 QMap ,MemSQL ,Redis… 虽然我都没有接触过…

跳跃表

一个简单的跳跃表结构

我们都知道，对一个 有序链表 执行查找的复杂度为 $O(n)$ ,而对数组使用二分查找的复杂度则是 $O(\log n)$ ，而跳跃表的出现就是为了解决有序链表的查找问题(当然还要其他更重要的作用)，使其复杂度将为 $O(\log n)$ ，不过最坏情况下却仍是 $O(n)$
注意: 跳跃表适用于有序链表。

利用跳跃表之后的查找、插入、删除复杂度如下

方法	查找	插入	删除
平均情况	$O(\log n)$	$O(\log n)$	$O(\log n)$
最坏情况	$O(n)$	$O(n)$	$O(n)$

跳跃表是一个空间换时间的数据结构。

那么跳跃表是如何实现快速查找的呢？跳跃表通过建立 索引节点 ，每次查找时，先通过最顶层索引节点一层层的向下寻找满足条件的节点，每次查找都有可能跨越到其他不相邻的节点，直到抵达最底层也就是原始层链表某处节点位置。

这里的 索引节点 其实就是一组节点地址，在new一个新节点时就分配好，然后在对每一层索引节点进行重组，使其前后串联起来，这样就形成了一个 $i$ 级链表且 $i\in(0,M-1)$ M为跳跃表的最高层级数MaxLevel，这样每一层级均可形成一条链。

要查找key为5的节点，那么从head开始一层层的查找，此次过程跳过了节点1和节点3，最后找到了节点5。

这里我们设置一个有序链表为带有head和tail节点，其中head节点的next指针就是有序链表的起始，而tail节点一般为nullptr，不过这里我们将tail作为哨兵节点，这样就无需处理nullptr问题了。

插入/删除

在一个规则跳表结构中， $i$ 级链表有 $\frac{n}{2^i}$ 个节点， $n$ 为最底层链表的总节点数。如:
$0 \rightarrow i$ 层有 $n,\frac{n}{2},\frac{n}{4},\cdots,\frac{n}{2^{i}}$

也就是说，我们希望在插入时符合这种结构，而插入的节点属于 $i$ 级链表的概率为 $\frac{1}{2^i}$ (不难想出，插入节点属于0级链表概率永远为1，而属于1级链表的概率为 $\frac{1}{2}$ ， $\cdots$ ，属于 $i$ 级链表的为 $\frac{1}{2^i}$ )

这时来看看这段话，摘自William Pugh(就是他搞出了跳表)写的论文里的一段话。
文末参考中有他关于跳表的论文

原文
Skip lists are a data structure that can be used in place of balanced trees.
Skip lists use probabilistic balancing rather than strictly enforced balancing
and as a result the algorithms for insertion and deletion in skip lists are
much simpler and significantly faster than equivalent algorithms for
balanced trees.
翻译:
跳跃表是一种可以用来代替平衡树的数据结构。跳跃表使用概率平衡而不是严格执行的平衡。因此，在跳跃表中插入和删除的算法是要比等效算法简单得多，也快得多平衡树。

注意 跳跃表使用概率平衡而不是严格执行的平衡 这告诉我们，插入操作是动态的插入到不同层级的位置。因此，把一个新节点插入到 $i$ 级链表的概率为 $p^i$ (在一个规则跳表结构中， $p=0.5$ )。

而这个动态插入法就利用到随机数生成函数来完成，一个好的随机数生成函数，能够使跳表结构达到最优的效果使其复杂度不会太高。

除此之外每一次插入/删除操作，都必须要维护跳表结构，也就是处理好链表节点的指向关系，避免混乱。

级的分配

不说那么多，级的分配可以利用一个随机数生成函数来完成。
伪码:

randomLevel()
    lvl := 1
    -- random() that returns a random value in [0...1)
    while random() < p and lvl < MaxLevel do
        lvl := lvl + 1
    return lvl

转化成C++代码

int random_level(){
    int level= 1;
    while (((double)(rand()%101)/101) < (m_prob) 
    && level < m_maxLevel) {
        level++;
    }
    return level;
}

这里还要一个糟糕的随机数生成函数仅供参考

int random_level(){
	int level = rand() % m_maxLevel + 1;
	return level;
}

C++实现跳跃表

OK，现在开始用C++实现跳跃表
首先我们定义两个结构体，用于保存节点数据

template<class KeyType,class ValueType>
struct Element{
    KeyType key;
    ValueType value;
    Element(){}
    Element(KeyType key,const ValueType& value){
        this->key=key;
        this->value=value;
    }
};

template<class KeyType,class ValueType>
struct skipNode{
    Element<KeyType,ValueType> element;
    skipNode**next;
    skipNode(const Element<KeyType,ValueType>&theElement,int size){
        this->element=theElement;
        this->next=new skipNode<KeyType,ValueType>*[size];
    }
};

确定MaxLevel

MaxLevel太大，造成空间的浪费，太小又会导致退化成一个冗杂的 ‘单链表’ 结构，两者均会影响跳跃表的性能。因此合适的MaxLevel十分重要。

从《Skip Lists: A Probabilistic Alternative to
Balanced Trees》中我们可以这样定义一个MaxLevel，即
$L(n)=\lceil log_\frac{1}{p}n \rceil$
$MaxLevel=L(N),N$ 为待插入元素节点总数

利用换底公式 $\log_ab=\frac{\log_cb}{\log_ca}$ 转化为C++代码

1
2
3

int MaxLevel(int numberOfnode,float prob){
    return (int)(ceil(logf((float)numberOfnode)/logf((float)1/prob)));
}

skiplist初始化

我们只需在构造函数进行初始化即可。
这里我们既可以自定义跳跃表的MaxLevel，也可以通过numberOfnode来计算一个MaxLevel

template<class KeyType,class ValueType>
skiplist<KeyType,ValueType>::skiplist(float prob,KeyType tailLargeKey,int max_level,int numberOfnode) {
    srand(time(nullptr));
    m_prob=prob;
    m_size=0;

    // 计算最大链表层数
    if(numberOfnode!=-1)
        m_maxLevel=MaxLevel(numberOfnode,prob);
    else
        m_maxLevel=max_level;
        
    m_curMaxLevel=0;

    // 初始化尾节点
    m_tailKey=tailLargeKey;
    Element<KeyType,ValueType>tailPair{m_tailKey,0};
    m_tailNode=new skipNode<KeyType,ValueType>(tailPair,0);

    // 初始化头结点
    m_headNode=new skipNode<KeyType,ValueType>(tailPair,m_maxLevel);
    for (int i = 0; i < m_maxLevel; i++){
        m_headNode->next[i]=m_tailNode;
    }
    m_forwardNodes=new skipNode<KeyType,ValueType>*[m_maxLevel];
}

尾节点内存分配类似malloc(0)形式

skiplist查找 find和search

template<class KeyType,class ValueType>
skipNode<KeyType,ValueType>* skiplist<KeyType,ValueType>::search(const KeyType&key){
    skipNode<KeyType,ValueType>*forwardNode=m_headNode;
    // 外层循环: 不断的指向下一层级链表
    for (int i = m_curMaxLevel-1; i >= 0; --i){
        // 内层循环: 指向当前层级链表的邻接节点
        while (forwardNode->next[i]!=m_tailNode&&forwardNode->next[i]->element.key<key){
            forwardNode=forwardNode->next[i];
        }
        // 保存前驱节点指针
        this->m_forwardNodes[i]=forwardNode;
    }
    return forwardNode->next[0];
}

template<class KeyType,class ValueType>
Element<KeyType,ValueType>* skiplist<KeyType,ValueType>::find(const KeyType&key){
    if(key > this->m_tailKey)
        return nullptr;
    skipNode<KeyType,ValueType>*forwardNode=m_headNode;
    for (int i = m_curMaxLevel-1; i >=0 ; --i){
        while (forwardNode->next[i]!=m_tailNode && forwardNode->next[i]->element.key < key){
            forwardNode=forwardNode->next[i];
        }
    }
    // 最终会遍历到第0层->原始层，只需要在第0层判断即可
    if(forwardNode->next[0]->element.key==key){
        return &forwardNode->next[0]->element;
    }else{
        return nullptr;
    }
}

其中find和search都是从当前最大层级依次向下遍历，不同的是，search在遍历的同时将每次遍历到每一层的前驱节点保存在m_forwardNodes中，目的是为了衔接后面的插入/删除节点

其中的 forwardNode->next[0] 就是下一个要查找的节点(如果存在的话)

skiplist插入

每一次插入，需要通过search保存 前驱节点m_forwardNodes 从而将待插入节点连接起来。如下图红色箭头表示前后需要链接的节点。
需要注意的是，跳跃表是有序链表，下图只是为了解释而已 (其实是当时不小心画错了… 😦)

template<class KeyType,class ValueType>
void skiplist<KeyType,ValueType>::insert(const Element<KeyType,ValueType>&element){
    // 随机生成索引节点层数 1<=level<=m_maxLevel
    int level=random_level();
    if(level > this->m_curMaxLevel){
        this->m_curMaxLevel=level;
    }
    skipNode<KeyType,ValueType>*pNode=search(element.key);
    // 已经存在
    if(pNode->element.key==element.key){
        pNode->element.value=element.value;
        return;
    }
    // 此时已经保存了合适的前驱节点m_forwardNodes
    // 创建一个具有level层的节点
    skipNode<KeyType,ValueType>*pNewNode = new skipNode<KeyType,ValueType>(element,level);
    
    // 建立索引节点
    for (int i = level-1; i >=0; --i){
        pNewNode->next[i]=m_forwardNodes[i]->next[i];
        m_forwardNodes[i]->next[i]=pNewNode;
    }
    m_size++;
}

skiplist删除

删除操作类似插入操作，也需要通过search保存 前驱节点m_forwardNodes

template<class KeyType,class ValueType>
bool skiplist<KeyType,ValueType>::erase(const KeyType&key){
    // 不符合的key
    if(key>m_tailKey)
        return false;
    skipNode<KeyType,ValueType>*pNode=search(key);
    // 不存在 
    if(pNode->element.key!=key)
        return false;
    // 更新跳表链表结构
    for (int i = m_curMaxLevel-1; i >= 0; --i){
        // 此处 m_forwardNodes[i]->next[i] 可能不是 pNode
        if(m_forwardNodes[i]->next[i]==pNode)
            m_forwardNodes[i]->next[i]=pNode->next[i];
    }
    
    // 维护当前最大层级数
    // 当删除一个具有最大层级的节点时，可能会导致 m_headNode->next[m_maxLevel-1]=m_tailNode，那么此时需要降低层级
    while (m_curMaxLevel-1 > 0&&m_headNode->next[m_curMaxLevel-1]==m_tailNode){
        m_curMaxLevel--;
    }
    delete pNode;   
    m_size--;    
    return true;
}

需要注意的是，删除节点后还要维护m_curMaxLevel使其不能太高，这样的话可以减少遍历次数。

释放跳跃表

在析构函数中释放所有内存

template<class KeyType,class ValueType>
skiplist<KeyType,ValueType>::~skiplist(){
    while (m_headNode!=m_tailNode){
        auto x=m_headNode->next[0];
        delete m_headNode;
        m_headNode=x;
    }
    delete m_tailNode;
    delete []m_forwardNodes;
}

测试

用skiplist和STL map测试插入操作，仅供参考

#include <map>
#include <iostream>
#include "skiplist.h"
#include "ClockTest.h"
using namespace std;

// 一个无穷数
#define TAIL_INFINITY_KEY 0xffffff


int main(int argc, char const *argv[]){
    int n=1000000;
    srand(time(nullptr));
    // 测试skiplist insert
    CClock::start_timeclock();
    skiplist<int,int> sk(0.5,TAIL_INFINITY_KEY,0,n);
    for (int i = n; i >= 0; --i){
        sk.insert(Element<int,int>{rand()%n,i});
    }
    CClock::stop_timeclock();
    auto skiplist_insert_time=CClock::time_duration();
    
    // 测试map insert
    CClock::start_timeclock();
    map<int,int> mp;
    for (int i = n; i >=0; --i){
        mp.insert(pair<int,int>(rand()%n,i));
    }
    CClock::stop_timeclock();
    auto map_insert_time=CClock::time_duration();
    
    // 测试skiplist find
    CClock::start_timeclock();
    for (int i = n; i >=0; --i){
        sk.find(rand()%n);
    }
    CClock::stop_timeclock();
    auto skip_find_time=CClock::time_duration();

    // 测试map find
    CClock::start_timeclock();
    for (int i = n; i >=0; --i){
        mp.find(rand()%n);
    }
    CClock::stop_timeclock();
    auto map_find_time=CClock::time_duration();
    
    // 测试skiplist erase
    CClock::start_timeclock();
    for (int i = n; i >=0; --i){
        sk.erase(rand()%n);
    }
    CClock::stop_timeclock();
    auto skip_erase_time=CClock::time_duration();

    // 测试map erase
    CClock::start_timeclock();
    for (int i = n; i >=0; --i){
        mp.erase(rand()%n);
    }
    CClock::stop_timeclock();
    auto map_erase_time=CClock::time_duration();

    cout<<"SkipList insert time: "<<skiplist_insert_time<<endl;
    cout<<"SkipList find time: "<<skip_find_time<<endl;
    cout<<"SkipList erase time: "<<skip_erase_time<<endl;
    cout<<endl;
    cout<<"Map insert time: "<<map_insert_time<<endl;
    cout<<"Map find time: "<<map_find_time<<endl;
    cout<<"Map erase time: "<<map_erase_time<<endl;

    cout<<"SkipList size: "<<sk.size()<<endl;
    cout<<"Map size: "<<mp.size()<<endl;
    return 0;
}

我机器上测试随机插入/查找/删除，结果仅供参考。单位秒

	insert	find	erase
skiplist1	1.44737	2.03802	2.158
skiplist2	1.34377	1.61364	1.52145
skiplist3	1.35671	2.31105	1.57659
map1	1.48519	1.28156	1.93228
map2	1.00754	1.099	1.29245
map3	1.23837	1.28021	1.35214

由此可见，跳跃表也是很高效的一种数据结构

skiplist源码地址: https://github.com/josexy/skiplist_c

参考

维基百科-skiplist
Skip Lists: A Probabilistic Alternative toBalanced Trees
浅析SkipList跳跃表原理及代码实现

数据结构与算法

C++ DataStructures C

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

数据结构-AVL树上一篇

数据结构-二叉查找树(BST) 下一篇