[CMU15-445] P0 C++ Primer

【文档 p0-lab详情地址】


Task#1 Copy-On-Write Trie

要求

In this task, you will need to modify trie.h and trie.cpp to implement a copy-on-write trie. In a copy-on-write trie, operations do not directly modify the nodes of the original trie. Instead, new nodes are created for modified data, and a new root is returned for the newly-modified trie. Copy-on-write enables us to access the trie after each operation at any time with minimum overhead(最小开销). Consider inserting ("ad", 2) in the above example. We create a new Node2 by reusing two of the child nodes from the original tree, and creating a new value node 2. (See figure below)

If we then insert ("b", 3), we will create a new root, a new node and reuse the previous nodes. In this way, we can get the content of the trie before and after each insertion operation. As long as we have the root object (Trie class), we can access the data inside the trie at that time. (See figure below)

One more example: if we then insert ("a", "abc") and remove ("ab", 1), we can get the below trie. Note that parent nodes can have values, and you will need to purge all unnecessary nodes after removal.

Your trie should support 3 operations:

  • Get(key): Get the value corresponding to the key.
  • Put(key, value): Set the corresponding value to the key. Overwrite existing value if the key already exists. Note that the type of the value might be non-copyable (i.e., std::unique_ptr). This method returns a new trie.
  • Delete(key): Delete the value for the key. This method returns a new trie.

None of these operations should be performed directly on the trie itself.

You should create new trie nodes and reuse existing ones as much as possible. // 创建新的trie节点并尽可能重用现有节点。

To create a new node, you should use the Clone function on the TrieNode class. To reuse an existing node in the new trie, you can copy std::shared_ptr: copying a shared pointer doesn’t copy the underlying data. You should not manually allocate memory by using new and delete in this project. std::shared_ptr will deallocate the object when no one has a reference to the underlying object.

For the full specifications of these operations, please refer to the comments in the starter code.


实现思路

任务要求我们实现一个写时复制树,并且补充完成插入、查询、删除的操作

01.PUT()插入操作

每次放入一对 (key, value)

插入的Trie节点分为两种:(TrieNodeTrieNodeWithValue<T>)

Trie类成员变量:root_

类关系图如下,std::shared_ptr是共享指针可以直接复制,

image-20230824103333232

考虑边界情况:

  • 插入时是空树
  • 插入时key是空串

遍历key,构建一个由key字符路径组成的字典树:

只遍历到倒数第二个字符,

  • 依次判断字符对应的孩子节点是否存在

    1.不存在,就新建一个普通节点

    2.存在,就使用Clone拷贝原有分支,让新树对应的指针指向拷贝分支

特殊处理最后一个字符,

  • 最后字符对应的节点不存在,需要创建一个无孩子的数值节点
  • 存在对应节点的那个分支,需要覆盖原有数值节点的内容,并且保留对应孩子节点

最后返回Trie的根节点

处理的流程图如下,

image-20230824165749379

02.Get()查询操作

考虑边界情况:

查询到的是空树时,返回空

使用临时指针遍历查询的key:

看能否在Trie的分支中找到对应的字符节点,

  • 某个字符不能找到,返回空树
  • 都能找到,指针会指向key的尾节点,需要继续判断

当前节点是否为is_value_node_这个终止节点,

  • false,说明该节点为普通节点,返回空树

  • true,说明是有对应的数值节点,那就返回对应节点的裸指针

处理流程如下,

image-20230824164321951

03.Remove()删除/移除操作

删除对应节点的值时,不能影响原有节点,操作都在新节点上执行

具体的思路,

自顶向下,从根节点开始遍历串key,采用DFS递归处理,每次记录递归的串下标

边界考虑(递归出口):

处理叶子结点,遍历到最后一个字符,找到要删除的节点时

  • 当前节点已经没有孩子,返回空

  • 有孩子节点,新建一个普通节点(并且根据孩子节点来构造)

    为普通节点时就相当于只把当前节点的值置空了(达到删除目的)

处理中间节点:

当前遍历到的字符,可以在当前节点的孩子节点中找到时,

  • 就递归到下一层,并返回对应的孩子节点

  • 每次拷贝当前节点的分支

    递归有返回值且不空—>说明孩子节点存在,连接到新分支对应位置处

    递归返回的节点为空—>到查找字符路径的最后一个位置了,继续判断

    • 当前节点是数值节点,就可删除对应位置的节点(已经是字符路径的最后一个位置,就可删掉),变为普通节点

    • 当前节点是普通节点且没有孩子时,返回空(理解为置空操作)

  • 每层返回拷贝后经修改后的节点

如果键的下一个字符不存在于当前节点的孩子中,就没找到直接返回原节点root

处理流程图如下,

image-20230824164105191

结合Chatgpt辅助理解,节点删除的过程

做出声明,当删除的节点已经是尾节点时,直接删掉这个数值节点,父节点的状态不变无需更新。

image-20230824121933506

代码实现

balabala…


Task#2 Concurrent Key-Value Store

要求

Task #2 - Concurrent Key-Value Store

After you have a copy-on-write trie which can be used in a single-thread environment, implement a concurrent key-value store for a multithreaded environment. In this task, you will need to modify trie_store.h and trie_store.cpp. This key-value store also supports 3 operations:

  • Get(key) returns the value.
  • Put(key, value). No return value.
  • Remove(key). No return value.

For the original Trie class, everytime we modify the trie, we need to get the new root to access the new content. But for the concurrent key-value store, the put and delete methods do not have a return value. This requires you to use concurrency primitives to synchronize reads and writes so that no data is lost through the process.

Your concurrent key-value store should concurrently serve multiple readers and a single writer. // 多个读者,一个写者

That is to say, when someone is modifying the trie, reads can still be performed on the old root. When someone is reading, writes can still be performed without waiting for reads. // 当有人修改trie时,仍然可以在旧根上读取。当有人正在读取时,仍然可以执行写入,而无需等待读取。

Also, if we get a reference to a value from the trie, we should be able to access it no matter how we modify the trie. The Get function from Trie only returns a pointer. If the trie node storing this value has been removed, the pointer will be dangling. Therefore, in TrieStore, we return a ValueGuard which stores both a reference to the value and the TrieNode corresponding to the root of the trie structure, so that the value can be accessed as we store the ValueGuard.

To achieve this, we have provided you with the pseudo code for TrieStore::Get in trie_store.cpp. Please read it carefully and think of how to implement TrieStore::Put and TrieStore::Remove.

相关知识点

  • 并发访问将写时复制的多版本字典树,简化为两个版本,历史版本和当前版本。
  • 并发访问时,只有一个根节点,始终保存最新的数据。
  • 并发控制原则:共享读、排他写

实现思路

根据伪代码的提示信息,

在多线程环境下,通过对根节点进行加锁或解锁操作,来实现并发键值存储

01.Get()

值得注意的是,在进行查找trie树节点时,我们拿到根节点后需要立即解锁

原因是,写时复制字典树,会保留历史版本信息,读的时候并不会受新Trie修改的影响(可以保证有多个读者)

02.Put()

插入Tire树的节点信息时,需要加读锁,确保此刻只有一个对象来进行修改操作。

插入完成后,同时要将新修改的信息拷贝给Trie的成员变量root_,让其持有最新的信息

03.Remove()

操作类似Put过程

代码实现

balabala…


Task#3 Debugging

要求

In this task, you will learn the basic techniques of debugging. You can choose any way you prefer for debugging, including but not limited to: using cout and printf, using CLion / VSCode debugger, using gdb, etc.

Please refer to trie_debug_test.cpp for instructions. You will need to set a breakpoint after the trie structure is generated and answer a few questions. You will need to fill in the answer in trie_answer.h.

使用Clion进行Debug

这步要求我们在 trie_debug_test.cpp 进行Debug测试,通过debug信息得到题目要求的答案内容,

我使用Clion进行Debug测试后,三个答案的结果是8 1 42, 进行测试后答案并不对。

问题解决

网上转了一圈发现别人也有类似的问题,通过修改为指定的测试样例,是可以解决的,替换TrieDebugger的内容如下,

1
2
3
4
5
6
7
8
9
10
11
auto trie = Trie();
trie = trie.Put<uint32_t>("65", 25);
trie = trie.Put<uint32_t>("61", 65);
trie = trie.Put<uint32_t>("82", 84);
trie = trie.Put<uint32_t>("2", 42);
trie = trie.Put<uint32_t>("16", 67);
trie = trie.Put<uint32_t>("94", 53);
trie = trie.Put<uint32_t>("20", 35);
trie = trie.Put<uint32_t>("3", 57);
trie = trie.Put<uint32_t>("93", 30);
trie = trie.Put<uint32_t>("75", 29);

参考的解决方案:【文档 CMU15-445 Project 0 (Spring 2023)】

需要注意:

修改测试样例后,Debug后的答案是7 2 30,用gtest进行单测后还是不正确,这是没问题的,提交到评测网站上是可以通过的


Task#4 SQL String Functions

要求

Now it is time to dive into BusTub itself! You will need to implement upper and lower SQL functions.

This can be done in 2 steps:

(1) implement the function logic in string_expression.h. // 实现大小写的转换的逻辑函数

(2) register the function in BusTub, so that the SQL framework can call your function when the user executes a SQL, in plan_func_call.cpp. // 在BusTub中实现注册函数,以便SQL框架可以在用户执行SQL时调用该函数

实现思路

string_expression.h

执行SOL时,通过判断expr_type_的转换类型,使用tolower()toupper()对传入的字符串进行处理

plan_func_call.cpp

检验参数的合法性,进行注册操作,按照伪代码提示完成。


拓展知识点

SQL层调用链路框架图,

  1. Bustub模块图
image-20230825085923699

Bustub调用链(到SQL层)

image-20230825090159651

通过Debug,可以看到函数的调用栈

执行shell, 启动bustub

输入select upper('AbCd'), lower('AbCd');来执行查询操作 image-20230825085717023


代码实现

balabala…


代码提交评测

【官方评测工具 代码评测地址】

image-20230825103908981