Protobuf-Descriptor相关类

类Descriptor

描述一种message类型（不是一个单独的message对象）的meta信息。构造函数是private类型，必须通过DescriptorPool（friend类）来构造。

const的成员：

const FileDescriptor* file_：描述message所在的.proto文件信息
const Descriptor* containing_type_：如果在proto定义中，这个message是被其它message所包含，那么这个字段是上一级message的descriptor*；如果没有被包含，那么是NULL
const MessageOptions* options_：定义在descriptor.proto，从注释看是用来和老版本proto1中MessageSet做拓展，可以先不去关注涉及extension的部分。

非const的成员：

int field_count_：当前field包含的field的个数
FieldDescriptor* fields_：以连续数组方式保存的所有的fieds
int nested_type_count_: 嵌套类型数量
Descriptor* nested_types_: message中嵌套message
int enum_type_count_：内部enum的个数
EnumDescriptor* enum_types_： enum类型的连续内存起始地址

类FileDescriptor

描述整个.proto文件信息，其中包含：

依赖.proto文件信息：

int dependency_count_;

const FileDescriptor** dependencies_;
当前.proto文件包含的message信息：

int message_type_count_;

Descriptor* message_types_;
当前.proto文件包含的所有symbol (各种descriptor)的tables：

const FileDescriptorTables* tables_;

类FieldDescriptor

描述一个单独的field，构造函数为private，也必须由DescriptorPool（friend类）构造。通过包含这个field的message的descriptor的函数（Descriptor::FindFieldByName()）获得。

enum类型：

enum Type ： field类型；
enum CppType： cpp中field类型，CppType和Type类型映射关系是固定的；
enum Label ：标记field的存在性类型(optional/required/repeated)；

const类型的private数据：

const Descriptor* containing_type_;
const Descriptor* extension_scope_;
const Descriptor* message_type_;
const EnumDescriptor* enum_type_;
const FieldDescriptor* experimental_map_key_;
const FieldOptions* options_;

3个映射表（static const类型）：

static const CppType kTypeToCppTypeMap[MAX_TYPE + 1];
static const char * const kTypeToName[MAX_TYPE + 1];
static const char * const kLabelToName[MAX_LABEL + 1];

在descriptor.cc中，实现对外暴露数据的函数时，为了提高代码可读性，使用了如下宏的方式：

1 2	PROTOBUF_DEFINE_ACCESSOR(FieldDescriptor, default_value_int32 , int32 ) PROTOBUF_DEFINE_ACCESSOR(FieldDescriptor, has_default_value, bool)

PROTOBUF_DEFINE_ACCESSOR的定义如下：

1
2
3

// These macros makes this repetitive code more readable.
#define PROTOBUF_DEFINE_ACCESSOR(CLASS, FIELD, TYPE) \
  inline TYPE CLASS::FIELD() const { return FIELD##_; }

因为FieldDescriptor自己包含如下union数据成员，用来表示不同TYPE类型数据的default值：

private:
  bool has_default_value_;
  union {
    int32  default_value_int32_;
    int64  default_value_int64_;
    uint32 default_value_uint32_;
    uint64 default_value_uint64_;
    float  default_value_float_;
    double default_value_double_;
    bool   default_value_bool_;

    const EnumValueDescriptor* default_value_enum_;
    const string* default_value_string_;
  };

类EnumDescriptor

描述在.proto文件中定义的enum类型

结构体Symbol

针对protobuf中7种类型的descriptor的一个封装。
编程上，也适用union来适配不同类型的descriptor：


Type type;

union {
  const Descriptor* descriptor;
  const FieldDescriptor* field_descriptor;
  const EnumDescriptor* enum_descriptor;
  const EnumValueDescriptor* enum_value_descriptor;
  const ServiceDescriptor* service_descriptor;
  const MethodDescriptor* method_descriptor;
  const FileDescriptor* package_file_descriptor;
};

提高代码可读性上，使用宏的方式：

#define CONSTRUCTOR(TYPE, TYPE_CONSTANT, FIELD)  \
  inline explicit Symbol(const TYPE* value) {    \
    type = TYPE_CONSTANT;                        \
    this->FIELD = value;                         \
  }

  CONSTRUCTOR(Descriptor         , MESSAGE   , descriptor             )
  CONSTRUCTOR(FieldDescriptor    , FIELD     , field_descriptor       )
  CONSTRUCTOR(EnumDescriptor     , ENUM      , enum_descriptor        )
  CONSTRUCTOR(EnumValueDescriptor, ENUM_VALUE, enum_value_descriptor  )
  CONSTRUCTOR(ServiceDescriptor  , SERVICE   , service_descriptor     )
  CONSTRUCTOR(MethodDescriptor   , METHOD    , method_descriptor      )
  CONSTRUCTOR(FileDescriptor     , PACKAGE   , package_file_descriptor)
#undef CONSTRUCTOR

类DescriptorPool::Tables

各种数据表的集合，封装了一系列的hashmap结构。
注意这个类是descriptor.h文件中在类DescriptorPool的private成员中声明的，所以是类DescriptorPool内部的数据结构，

封装的一系列的hashmap：

typedef pair<const void*, const char*> PointerStringPair;
//这里是将message对应的descriptor地址和int组合在一起，指定descriptor中的某一个field

typedef pair<const Descriptor*, int> DescriptorIntPair;    
typedef pair<const EnumDescriptor*, int> EnumIntPair;
	

typedef hash_map<const char*, Symbol,
                 hash<const char*>, streq>
  SymbolsByNameMap;
  
typedef hash_map<PointerStringPair, Symbol,                     
                 PointerStringPairHash, PointerStringPairEqual>
  SymbolsByParentMap;   
  
typedef hash_map<const char*, const FileDescriptor*,
                 hash<const char*>, streq>
  FilesByNameMap;
  
typedef hash_map<PointerStringPair, const FieldDescriptor*,
                 PointerStringPairHash, PointerStringPairEqual>
  FieldsByNameMap;
  
typedef hash_map<DescriptorIntPair, const FieldDescriptor*,
                 PointerIntegerPairHash<DescriptorIntPair> >
  FieldsByNumberMap;
  
typedef hash_map<EnumIntPair, const EnumValueDescriptor*,
                 PointerIntegerPairHash<EnumIntPair> >
  EnumValuesByNumberMap;

parent的含义

从BUILD_ARRAY的定义和使用，可以理解parent的含义，有如下3种情况：

当一个message针对它所包含的成员（field/nested_message/enum/extension）, 这个message的Descriptor* 就是它成员的parent。
```
从函数`DescriptorBuilder::BuildMessage()`中的宏`BUILD_ARRAY`定义可以看出这一点。
```
一个enum，针对它所包含的enum_value是parent（函数DescriptorBuilder::BuildEnum()中体现）
一个service，针对它所包含的method是parent（函数DescriptorBuilder::BuildService()中体现）

具体数据成员

vector<string*> strings_;    // All strings in the pool.
vector<Message*> messages_;  // All messages in the pool.
vector<FileDescriptorTables*> file_tables_;  // All file tables in the pool.
vector<void*> allocations_;  // All other memory allocated in the pool.

SymbolsByNameMap      symbols_by_name_;
FilesByNameMap        files_by_name_;
ExtensionsGroupedByDescriptorMap extensions_;

和rollback相关的数据成员

int strings_before_checkpoint_;
int messages_before_checkpoint_;
int file_tables_before_checkpoint_;
int allocations_before_checkpoint_;
vector<const char*      > symbols_after_checkpoint_;
vector<const char*      > files_after_checkpoint_;
vector<DescriptorIntPair> extensions_after_checkpoint_;

其它数据成员

vector<string> pending_files_  // stack方式保存的文件名，用来检测文件的循环依赖错误

Checkpoint/Rollback

和数据库事务处理中的概念一样，在确保数据正常时，生成一个检查点(checkpoint)，针对当前状态做一个快照；如果在后续处理过程中，发生问题，做回滚(rollback)，数据恢复到上一个checkpoint，保证基础服务可以继续，提高系统的可用性。

生成checkpoint的点只有2个，都在函数DescriptorBuilder::BuildFile()中：

开始修改DescriptorPool::Tables* tables_内容之前；
所有操作都成功之后；

DescriptorPool::Tables::Checkpoint():

void DescriptorPool::Tables::Checkpoint() {
  // 记录下当前4个vector的size
  strings_before_checkpoint_ = strings_.size();
  messages_before_checkpoint_ = messages_.size();
  file_tables_before_checkpoint_ = file_tables_.size();
  allocations_before_checkpoint_ = allocations_.size();

  // clear掉3个`***_after_checkpoint_`的vector
  symbols_after_checkpoint_.clear();
  files_after_checkpoint_.clear();
  extensions_after_checkpoint_.clear();
   }

DescriptorPool::Tables::Rollback():

从通过name查询的hashmap删除掉after_checkpoint_[]的数据；
清理掉after_checkpoint_[]数据；
通过Checkpoint()记录下来的size，删除vector尾部数据，并且完成resize()，释放掉不再占有的内存空间；

DescriptorPool::Tables中的各个表中的数据是如何注册进来的呢？

对外接口是DescriptorPool::Tables::AddSymbol()，在DescriptorBuilder类的DescriptorBuilder::AddSymbol()和DescriptorBuilder::AddPackage()被调用。

类DescriptorPool

负责构造和管理所有的、各种类型的descriptor，并且帮助管理互相cross-linked的descriptor之间的关系，以及他们之间的数据依赖。可以通过name来从DescriptorPool找到对应descriptor。

按照singleton方式提供服务，全局数据包含：

EncodedDescriptorDatabase* generated_database_ = NULL;
DescriptorPool* generated_pool_ = NULL;
GOOGLE_PROTOBUF_DECLARE_ONCE(generated_pool_init_);

使用google::protobuf::GoogleOnceInit（本质是pthread_once）来控制仅仅被init一次。

虽然类DescriptorPool提供了3种构造函数，但从函数InitGeneratedPool()看，仅仅使用了配置DescriptorDatabase*的版本，其余2个并没有使用。在这种情况下，其实 const DescriptorPool* underlay_是为NULL的。

void InitGeneratedPool() {
  generated_database_ = new EncodedDescriptorDatabase;
  generated_pool_ = new DescriptorPool(generated_database_);

  internal::OnShutdown(&DeleteGeneratedPool);
}

###DescriptorDatabase* fallback_database_###

作用:

用于定制地(on-demand)从某种”大”的database加载产生DescriptorPool。因为database太大，逐个调用DescriptorPool::BuildFile() 来处理原database中的每一个proto文件是低效的。为了提升效率，使用DescriptorPool来封装DescriptorDatabase，并且只建立正真需要的descriptor。

针对编译依赖的每个proto文件，并不是在进程启动时，直接构建出proto中所包含的所有descriptor，而是hang on，直到某个descriptor真的被需要：

(1) 用户调用例如descriptor(), GetDescriptor(), GetReflection()的方法，需要返回descriptor；  
(2) 用户从DescriptorPool::generated_pool()中查找descriptor；  
这也是为什么DescriptorPool的底层数据，需要分层的原因！

说明：

采用fallback_database_之后，不能调用BuildFile*() 方法来构建pool,只能使用Find*By*() 方法
Find*By*() 因为上锁，所以即使没有去访问fallback_database_的请求也会变慢

`const DescriptorPool* underlay_`的作用

Underlay的作用（从注释中得到）：

仅在内部使用，并且可能存在诡异的问题（many subtle gotchas），建议使用DescriptorDatabases来解决问题。

应用场景：

需要runtime方式使用DynamicMessage来解析一个.proto文件，已知这个.proto文件的依赖已经按照静态编译方式包含。

一方面为了避免重复解析和加载这些依赖内容；
另一方面不能把runtime的.proto添加到原有的generated_pool()产生的DescriptorPool中，所以并不是直接把这个.proto文件的内容添加到全局的、由generated_pool()产生的DescriptorPool中，而是创建一个新的DescriptorPool，将generated_pool()产生的DescriptorPool作为新的pool的underlay。

`DescriptorPool::FindBy()`系列函数

作用：

通过name来从DescriptorPool找到对应descriptor时，查找时先上锁（MutexLockMaybe），代码上看是分3个层级数据来查找的：

从DescriptorPool::Tables tables_中找，没找到继续第2层中找；
从DescriptorPool* underlay_中找,没找到继续第3层中找；
从DescriptorDatabase* fallback_database_中找对应proto，并且调用临时构造的DescriptorBuilder::Build*()系列接口把生成的descriptor添加到tables_中，然后再从tables_中找；

Q: 这里为什么要临时构造一个DescriptorBuilder来使用呢？
答案是：锁是针对第3层DescriptorDatabase* fallback_database_的，因为这个可能被同时读/写

类DescriptorBuilder

封装了DescriptorPool，对外提供descriptor的构造。对外最主要的接口是DescriptorBuilder::BuildFile()，通过FileDescriptorProto来构建FileDescriptor。

DescriptorProto系列类

DescriptorProto系列类，在descriptor.proto文件中定义，用来描述由protobuf产生类型的类型原型（proto）。

一共有如下7种proto

FileDescriptorProto	用来描述文件
DescriptorProto	用来描述消息（message）
FieldDescriptorProto	用来描述字段
EnumDescriptorProto	用来描述枚举
EnumValueDescriptorProto	用来描述枚举值
ServiceDescriptorProto	用来描述服务器
MethodDescriptorProto	用来描述服务器方法

类FileDescriptorTables

单个proto文件中包含的tables，这些tables在文件加载时就固化下来，所以无需使用mutex保护，所以使得依赖单个文件的操作(例如Descriptor::FindFieldByName() )是lock-free的。
类FileDescriptorTables 和类 DescriptorPool::Tables过去是在同一个类中定义的。
原来Google也有类似的注释：// For historical reasons,xxxxxx。

它所包含的数据结构如下：

SymbolsByParentMap    symbols_by_parent_;
FieldsByNameMap       fields_by_lowercase_name_;
FieldsByNameMap       fields_by_camelcase_name_;
FieldsByNumberMap     fields_by_number_;       // Not including extensions.
EnumValuesByNumberMap enum_values_by_number_;

类DescriptorDatabase

接口类，用于定制地(on-demand)从某种”大”的database加载产生DescriptorPool。因为database太大，逐个调用DescriptorPool::BuildFile() 来处理原database中的每一个proto文件是低效的。
为了提升效率，使用DescriptorPool来封装DescriptorDatabase，并且只建立正真需要的descriptor。

包含了4个子类，提供通过name查询file_descriptor_proto 的接口（注意这里是file_descriptor_proto，而不是file_descriptor）。

类SimpleDescriptorDatabase

索引file_name-> file_descriptor_proto*，拥有被它索引的 file_descriptor_proto*的ownership，并提供add()/find()接口。类SimpleDescriptorDatabase在protobuf中并没有被使用。

内部实现：

通过SimpleDescriptorDatabase::DescriptorIndex<const FileDescriptorProto*> index_管理索引结构;
通过vector<const FileDescriptorProto*> files_to_delete_管理”深拷贝”的部分;

类SimpleDescriptorDatabase::DescriptorIndex

内部实现：

由map<string，Value>管理从name->Value的映射关系；
由map<string，Value>管理file所包含的symbol_name->Value的映射关系，这里的symbol可以是file包含的message/enum/service；
由map<string，Value>管理file所包含的extension_name->Value的映射关系；

应用：

protobuf中仅在如下2个地方被应用：

类SimpleDescriptorDatabase中的DescriptorIndex<const FileDescriptorProto*> index_
类EncodedDescriptorDatabase中的SimpleDescriptorDatabase::DescriptorIndex<pair<const void*, int> > index_

类EncodedDescriptorDatabase

功能说明：

索引file_name->pair<const void*, int>，结构pair<const void*, int>中的const void*指的是encoded_file_descriptor字符串的地址，int指的是encoded_file_descriptor字符串的长度。被管理的encoded_file_descriptor有两类ownership：

拥有ownership的，通过接口EncodedDescriptorDatabase::AddCopy()实现；
不用有ownership的，通过接口EncodedDescriptorDatabase::Add()实现；

具体应用：

每个proto生成的pb.cc中，都包含将本proto文件encoded之后的string添加到EncodedDescriptorDatabase中的函数。例如protobuf/compiler/plugin.pb.cc中的

void protobuf_AddDesc_google_2fprotobuf_2fcompiler_2fplugin_2eproto() {                                                   

  …… //省略函数

  ::google::protobuf::DescriptorPool::InternalAddGeneratedFile(
    "\n%google/protobuf/compiler/plugin.proto\022"
    "\030google.protobuf.compiler\032 google/protob"
    "uf/descriptor.proto\"}\n\024CodeGeneratorRequ"
    "est\022\030\n\020file_to_generate\030\001 \003(\t\022\021\n\tparamet"
    "er\030\002 \001(\t\0228\n\nproto_file\030\017 \003(\0132$.google.pr"
    "otobuf.FileDescriptorProto\"\252\001\n\025CodeGener"
    "atorResponse\022\r\n\005error\030\001 \001(\t\022B\n\004file\030\017 \003("
    "\01324.google.protobuf.compiler.CodeGenerat"
    "orResponse.File\032>\n\004File\022\014\n\004name\030\001 \001(\t\022\027\n"
    "\017insertion_point\030\002 \001(\t\022\017\n\007content\030\017 \001(\t", 399);

  …… //省略函数

}

::google::protobuf::DescriptorPool::InternalAddGeneratedFile()定义如下：

void DescriptorPool::InternalAddGeneratedFile(
    const void* encoded_file_descriptor, int size) {

// 每个protobuf产出的.pb.cc文件中都会包含InternalAddGeneratedFile()，在进程启动时会调用这个函数，注册.proto文件对应FileDescriptorProto的raw bytes
// Q：进程启动时会调用这个函数，上一级的入口在哪里呢？每个编译依赖(被include)的proto文件都会注册么？
// 针对编译依赖的每个proto文件，并不是在进程启动时，直接构建出proto中所包含的所有descriptor，而是hang on，直到某个descriptor真的被需要：
// (1) 用户调用例如descriptor(), GetDescriptor(), GetReflection()的方法，需要返回descriptor；
// (2) 用户从DescriptorPool::generated_pool()中查找descriptor；
//
// 上述2类请求发生时，DescriptorPool先获得并解析FileDescriptorProto，然后根据它产生对应的FileDescriptor（以及它所包含的descriptor）
//
// 因为FileDescriptorProto类型本身也是由protobuf通过protobuf/descriptor.proto文件产生的，所以当解析的时候，需要注意避免使用任何descriptor-based 的操作，避免死锁和死循环。

  InitGeneratedPoolOnce();
  GOOGLE_CHECK(generated_database_->Add(encoded_file_descriptor, size));
}

内部实现：

vector<void*> files_to_delete_：记录拥有ownership的encoded_file_descriptor字符串的地址，

类DescriptorPoolDatabase

针对单一DescriptorPool的封装，查询时先调用内置的DescriptorPool接口，从name查找到对应的file_descriptor, 再调用FileDescriptor::CopyTo(),获得file_descriptor_proto.

类MergedDescriptorDatabase

类DescriptorDatabase的子类，封装多个descriptor_database，本身结构简单，用vector<DescriptorDatabase*>保存，逐个遍历查询。

DescriptorDatabase 相关类的关系图

avatar

类Descriptor

const的成员：

非const的成员：

类FileDescriptor

类FieldDescriptor

enum类型：

const类型的private数据：

3个映射表（static const类型）：

类EnumDescriptor

结构体Symbol

类DescriptorPool::Tables

Checkpoint/Rollback

类DescriptorPool

const DescriptorPool* underlay_的作用

DescriptorPool::Find*By*()系列函数

类DescriptorBuilder

DescriptorProto系列类

类FileDescriptorTables

类DescriptorDatabase

类SimpleDescriptorDatabase

类SimpleDescriptorDatabase::DescriptorIndex

类EncodedDescriptorDatabase

类DescriptorPoolDatabase

类MergedDescriptorDatabase

DescriptorDatabase 相关类的关系图

`const DescriptorPool* underlay_`的作用

`DescriptorPool::FindBy()`系列函数