Protobuf-Reflection类

类Reflection

接口类，提供方法来动态访问/修改message中的field的接口类。调用Message::GetReflection()获得messge对应的reflection。
这个类没有放到Message类中，是从效率角度考虑的。因为绝大多数message的实现共用同一套Reflection（GeneratedMessageReflection），并且一个Message所有的object是共享同一个reflection object的。

注意：

针对所有不同的field类型FieldDescriptor::TYPE_*,需要使用不同的Get*()/Set*()/Add*() 接口;
repeated类型需要使用GetRepeated*()/SetRepeated*()接口，不可以和非repeated类型接口混用；
message对象只可以被由它自身的reflection（message.GetReflection()）来操作；

那么为什么需要针对每种FieldDescriptor::TYPE_*有单独的Get*()/Set*()呢？
因为如果使用抽象的type来解决，需要增加一层处理，这会导致message占用内存变大，也增加了内存泄漏的风险，所以在用这种flat的接口设计。

类GeneratedMessageReflection

类Reflection的子类（也是当前版本中唯一的子类），服务于某一个固定的descriptor（构造GeneratedMessageReflection对象时就确定了对应的descriptor）。反射机制中最为核心的类。

内部实现：

操作任何一个数据时，需要知道2个信息即可：

内存地址；
类型信息;

GeneratedMessageReflection也是这样设计的。GeneratedMessageReflection通过base_addr + $offset[i] 的方式管理message所有的field，$offset[i]记录了message中每个field在message内存对象中的偏移，并且descriptor中有每个field的类型信息。

需要针对某个(message, field)做处理的时候：

直接通过descriptor获取对应field在message中的index
再查询offset[$index]获取内存地址
然后通过descriptor中type信息
做reinterpret_cast就获得对应数据。

构建GeneratedMessageReflection对象时，传入的核心数据是：

descriptor：被管理的message的descriptor指针；
offsets：message类的所有成员在message类内存对象的偏移；
has_bits_offset：用于”记录某个field是否存在的bitmap”的偏移（这个bitmap是message子类内部成员，其实是取这个数组0元素_has_bits_[0]的偏移）,这个bitmap最终是用来判断optional类型的field是否存在；
unknown_fields_offset：和has_bits_offset功能类似，用于记录unkown数据；

field有不同的类型，所以需要将void*转化为相应的类型。

对于primitive类型和string类型，直接使用对应primitive类型/string*表示；
单个Message类型field，通过Message的指针来保存；
Enum类型field，通过int来保存，这个int作为EnumDescriptor::FindValueByNumber()的输入；
Repeated类型field（细节见《repeated字段》一章）：
其中Strings/Message类型使用RepeatedPtrFields
其它primitive类型使用RepeatedFields

应用举例：

在每个.pb.cc文件中，对应每个message都有对应的GeneratedMessageReflection对象。例如针对protobuf/compiler/plugin.proto文件中的message CodeGeneratorRequest，在protobuf/compiler/plugin.pb.cc中：

namespace {

const ::google::protobuf::Descriptor* CodeGeneratorRequest_descriptor_ = NULL;
const ::google::protobuf::internal::GeneratedMessageReflection*
  CodeGeneratorRequest_reflection_ = NULL;

…… //省略
                                                                                                                          
}  // namespace

…… //省略

void protobuf_AssignDesc_google_2fprotobuf_2fcompiler_2fplugin_2eproto() {

…… //省略

  // CodeGeneratorRequest包含这3个field
   static const int CodeGeneratorRequest_offsets_[3] = {                                                                                                                     
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(CodeGeneratorRequest, file_to_generate_),
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(CodeGeneratorRequest, parameter_),
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(CodeGeneratorRequest, proto_file_),
  };

  CodeGeneratorRequest_reflection_ =
    new ::google::protobuf::internal::GeneratedMessageReflection(
      CodeGeneratorRequest_descriptor_,
      CodeGeneratorRequest::default_instance_,
      CodeGeneratorRequest_offsets_,
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(CodeGeneratorRequest, _has_bits_[0]),
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(CodeGeneratorRequest, _unknown_fields_),
      -1,  
      ::google::protobuf::DescriptorPool::generated_pool(),
      ::google::protobuf::MessageFactory::generated_factory(),
      sizeof(CodeGeneratorRequest));

…… //省略

}

`GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET`宏

GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET宏作用是找到某个field在所被包含type内存的offset

Q：pb.h定义中，field都是message子类的private成员，这里为什么可以通过”->”访问private成员呢？

A: 函数protobuf_AssignDesc_google_2fprotobuf_2fcompiler_2fplugin_2eproto()被定义为各个message子类的friend（定义在private部分）

这里代码注释给了很多信息！protobuf针对关键点的注释非常详细，值得学习！

// Returns the offset of the given field within the given aggregate type.
// This is equivalent to the ANSI C offsetof() macro.  However, according
// to the C++ standard, offsetof() only works on POD types, and GCC
// enforces this requirement with a warning.  In practice, this rule is
// unnecessarily strict; there is probably no compiler or platform on
// which the offsets of the direct fields of a class are non-constant.
// Fields inherited from superclasses *can* have non-constant offsets,
// but that's not what this macro will be used for.
//
// Note that we calculate relative to the pointer value 16 here since if we
// just use zero, GCC complains about dereferencing a NULL pointer.  We
// choose 16 rather than some other number just in case the compiler would
// be confused by an unaligned pointer.
#define GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(TYPE, FIELD)    \
  static_cast<int>(                                           \
    reinterpret_cast<const char*>(                            \
      &reinterpret_cast<const TYPE*>(16)->FIELD) -            \
    reinterpret_cast<const char*>(16))

举例说明

以primitive类型为例说明GeneratedMessageReflection如何管理各个不同类型的field。

在read一侧：

// Template implementations of basic accessors.  Inline because each
// template instance is only called from one location.  These are
// used for all types except messages.
template <typename Type>
inline const Type& GeneratedMessageReflection::GetField(
    const Message& message, const FieldDescriptor* field) const {
  return GetRaw<Type>(message, field);
}

从message内存起始地址，按照field在message对象内存中的offset偏移之后获取field的内存地址，然后reinterpret_cast为Type类型（primitive的）

// These simple template accessors obtain pointers (or references) to
// the given field.
template <typename Type>
inline const Type& GeneratedMessageReflection::GetRaw(                                                                    
    const Message& message, const FieldDescriptor* field) const {
  const void* ptr = reinterpret_cast<const uint8*>(&message) +
                    offsets_[field->index()];
  return *reinterpret_cast<const Type*>(ptr);
}

这里的offsets_[]就是构造函数GeneratedMessageReflection::GeneratedMessageReflection()传入的，各个field在message中的偏移量数组（也就是上面例子中的 CodeGeneratorRequest_offsets_[3]）。field->index()是field在parent的children数组中的pos，实现如下：

			inline int FieldDescriptor::index() const {
				...... //省略
			    return this - containing_type_->fields_;
				...... //省略
}

在write一侧：

template <typename Type>
inline void GeneratedMessageReflection::SetField(
    Message* message, const FieldDescriptor* field, const Type& value) const {
  *MutableRaw<Type>(message, field) = value;
  SetBit(message, field);
     }

template <typename Type>
inline Type* GeneratedMessageReflection::MutableRaw(
    Message* message, const FieldDescriptor* field) const {
  void* ptr = reinterpret_cast<uint8*>(message) + offsets_[field->index()];
  return reinterpret_cast<Type*>(ptr);
}

这里has_bits_offset_为bitmap结构，通过某个bit是否存在，快速判断对应filed是否存在

inline void GeneratedMessageReflection::SetBit(
    Message* message, const FieldDescriptor* field) const {
  MutableHasBits(message)[field->index() / 32] |= (1 << (field->index() % 32));
}

inline uint32* GeneratedMessageReflection::MutableHasBits(
    Message* message) const {
  void* ptr = reinterpret_cast<uint8*>(message) + has_bits_offset_;
  return reinterpret_cast<uint32*>(ptr);
}

RepeatedPtrFields / RepeatedFields具体实现，见repeated_field.*文件，详情见《repeated字段》一章

内存分布说明

举一个具体的例子来说明offset[]的工作方式吧：

从message Student定义到对应class Student
再到offset以及Student类对象内存分布说明，请见下图：

avatar

Message类对象内存分布

针对不同类型，有不同的内存记录方式：

primitve类型：直接在内存中保存了对应的value；
string类型，保存的是string*地址；
repeated<message>类型，保存的是RepeatedPtrField<message>对象，采用2级内存管理，第一级内部数据管理的是array<void*>，void*是真实message对象内存地址
repeated<primitive>类型，保存的是RepeatedField<primitive>对象,内部数据管理的是array<primitive数据对象>

Student类中包含了多种类型成员，对应内存查找过程如下图所示：