问题描述
每当我发现自己需要在C ++程序中序列化对象时,我都会回到这种模式:
class Serializable { public: static Serializable *deserialize(istream &is) { int id; is >> id; switch(id) { case EXAMPLE_ID: return new ExampleClass(is); //... } } void serialize(ostream &os) { os << getClassID(); serializeMe(os); } protected: int getClassID()=0; void serializeMe(ostream &os)=0; };
上面的效果很好.但是,我听说这种转换为阶级ID是邪恶的,是一个邪恶的人.在C ++中处理序列化的标准是什么?
推荐答案
使用 Boost Serialization 虽然绝不是标准,但在很大程度上是一个非常写作的图书馆,可以为您完成咕unt的工作.
上次我不得不用清晰的继承树手动解析预定义的记录结构时,我最终使用工厂模式使用可注册类(即使用(模板)创建器功能的密钥映射而不是许多开关功能)尝试避免您所拥有的问题.
编辑
上述段落中提到的对象工厂的基本C ++实现.
/** * A class for creating objects, with the type of object created based on a key * * @param K the key * @param T the super class that all created classes derive from */ template<typename K, typename T> class Factory { private: typedef T *(*CreateObjectFunc)(); /** * A map keys (K) to functions (CreateObjectFunc) * When creating a new type, we simply call the function with the required key */ std::map<K, CreateObjectFunc> mObjectCreator; /** * Pointers to this function are inserted into the map and called when creating objects * * @param S the type of class to create * @return a object with the type of S */ template<typename S> static T* createObject(){ return new S(); } public: /** * Registers a class to that it can be created via createObject() * * @param S the class to register, this must ve a subclass of T * @param id the id to associate with the class. This ID must be unique */ template<typename S> void registerClass(K id){ if (mObjectCreator.find(id) != mObjectCreator.end()){ //your error handling here } mObjectCreator.insert( std::make_pair<K,CreateObjectFunc>(id, &createObject<S> ) ); } /** * Returns true if a given key exists * * @param id the id to check exists * @return true if the id exists */ bool hasClass(K id){ return mObjectCreator.find(id) != mObjectCreator.end(); } /** * Creates an object based on an id. It will return null if the key doesn't exist * * @param id the id of the object to create * @return the new object or null if the object id doesn't exist */ T* createObject(K id){ //Don't use hasClass here as doing so would involve two lookups typename std::map<K, CreateObjectFunc>::iterator iter = mObjectCreator.find(id); if (iter == mObjectCreator.end()){ return NULL; } //calls the required createObject() function return ((*iter).second)(); } };
其他推荐答案
序列化是C ++ ...
中的一个敏感主题快速问题:
- 序列化:短期结构,一个编码器/解码器
- 消息传递:寿命更长,用多种语言编码/解码器
2很有用,并且使用它们.
boost.serialization 是最大的推荐的序列化库通常是operator&的奇数选择,该operator&根据const的序列化或进行序列化的序列化确实是对我的滥用操作员的滥用.
对于消息传递,我宁愿建议 Google协议缓冲区.他们提供了一种干净的语法,用于描述消息并生成各种语言的编码器和解码器.当性能很重要时,还有另一个优点:它可以通过设计允许懒惰的审理(即一次仅一次斑点).
移动
现在,就实施的详细信息而言,这确实取决于您想要的.
- 您需要版本化,即使为了定期序列化,无论如何,您都可能需要向后兼容.
- 您可能或不需要一个tag + factory的系统.这仅需要多态类别.然后,您需要一个factory每个继承树(kind)然后...当然可以将代码放置!
- 指针/参考将使您在屁股上咬您...他们引用了在避免后发生变化的内存中的位置.我通常会选择一个切线方法:每个kind的每个对象均以其kind为唯一的id,因此我序列化id而不是指针.只要您没有循环依赖性并将指向/引用的对象序列化,就可以处理它.
就个人而言,我尽可能多地尽力将序列化/避难代码与运行类的实际代码分开.特别是,我尝试将其隔离在源文件中,以便对代码的这一部分进行更改不会歼灭二进制兼容性.
在版本上
我通常会尝试保持一个版本的序列化和避免序列化.更容易检查它们是否真正对称.我还尝试在我的序列化框架 +其他一些内容中直接抽象版本处理,因为应遵守:)
在错误处理上
为了简化错误检测,我通常使用一对"标记"(特殊字节)将一个对象与另一个对象分开.它使我能够在避难化期间立即投掷,因为我可以检测到流的干燥问题(即,有点吃了太多字母或没有足够的食物).
如果您想要允许的挑选化,即即使以前发生了故障,即使剩余的流进行挑选,您也必须朝着字节计数迈进:每个对象都有其字节计数之前,只能吃太多的字节(并有望全部吃掉它们).这种方法很不错,因为它允许部分进行验证:即,您可以保存对象所需的流部分,并且在必要时只能对其进行验证.
标记(您的类ID)在这里很有用,而不是(仅)用于派遣,而只是检查您实际上是否可以对对象的正确类型进行挑选.它还允许出现漂亮的错误消息.
这是您可能希望的一些错误消息/异常:
- No version X for object TYPE: only Y and Z
- Stream is corrupted: here are the next few bytes BBBBBBBBBBBBBBBBBBB
- TYPE (version X) was not completely deserialized
- Trying to deserialize a TYPE1 in TYPE2
请注意,据我记得Boost.Serialization和protobuf确实有帮助/版本处理.
protobuf由于其嵌套消息的能力,也有一些好处:
- 自然而然地支持字节计数,以及版本控制
- 您可以进行懒惰的挑战(即,存储消息,只有有人要求,才能进行审理)
对应物是由于消息的固定格式,很难处理多态性.您必须为此仔细设计它们.
其他推荐答案
不幸的是,序列化永远不会在C ++中完全无痛,至少在可预见的将来不是,仅仅是因为C ++缺少关键语言功能,使其他语言变得容易序列化: 反射 .也就是说,如果您创建类Foo,则C ++没有机制可以在运行时进行编程检查类以确定其包含哪些成员变量.
因此,没有办法创建广义序列化函数.一种或另一种方式,您必须为每个类实现特殊的序列化功能. Boost.序列化没有什么不同,它只是为您提供方便的框架和一套不错的工具,可以帮助您完成此操作.
问题描述
Whenever I find myself needing to serialize objects in a C++ program, I fall back to this kind of pattern:
class Serializable { public: static Serializable *deserialize(istream &is) { int id; is >> id; switch(id) { case EXAMPLE_ID: return new ExampleClass(is); //... } } void serialize(ostream &os) { os << getClassID(); serializeMe(os); } protected: int getClassID()=0; void serializeMe(ostream &os)=0; };
The above works pretty well in practice. However, I've heard that this kind of switching over class IDs is evil and an antipattern; what's the standard, OO-way of handling serialization in C++?
推荐答案
Using something like Boost Serialization, while by no means a standard, is a (for the most part) very well written library that does the grunt work for you.
The last time I had to manually parse a predefined record structure with a clear inheritance tree, I ended up using the factory pattern with registrable classes (i.e. Using a map of key to a (template) creator function rather than a lot of switch functions) to try and avoid the issue you were having.
EDIT
A basic C++ implementation of a object factory mentioned in the above paragraph.
/** * A class for creating objects, with the type of object created based on a key * * @param K the key * @param T the super class that all created classes derive from */ template<typename K, typename T> class Factory { private: typedef T *(*CreateObjectFunc)(); /** * A map keys (K) to functions (CreateObjectFunc) * When creating a new type, we simply call the function with the required key */ std::map<K, CreateObjectFunc> mObjectCreator; /** * Pointers to this function are inserted into the map and called when creating objects * * @param S the type of class to create * @return a object with the type of S */ template<typename S> static T* createObject(){ return new S(); } public: /** * Registers a class to that it can be created via createObject() * * @param S the class to register, this must ve a subclass of T * @param id the id to associate with the class. This ID must be unique */ template<typename S> void registerClass(K id){ if (mObjectCreator.find(id) != mObjectCreator.end()){ //your error handling here } mObjectCreator.insert( std::make_pair<K,CreateObjectFunc>(id, &createObject<S> ) ); } /** * Returns true if a given key exists * * @param id the id to check exists * @return true if the id exists */ bool hasClass(K id){ return mObjectCreator.find(id) != mObjectCreator.end(); } /** * Creates an object based on an id. It will return null if the key doesn't exist * * @param id the id of the object to create * @return the new object or null if the object id doesn't exist */ T* createObject(K id){ //Don't use hasClass here as doing so would involve two lookups typename std::map<K, CreateObjectFunc>::iterator iter = mObjectCreator.find(id); if (iter == mObjectCreator.end()){ return NULL; } //calls the required createObject() function return ((*iter).second)(); } };
其他推荐答案
Serialization is a touchy topic in C++...
Quick question:
- Serialization: short-lived structure, one encoder/decoder
- Messaging: longer life, encoders / decoders in multiple languages
The 2 are useful, and have their use.
Boost.Serialization is the most recommended library for serialization usually, though the odd choice of operator& which serializes or deserializes depending on the const-ness is really an abuse of operator overloading for me.
For messaging, I would rather suggest Google Protocol Buffer. They offer a clean syntax for describing the message and generate encoders and decoders for a huge variety of languages. There are also one other advantage when performance matters: it allows lazy deserialization (ie only part of the blob at once) by design.
Moving on
Now, as for the details of implementation, it really depends on what you wish.
- You need versioning, even for regular serialization, you'll probably need backward compatibility with the previous version anyway.
- You may, or may not, need a system of tag + factory. It's only necessary for polymorphic class. And you will need one factory per inheritance tree (kind) then... the code can be templatized of course!
- Pointers / References are going to bite you in the ass... they reference a position in memory that changes after deserialization. I usually choose a tangent approach: each object of each kind is given an id, unique for its kind, and so I serialize the id rather than a pointer. Some framework handles it as long as you don't have circular dependency and serialize the objects pointed to / referenced first.
Personally, I tried as much as I can to separate the code of serialization / deserialization from the actual code that runs the class. Especially, I try to isolate it in the source files so that changes on this part of the code does not annihilate the binary compatibility.
On versioning
I usually try to keep serialization and deserialization of one version close together. It's easier to check that they are truly symmetric. I also try to abstract the versioning handling directly in my serialization framework + a few other things, because DRY should be adhered to :)
On error-handling
To ease error-detection, I usually use a pair of 'markers' (special bytes) to separate one object from another. It allows me to immediately throw during deserialization because I can detect a problem of desynchronization of the stream (ie, somewhat ate too much bytes or did not ate sufficiently).
If you want permissive deserialization, ie deserializing the rest of the stream even if something failed before, you'll have to move toward byte-count: each object is preceded by its byte-count and can only eat so much byte (and is expected to eat them all). This approach is nice because it allows for partial deserialization: ie you can save the part of the stream required for an object and only deserialize it if necessary.
Tagging (your class IDs) is useful here, not (only) for dispatching, but simply to check that you are actually deserializing the right type of object. It also allows for pretty error messages.
Here are some error messages / exceptions you may wish:
- No version X for object TYPE: only Y and Z
- Stream is corrupted: here are the next few bytes BBBBBBBBBBBBBBBBBBB
- TYPE (version X) was not completely deserialized
- Trying to deserialize a TYPE1 in TYPE2
Note that as far as I remember both Boost.Serialization and protobuf really help for error/version handling.
protobuf has some perks too, because of its capacity of nesting messages:
- the byte-count is naturally supported, as well as the versioning
- you can do lazy deserialization (ie, store the message and only deserialize if someone asks for it)
The counterpart is that it's harder to handle polymorphism because of the fixed format of the message. You have to carefully design them for that.
其他推荐答案
Serialization is unfortunately never going to be completely painless in C++, at least not for the foreseeable future, simply because C++ lacks the critical language feature that makes easy serialization possible in other languages : reflection. That is, if you create a class Foo, C++ has no mechanism to inspect the class programatically at runtime to determine what member variables it contains.
So therefore, there is no way to create generalized serialization functions. One way or another, you have to implement a special serialization function for each class. Boost.Serialization is no different, it simply provides you with a convenient framework and a nice set of tools which help you do this.