Class instance clustering in object reference graph for multi-entries serialization
Posted
by
Juh_
on Programmers
See other posts from Programmers
or by Juh_
Published on 2013-08-02T13:09:15Z
Indexed on
2013/08/02
16:02 UTC
Read the original article
Hit count: 313
My question is on the best way to cluster a graph of class instances (i.e. objects, the graph nodes) linked by object references (the -directed- edges of the graph) around specifically marked objects. To explain better my question, let me explain my motivation:
I currently use a moderately complex system to serialize the data used in my projects:
- "marked" objects have a specific attributes which stores a "saving entry": the path to an associated file on disc (but it could be done for any storage type providing the suitable interface)
- Those object can then be serialized automatically (eg:
obj.save()
) - The serialization of a marked object
'a'
contains implicitly all objects'b'
for which'a'
has a reference to, directly s.t:a.b = b
, or indirectly s.t.:a.c.b = b
for some object'c'
This is very simple and basically define specific storage entries to specific objects. I have then "container" type objects that:
- can be serialized similarly (in fact their are or can-be "marked")
- they don't serialize in their storage entries the "marked" objects (with direct reference): if
a
anda.b
are both marked,a.save()
callsb.save()
and storesa.b = storage_entry(b)
So, if I serialize 'a'
, it will serialize automatically all objects that can be reached from 'a'
through the object reference graph, possibly in multiples entries. That is what I want, and is usually provides the functionalities I need. However, it is very ad-hoc and there are some structural limitations to this approach:
- the multi-entry saving can only works through direct connections in "container" objects, and
- there are situations with undefined behavior such as if two "marked" objects
'a'
and'b'
both have a reference to an unmarked object'c'
. In this case my system will stores'c'
in both'a'
and'b'
making an implicit copy which not only double the storage size, but also change the object reference graph after re-loading.
I am thinking of generalizing the process. Apart for the practical questions on implementation (I am coding in python, and use Pickle to serialize my objects), there is a general question on the way to attach (cluster) unmarked objects to marked ones.
So, my questions are:
- What are the important issues that should be considered? Basically why not just use any graph parsing algorithm with the "attach to last marked node" behavior.
- Is there any work done on this problem, practical or theoretical, that I should be aware of?
Note:
I added the tag graph-database
because I think the answer might come from that fields, even if the question is not.
© Programmers or respective owner