How to avoid keeping old data formats in your code?

Hi,
I didn't had so much luck with this question. The question was:
-> "How to avoid keeping old data formats in your code?"
Indeed those old data formats are often needed for retro-compatibility with
previous versions of your software.
Say you have a data structure that is serialized in a file, and then you
add a field in a later version. You are obliged to keep both versions of
the data structure if you want to be able to read both versions of the
file. Potentially, if you update very often your structure between
releases, you are obliged to keep N versions of the data structure in your
code.
See the exemple at: http://acid-state.seize.it/safecopy
On Sun, Nov 30, 2014 at 1:04 PM, Corentin Dupont
Hi the list, I have some question relative to data migration.
Say you have a software in version A which save data in a file with format FA. Later, you update your software to version B, with data format FB. Now, if you want your version B of the software to be able to read data saved by version A, you're obliged to include FA in version B, together with some functions to translate from FA to FB.
This is what I don't find elegant: you're obliged to keep old code (data formats) in your software. -> do you know of any way to avoid keeping old data formats (for retro-compatiblilty) in your code?
In practice I use acid-state with safecopy: http://acid-state.seize.it/safecopy In this example you can see the problem: the author is obliged to keep old code (data structures) to maintain compatibility. Even worth, you might be obliged to suffix your data structure with a version number:
data MyType_V1 = MyType_V1 Intdata MyType_V2 = MyType_V2 Integer
Instead, I'm thinking of a process using GIT, or Cabal as a back-end. The idea would be to have an additional program (or library) specialized in the data migration of your main software. It would extract both version A and B from the repo, and then would compile an application capable of handling migrations from FA to FB.
Does something like this exists (even outside of Haskell)?
Cheers, Corentin

(sorry, I think I responded only to Corentin the first time) Hello, Not sure if it completely solves your problem, but perhaps try Vinyl? (https://hackage.haskell.org/package/vinyl) Adding new fields without having to change the code that doesn't need them becomes much easier. Regards, Marcin

On Sat, Dec 20, 2014 at 3:29 AM, Corentin Dupont
Hi, I didn't had so much luck with this question. The question was: -> "How to avoid keeping old data formats in your code?" Indeed those old data formats are often needed for retro-compatibility with previous versions of your software.
Say you have a data structure that is serialized in a file, and then you add a field in a later version. You are obliged to keep both versions of the data structure if you want to be able to read both versions of the file. Potentially, if you update very often your structure between releases, you are obliged to keep N versions of the data structure in your code. See the exemple at: http://acid-state.seize.it/safecopy
I just keep both versions. It doesn't seem like too much hassle. I move the old version to the Serialize module so modules that aren't concerned with serialization aren't cluttered. In the fully general case, I think there's no way around it. But if you are just adding a field, or or changing a field to a super-type (e.g. 'a' becomes 'Maybe a') then you can trivially upgrade in place in the deserialization code. Or you can use something like protobufs, which supports that kind of thing automatically.

On Sun, Dec 21, 2014 at 1:50 AM, Evan Laforge
On Sat, Dec 20, 2014 at 3:29 AM, Corentin Dupont
wrote: Hi, I didn't had so much luck with this question. The question was: -> "How to avoid keeping old data formats in your code?" Indeed those old data formats are often needed for retro-compatibility with previous versions of your software.
Say you have a data structure that is serialized in a file, and then you add a field in a later version. You are obliged to keep both versions of the data structure if you want to be able to read both versions of the file. Potentially, if you update very often your structure between releases, you are obliged to keep N versions of the data structure in your code. See the exemple at: http://acid-state.seize.it/safecopy
I just keep both versions. It doesn't seem like too much hassle. I move the old version to the Serialize module so modules that aren't concerned with serialization aren't cluttered.
In the fully general case, I think there's no way around it. But if you are just adding a field, or or changing a field to a super-type (e.g. 'a' becomes 'Maybe a') then you can trivially upgrade in place in the deserialization code. Or you can use something like protobufs, which supports that kind of thing automatically.
What I'm thinking about is to put in the Setup.hs some code that will extract from GIT the older versions of the data structure. This way you are able to keep only the last version of it in the HEAD, and still be able to build migration code. Do you know of any such custom build?

I haven't heard of anything like that, it sounds too complicated for me.
You still need to convert the old format to the new one, so it seems like
you still need a bit of code for each version, which needs access to both
old and new.
On Dec 21, 2014 9:10 AM, "Corentin Dupont"
On Sun, Dec 21, 2014 at 1:50 AM, Evan Laforge
wrote: On Sat, Dec 20, 2014 at 3:29 AM, Corentin Dupont
wrote: Hi, I didn't had so much luck with this question. The question was: -> "How to avoid keeping old data formats in your code?" Indeed those old data formats are often needed for retro-compatibility with previous versions of your software.
Say you have a data structure that is serialized in a file, and then you add a field in a later version. You are obliged to keep both versions of the data structure if you want to be able to read both versions of the file. Potentially, if you update very often your structure between releases, you are obliged to keep N versions of the data structure in your code. See the exemple at: http://acid-state.seize.it/safecopy
I just keep both versions. It doesn't seem like too much hassle. I move the old version to the Serialize module so modules that aren't concerned with serialization aren't cluttered.
In the fully general case, I think there's no way around it. But if you are just adding a field, or or changing a field to a super-type (e.g. 'a' becomes 'Maybe a') then you can trivially upgrade in place in the deserialization code. Or you can use something like protobufs, which supports that kind of thing automatically.
What I'm thinking about is to put in the Setup.hs some code that will extract from GIT the older versions of the data structure. This way you are able to keep only the last version of it in the HEAD, and still be able to build migration code. Do you know of any such custom build?

Yeah, I agree with Evan, it does sound too complicated. I also have a SafeCopy module where I keep my SafeCopy instances and old versions of data types. On 21/12/14 07:29, Evan Laforge wrote:
I haven't heard of anything like that, it sounds too complicated for me. You still need to convert the old format to the new one, so it seems like you still need a bit of code for each version, which needs access to both old and new.
On Dec 21, 2014 9:10 AM, "Corentin Dupont"
mailto:corentin.dupont@gmail.com> wrote: On Sun, Dec 21, 2014 at 1:50 AM, Evan Laforge
mailto:qdunkan@gmail.com> wrote: On Sat, Dec 20, 2014 at 3:29 AM, Corentin Dupont
mailto:corentin.dupont@gmail.com> wrote: > Hi, > I didn't had so much luck with this question. The question was: > -> "How to avoid keeping old data formats in your code?" > Indeed those old data formats are often needed for retro-compatibility with > previous versions of your software. > > Say you have a data structure that is serialized in a file, and then you add > a field in a later version. You are obliged to keep both versions of the > data structure if you want to be able to read both versions of the file. > Potentially, if you update very often your structure between releases, you > are obliged to keep N versions of the data structure in your code. > See the exemple at: http://acid-state.seize.it/safecopy I just keep both versions. It doesn't seem like too much hassle. I move the old version to the Serialize module so modules that aren't concerned with serialization aren't cluttered.
In the fully general case, I think there's no way around it. But if you are just adding a field, or or changing a field to a super-type (e.g. 'a' becomes 'Maybe a') then you can trivially upgrade in place in the deserialization code. Or you can use something like protobufs, which supports that kind of thing automatically.
What I'm thinking about is to put in the Setup.hs some code that will extract from GIT the older versions of the data structure. This way you are able to keep only the last version of it in the HEAD, and still be able to build migration code. Do you know of any such custom build?
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
Corentin Dupont
-
Evan Laforge
-
Marcin Mrotek
-
Roman Cheplyaka