Final steps in GHC's Trac-to-GitLab migration

Hi everyone, Over the past few weeks we have been hard at work sorting out the last batch of issues in GHC's Trac-to-GitLab import [1]. At this point I believe we have sorted out the issues which are necessary to perform the final migration: * We are missing only two tickets (#1436 and #2074 which will require a bit of manual intervention to import due to extremely large description lengths) * A variety of markup issues have been resolved * More metadata is now preserved via labels. We may choose to reorganize or eliminate some of these labels in time but it's easier to remove metadata after import than it is to reintroduce it. The logic which maps Trac metadata to GitLab labels can be found here [2] * We now generate a Wiki table of contents [3] which is significantly more readable than GitLab's default page list. This will be updated by a cron job until underlying GitLab pages list becomes more readable. * We now generate redirects for Trac ticket and Wiki links (although this isn't visible in the staging instance) * Milestones are now properly closed when closed in Trac * Mapping between Trac and GitLab usernames is now a bit more robust As in previous test imports, we would appreciate it if you could have a look over the import and let us know of any problems your encounter. If no serious issues are identified with the staging site we plan to proceed with the migration this coming weekend. The current migration plan is to perform the final import on gitlab.haskell.org on Saturday, 9 March 2019. This will involve both gitlab.haskell.org and ghc.haskell.org being down for likely the entirety of the day Saturday and likely some of Sunday (EST time zone). Read-only access will be available to gitlab.staging.haskell.org for ticket lookup while the import is underway. After the import we will wait at least a week or so before we begin the process of decommissioning Trac, which will be kept in read-only mode for the duration. Do let me know if the 9 March timing is problematic. Cheers, - Ben [1] https://gitlab.staging.haskell.org/ghc/ghc [2] https://github.com/bgamari/trac-to-remarkup/blob/master/TicketImport.hs#L227 [3] https://gitlab.staging.haskell.org/ghc/ghc/wikis/index

This look great, thanks to everyone involved!
Some feedback:
- When I click to the "Wiki" link on the left it opens "Home" page and I don't
know how to go to the index from there. I think we may want index to be the
home page for the wiki?
- Redirects don't seem to work:
https://gitlab.staging.haskell.org/ghc/ghc/wikis/commentary/rts/heap-objects
- Comparing these two pages:
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects?red...
https://gitlab.staging.haskell.org/ghc/ghc/wikis/commentary/rts/storage/heap...
The Gitlab page doesn't have images that Trac page has. Secondly, the "_|_"
string used in the Trac page is migrated as italic "|" in Gitlab.
Ömer
Ben Gamari
Hi everyone,
Over the past few weeks we have been hard at work sorting out the last batch of issues in GHC's Trac-to-GitLab import [1]. At this point I believe we have sorted out the issues which are necessary to perform the final migration:
* We are missing only two tickets (#1436 and #2074 which will require a bit of manual intervention to import due to extremely large description lengths)
* A variety of markup issues have been resolved
* More metadata is now preserved via labels. We may choose to reorganize or eliminate some of these labels in time but it's easier to remove metadata after import than it is to reintroduce it. The logic which maps Trac metadata to GitLab labels can be found here [2]
* We now generate a Wiki table of contents [3] which is significantly more readable than GitLab's default page list. This will be updated by a cron job until underlying GitLab pages list becomes more readable.
* We now generate redirects for Trac ticket and Wiki links (although this isn't visible in the staging instance)
* Milestones are now properly closed when closed in Trac
* Mapping between Trac and GitLab usernames is now a bit more robust
As in previous test imports, we would appreciate it if you could have a look over the import and let us know of any problems your encounter.
If no serious issues are identified with the staging site we plan to proceed with the migration this coming weekend. The current migration plan is to perform the final import on gitlab.haskell.org on Saturday, 9 March 2019.
This will involve both gitlab.haskell.org and ghc.haskell.org being down for likely the entirety of the day Saturday and likely some of Sunday (EST time zone). Read-only access will be available to gitlab.staging.haskell.org for ticket lookup while the import is underway.
After the import we will wait at least a week or so before we begin the process of decommissioning Trac, which will be kept in read-only mode for the duration.
Do let me know if the 9 March timing is problematic.
Cheers,
- Ben
[1] https://gitlab.staging.haskell.org/ghc/ghc [2] https://github.com/bgamari/trac-to-remarkup/blob/master/TicketImport.hs#L227 [3] https://gitlab.staging.haskell.org/ghc/ghc/wikis/index _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Wed, Mar 06, 2019 at 09:32:44AM +0300, Ömer Sinan Ağacan wrote:
- Redirects don't seem to work: https://gitlab.staging.haskell.org/ghc/ghc/wikis/commentary/rts/heap-objects
I believe this is an unfortunate result of the way we migrate wiki pages. The way that works is that we don't actually parse the original Trac markup; instead, we scrape the rendered HTML directly from the live Trac instance, and massage that into GitLab markup. This has a few interesting consequences: 1. "Wiki processors", such as for example dynamically-generated TOCs and issue lists, get to run on the Trac instance as we request the page, and thus capture a snapshot of the dynamic data at the time of migration. 2. Redirects, being implemented as such wiki processors, cause client-side redirects, which our scraper will not follow. Hence, the converted page is based on an HTML page body that you don't normally get to see, and no actual redirect is generated on the GitLab side of things. 3. The scraper only looks at what is normally the actual page content; any additional UI generated outside of the main content element is ignored. Hence, when Trac generates links to the redirect target for clients that do not support client-side redirects, those links don't make it into the converted page. 4. Because redirects are usually the last thing to be added to a page, that page's history ends there, and becomes the "current" version on the GitLab side. So we end up with what you're seeing: a nonsensical page that contains the fallback content, a somewhat cryptic question asking whether it should redirect, and no way to answer that question. Since GitLab doesn't have an equivalent to those "wiki processors", and AFAIK does not cater for such redirects, the question is how we should handle these. I can think of several options: 1. Do nothing; when anyone complains, fix the offending pages manually (either by converting the useless redirect message into a proper hyperlink, or by manually adding a rewrite entry to the nginx configuration). 2. Generate a list of redirecting pages from the Trac dataset, either as part of the import (2a), or with some grep/sed/awk magic based on the converted git repo after the fact (2b); then use that list to generate suitable nginx redirects. 3. Extend the import script to detect redirects, and special-case those so that they render as proper links to the redirect target. 4. Do more research and see if there is a way to make GitLab redirect based on wiki content, then extend the import script like in step 3, but render redirecting pages to use the (currently hypothetical) redirect feature. Personally, I'm inclined to say let's go with option 2b: run the import, then grep for 'redirect(wiki:', and massage that into nginx redirects. TL;DR: the import currently ignores Trac wiki redirects, and I'm not sure what the best way is to deal with this.

The lacking redirect support is unfortunate. In my opinion this is something we will need to handle going forward as well; a one time solution like adding nginx redirects doesn't seem like the right approach to me.
I would rather advocate either option 3 or one of the following options:
5. Detect redirects and convert them to symbolic links in the repo
6. Request redirect support in the gitlab wiki.
On March 6, 2019 5:55:15 AM EST, Tobias Dammers
On Wed, Mar 06, 2019 at 09:32:44AM +0300, Ömer Sinan Ağacan wrote:
- Redirects don't seem to work:
https://gitlab.staging.haskell.org/ghc/ghc/wikis/commentary/rts/heap-objects
I believe this is an unfortunate result of the way we migrate wiki pages. The way that works is that we don't actually parse the original Trac markup; instead, we scrape the rendered HTML directly from the live Trac instance, and massage that into GitLab markup.
This has a few interesting consequences:
1. "Wiki processors", such as for example dynamically-generated TOCs and issue lists, get to run on the Trac instance as we request the page, and thus capture a snapshot of the dynamic data at the time of migration. 2. Redirects, being implemented as such wiki processors, cause client-side redirects, which our scraper will not follow. Hence, the converted page is based on an HTML page body that you don't normally get to see, and no actual redirect is generated on the GitLab side of things. 3. The scraper only looks at what is normally the actual page content; any additional UI generated outside of the main content element is ignored. Hence, when Trac generates links to the redirect target for clients that do not support client-side redirects, those links don't make it into the converted page. 4. Because redirects are usually the last thing to be added to a page, that page's history ends there, and becomes the "current" version on the GitLab side. So we end up with what you're seeing: a nonsensical page that contains the fallback content, a somewhat cryptic question asking whether it should redirect, and no way to answer that question.
Since GitLab doesn't have an equivalent to those "wiki processors", and AFAIK does not cater for such redirects, the question is how we should handle these. I can think of several options:
1. Do nothing; when anyone complains, fix the offending pages manually (either by converting the useless redirect message into a proper hyperlink, or by manually adding a rewrite entry to the nginx configuration). 2. Generate a list of redirecting pages from the Trac dataset, either as part of the import (2a), or with some grep/sed/awk magic based on the converted git repo after the fact (2b); then use that list to generate suitable nginx redirects. 3. Extend the import script to detect redirects, and special-case those so that they render as proper links to the redirect target. 4. Do more research and see if there is a way to make GitLab redirect based on wiki content, then extend the import script like in step 3, but render redirecting pages to use the (currently hypothetical) redirect feature.
Personally, I'm inclined to say let's go with option 2b: run the import, then grep for 'redirect(wiki:', and massage that into nginx redirects.
TL;DR: the import currently ignores Trac wiki redirects, and I'm not sure what the best way is to deal with this. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

On Wed, Mar 06, 2019 at 06:09:35AM -0500, Ben Gamari wrote:
The lacking redirect support is unfortunate. In my opinion this is something we will need to handle going forward as well; a one time solution like adding nginx redirects doesn't seem like the right approach to me.
I would rather advocate either option 3 or one of the following options:
5. Detect redirects and convert them to symbolic links in the repo 6. Request redirect support in the gitlab wiki.
OK, I'll see what I can do about option 3. Option 5 is something that I believe we can still do after the fact if need be. Option 6, I think, we should do anyway, because we will want that feature for future pages, and the solutions outlined so far only take care of existing pages.

For context: there is a total of 22 pages that use the redirect feature. So it may actually be feasible to just do this manually. On Wed, Mar 06, 2019 at 01:05:28PM +0100, Tobias Dammers wrote:
On Wed, Mar 06, 2019 at 06:09:35AM -0500, Ben Gamari wrote:
The lacking redirect support is unfortunate. In my opinion this is something we will need to handle going forward as well; a one time solution like adding nginx redirects doesn't seem like the right approach to me.
I would rather advocate either option 3 or one of the following options:
5. Detect redirects and convert them to symbolic links in the repo 6. Request redirect support in the gitlab wiki.
OK, I'll see what I can do about option 3. Option 5 is something that I believe we can still do after the fact if need be. Option 6, I think, we should do anyway, because we will want that feature for future pages, and the solutions outlined so far only take care of existing pages.
-- Tobias Dammers - tdammers@gmail.com

On March 6, 2019 1:32:44 AM EST, "Ömer Sinan Ağacan"
This look great, thanks to everyone involved!
Some feedback:
- When I click to the "Wiki" link on the left it opens "Home" page and I don't know how to go to the index from there. I think we may want index to be the home page for the wiki?
Yes, I do think we at least want to link to the index from the wiki home page.
- Redirects don't seem to work: https://gitlab.staging.haskell.org/ghc/ghc/wikis/commentary/rts/heap-objects
Yes this needs to be fixed. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

On Mar 6, 2019, at 1:32 AM, Ömer Sinan Ağacan
wrote: - Comparing these two pages:
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects?red... https://gitlab.staging.haskell.org/ghc/ghc/wikis/commentary/rts/storage/heap...
The Gitlab page doesn't have images that Trac page has. Secondly, the "_|_" string used in the Trac page is migrated as italic "|" in Gitlab.
The missing "images" (structure layout diagrams, ...) do make it difficult to follow the exposition. I do hope those are ultimately migrated. -- Viktor.

Super excited for this! Thank you to everyone whose put in so much hard work to get it done! One question: what is happening with the trac tickets mailing list? I imagine it’ll be going away, but for those of us that use it to keep track of things is there a recommended alternative? Best, _ara
On 6 Mar 2019, at 01:21, Ben Gamari
wrote: Hi everyone,
Over the past few weeks we have been hard at work sorting out the last batch of issues in GHC's Trac-to-GitLab import [1]. At this point I believe we have sorted out the issues which are necessary to perform the final migration:
* We are missing only two tickets (#1436 and #2074 which will require a bit of manual intervention to import due to extremely large description lengths)
* A variety of markup issues have been resolved
* More metadata is now preserved via labels. We may choose to reorganize or eliminate some of these labels in time but it's easier to remove metadata after import than it is to reintroduce it. The logic which maps Trac metadata to GitLab labels can be found here [2]
* We now generate a Wiki table of contents [3] which is significantly more readable than GitLab's default page list. This will be updated by a cron job until underlying GitLab pages list becomes more readable.
* We now generate redirects for Trac ticket and Wiki links (although this isn't visible in the staging instance)
* Milestones are now properly closed when closed in Trac
* Mapping between Trac and GitLab usernames is now a bit more robust
As in previous test imports, we would appreciate it if you could have a look over the import and let us know of any problems your encounter.
If no serious issues are identified with the staging site we plan to proceed with the migration this coming weekend. The current migration plan is to perform the final import on gitlab.haskell.org on Saturday, 9 March 2019.
This will involve both gitlab.haskell.org and ghc.haskell.org being down for likely the entirety of the day Saturday and likely some of Sunday (EST time zone). Read-only access will be available to gitlab.staging.haskell.org for ticket lookup while the import is underway.
After the import we will wait at least a week or so before we begin the process of decommissioning Trac, which will be kept in read-only mode for the duration.
Do let me know if the 9 March timing is problematic.
Cheers,
- Ben
[1] https://gitlab.staging.haskell.org/ghc/ghc [2] https://github.com/bgamari/trac-to-remarkup/blob/master/TicketImport.hs#L227 [3] https://gitlab.staging.haskell.org/ghc/ghc/wikis/index _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On March 6, 2019 6:11:49 AM EST, Ara Adkins
Super excited for this! Thank you to everyone whose put in so much hard work to get it done!
One question: what is happening with the trac tickets mailing list? I imagine it’ll be going away, but for those of us that use it to keep track of things is there a recommended alternative?
The ghc-commits list will continue to work. The ghc-tickets list is a good question. I suspect that under gitlab there will be less need for this list but we may still want to continue maintaining it regardless for continuity's sake. Thoughts? Cheers, - Ben
Best, _ara
On 6 Mar 2019, at 01:21, Ben Gamari
wrote: Hi everyone,
Over the past few weeks we have been hard at work sorting out the last batch of issues in GHC's Trac-to-GitLab import [1]. At this point I believe we have sorted out the issues which are necessary to perform the final migration:
* We are missing only two tickets (#1436 and #2074 which will require a bit of manual intervention to import due to extremely large description lengths)
* A variety of markup issues have been resolved
* More metadata is now preserved via labels. We may choose to reorganize or eliminate some of these labels in time but it's easier to remove metadata after import than it is to reintroduce it. The logic which maps Trac metadata to GitLab labels can be found here [2]
* We now generate a Wiki table of contents [3] which is significantly more readable than GitLab's default page list. This will be updated by a cron job until underlying GitLab pages list becomes more readable.
* We now generate redirects for Trac ticket and Wiki links (although this isn't visible in the staging instance)
* Milestones are now properly closed when closed in Trac
* Mapping between Trac and GitLab usernames is now a bit more robust
As in previous test imports, we would appreciate it if you could have a look over the import and let us know of any problems your encounter.
If no serious issues are identified with the staging site we plan to proceed with the migration this coming weekend. The current migration plan is to perform the final import on gitlab.haskell.org on Saturday, 9 March 2019.
This will involve both gitlab.haskell.org and ghc.haskell.org being down for likely the entirety of the day Saturday and likely some of Sunday (EST time zone). Read-only access will be available to gitlab.staging.haskell.org for ticket lookup while the import is underway.
After the import we will wait at least a week or so before we begin the process of decommissioning Trac, which will be kept in read-only mode for the duration.
Do let me know if the 9 March timing is problematic.
Cheers,
- Ben
[1] https://gitlab.staging.haskell.org/ghc/ghc [2] https://github.com/bgamari/trac-to-remarkup/blob/master/TicketImport.hs#L227 [3] https://gitlab.staging.haskell.org/ghc/ghc/wikis/index _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Personally I would like to see it continued, but it may not be worth the work if I’m in a minority here. A potential stopgap would be to ‘watch’ the GHC project on our gitlab instance, but I can’t see any way to decide to get emails for notifications rather than having to check in at GitLab all the time. _ara
On 6 Mar 2019, at 11:21, Ben Gamari
wrote: On March 6, 2019 6:11:49 AM EST, Ara Adkins
wrote: Super excited for this! Thank you to everyone whose put in so much hard work to get it done! One question: what is happening with the trac tickets mailing list? I imagine it’ll be going away, but for those of us that use it to keep track of things is there a recommended alternative?
The ghc-commits list will continue to work.
The ghc-tickets list is a good question. I suspect that under gitlab there will be less need for this list but we may still want to continue maintaining it regardless for continuity's sake. Thoughts?
Cheers,
- Ben
Best, _ara
On 6 Mar 2019, at 01:21, Ben Gamari
wrote: Hi everyone,
Over the past few weeks we have been hard at work sorting out the last batch of issues in GHC's Trac-to-GitLab import [1]. At this point I believe we have sorted out the issues which are necessary to perform the final migration:
* We are missing only two tickets (#1436 and #2074 which will require a bit of manual intervention to import due to extremely large description lengths)
* A variety of markup issues have been resolved
* More metadata is now preserved via labels. We may choose to reorganize or eliminate some of these labels in time but it's easier to remove metadata after import than it is to reintroduce it. The logic which maps Trac metadata to GitLab labels can be found here [2]
* We now generate a Wiki table of contents [3] which is significantly more readable than GitLab's default page list. This will be updated by a cron job until underlying GitLab pages list becomes more readable.
* We now generate redirects for Trac ticket and Wiki links (although this isn't visible in the staging instance)
* Milestones are now properly closed when closed in Trac
* Mapping between Trac and GitLab usernames is now a bit more robust
As in previous test imports, we would appreciate it if you could have a look over the import and let us know of any problems your encounter.
If no serious issues are identified with the staging site we plan to proceed with the migration this coming weekend. The current migration plan is to perform the final import on gitlab.haskell.org on Saturday, 9 March 2019.
This will involve both gitlab.haskell.org and ghc.haskell.org being down for likely the entirety of the day Saturday and likely some of Sunday (EST time zone). Read-only access will be available to gitlab.staging.haskell.org for ticket lookup while the import is underway.
After the import we will wait at least a week or so before we begin the process of decommissioning Trac, which will be kept in read-only mode for the duration.
Do let me know if the 9 March timing is problematic.
Cheers,
- Ben
[1] https://gitlab.staging.haskell.org/ghc/ghc [2] https://github.com/bgamari/trac-to-remarkup/blob/master/TicketImport.hs#L227 [3] https://gitlab.staging.haskell.org/ghc/ghc/wikis/index _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
participants (5)
-
Ara Adkins
-
Ben Gamari
-
Tobias Dammers
-
Viktor Dukhovni
-
Ömer Sinan Ağacan