Discussions‎ > ‎

Y2K and other date/time issues?

MTS wasn't in use for production when the world reached the Y2K deadline.

I remember hearing that when they were shutting down one of their MTS systems for the last time, the folks at UBC set the TOD clock ahead to sometime in the 21st century.  What I don't remember is what the outcome of that experiment was.  Can someone from UBC provide the exciting conclusion to the story?
 
And I think Mike Alexander mentioned recently that he had to reassemble some component or other due to a date/time issue when he was running MTS under Hercules. It might have been the Tape routines.  Mike, can you confirm that?

Are there other MTS related date/time issues that people remember?


<< Previous  Next >>


1.3 Halfword overflow

posted Aug 20, 2013, 1:19 PM by David Lee

What a coincidence!  Just a few days ago at my post-Durham workplace (ECMWF, Berkshire) I was chatting to someone about this, having not mentioned it for years.

I, too, remember that incident.  This cropped up simultaneously at NCL and DUR.  I think the person who figured it out was Mike Ellison at Durham, who, as I recall, had called in sick that morning, but came in to address it and successfully diagnosed it, despite looking very poorly.  I think there was similar activity at NCL and some sort of similar difficulty of absence of the key personnel to address this, although I think there was someone (I forget who) who set to work on it at NCL.

I, too, recall that this "travelled with the Sun", so it hit us (the UK) first, then UMICH (and presumably RPI, etc.) then finally UBC/SFU.  (A pity that the MTS community hadn't had that prospective site in Yugoslavia!)

Of course, seen in retrospect from today, this makes MTS look like the Windows of its time, not like the UNIX of its time.  (Boo!  Hiss!  I hear you cry.)  Why?  Because MTS, like the Windows of yesteryear, did its timekeeping via local time, not by setting its clock to UTC and applying a timezone offset.  Had MTS been like UNIX, then we would all have been in it together...

4. Dave Mills, the Network Time Protocol (NTP), its timescale and chronometry

posted Sep 20, 2010, 7:54 AM by Jeff Ogden   [ updated Jun 4, 2014, 7:02 AM ]

Dave Mills worked at the UM Computing Center in the late 1960s and early 1970s where he developed the PDP-8 based Data Concentrator and what was almost certainly the first non-IBM implementation of a S/360 control unit to I/O channel interface. After Dave left UM he went on to do many important things related to satellite and data communication and what would become today's Internet. Among his work is the design of the Network Time Protocol (NTP). NTP is used by pretty much every computer in the world that is connected to the Internet. In 2008 Dave was elected to the National Academy of Engineering for "contributions to Internet timekeeping and the development of the Network Time Protocol".

A Wikipedia article gives more information about Dave.

There is also a bio sketch for Dave in the People section of this web site.

Starting with the second NTP RFC the RFCs include a fascinating collection of information on time and dates that goes well beyond what is strictly necessary for the implementation of an Internet protocol. The information has been edited and reworked in each of the three NTP RFCs:

RFC 958 - Network Time Protocol (NTP), September 1985
    Does not include separate sections with timescale and chronometry information
RFC 1059 - Network Time Protocol (Version 1), July 1988
    See section "2.3 Time Scales"
RFC 1119 - Network Time Protocol (Version 2), September 1989
    See sections "2.3. The NTP Timescale", "2.4. The NTP Calendar", and "2.5. Time and Frequency Dissemination"
RFC 1305 - Network Time Protocol (Version 3), March 1992
    See "Appendix E. The NTP Timescale and its Chronometry"

And Dave has written a book:

Computer Network Time Synchronization: the Network Time Protocol, CRC Press 2006, 304 pp.
From Dave's description of his book:
Chapter 13 describes how we reckon the time according to the stars and atoms. It explains the relationships between the international timescales TAI, UTC and JDN dear to physicists and navigators and the NTP timescale. If we use NTP for historic and future dating, there are issues of rollover and precision. Even the calendar gets in the act, as the astronomers have their ways and the historians theirs. Since the topic of history comes up, Chapter 15 reveals the events of historic interest since computer network timekeeping started over two decades ago.


<< Previous  Next >>

3. Simulating Y2K at UBC

posted Sep 20, 2010, 6:53 AM by Jeff Ogden   [ updated Sep 29, 2010, 7:32 AM ]

I received this e-mail from Doug Wade at UBC:

From: Doug Wade <@UBC>
Date: September 20, 2010 2:35:17 AM EDT
To: Jeff Ogden
Subject: MTS - I need help

Hey Jeff

Your MTS contributions to both Wikipedia and the MTS archive are really cool. I've worked at UBC as a "computer operator" since 1981. Except for Dennis O'Reilly that makes both me and Dean Main the dinosaurs of UBC IT (Computing Centre -> Computing Services -> IT Services -> UBC IT). I'd love to help you out with the MTS stuff.

. . .

4) You asked about the final IPL of MTS at UBC. I was there. We put it into the future (Y2K + a few years). My memory is nothing in the system broke and we were quite surprised. I had asked about doing that previous to the final shutdown and was told "dont you dare" because it might break things to the point  that the system would never come up come up again if needed,

. . .

Keep in touch and any way I can help is no problem. BTW, I would die and go to heaven if I ever managed to get a version of MTS running on my iMac!

Cheers
Doug Wade


1.2. Halfword overflow (continued) - something semiofficial

posted Sep 15, 2010, 5:45 PM by Jeff Ogden   [ updated Dec 25, 2011, 9:15 PM ]

I found the following on the "Anecdotes" page of Josh Simon's Web site and the change log looks pretty real. Unlike some of the other materials about MTS at this site these items are not from the 13 May 1996 issue of UM's IT Digest (the "goodbye to MTS issue").

Problems with the date

In November of 1989, a minor itsy-bitsy bug was discovered in the MTS code. Nothing you'd call major, really. Seems that the United Kingdom-based MTS sites were having all sorts of file system-related problems. Luckily for us in the United States, we had 5 hours before it became midnight locally. The problem was that some of the file system code used an unsigned half-word integer (16 bits) to store the number of days since zero time (March 1, 1900). Unfortunately, the rest of the file system code used a signed half-word integer (15 bits data, 1 bit sign) — and when it became the 32,768th day after zero time, the sign bit flipped and parts of the system thought files were stamped as being created or modified 32,767 days in the future. MTS didn't like this concept, so it caused all sorts of system problems. (The change log comments are available.)

The systems programmers hurriedly patched the file system code to use unsigned half-word integers consistently, recompiled the operating system, and provided patches to the various MTS Consortium sites. (Hewlett-Packard was using a previous version of MTS — Distribution 5.1 instead of the then-current Distribution 6.0 — at one of their sites. We provided them with a binary-only version of the patch and informed them not to trust any previous backups of the operating system.)

Of course, as the senior programmer noted on the systems programmers' mailing list, this solution will only work until the 65535th day after zero time (which maps out to some time in 2061). His comment was that if anyone was still running what would in effect be a century-old operating system then that they got what they deserved. And besides, by 2061, all of the then-current systems programmers would be retired or deceased, so they really didn't much care. (Shades of the Year 2000 problem, huh?)

And the change log:

Change Log

3:15pm 16 November 1989
Problems with the file system, notably $PERMIT and $FILESTATUS, on every system that uses MTS, made emergency reloads a requirement. The UB system was reloaded at 1:53pm and the UM system at 1:25pm.
4:47pm 16 November 1989
More information from the systems staff:

The down-time on the UM and UB systems was due to a bug that was exposed at midnight of November 16, 1989, which was the 32768th day after Mar 1, 1900. The file system uses halfwords to store the number of days since Mar. 1, 1900 for lastref, lastcat, and credat.

A halfword can be used to store values from -32768 to 32767. or from 0 to 65535. Parts of the system assumed the first value range, which caused various other parts of the system to PGNT, or use an incorrect value for evaluating how old a file is. This caused the $PERMIT PGNT, and the problems with HASPLOG and CMDSTAT. $FILESTATUS and $DUPLICATE also suffered from this bug.

So, my take on this is that MTS didn't crash, but the problems were serious enough to require an unscheduled shutdown and reload, which is pretty close to a crash.  And it doesn't appear that UM took advantage of the advanced warning from the UK to avoid the need for an unscheduled shutdown. The sites in Canada had a couple or three more hours of warning, I wonder if it was enough to help?

2. Setting the TOD clock (incorrectly)

posted Sep 14, 2010, 12:17 PM by Jeff Ogden   [ updated Sep 29, 2010, 7:30 AM ]

More items from Risks.

From the Risks Digest, Volume 17, Number 19, 19 June 1995:

Re: Multo ante natus eram [free translation by translation-guide.com: much before to be born was]

Mike Alexander
Thu, 15 Jun 1995 19:45:13 -0400

I, too, am glad to see that Multics is still used. It is a system that was far ahead of its time in many respects.

In MTS (the Michigan Terminal System), a system contemporaneous with Multics which is also still in use, we solved the problem of the operators entering a bad time in a slightly different way. During initialization, the system compares the current time with the time in the last billing record recorded. If the current time is earlier or too much later (more than 12 hours, unless the day is Sunday in which case 18 hours is ok) it complains and asks the operators to confirm that the time is ok. This has several advantages: it doesn't use hard-coded dates, it is a more precise check, and it never makes the system unusable. Of course this has become less important as modern machines maintain the time of day even when not running and the clock rarely needs to be set at all.

Mike Alexander, Univ. of Michigan

And an earlier note to Risks, Volume 17, Number 18, 15 June 1995 that Mike was probably responding to:

Re: Multo ante natus eram

"Bernard S. Greenberg"
8 Jun 1995 22:45:59 GMT
[A woodka tonic forwarded to RISKS by Donna Woodka, who probably knows my penchant (or even pun-chant) for Multics tales. Thanks to Bernard for having fortifived our archives and providing evidence that Multics still lives! PGN]
Ward Anderson at ACTC just reported an interesting crash on Multics (10.2) at ACTC -- Collection 1 initialization discovered that I became 45 years old Tuesday past, an event which was extremely unlikely, and crashed the system before the clock did damage to the file system, or so it feared.

The code in scs_and_clock_init is perfectly clear - the time "06/06/95 18:31 est Tuesday" is hard-coded in, in characters, with the comment that it is "Bernard S. Greenberg's 45th birthday". It has been there for twenty years in plain text visible to anyone reading the code! (I loved to read code in my day, especially initialization - perhaps I was the last?)

Maybe Tom Van Vleck remembers, but it is extremely likely that twenty years ago at CISL our operator at the time for the nth and last time forgot to set the clock, or set it poorly, and damaged the file system (which looks quite askance on "back to the future" jaunts), and Tom and I said "This has to end. We have to put a gullibility check in the clock init code", and I did this. Probably saved a lot of file system damage over the years. If I had it to do over again, I'd do it over again! This code did the -right-thing-!

At 25, I could not imagine I'd ever be 45, let alone that scs_and_clock_init.pl1 would be there along with me! Somehow, though, 65 doesn't seem that far away any more...

As Ward said, this is a -real- Multics story.

Bernie



1.1. Halfword overflow (continued) – Brian Randell's initial post to Risks

posted Sep 14, 2010, 11:47 AM by Jeff Ogden   [ updated Dec 25, 2011, 9:17 PM ]

I found the following.  I think it is Brian Randell's initial post to Risks about this event. It is still hard to figure out what, if anything, crashed and if the crash or other problem actually occurred on both sides of the Atlantic or if the warning from the UK came soon enough to save the rest of us some embarrassment. The post talks about an "unexpected system shutdown" and a "bug", but doesn't use the word "crash".

From the Risks Digest, Volume 9, Number 45, 20 November 1989:

Another foretaste of the Millenium

Brian Randell <Brian.Randell@newcastle.ac.uk>
Fri, 17 Nov 89 9:17:33 BST
We apologise for the unexpected system shutdown today (Thursday).  This was
caused by a bug in the MTS system that was a "time-bomb" in all senses of the
word. It was triggered by today's date, 16th November 1989.

This date is specially significant. Dates within the file system are stored as
half-word (16 bit) values which are the number of days since the 1st March
1900. The value of today's date is 32,768 decimal (X'8000' hexadecimal). This
number is exactly 1 more than the largest positive integer that can be stored
in a half-word (the left-most bit is the sign bit). As a result, various range
checks that are performed on these dates began to fail when the date reached
this value.

The problem has a particular interest because all the MTS sites world-wide are
similarly affected. Durham and Newcastle were the first to experience the bug
because of time zone differences and we were the first to fix it.  The American
and Canadian MTS installations are some 4 to 8 hours behind us so the
opportunity to be the first MTS site to fix such a serious problem has been
some consolation.  The work was done by our MTS specialist who struggled in
from his sick bed to have just that satisfaction!
Does anyone remember who the "MTS specialist" was?

And a little more from the next Risks digest, Volume 9, Number 46, 22 November 1986. PNG is the risks moderator Peter G. Neumann.

Another Foretaste of the Millenium? (RISKS-9.45, corrigenda)

Brian Randell <Brian.Randell@newcastle.ac.uk>
Tue, 21 Nov 89 10:12:20 BST
   [Brian sent me two versions of the MTS saga, part of one of which ran in
   RISKS-9.45 -- but without the explanation indicating that the MTS message
   was not from Brian but rather from someone else.  The surrounding text is
   given below, in case anyone thought that the "We apologise ..." message
   was originally Brian's.  I apology to Brian in case anyone was misled.  PGN]

The university computing service here runs MTS (the Michigan Terminal System)
on an Amdahl mainframe, which crashed mysteriously today, as did various other
MTS sites in North America, some time later. The explanation is given in the
following message which I have just received from one of the systems
programmers here.

> We apologise for the unexpected system shutdown ... [see RISKS.9-45 for text.]

I hadn't realised that there was this disadvantage to living on this side of
the Atlantic! Ah, well, it makes up for various advantages :-)

Brian Randell

This note does use the word "crash" and says that there were problems on both sides of the Atlantic.

1.0. Halfword overflow?

posted Sep 14, 2010, 7:27 AM by Jeff Ogden   [ updated Sep 29, 2010, 7:24 AM ]

It wasn't a Y2K issue exactly, but one MTS date and time issue that is widely mentioned on the Web had to do with a halfword integer overflow first encountered by the folks at NUMAC.

From Computer-Related Risks: Excerpts on Computer Calendar-Clock Problems, Peter G. Neumann, Computer Science Laboratory, SRI International:

Overflows. The number 32,768 = 215 has caused all sorts of grief that resulted from the overflow of a 16-bit word. ...  Brian Randell reported that the University of Newcastle upon Tyne, England, had a Michigan Terminal System (MTS) that crashed on 1989 Nov 16, 215 days after 1900 Mar 01. Five hours later, MTS installations on the U.S. east coast died, and so on across the country, an example of a genuine (but unintentional) distributed time bomb.

I'd left the UM Computing Center for Arbortext when this took place, but I heard about it. The story I remember was a little different. It was that the five to eight hour time difference between Newcastle and sites in North America allowed the Newcastle folks to spread the word and get the North American MTS systems patched in time to avoid the problem. Is my memory any good or is this just wishful thinking on my part?

Tony Young included this ps on a note that he sent me back in August:
I'm sure MTS anecdotes are totally inappropriate to your article, but do you remember the 31-bit MTS date overflow problem - perhaps one of the few advantages of having MTS systems in England who hit the problem some 5(6?) hrs earlier and were able to give early warning?

Anecdotes may have been inappropriate for the Wikipedia article, but they are just the thing for this web site.

And George Helffrich sent me this note yesterday (13Sep2010):
I don't think the system crashed; I think it was *FILESAVE that was unavailable due to 16 bit integer days in the directory of file versions saved.  It would be interesting to unearth newsletters to see what actually happened.  Viktors Berstis would probably remember; he wrote the original code, though had left for IBM by then.

There are copies of the MTS Newsletters at UM's Bentley Historical Library, so I can check them there. But I'm guessing that by 1989 this would have been recorded in CONFER, *FORUM, or sent via e-mail and not included in the paper newsletter, if we were still publishing the paper version in 1989.

Does anyone know where we can find Brian Randell's initial report?  Or does anyone remember the details of this event?


1-7 of 7