Is High Math Software Accurate?

Math functions for advanced mathematics often come in only two software packages per function, I’m told. That’s living close to the edge. If any highly complex important functions are available only in proprietary software, that’s even riskier, even edgier. Massive volumes of peer-reviewed research would thus depend on edgy calculations that may not get rerun.


The most advanced mathematics is very advanced, indeed. No one alive understands all of the branches of math at their leading edges. It has been about a century since any one individual could. Mathematicians have gotten specialized. You don’t ask a foot doctor about your ears. If a research paper depends on math that few understand, and if a referee typically spends an hour on a paper (they’re expert so an hour is enough for a lot of work and maybe it’s more in math), they’re unlikely to carry out the author’s recipe. Journal editors may spend even less time on confirming articles they accept. One mathematician spent seven years developing one proof. Granted, he provided a blueprint; still, no one could inspect it in an hour. Some academics took his word (albeit legitimately, under the circumstances).

And high-end number-crunching is in demand. If your research is into gray holes in space, you’d rather have machines do your numerical drudgery for you. They’re faster, by a lot. You have other research to do, like whether you can retrieve your glasses that you dropped into a gray hole. So, having a long research agenda and little time, you start trusting the programs in those machines. But the trust has to be delicate.

Doubtless, all the trusted programs have been examined by qualified mathematicians. But how thoroughly can they have been, given time limits, budget constraints, proprietors’ trade secrecy, and the river of discoveries of other bugs in software? Do mathematicians who understand a very high level of math also have abilities in reading computer languages? If not, will they understand at least pseudocode? Reading specifications won’t be enough and if they’re not fluent in pseudocode or source code then someone else has to be and splitting the proofreading between two people in that way, one who’s weak in computer code and the other who’s not so good at math, makes errors likely.

Deliberate error preservation is not the issue. None of these programs will claim that pi equals exactly 3. I don’t have to check that; no one would be stupid enough to do that and no one would get away with it. Nor will anyone make a management decision to oversimplify a hard formula because it would cost too much money to program it in. No; I think management will demand the best quality of math for whatever parts of math they include in the software. But a manager may not allocate enough time to a mathematician, perhaps a student or recent graduate without a large reputation (yet), to write a formula in a form programmers can code. The mathematician may then rush too much and an error may slip through. And if the mathematician doesn’t catch it, no one else will, because in an organization it’s typical for each department to avoid challenging other departments’ work. Complaints may come in, but if the complainant can’t directly see the math or the source code then the complaint is almost certainly vague or based on circumstantial evidence, either of which makes it easier to ignore the complaint, which, at any rate, may not get into the hands of a high-end mathematician, but of someone with other qualifications. If outsiders can’t verify the software, complaints probably can’t be verified either, and complaints will likely get less weight. If complaints being public is embarrassing, the file will be kept private. But that doesn’t mean errors don’t exist. We don’t judge the quality of anything else important by saying that we didn’t hear of any complaints so it must be okay; we often evaluate it critically anyway. Surely, some scientists and technologists using this kind of leading-edge math program have done their best to perform those evaluations, but, I argue, they didn’t have enough tools for a thoroughgoing quality inspection on vital parts. And they still don’t.

Whenever a major new technology is introduced, various programs need revising. Whenever a complicated thing gets revised, errors get introduced. Your DNA mutates and your body has to repair it so you don’t die, and DNA has been around for millions of years. Microsoft Windows has had major overhauls every few years. Is it plausible that a program that might be stuck with a label misalignment or a zero-day security gap would never be wrong in mathematics?

Excel was found sometimes to display wrong math results and users have always depended on Excel for the math. It got answers wrong for numbers with many decimal places and most customers need only two places or less, so it usually didn’t matter, but still, where it does matter, an error is an issue.

Advanced programs generally don’t use the number-of-places technology that ordinary programs use, so they avoid that problem, but that does not make them immune to other math problems. Excel has been made for decades by a big and well-funded company that knows how to check software for quality, and that company still missed some of it. That same company missed that an entire flight simulator program had been embedded into Excel by its own people until company management got a bunch of thank-you letters and the company investigated, but a flight simulator, while it’s not math and it’s not critical for why you’d want Excel, is still a big thing to miss, and the company missed it.

Peering in Every Inch

Math quality assurance testers for software producers would likely check for ordinary cases and for some cases around the boundaries of what has to be calculated. That is where vulnerabilities tend to be found. But real-world usage, especially complicated usage, over years, likely uncovers more boundary cases that haven't been examined.

Even that would be tolerable if the user could check it themself against other math programs or if outside programmers could peek inside the programming code. But, often, they can’t.

If a given advanced math function is available only in one or two computer programs, checking it against some other way to calculate that function is, if not impossible, awfully difficult. If it’s in two programs, checking one against the other may be inadequate; and, if it’s in only one program, even that possibility is unavailable. Whatever the other way to calculate might be will be too advanced for an abacus. Maybe you could do it with pencil and paper, but it would need a prodigious amount of paper and a matching truckload of pencils and dinners. This becomes like the problem of counting approximately one million pennies that are in a bathtub. (Have two bathtubs side by side with closed drains, fill up one tub with the pennies (loose) and leave the other empty, count a dozen or so pennies from the full one and toss them into the empty one, write the number down, repeat, and keep a running tally until the once-full tub is empty. Do it again. You’re a human being, so you won’t get the same number.) Maybe an advanced function can be reduced to simple arithmetic, but it will be an astronomical amount of arithmetic and who can be sure of not making a mistake somewhere in the reams of paper you’d need? And never mind the time. And never mind doing it a few times with different input data, since you’re testing.

Computer math is a breed unto itself. That’s because, while the brain has certain limits, computers have different limits. Solving 1 + 1 + 1 = is too complicated for a computer at the fundamental hardware level. Software must preprocess it into bite-size chunks (no, not byte-size chunks) and combine everything to get the final answer. Not only at the hardware level where registers are located, but even at higher levels must adjustments be made. One spreadsheet had a limit of 250 characters in a formula, so a longer formula had to be cut into at least two with one formula feeding its answer into another. Dividing formulae into a chain takes thought.

Then there’s the legal side and how much help you can get. There’s a reason these proprietary programs are mysterious little black boxes.

License restrictions on closed-source software may forbid reverse engineering, decompiling, disassembly, benchmarking, and perhaps other techniques for taking an X ray of program internals. License is a dull word, but it covers that the software is under copyright or patent, and often both, and can’t be copied without a license, and, unless it’s copied, it can’t be used on most machines, or perhaps any machines. For some people, that’s barrier enough. Some other people, hacking around and probably violating license restrictions (something institutions frown on if their own people plan to do it and that’s where many high-end math programs get used), will look at the coding. But the community of hackers who will bounce ideas and analyses around to refine their thinking and figure out solutions, when software proprietors are trying to silence them, is a small community of hackers, some of whom have other interests.

Open-source software (OSS) welcomes that scrutiny. All that’s needed is someone wanting to put time into it. Then they can do the patchwork, or tell someone else about their discovery and someone else can write a patch, if needed. Or maybe they’ll confirm its accuracy the way it is, no repair being needed. Some hacker communities are large and some are transparent, with Bugzilla software (and similar) for reporting possible bugs and feature requests so that anyone else can sort through the reports for interesting tidbits to work on. Where the communities are small, patches are rarer and slower to come and maybe less tested and lower-priority work doesn’t get done at all.

The contrary argument starts with closed-source software selling at higher prices than does open-source software, especially since OSS is often free. The income to proprietors can be used to finance development, including better validation, just like some OSS development is being financed now.

But that does not address the weakness inherent in closed-source software, that independent researchers are not permitted to peel apart the software to find flaws. Independence is needed among some participants in the process, because those who are dependent on the software maker for their income tend to adhere to the maker’s business decisions about priorities even when contrary to customers’ interests or good programming practices. The dependents have access to expertise and institutional memory, but we can benefit from the net contributions of both dependents and independents.

Moving to Clarity

So, the solution may be to replace all of the closed-source software math modules with open-source versions, so that every high-level mathematical function is available in open-source routines.

Even combining is possible. Major programs usually have non-math features that can be proprietary without interfering with the math. Open-source math components can be integrated with closed source code that does other things, like providing multiplatform compatibility, input-output flexibility, and a choice of user interfaces. That integration should be explicit, so a scholar or technician can verify the math methods encapsulated in open-source white boxes. It’s okay to lock down the open-source modules inside closed-source frames, as long as the open source code can be traced to providers where it can be modified, with modifications vetted for accuracy before being used to update the proprietary packages. Red Hat and Micro Focus both support Linux operating systems in both free and paid-for forms and other companies can have hybrid business models.

One hopes that no science is slipping on bad numbers. We deserve something better than hope, before we move out of pure theory and try to build a rocket or a disease cure and discover a house of cards.