Coupling & Crosstalk: Trust your Paranoia!

Coupling & Crosstalk is my column in the MEPTEC Report. This column appears in the Winter 2019 edition on pages 11-12.

Electronic coupling is the transfer of energy from one circuit or medium to another. Sometimes it is intentional and sometimes not (crosstalk). I hope that this column, by mixing technology and general observations, is thought provoking and “couples” with your thinking. Most of the time I will stick to technology but occasional crosstalk diversions may deliver a message closer to home.

Trust Your Paranoia!

President Ronald Reagan’s use of the Russian proverb “Doveryai, no proveryai was the perfect soundbite to describe the 1987 Intermediate-Range Nuclear Forces Treaty. What does this and Andy Grove’s “only the paranoid survive” have to do with semiconductors?

Trust, but verify” appears to be a nonsensical phrase since verification implies a lack of trust. However, it accurately describes how models of trust work. The extensive process of verification built into the treaty basically meant there was no trust between the United States and the former Soviet Union. But, as each side performed their commitments in a verifiable manner the treaty worked.

Most people trust something until it no longer works. We hop in our car trusting that the brakes will stop the vehicle when needed. Did you take a moment to inspect the brake pads before driving off? Of course not, unless you already suspected a problem. Did you tell a colleague or friend a ‘secret’? Will you continue to share with them? Yes, until that trust is broken and something untoward happens. Cases of “blind trust” rarely end well…

A good engineer, by nature and practice, tends to be skeptical. Usually a project needs to be successful the first time or the cost of failure will be too high. We double-check, we cross-check, and sometimes triple-check our process assumptions as well as the calculations and data. The cost of correcting a small error on a semiconductor mask set can run into the millions of dollars. Some endeavors, like space flight, combine the need for first time success without any failures due to the lives at stake and enormous cost.

My son does not fully comprehend why I insist on checking his math homework with him before he is ‘done’. Like most high school students, he thinks he is done after he quickly writes out the answers in his notebook but before we go through to check… He hasn’t grasped the value in reinforcing his mastery of the material as he attempts to explain his steps (especially those he failed to write out) nor finding simple calculation errors before turning in his homework. Asking him to detail out all the steps of his calculations is our family’s current “trust but verify”.

Semiconductor design using modern electronic design automation (EDA) software has extensive process checks to make sure the design is correct. As the process nodes have shrunk to ever smaller sizes, not only has the non-recurring engineering (NRE) costs increased, but the complexity of the physical process also has increased, resulting in more opportunities for design errors and fabrication yield loss. Most modern integrated circuits (ICs) are designed using a high-level description (HDL), examples include VHDL and Verilog, which provides a ‘logical’ language (think software code) to describe the desired functionality. The design in HDL can be simulated and analyzed to verify the desired operation. At some point the HDL is expressed as an electrical circuit schematic which is also checked and simulated. This circuit schematic is then translated into a physical layout, i.e. the patterns and shapes of the traces to be processed on the IC.

One of the last checks performed before the patterns are written on the masks is “physical verification”. At this stage the mask patterns are checked against design rules (minimum feature size, minimum clearances, etc.), electrical checks are performed, and layout versus schematic (LVS) is run. LVS calculates the schematic of the patterns – as generated – and compares them against the input schematics. This makes sure the EDA tools did not create an error by adding extra circuitry, neglecting part of the circuitry, or misinterpreting the circuitry. Yes, we trust the tools to operate correctly but it is essential to verify the output to make sure there are no errors. Each and every discrepancy needs to be checked carefully to make sure there are no systemic errors or fatal mistakes. Only after the design team is fully satisfied that everything is correct, does the mask data get sent off to generate the physical glass masks.

At the moment the EDA data becomes physical masks is where nature intrudes and physical variations become the difference between failure and success. Just like a film negative, an error in generating the mask or contamination on the mask will cause defects in every IC patterned. Needless to say, mask fabricators very carefully measure and inspect the masks to insure they match the submitted data.

In the actual IC fabrication process itself there are systemic (process “bias”) and random defects that will impact the yield of the individual devices. Hence the need for extensive quality control including “metrology” (measurement) and testing to separate the defective devices from those presumed good. I’ve previously discussed the need for Known Good Die (KGD) and I hope you will join us as MEPTEC discusses the need and ever present challenges at the upcoming KGD Workshop on April 21, 2020.

In addition to a method for finding the good die, what is needed in terms of trust? Beyond the random variations that cause die to fail, how does one ensure there are no ‘bad actors’ in one’s supply chain? Jeff Demmin (Booz Allen Hamilton and MEPTEC Advisory Board member), touched upon this in his MEPTEC-IMAPS Semiconductor Industry Speaker Series luncheon presentation on September 11, 2019.

Today’s distributed global supply chain typically has over a dozen entities ‘touching’ an IC between fabrication and assembly into an electronic sub-system. Most of these companies are located outside of the US. And there are increasing cases of counterfeit ICs being found due to the economic incentives. With the IC itself it is very difficult and costly to confirm that the actual circuits match those of the design and do not contain changes or extraneous circuitry. Unfortunately, reverse engineering an IC is a destructive process. So, one can never say with 100% certainty that a specific IC being used has not been tampered with.

Combined with no longer having domestic companies that are ‘trusted’ semiconductor foundries available to build complete devices (especially at advanced process nodes) this creates a challenge for defense related applications. In addition to developing methods for detecting the counterfeiting of or tampering with packaged devices, the Defense Advanced Research Projects Agency (DARPA) is driving the development of Heterogenous Integration (HI) based solutions. Their approach is to use HI ‘chiplets’ developed and fabricated in secure and trusted facilities for the most ‘sensitive’ functions. These chiplets can then be integrated with more common building blocks available from the commercial market with less concern as to the level of trust.

So, regardless of one’s level of paranoia, processes (as always) need to be established to ensure one’s supply chain’s quality and security. And the roots of the paranoia expressed by former CEO and Intel co-founder Andy Grove? His biggest concern was that of missing strategic inflection points (SIP) where the fundamentals of the business or available technology have shifted. If a company did not see and respond to these SIP, they may be passed by competitors or left out of a market entirely. Thus, his focus on manufacturing and quality (product and process).

Mr. Grove realized that existing teams have a very large bias towards the status quo. He often used outsiders (other divisions / business units, consultants, etc.) to review things from a fresh perspective. This is similar to having a separate team do the design verification and physical verification of a chip design. Like most audit processes, an outsider may find details overlooked by the those involved in the day-to-day activities.

Bottom line? For any important process, even though you may trust that it is being done ‘correctly’, you must establish an independent ‘audit’ or checking process. Be paranoid and trust, but verify. And do not hesitate to ask for ‘outside’ help!

As always, I look forward to hearing your comments directly. Please contact me to discuss your thoughts or if I can be of any assistance.