Resurrecting a dead MacBook Pro (mid-2012 13-inch, model A1278)

As seen on Hackaday!

A couple weeks ago, I picked up a dead MacBook Pro that was on its way to the recycle bin, and was curious as to whether I would be able to fix it. It had a note attached to it citing several issues with the computer: the display doesn’t work, the battery doesn’t charge, one of the USB ports doesn’t work, and it won’t load an operating system. It certainly didn’t look particularly promising, but I felt it would be a good way to test my skills in component-level repair – with a pretty nice prize if I succeeded.

Triage

The computer I picked up is a mid-2012 MacBook Pro by Apple; it is the A1278 model with a logic board number of 820-3115-B, and it comes with an i7-3520M CPU and 8 GB of DDR3 RAM – however, the hard drive was taken out of the computer by the time I received it. As previously noted, the computer had a laundry list of issues that were certainly the reason the original owner decided to discard their computer – a laptop that doesn’t boot nor have a display isn’t a particularly useful one.

Connecting a MagSafe AC adapter to the computer revealed even more issues: even though the unit was already noted that it wouldn’t charge, I noticed there was no LED indicator on the power adapter’s plug, and the computer wouldn’t power on, even with external power connected; the only sign of life was one of the LED level indicators rapidly flashing when I pressed the button. With this functionality test being unsuccessful, I decided to open up the computer to see what else was wrong…

Troubleshooting & Diagnosis

Unscrewing the bottom cover revealed what horrors the computer had experienced. There was clear evidence that it had suffered from liquid damage: rampant corrosion around the LCD connector and some of the power circuitry, and some of the corrosion deposits were even left on the computer’s bottom cover! If you watch Louis Rossmann’s videos, you would know that liquid damage rarely is an easy fix, especially when high-voltage LED backlight circuitry gets involved.

Liberal use of a 70% isopropyl alcohol solution and a brush was able to scrub away all the corrosion on the computer’s logic board, and the results were not pretty:

Many PCB test pads were either corroded or entirely gone, the backlight fuse (and its pads) were nowhere to be found, and some ICs were missing entire pins! Whatever was spilled on this area of the MacBook certainly had some corrosive properties to it, and it looks like nothing was done to stop the initial damage. With a schematic and board-view software in hand, it was time to investigate what particular components had suffered damage.

Power Supply

Before any device can perform any useful functions, it needs power. I reconnected the AC adapter and started to check the voltages around the DC input jack and its surrounding support circuitry. Since I was able to press the MacBook’s battery indicator and get some response when connected to power, I knew that the main input fuse was intact, and that the SMC (System Management Controller) chip was receiving power via the PP3V42_G3H rail and functional; the G3H (G3-Hot) designation means that the power rail is always on, even if the computer is otherwise turned off. I checked the voltage at the DC jack’s ADAPTER_SENSE line, which is normally at approximately 3.3 volts and uses the 1-Wire protocol to communicate with the power adapter and control the LED on the power adapter’s MagSafe plug. To my surprise, it was at a staggering 16 volts, which meant that something was shorting the DC input voltage (about 16.5 volts) onto a low-voltage communication line – no wonder there wasn’t any LED indicator when I plugged it in! A multimeter measurement found about 2 kOhms of resistance from the power line to the communication line. Thankfully the MacBook’s logic board features a MAX9940 1-Wire overvoltage protection chip, which is rated to protect against voltages as high as 30 volts. I scavenged another DC input connector from an older, dead MacBook which shared the same connector and pinout. After connecting this to the logic board, I found that I got a green LED upon connecting the AC adapter, and the CPU fan started spinning; this is a very good sign as this means the main power circuitry is intact. Measuring the CPU’s Vcore voltage revealed a voltage of about 0.8 volts, which is normal for a modern laptop CPU. With the “heart” of the computer checked out, it was time to focus on the area most affected by the water damage.

LCD & Backlight

Examining the backlight connector and its surrounding circuitry revealed significant damage to many components and the PCB itself. The power supply pins on the LCD connector showed a significant amount of corrosion, and I was concerned that the backlight’s output voltage (up to 52 volts!) could have made its way through all the corrosion residue and damaged critical data lines between the display and the graphics controller. I noticed the backlight fuse (F9700, a 3-amp 0603-size fuse) had gone more than just open-circuit – I couldn’t even find the fuse or its corresponding PCB pads initially! I then probed the LCD connector and found that the display’s 3.3-volt power lines were open-circuit; the corrosion had eaten through the traces between the connector and its decoupling capacitors nearby. Using a diode-mode measurement on the FPD-Link (often called LVDS) lines revealed that the connections were intact; there weren’t any anomalous readings or short-circuits on those lines, the DDC (Display Data Channel) lines, and the 3.3-volt power lines.

Due to the high voltages used to drive LED backlights, I had my suspicions on U9701 (a Texas Instruments LP8550 LED backlight driver). It’s a tiny ball-grid array (BGA) package, and attempts to clean the chip from its edges didn’t seem to do much. Its corrosion looked limited at first – only the feedback line’s probe point was lost – but I was sure the chip was on its last legs (or is it balls?).

Power Management

The LCD connector is in close proximity to the computer’s DC input and its “PPBUS_G3Hot” power rail, which is always on (even if the computer is otherwise turned off), which exacerbates any corrosion due to liquid damage due to its high voltage. Further examination revealed significant corrosion on the outside of the CPU’s high-side current sense resistor (R5400), and the current-measurement pins (pins 4 and 5) on U5400 (a Texas Instruments INA213 current-sense amplifier) were completely gone! Clearly there was no way to salvage that component.

There was significant damage to the SMC’s DC input voltage sense circuitry (“VD0R”), with pins 3 and 4 of Q5490 (an ON Semiconductor NTUD3169CZ complementary pair of N-channel and P-channel MOSFETs) being completely eroded away, much like U5400’s current-sense pins; this part of the circuit uses a P-channel MOSFET to switch on a resistive voltage divider, allowing the SMC to measure what the voltage is on its MagSafe input connector. Also, many of the probe points related to that circuit were also completely eroded, revealing dark pits instead of silver-plated copper pads.

FireWire

The FireWire circuit wasn’t spared from the carnage, either. Pins 3 and 4 on Q4262 (a Diodes Incorporated BSS8042DW complementary MOSFET pair) were also severely damaged; these pins are used to quickly disable the FireWire power output transistor (Q4260, an ON Semiconductor FDC638P P-channel MOSFET) in case of a “Late-VG event“. This occurs if the ground pins of the FireWire connector are mated too late when plugging in a device – this creates a dangerous overvoltage condition on the FireWire data lines, as up to 30 volts briefly find a return path through the data lines, risking damage to the device and host controller. I wasn’t as concerned with this circuit, as I don’t have any FireWire peripherals, and the circuit in its current state simply means the FireWire port will be unable to disconnect power if a bad cord is plugged in.

Thunderbolt

The area that had the least liquid damage was C3897, which belonged to U3890 (a Linear Technologies LT3957, a 15-volt boost converter for the MacBook’s T29 chip and Thunderbolt interface). All this area needed was a bit of corrosion cleanup.

USB Port

During the functionality tests, I noticed the metal casings of the USB port were getting very hot to the touch, and I nearly burned myself on U4600 (a Texas Instruments TPS2561, a dual-channel load switch with internal current limiting)! I found a short-to-ground problem on a power line on one of the USB ports, which explains the symptom listed on the note. I desoldered the chip, initially thinking the issue was in the chip itself, but the fault remained. I narrowed the problem down to C4695, a 10-microfarad ceramic capacitor that had short-circuited internally; this caused the TPS2561 to go into current-limiting mode, which turns the chip into a resistor and dissipate copious amounts of heat into the PCB, which made its way to the USB ports (and then my fingers – ouch).

Hard Drive Cable

During the repair process, I was able to install Mac OS X Lion to a SATA SSD, but soon found the MacBook unable to recognize SSDs, despite hard disk drives showing up just fine! As it turns out, the A1278 is notorious for bad HDD cables, with even replacements failing within months of installation. This appeared to be caused by chronic frictional damage, as the cable is sandwiched between the hard drive and the MacBook’s rough aluminum casing – even regular use of the laptop was found to create hairline cracks in the cable. Thankfully replacements are relatively inexpensive, and a little bit of Kapton tape as a barrier against the casing was the “vaccine” against future cable failures.

Repairs

With all of the problems written down, it was time to start fixing up the MacBook. Time to break out the hot air rework station, soldering iron, solder, magnet wire, and plenty of flux!

DC Input Jack

I desoldered the DC input jack, and found there was a lot of corrosion residue bridging the +16.5-volt power line to the ADAPTER_SENSE 1-Wire communication line.

With some isopropyl alcohol and some scrubbing with a small brush, I was able to clean up the corrosion and resoldered the jack into place. A quick multimeter test found that there was no more 2 kOhms of resistance from the power to the data line, and I was able to get an LED indication when I plugged in the AC adapter, including an orange light that indicates the battery is charging.

LCD Connector

I wanted to determine if the display was still functional, so I first focused my attention to the LCD connector, even if I had to eschew the LED backlight for a bit.

I ran a jumper wire from L9004 to pins 2 and 3 of the LCD connector; this belongs to PP3V3_LCDVDD_SW_F, which provides the 3.3-volt power to run the LCD panel except the backlight. After cleaning out the flux and corrosion on the logic board’s connector as well as the LCD cable, I was able to get an image on the display!

USB Port

With the faulty component identified, I replaced C4695 with an identically-rated 10-microfarad 6.3-volt X5R ceramic capacitor in an 0603-sized package. After replacing the capacitor, the USB port was fully functional again!

Current-Sense Amplifier

After ordering both the INA213 and LP8550 from Texas Instruments, it was only a few days before they arrived in the mail. I desoldered the dead chip from the logic board, cleaned up the pads with some flux and desoldering braid, and installed the new chip. Running Apple Service Diagnostic tools showed that the current-sensing circuit was working correctly.

DC Input Voltage Divider Switch

I didn’t want to buy another transistor pair for Q5490, so I replaced the P-channel half with an ON Semiconductor NTK3142P P-channel MOSFET that I salvaged from an older donor MacBook logic board. I scraped away some solder mask on one of the broken traces heading to the SMC’s voltage divider so I could solder the transistor’s drain terminal to it, and used magnet wire to connect the transistor’s gate and source to their corresponding locations across R5491. R5494, leading to PM_SUS_EN, was found to have a 0-ohm resistor that was open-circuit; this was easily bypassed with a wire jumper across the resistor’s original pads. After cleaning off the flux and performing continuity measurements, I measured the voltage at the SMC’s voltage divider resistors and got a valid voltage reading when I plugged in the AC adapter.

LED Backlight Driver

The LP8550 was up next for repair. I took a 2-amp 0603-sized fuse from a dead hard drive, and used some magnet wire to reattach it to the remnants of F9700, which was a 3-amp fuse originally; note that it’s far safer to use a fuse of a smaller rating instead of a larger one, should a circuit fault still exist.

Tracing the other lines to the LP8550 revealed that R9731 (leading to PPBUS_SW_LCDBKLT_PWR) was open-circuit at a via, which was easily bridged with some solder and magnet wire. R9010 (leading to PPBUS_SW_BKL) was open as well.

After reinstalling the fuse, I actually got the backlight working! However, upon a power cycle I heard a snap, saw a puff of smoke, and lost the original backlight chip. Chances are there was indeed some corrosion residue had caused 50-odd volts to end up on a more sensitive pin on the LP8550. I used an Xacto knife to lightly scratch an outline around the chip, then used copious amounts of flux and desoldered the dead chip with my hot air rework station; I also removed the fuse to help in further troubleshooting to ensure that there weren’t any short-circuits to ground on the backlight circuit. I cleaned up the area with leaded solder and some solder wick, and cleaned up the residual flux in anticipation of the new chip’s installation.

The chip was remarkably easy to install – just get the A1 ball lined up according to the board view, and heat the board to the right temperature. After thoroughly cleaning away the flux from the area, I turned on the MacBook… and let there be (back)light! I power-cycled the computer and the LED backlight remained functional! (And for the record, the fuse didn’t even blow during the entire ordeal.)

FireWire Late-VG Protection Circuit

I considered this issue to be a “WONTFIX, as I had no use for FireWire connectivity (nor do I have the correct FireWire 800 cables anyway). If I want to sell this computer, I might install a P-channel MOSFET to replace Q4262 (see the LCD Connector section above) in a similar fashion to the DC input voltage-sensing circuitry.

UPDATE (February 26, 2020): Actually, several months ago I decided to finish things once and for all, so I replaced the damaged P-channel MOSFET circuitry. Do I have much use for FireWire? Not really, but it at least means I can experiment with it or something later on, if desired.

Testing

It takes a little bit of Google-Fu, but with the help of a BitTorrent client, I downloaded the disk images to create an Apple Service Diagnostic (ASD) drive. This is far more sophisticated than the built-in diagnostic when you boot the computer while holding down the D key. With ASD, one has the option to use a stripped-down version of Mac OS X – in a similar vein as WinPE – or a very lightweight UEFI (Universal Extensible Firmware Interface) environment that looks very much like Mac OS 9 and earlier.

It took over half an hour, but all the tests passed without a problem, since all the sensor readings were valid. My MacBook Pro has been restored to working order! I installed Mac OS X High Sierra to a 1TB SSD, and used Boot Camp to run Windows 10 Pro as the default operating system (what can I say, I like Windows 🙂 ). The Mac Precision Touchpad driver project makes the touchpad a pleasure to use, as the built-in Boot Camp driver provides a much less-comfortable experience.

Conclusion

Much like solving a puzzle, component-level troubleshooting of modern electronics is possible, but this is only feasible if the relevant documentation exists as a good reference point. One can do without them, but the act of reverse engineering isn’t easy if one only has a non-working device.

With the help of a schematic and board view (including the open-source software OpenBoardView), one can easily find what circuits a component belongs to, and where it goes. By following the connections, one can track down the problem(s) with the board, and hopefully save a device from an untimely end in the landfill or a recycling facility.

Right to Repair

This project is an example of why I believe in the right to repair. If I didn’t have (even unofficial) access to schematics, board views, and diagnostic software, I wouldn’t have been able to bring this dead MacBook Pro back to life. However, with a little bit of electronics troubleshooting knowledge and skill, I was delighted that I diverted a discarded dysfunctional device from a demise in the dumpster. In fact, this blog post was written from the MacBook I just repaired!

Atomic Pi Adventures, Episode 1: Adding external PCI Express expansion by removing onboard Ethernet

As seen on Hackaday!

TL;DR: The Atomic Pi single-board computer CAN be expanded through PCIe. It’s just a massive pain to do so, even if you have steady hands. Let’s just say it’s a long story…

DISCLAIMER: The modification performed in this blog post can, and has, caused permanent hardware damage to my Atomic Pi, albeit repairable with much skill and effort. Reenacting what I’ve done requires significant experience with SMT (surface-mount technology) components, some barely larger than a grain of sand (I consider 0402-size components to be “oversize” in this instance). I accept no responsibility for damages arising from attempting this modification.

Introduction

Single-board computers (SBCs) are all the rage nowadays, with the Raspberry Pi being the most well-known in this category. SBCs are compact computers, carrying their own CPU and memory, and usually some on-board storage and various I/O connections (e.g. USB, HDMI, Ethernet). Most of these computers use the ARM architecture, found on almost all mobile devices today. However, some use the x86 architecture, which is used in higher-end tablets, laptops and desktop computers.

Recently, the Atomic Pi made waves in the electronics hobbyist space, boasting an Intel Atom Z8350 quad-core CPU with 2 GB of RAM, 16GB of eMMC storage, Gigabit Ethernet, Wi-Fi, USB 3.0, built-in speaker amplifiers, and lots of general-purpose I/O (GPIO) pins – all for less than $40 USD!

As one might expect, there were caveats to this little computer, with some dismissing it very harshly, if not unfairly. With some investigative work, members of the community found out that the “Atomic Pi” board actually belonged to the Kuri robot from Mayfield Robotics; the company shut down in late 2018, and the liquidated stock of these SBCs were snatched up by Digital Loggers with the help of a Kickstarter campaign, who then developed breakout boards to make using them easier. This is because – unlike almost every other SBC – you can’t just plug in a barrel jack or USB cord to power it! Instead, it uses a 2×13 pin header, which many users did not have on hand, nor had the skill and/or resources to build their own solution. This is compounded by the board’s need for clean, well-regulated 5 volts at 2-4 amps, with 12 volts being optional to power the onboard speaker amplifier. The 16 gigabytes of eMMC storage proved to be too cramped to run Windows 10 directly, and the Realtek RTL8111G Gigabit Ethernet chipset is often frowned upon by those in the pfSense (a free firewall/router OS) community.

The NIC’s usage of the Z8350’s single PCI Express (PCIe) lane is what caught my attention. Unfortunately, the RTL8111G Ethernet chip is soldered to the board, with no easy method to replace it with an external card. A few people have attempted to remove the chip and wire in an external PCIe riser, without success. With my previous experience with fine-pitch electronics work, I figured that this would be a fairly easy modification to make.

It wasn’t. In fact, this was one of the most frustrating electronics projects I’ve done to date – and now you get to come with me in my adventures (and misadventures) in microsoldering on the Atomic Pi (or was it Kuri? The jury seems to be out on the nomenclature).

Optional Reading: PCI Express Signals

PCI Express (or PCIe for short) is a very common high-speed interface for connecting processors or chipsets to all sorts of different peripherals. It uses low-voltage differential signaling to minimize interference, and a single lane of PCIe can carry 250 MB/s for the first generation, all the way up to 2 GB/s for the fourth generation!

A single PCIe lane is made up of three differential pairs: receive, transmit, and a 100 MHz reference clock. Control signals for waking up and resetting peripherals are also provided. Riser cards, often used for cryptocurrency mining, use these five signals to provide the minimum connectivity for any PCIe peripheral. The interface is highly flexible, allowing graphics cards that normally use 16 lanes to run on just one lane. The specifications for PCIe make board design easier, as the differential lines are designed to adapt to different lane configurations; the polarities in a pair (transmit, receive, clock) are polarity independent (all that matters is that you maintain a good differential pair).

PCIe Signal Pinout

Since the Atomic Pi lacks a PCIe slot of any kind, I have provided the diagram indicating which signals go to what points on the board:

Note that the AC coupling capacitors are required for the peripheral Rx (receive) pair, and are optional for the peripheral Tx (transmit) pair.

Attempt 1: “Chipple” Bypass

I started this project with the intent of making my modifications as reversible as possible. Looking at the RTL8111E’s datasheet (a similar model to the 8111G) revealed the presence of an “ISOLATEB” pin; grounding this pin causes the chip to disable itself and release its hold on the PCI Express lines.

This brings us to the first roadblock: the Ethernet chip is soldered to the board, and the components that connect it are known as 0201 (0.002 inches in length, 0.001 inches in width), smaller than a breadcrumb!

After disabling the Ethernet chip, I used my trusty 0.1mm (that’s four-thousandths of an inch!) magnet wire to tap into the PCIe lines, and brought them out to a PCIe riser card, which provided a USB 3.0-type socket for an external connection.

Unfortunately, high-speed signals aren’t just about wiring from one device to the next, and this was no exception. The rat’s nest of wires were no suitable medium for the 5-gigabit signals to traverse (PCIe requires very tight control of electrical trace layouts), and the Atomic Pi was unable to detect the presence of any device on the external PCIe port.

Attempt 2: Thin Twinax Troubles

With the signaling issue in mind, I tried some very thin twinaxial cables from a dead MacBook’s hard drive cable to connect the PCIe lines to my riser card. This cable is very thin, with an outer thickness similar to fishing line. It uses a foil and wire “shield” around the two internal wires to protect it from external interference. I figured that this should help protect the delicate signals from the harsh outside world.

I didn’t have very much of this thin twinax cable on hand (that said, if anyone knows where I can buy this stuff in bulk, please let me know!), so I was limited to very short lengths for each pair. I ran out of twinax after the transmit and receive pairs, and resorted to using two coaxial cables from that same cable bundle.

After fiddling with the super-thin wires and soldering them to the header, I still got nowhere and could not get the Atomic Pi to see an external PCIe card.

Removing my modifications revealed my first blunder: despite placing the chip into a temporarily disabled state, the Ethernet chip was dead! This meant that I could not use the built-in Ethernet port anymore, and the LEDs next to the Ethernet port simply glowed a dull orange instead of their rapid blinking pattern when data is being transmitted over Ethernet. I figured that there was no use keeping the chip on the board, so I used my hot air rework station to remove it, using a generous amount of heat and flux to get the chip removed.

Attempt 3: To Ribbons, You Say?

With the dead Ethernet chip removed, I decided to use another tactic to bring out the PCIe interface. I decided to use a thin ribbon cable (okay, more accurately a “flat flexible cable”) to help decouple any mechanical stresses from moving the external USB 3.0 connector around during testing, and its compact size allowed me to try using the QFN pads to connect my ribbon cable, hopefully minimizing noise that could be picked up, as well as avoiding any “stubs” of wiring within the differential pair that would degrade the signals.

The challenge is that the chip uses a very small pad spacing, and I wanted to avoid soldering directly to the ribbon cable. I managed to salvage a couple connectors from some other devices, and used a cut-up CompactFlash memory card’s PCB to make it easier to solder the connector, as well as provide a base for the connector to sit on.

The connector on the board was affixed onto the Ethernet chip’s footprint with the help of some double-sided foam tape, and magnet wire was used to bond the PCIe signal wires onto the ribbon cable. The ribbon itself used copper foil on one side to help with interference suppression, and provide the signals a “ground plane” to travel across.

Things weren’t quite as elegant on the other end of the cable, however. I still had to contend with a fine-pitch ribbon connector, and a way to connect that to a male USB 3.0 plug to hook it up to the PCIe riser. I had little option except to use the small lengths of twinax from the last attempt, and (perhaps unsurprisingly) it didn’t work.

Attempt 4: Teeny Tiny Twists

The next attempt went to a smaller scale, using a very fine twisted pair I created out of my 40 AWG magnet wire (each wire is as thick as a hair). I then shielded this magnet wire by sandwiching it between pieces of copper tape to act as a ground plane for the signals. I reused a dead USB 3.0 drive to act as the plug for the PCIe riser, and wired it straight to the QFN pads of the original Ethernet chip.

Although the twisted pair would have, in theory, reduced intra-pair skew and some degree of EMI resistance thanks to the copper tape, the homemade twisted pair almost certainly would not have provided a 100-ohm differential impedance that’s required for PCIe signaling. Once again, the attempt to break out PCIe from the Atomic Pi was a failure.

Attempt 5: SASsiness Yields Results…?

Since the central issue with external PCIe connectivity involved the connection between the Atomic Pi board and the riser, I figured I would try a medium that was specifically intended for high-speed differential signals. I bought some thin SATA cables (specifically the 18-inch thin cables on Amazon), which used two pairs of thin SAS cable per SATA connection. The difference between SATA, SAS and PCIe is of little difference here, as the key criteria was a differential pair with 100-ohm impedance and ability to carry high-frequency signals.

Despite being a thin cable (each conductor is only 30 AWG, or 0.25 mm in diameter), these cables were too stiff to directly connect to the AC coupling capacitors without a very high risk of damaging the capacitor and/or the pad it is soldered to. I had to reinforce the cables by soldering them down and hot gluing the cables to the board at multiple points to relieve stress on the connections.

The original 0201-sized AC coupling capacitors were removed, and more reasonably-sized 0402 capacitors were used in their place. Each capacitor was a common 100nF capacitance, and were easy to harvest from some dead laptop motherboards I had on hand. I “flipped” the layout of the capacitors away from the QFN footprint, but this meant that only one side of the capacitor was actually anchored to the board; this made the capacitors very fragile and I often lost the terminations on the capacitors (thankfully they are plentiful in electronic devices like this).

To help minimize stress on the vulnerable capacitors, I used magnet wire as a flexible “bridge” between the capacitor and the SAS cable, and the cables were tied down to solder tab wire that I used as “bus bars” for a strong electrical ground as well as a tie-down point to take away most of the strain of the cable’s flexion. Despite negatively affecting signal integrity, I was able to get a sufficiently stable connection to perform some initial testing, and I succeeded! I was able to connect an Intel 82576-based dual-port network card to my external riser.

However, this didn’t last long. The simple movement of the wires on the board when trying to connect different peripherals was enough to break the AC coupling capacitors, despite the use of my flexible terminations from the SAS cable to the capacitors. Replacing the damaged capacitors and redoing the magnet wire terminations failed to restore connectivity, so I desoldered all of the existing connections and started over.

Attempt 6: We Have Lift-Off! (that’s bad)

In an attempt to further improve signal integrity, I decided to take the bold move of eschewing the flexible terminations of magnet wire, and decided to directly solder the SAS cable’s wires to the capacitors, which are soldered to the PCIe pads on the Atomic Pi. I anticipated that physical stresses would damage the capacitors, so I opted to use longer lengths of SAS cable, and to hot-glue the cable to the board at regular intervals to minimize the amount of stress that gets coupled into the cable and capacitors. The 100 MHz reference clock continued to use magnet wire for connection as it was at a much lower frequency than the PCIe signals, and reduced the amount of physical crowding around the PCIe pads.

This leads me to my next issue. To help with structural integrity, I repositioned one of the SAS cables so that it would remain on the board before going to the PCIe riser connector, meaning it would be perpendicular to the other cables. This required me to strip extra foil shielding to jump over the existing capacitors, which increases the risk of physical stress on the capacitors, as well as reduced signal integrity. Additionally, I used a longer set of twinax cables, sacrificing two SATA cables to get most of the original 18 inches of length per cable.

Testing of this construction method failed, and it was only at this point that I realized that two of the four AC coupling capacitors I used weren’t the same capacitance; PCIe usually uses 100 nF capacitors, but I had a 1 nF on one PCIe pair, and 10 nF on the other – no wonder things weren’t communicating (and that’s what I get for assuming the capacitors were the same)! Unfortunately, the process of removing the SAS cable connections resulted in the phenomenon I was trying to avoid: the board’s PERp0 (the positive side of the PCIe receiver) pad had lifted away from the PCB, leaving me with an unsolderable crater.

Attempt 7: Success!

After losing one of the pads for the original coupling capacitors, I was feeling pretty defeated. I didn’t let this stop me, and I decided to apply my magic skills with 40-gauge magnet wire to the trace, and was able to get a sufficient connection with a replacement 100 nF capacitor (and this time I measured it…), hopefully in such a way as to not too negatively affect signal integrity. Unfortunately, the 0201-sized resistors used to connect the PCIe reset and wake signals lost their terminations and no longer took solder on one end; I opted to keep the resistors in place and soldered magnet wire to the other end (facing the SoC rather than the Ethernet chip), as further measurement revealed that the resistors were simply 0-ohm jumpers anyway. I scrapped the longer SAS cables, and went back to the ~8-inch lengths to reduce attenuation.

After rewiring the PCIe data, clock and control lines, I crossed my fingers and retried the riser with my Intel NIC – and it worked! After verifying the connections were sufficiently strong, I decided to make the modification permanent, and covered the area with epoxy to prevent any of the components from breaking loose. Additionally, I used some aluminum sheet metal, double-sided foam tape and some zip ties to create a reinforcement bar, helping to strengthen the cable connections as they leave the board.

With all the connections in place, it was time to get to the fun part: seeing what PCIe peripherals work on the Atomic Pi!

Testing Results

One interesting behaviour of the Atomic Pi when using Windows is that changing PCIe cards often causes the system to immediately power down or freeze during boot. Simply powering the board on again seems to fix this issue. Why am I using Windows? I already upgraded the board to a 64GB eMMC instead of the original 16GB, and the appeal of the Atomic Pi’s x86 architecture was the ability to run desktop apps – in particular, Windows apps.

The UEFI firmware has a hidden advanced menu, accessible if the “shutdown /r /fw” command is run as administrator. There is a section for PCIe settings, and the ones that interested me were the AER (Advanced Error Reporting) and PCIe hot-plug settings. Unfortunately, these do not seem to have any effect as Windows doesn’t seem to pick up any hardware events in the Event Log.

Test 1: Network Devices

PASS: Intel 82576 Dual-Port Adapter

If the port(s) have a connection to another device (e.g. a network switch), the orange and green LEDs will illuminate soon after the PCIe link is properly initialized, which makes for quicker troubleshooting. Even if the system is off, the green link LED remains lit, but goes out once the PCIe bus is reset.

The card works a treat in pfSense and Windows, allowing high-performance Gigabit-speed transfers with both ports in use. However, the UEFI firmware does not recognize the card as a bootable medium (which is for the best in my opinion – the PXE boot program tends to hinder the boot process more than anything).

PASS: Realtek RTL8111GS (External Card)

The PXE network boot program built into the UEFI firmware works just fine, which is expected as the chip is the same type as the original RTL8111G, with the exception that the chip has a built-in switching converter rather than the linear LDO that the Atomic Pi uses natively.

PASS: JMicron JMC250

The card works in Windows and pfSense; the particular card I have has always been flaky in operation, and this was no exception.

PASS: ASUS PCE-N15 (Realtek RTL8192CE) 802.11n WLAN Adapter

Using this card is a performance downgrade compared to the dual-band adapter already present on the Atomic Pi, but it did function correctly in Windows.

Test 2: External Graphics Cards (eGPU)

PASS: EVGA/nVidia 8500 GT

The UEFI firmware doesn’t seem to want to acknowledge the presence of PCIe graphics adapters, which results in issues when testing in Windows. Initially, the card failed to enumerate correctly, identifying as a Microsoft Basic Display Adapter and displaying an error in Device Manager:

This device is not working properly because Windows cannot load the drivers required for this device. (Code 31)

The driver trying to start is not the same as the driver for the POSTed display adapter.

Manually downloading and installing the drivers allowed the card to work properly. As previously mentioned, the UEFI does not recognize the presence of the graphics card, so a monitor that is plugged into the graphics card won’t start working until Windows is loaded and the graphics driver initializes.

Additionally, the adapter’s resources won’t be used if the monitor is connected to the Atomic Pi’s built-in HDMI port – it’s a tradeoff between graphics performance and the ability to see what’s happening on boot; maybe this is different in a multi-monitor configuration, but I didn’t have enough desk space to test this.

PASS: XFX/AMD Radeon 4650

The same driver issues popped up when using this card as well, and the same fix applies.

PASS: ASUS/nVidia GTX 660 Ti

Ditto. This card was the best I had on hand for testing, and amusingly it consumes about 10-20 times the amount of power that the Atomic Pi uses, and is bigger in size as well.

Connecting this card allowed me to run games that would otherwise simply not work, such as Just Cause 2. It performed at about 20-30 frames per second: not great but not terrible. Just Cause 3 was absolutely painful to run, but this is not surprising as the Atomic Pi’s hardware is far below the game’s minimum requirements in almost all aspects.

Test 3: Storage

PASS: Dell PERC H200 SAS Adapter

To my surprise, not only will it work in Windows, it even supports booting from the UEFI! It mentions a SAS controller driver during boot, but I am not sure as to whether this is present in the Atomic Pi’s UEFI, or if it is a driver provided by the SAS adapter itself. The PCIe 2.0 1x interface limits the maximum throughput to less than 500 MB/s, which is slower than a single SATA III connection, and informal testing showed a maximum speed of about 330 MB/s. I have not investigated whether I could configure the hardware RAID features of the adapter from the operating system, but I imagine that doing so would dramatically improve performance when running a RAID 1 (mirrored) setup.

FAIL: OWC (What’s This?) Aura Pro X 480GB NVMe SSD

This SSD was originally meant as an aftermarket SSD upgrade for a MacBook Air, but I bought a PCIe adapter card to use it in a regular PC. I was able to get it to enumerate in Windows, but any sort of read/write action caused it to lock up and throw errors in the Windows Event Log. I suspect this is likely a power supply issue due to how the PCIe riser board provides power (12 volts -> DC/DC converter -> 3.3V LDO linear regulator), or maybe the card just doesn’t want to cooperate.

FAIL: Marvell 88SE9128 2x SATA III + 1x PATA UDMA-133

This adapter caused Windows to almost freeze during boot. Instead, the boot animation would advance by one frame every 2 seconds until the card is removed. I suspect this is the PCIe bus attempting to negotiate a connection but failing, possibly due to signal integrity issues. Checking the Windows Event Log turned up nothing either.

Test 4: … More PCIe?

FAIL: Extender Cable

One would think that a simple straight-through cable would work, right? Nope – it seems like the signal has already been degraded enough after traveling across multiple non-ideal connections, and the extra length in a cable was just enough to degrade the signal beyond recovery.

PASS: PCIe Port Multiplier / ASMedia ASM1184e

UPDATE (July 3, 2019): It seems that the “USB 3.0” pinout on these PCIe risers is inconsistent. I was somehow able to make do by using a PCIe card plugged into my current riser, which then led to the ASM1184e board, which splits the PCIe 2.0 1x lane into four slots. It seems like the chip is able to handle the attenuation over such a long cable length, and it even includes a 100 MHz clock buffer, effectively “refreshing” the signal for more sensitive PCIe cards.

I tried the 88SE9128 card again, but still had no luck. Ethernet cards worked without issue, but I wasn’t able to test the graphics cards as the ASM1184e board’s PCIe slots are closed-ended 1x slots. I could try cutting the edge of the connector to allow for larger cards, but that’s an experiment for another day.

Conclusion

Despite all the roadblocks I encountered in this project (sometimes rage-inducing ones), I still enjoyed this project and the various techniques I tried along the way. The prospect of attaching desktop PCI Express peripherals to a CPU designed for a tablet opens up many different applications for the Atomic Pi, including ones that would otherwise have been less efficient or impossible using the board’s built-in hardware (e.g. dual Gigabit Ethernet without using USB, or GPU-intensive computing workloads).

The high-speed nature of modern computer systems presents many challenges for engineers and modders alike. Multi-gigabit interfaces easily reach into the realm of radio-frequency (RF) electronics, where things like AC transmission line effects and differential signal routing become very important. This issue, coupled with the fact that most modern devices use very small components, makes it difficult for many electronics hobbyists to access these interfaces on their own. Although a modification like what I did is technically possible, I don’t consider it practical for an average electronics/computer enthusiast.

Future Ideas

One avenue I’ve thought about pursuing is a mod board that is soldered onto the Atomic Pi’s PCB, allowing a more robust connection for the PCIe lines; this would include signal integrity improvements like ESD protection and PCIe line redrivers to strengthen the signals to/from the SoC and peripheral(s). PCB design is currently beyond my scope of knowledge, but it would be a fun way to dive right into the subject matter.

eMMC Adventures, Episode 4: Recovering data from physically damaged BGA eMMC Flash storage chips

As seen on Hackaday!

The ball grid array (BGA) chip package has been instrumental in getting modern electronics to fit in smaller and smaller spaces, as it uses tiny balls of solder on the bottom of the package to make electrical connections, instead of copper leads on the edge of the chip package. This allows for hundreds of connections to be made in a small amount of PCB area, but their size also makes them very vulnerable to damage as well.

One common way for BGA chips to become damaged is called “pad cratering“, where the copper pad on the package’s substrate (basically a wafer-thin circuit board) separates and leaves behind a crater.

In the case of eMMC (Embedded MultiMediaCard), its package type is known as an FBGA (Fine-pitch Ball Grid Array), so the area of each pad is very small (0.4 mm in diameter!); it doesn’t take much at all to crater the pad – even gently removing solder with solder wick and generous amounts of flux can still cause damage! Most of the pads on an eMMC package are unused, but if any one of the DAT0, CLK, or CMD pads are damaged, then the chip is rendered unusable, even if the chip is placed into a socket for data recovery. If the DAT1-DAT7 pads are damaged, data recovery becomes much slower as the chip is forced to use fewer lines to transmit data (the MMC standard supports data over 1, 4, or 8 lines).

However, there is some hope. Many FBGA packages, including eMMC, use pads that are SMD (solder-mask defined); this means the solder mask is what defines the size of the pad, not the copper itself. Therefore, when the pad gets cratered, often there is a “halo” of copper left behind that still has a chance of getting an electrical connection.

The trick is how to to get a flat conductive area that a chip socket can use to get a reliable connection (without a copper pad, soldering is no longer an option). The eMMC socket adapter I used breaks out the eMMC onto an MMCplus-shaped PCB that can be inserted into any commercial SD card reader.

Filling in the Blanks

There are a few possible materials that can be used to restore contact area on a damaged BGA pad. One possible option is a silver-filled conductive epoxy, but I have not tested its efficacy; an additional consideration is that the volume of the filled-in crater might not be enough to get a filling with sufficiently low resistance for a good connection.

Another option is using solder paste, which I used in this case. Unfortunately, solder’s surface tension is our enemy when trying to fill in a flat area (it wants to form cohesive balls and therefore won’t want to stick to the ring of copper left around the crater), so a means of forcing the solder into the crater requires something flat, rigid and capable of handling the high temperatures experienced during soldering.

At first I tried Kapton (polyimide) tape, but that was a massive failure since it didn’t have the rigidity to stay flat when the solder paste began to melt, and the liquid flux rendered the adhesive useless.

The solution to the issue came in the form of glass. Specifically, I used very thin (0.15 mm thick!) glass “cover slips” normally used to prepare specimens for viewing under a microscope. These can be very inexpensive and one can obtain hundreds of them for a few dollars. The key is to fill the craters with the solder paste by using a knife as a squeegee, then placing the cover slip on top of the eMMC and reflowing it.

It took a few iterations for the pads on some of my eMMC chips to be restored sufficiently, as the volume taken up by the solder will be less than the paste and its accompanying flux. It doesn’t have to fill the entire crater – it just needs to be enough for the eMMC socket’s pins to make a solid connection.

Conclusion

The high-density nature of modern BGA chips is both a blessing and a curse. When trying to do data recovery from devices that use such tiny chips, such as eMMC or UFS Flash storage, sometimes the desoldering process is too much for the chip’s pads to handle. With some ingenuity (and thin glass), it might be possible to temporarily restore enough conductive pad area to get the data off with the help of an eMMC socket.