The Amazing $1 Microcontroller

While some projects that come across my desk are complex and beefy enough to require a hundreds-of-MHz MCU with all the bells and whistles, it’s amazing how many projects work great using nothing more than a $1 MCU.

I wanted to explore the $1 pricing zone, since it’s the least amount of money you can spend on an MCU that’s still general-purpose enough to be widely useful. That’s not to say there aren’t cheaper MCUs out there; these days, there’s definitely a category of ultra-cheap / ultra-small MCUs that fall into the sub-50-cent zone — but these parts aren’t functional enough to be considered general-purpose (some of them only have a few dozen bytes of RAM, most don’t have an ADC, or any peripherals other than a single timer and some GPIO). On the other end of the spectrum, once you start spending more than $1 on an MCU, the field completely opens up into lots of different design philosophies — all with pretty heavy specialization in one particular area.

To do this, I purchased several different MCUs — all less than a $1 — from a wide variety of brands and distributors. I’m sure people will chime in and either claim that a part is more than a dollar, or that I should have used another part which can be had for less than a dollar. I used a price-break of 100 units when determining pricing, and I looked at typical, general suppliers I personally use when shopping for parts.

These MCUs will be selected to represent their entire families — or sub-families, depending on the architecture — and in my analysis, I’ll offer some information about the family as a whole. I’m not going to call it a competition, since competitions have winners — and if there were a clear winner, I’d imagine we’d all be using those parts exclusively. At the same time, don’t think that microcontrollers are all special snowflakes that are all equally as good as all the others. While there are no winners, there are definitely losers — we’ll find out which architectures you should avoid in new designs. Also, don’t expect another tired Atmel-vs-Microchip-vs-whatever rant that focuses on intangibles. While I’ll write qualitatively where appropriate, I’m going to quantify things where possible, which means I’ll be benchmarking these MCUs through a few different real-world and not-so-real-world tasks, as well as stacking up specs to compare these across the board.

Overall, I’ll be looking at a few different categories:

Parametrics, Packaging, and Peripherals

Within a particular family, what is the range of core speed? Memory? Peripherals? Price? Package options?

Some microcontroller families are huge — with hundreds of different models that you can select from to find the perfect MCU for your application. Some families are much smaller, which means you’re essentially going to pay for peripherals, memory, or features you don’t need. But these have an economies-of-scale effect; if we only have to produce five different MCU models, we’ll be able to make a lot more of each of them, driving down the price. How do different MCU families end up on that spectrum?

Package availability is another huge factor to consider. A professional electronics engineer working on a wearable consumer product might be looking for a wafer-level CSP package that’s less than 2×2 mm in size. A hobbyist who is uncomfortable with surface-mount soldering may be looking for a legacy DIP package that can be used with breadboards and solder protoboards. Different manufacturers choose packaging options carefully, so before you dive into an architecture for a project, one of the first things to consider is making sure that it’s in a package you actually want to deal with.

Peripherals can vary widely from architecture to architecture. Some MCUs have extremely powerful peripherals with multiple interrupt channels, DMA, internal clock generators, tons of power configuration control, and various clocking options. Others are incredibly simple — almost basic. Just as before, different people will be looking for different things (even for different applications). It would be a massive undertaking to go over every single peripheral on these MCUs, but I’ll focus on the ones that all MCUs have in common, and point out fine-print “gotchas” that datasheets always seem to glance over.

Development Experience

While this is where things get subjective and opinion-oriented, I’ll attempt to present “just the facts” and let you decide what you care about. The main source of subjectivity comes from weighing these facts appropriately, which I will not attempt to do.

IDEs / SDKs / Compilers: What is the manufacturer-suggested IDE for developing code on the MCU? Are there other options? What compilers does the MCU support? Is the software cross-platform? How much does it cost? These are the sorts of things I’ll be exploring while evaluating the software for the MCU architecture.

Platform functionality and features will vary a lot by architecture, but I’ll look at basic project management, source-code editor quality, initialization code-generation tools, run-time peripheral libraries, debugging experience, and documentation accessibility.

I’ll focus on manufacturer-provided or manufacturer-suggested IDEs and compilers (and these will be what I use to benchmark the MCU). There’s more than a dozen compilers / IDEs available for many of these architectures, so I can’t reasonably review all of them. Feel free to express your contempt of my methodology in the comments section.

Programmers / debuggers / emulators / dev boards: how do I get code onto this MCU, and how much is it going to cost me? How clunky is it to interface the debugger to the target? Does the device support DFU uploads, or anything that would allow us to forgo a debugger altogether? Every company has a slightly different philosophy for development boards. I’ll explore the availability and pricing

20170723-7678
The test setup consists of a Saleae Logic Pro 16, capturing timing data at 500 Msps, a repurposed Silicon Labs STK board, which was used for power measurement, large bulk capacitors which absorbed current spikes to ensure power measurements were accurate, and an FTDI-enabled RS-485 transceiver (operating in single-ended mode) for the DMX-512 demo testing.

Performance & Power Consumption

I’ve established three different code samples I’ll be benchmarking the cores with. With these tests, I want to establish some performance and efficiency scores.

Clock efficiency: How many clock cycles does it take to execute each chunk of code? We’re looking at clock cycles, not instruction count, since some MCUs implement a RISC-like instruction set (where code gets compiled into lots of very basic instructions that each implement in a single cycle), and some MCUs implement a CISC-like instruction set (where code gets compiled into fewer, more complex instructions that can take multiple cycles to complete). If an 8 MHz MCU has 4x the clock efficiency of a 32 MHz MCU, they will perform the same. It’s amazing that more benchmarks don’t evaluate MCUs this way; a 4x difference may seem insane and improbable, but as we’ll see, this level of performance difference is not at all unusual.

Practical clocking limits: Some MCUs can comfortably run at 64 MHz — while some only hit 20 MHz. Some have internal oscillators that can drive the core near its maximum speed, while others have internal oscillators that are quite a bit slower than what the MCU tops out at, which means you’ll have to use external clocking to get near the performance maximums of the platform (and there’s nothing that professional electronics engineers hate more than laying down expensive quartz crystal oscillator circuits simply to get the full speed out of the part). I’ll be explaining these limitations for each architecture, which will help you get a feel for what kind of speed to expect out of the architecture, given your clocking configuration.

studio_2017-07-23_01-53-16.png
Silicon Labs STK boards aren’t the cheapest dev boards around, but they have a fantastic energy monitor built into the board, plus an Eclipse plugin that visualizes this data in real time. It’s great to see a company willing to put their money where their mouth is, in terms of low-power performance, and I ended up using this dev board for all my low-power measurement.

Power efficiency: this is a hugely important metric when building portable devices, or just, you know, trying to not screw over the planet. If two MCUs are equally performant, yet one has a run current that’s half of the other one, I want to know about it. So, while we’re benchmarking the code to determine clock efficiency, we’ll also be measuring active-mode power consumption, which is code-dependent. I will present a bit of information about low-power efficiency, too, but this varies so much by your application (i.e., which peripherals you have on or off), it’s too daunting to try to measure and compare; instead, I’ll present some metrics from the datasheets for anything that stands out (extremely bad or extremely good sleep- or stop-mode current-consumption, no RTC support, etc).

While(1) Blink

Toggle a pin in a while() loop. Use the fastest C code possible. Prefer a bit-complement call, but resort to a read-modify-write if the platform doesn’t support it. Don’t use GPIO-specific “toggle” registers (which defeats the purpose of the test), but if the platform has it, make a note in the discussion. Report on which instructions were executed, and the number of cycles they took.

What this tests: This gives some good intuition for how the platform works — because it is so low-level, we’ll be able to easily look at the assembly code and individual instruction timing. This routine will also test bit manipulation performance; some architectures implement a “bit complement” instruction, while some don’t. Who comes out ahead?

64-Sample Biquad Filter

You can read CoreMark scores in the datasheets. Instead, I’ll focus on a real-world application where you need performance: digital signal processing. For this test, I’ll process signed 16-bit data through a 400 Hz second-order high-pass filter (Transposed Direct Form II implementation, for those playing along at home), assuming a sample rate of 8 kHz. We won’t actually sample the data from an ADC — instead, the variables will be marked “volatile” so the compiler doesn’t play games with them. We’ll wrap this in a 64-element loop that works on 64-element samples, and I’ll wiggle a bit at the beginning and end to measure performance with my 100 MHz Saleae logic analyzer.

In addition to the samples-per-second measure of raw processing power, we’ll also measure power consumption, which will give us a “nanojoule-per-sample” measure; this will help you figure out how efficient a processor is, since only in limited circumstances do you actually need maximum processing performance out of a chip.

What this tests: Memory and math performance per microamp, essentially. The MCU will have to do five 16-bit signed multiplications, three signed additions/subtractions, plus array indexing. The 8-bit MCUs in our round up are going to struggle with this pretty hardcore — it’ll be interesting to see just how much better the 16 and 32-bit MCUs do!

Test notes: I’m sure some 8-bit fans (of which I am!) will scream that I should use native data types. But these days, it seems like every temperature sensor or ADC or IMU spits out 10, 12, or 16-bit data. An 8-bit biquad has little usefulness, so I’m going to stick to a 16-bit fixed-point implementation. I don’t want to get into a compiler optimization war, and while I’ll try a few different compiler optimizations and pick the best one, I’ve tried to write code that will work well on platforms, regardless of optimization. Here is the actual filter source code:

volatile int16_t in[64];
volatile int16_t out[64];

const int16_t a0 = 16384;
const int16_t a1 = -32768;
const int16_t a2 = 16384;
const int16_t b1 = -25576;
const int16_t b2 = 10508;

int16_t z1, z2;
int16_t outTemp;
int16_t inTemp;

// this goes in main.c's while() loop:
uint8_t i;

// set pin high here

for (i = 0; i < 64; i++)
{
    inTemp = in[i];
    outTemp = inTemp * a0 + z1;
    z1 = inTemp * a1 + z2 - b1 * outTemp;
    z2 = inTemp * a2 - b2 * outTemp;
    out[i] = outTemp;
}

// set pin low here

 

DMX-512 RGB light

DMX-512 is a commonly-used lighting protocol for stage, club, and commercial lighting systems. Electrically, it uses RS-485; the protocol uses a long BREAK message at the beginning, followed by a “0” and then 511 bytes of data, transmitted at 250 kbaud. Implement a DMX-512 receiver that directly drives a common-anode RGB LED. Since this is only for prototyping, detect the start-of-frame by looking for an RX framing error (if the hardware supports it — otherwise, resort to a timer). Minimize power consumption as much as possible by using interrupt-based UART receiver routines, and halting or sleeping the CPU. Report the power consumption (with the LED removed from the circuit, of course). To get a rough idea of the completeness of the code-generator tools, report the total number of statements written as well.

What this tests: This is a sort of holistic test that lets me get into the ecosystem and play around in a way that wiggling a pin or running some copy-pasta filter routine doesn’t. This stuff is the bread and butter of embedded programming: interrupt-based UART reception with a little bit of flair (framing error detection), multi-channel PWM configuration, and nearly-always-halted state-machine-style CPU programming. Once you have your hardware set up and you know what you’re doing (say, after you’ve implemented this on a dozen MCUs before…), with a good code-gen tool and easy-to-access documentation, you should be able to program this with just a few lines of code.

Test notes: I’m using FreeStyler to generate DMX messages through an FTDI USB-to-serial converter (identified to the program as an Enttec Open DMX). As my FTDI cable is 5V, I put a 1k resistor with a 3.3V zener diode to to ground, which clamps the signal to 3.3V. The zener clamp isn’t there to protect the MCU — all of these chips tested have diode catches to protect from over-voltage — but rather, so that the MCU doesn’t draw power from the Rx pin.

The Contenders

This content will be updated as new MCUs are tested. You can read the full, in-depth review by clicking on the title, or just keep scrolling for comparative analysis.

Atmel (Microchip) tinyAVR

Part tested: ATTINY1616

Subjective Assessment: This entry-level AVR device has decent clock-cycle efficiency, but suffers from high active-mode power consumption, spartan peripherals, poor internal clocking options, and a quirky debugging experience. It’s unfortunate that the architecture was nearly abandoned for several years as Atmel moved onto their other products; die-hard open-source devs will appreciate the modern avr-gcc compiler that produces reasonably good machine code, and there are tons of open-source dev tools available on the hardware and software front for flashing AVR MCUs (though none of these support on-chip debugging in any capacity).

And that’s really why AVR is so popular among the hobbyist community (though not necessarily so much among professional engineers) — it’s all the low-cost open-source tools available. These tools seem to be made more out of necessity than pride, though; the official Atmel’s AVR tools are not particularly competitive in terms of features, stability, or cost when compared with those from the other brands tested here, nor do they run on Linux or macOS.

And, of course, the 500-pound gorilla in the room is that AVR is the choix du chef of the Arduino platform. That brings with it a huge community of people playing around and learning microcontrollers, which is never a bad thing. I hope the new ATtiny417/814/816/817 will start shipping soon, as these chips might bring a small breath of fresh air to this stale, legacy ecosystem. Until then, this architecture is best-suited for die-hard open-source fans, or hobbyists who need access to a huge community.

Atmel (Microchip) SAM D10

Part tested: ATSAMD10D14A-SSUT

Subjective Assessment: Atmel’s least-expensive ARM Cortex-M0 offering, the new SAM D10, is positioned to kill off all but the smallest TinyAVR MCUs with its performance numbers, peripherals, and price. Stand-out analog peripherals like a 10-channel 12-bit ADC and 10-bit DAC capstone a peripheral set that would feel at home on a controller twice the cost. Having said that, the 32-bit instruction fetch width means this thing will gobble up program space rapidly, and it’s missing the wake-up interrupt controller, which will affect how quickly you can get out of sleep mode. Sleep current isn’t great, and run current is pretty average, too, which points to an era that a smartly-designed 8-bit MCU can still potentially excel (depending on the application).

As this is an ARM Cortex-M0, developers get full access to commercial and open-source ARM tools for compiling, flashing, and debugging their parts, so there’s de facto cross-platform support for Windows, Linux, and macOS — though the official IDE of this MCU, Atmel Studio, only runs on Windows. Atmel Studio, by the way, turns from a driver-issues-laden, clunky, never-quite-works-right IDE to a polished, smooth-running machine when you switch from AVR to SAM MCUs; professionals will appreciate the painless SAM-ICE (J-Link) setup, while hobbyists will hate the price tag that comes with that debugger. The MCU comes in an SOIC package, which is about as beginner-friendly as you’ll see in the ARM ecosystem. If you’re just getting started, though, I’d recommend the SAM D10 Xplained Mini board, which is just $10 — it’s a “chip breakout”-style dev kit.

Cypress PSoC 4000S

Part Tested: CY8C4024LQI-S401

Subjective Assessment: Cypress was one of the first companies to integrate a ton of analog on their MCUs, but it seems like everyone else in the field has caught up — many of the MCUs we’re reviewing have DACs, comparators, and ADCs with similar resolution and speed. and then surpassed them; especially in terms of pricing. Having said that, from my own development, I know that Cypress has a “it just works” capacitive-touch sensing library, and their proprietary IDE is much lighter weight than Eclipse, while offering most of the same functionality.

Freescale (NXP) KE04

Part tested: MKE04Z8VTG4
Subjective Assessment: The MKE04 was introduced to kill off 8-bit MCUs — with 2.7-5.5V operation, tons of timers, bit manipulation / bit-band support, and a decent analog portfolio, it’s a step in that direction.

While it was the least-expensive ARM MCU for a while, ST and Atmel have entered the Cortex-M0 market with a lot of force, and the MKE04 doesn’t compete as well as it used to.

Kinetis Design Studio, NXP’s free Eclipse Luna SR2-based IDEs, is one of the highlights of the platform. It’s extremely flexible; it has built-in support for ARM-GCC, Keil, and IAR compilers; and it’s the only IDE I tested that has built-in support for OpenOCD (as well as supporting the usual J-Link, et al.). Unfortunately, the whole process of installing Kinetis SDK — with multiple versions floating around that support different MCUs, plus separate Kinestis-SDK / Eclipse-plugin-for-Kinetis-SDK download installation processes, combined with ambiguous references to Processor Expert — steepens the learning curve dramatically.

 

Freescale (NXP) KL03

Part tested: MKL03Z8VFG4
Subjective Assessment: While it may seem strange to have two ARM MCUs from the same manufacturer here, the KE and KL series are quite different.

While it was the least-expensive ARM MCU for a while, ST and Atmel have entered the Cortex-M0 market with a lot of force, and the MKE04 doesn’t compete as well as it used to.

Kinetis Design Studio, NXP’s free Eclipse Luna SR2-based IDEs, is one of the highlights of the platform. It’s extremely flexible; it has built-in support for ARM-GCC, Keil, and IAR compilers; and it’s the only IDE I tested that has built-in support for OpenOCD (as well as supporting the usual J-Link, et al.). Unfortunately, the whole process of installing Kinetis SDK — with multiple versions floating around that support different MCUs, plus separate Kinestis-SDK / Eclipse-plugin-for-Kinetis-SDK download installation processes, combined with ambiguous references to Processor Expert — steepens the learning curve dramatically.

Holtek HT66

Part Tested: HT66F0185

Subjective Assessment: A basic 8-bit microcontroller with a slow, 4-cycle PIC16-like single-accumulator RISC core. Anemic peripheral selection and limited 4K flash / 256-byte RAM capacity makes this a great one-trick pony — not a main system controller. Having said that, the 4T core helps keep the MCU running cool, even with a fast clock running into its peripherals — the HT66 uses less than half a milliamp of current when running at 4 MHz, and 10 µA using the internal 32 kHz RC oscillator.

The development environment has a basic text editor that lacks many text-completion features found in Eclipse-based IDEs, but the IDE has good debugger integration, and the ecosystem is well-documented (though short on code samples and peripheral libraries). I’m not sure I can recommend this particular MCU for general-purpose applications, but reviewing this platform gave me a ton of confidence in the ecosystem — and Holtek has a wide range of interesting application-specific MCUs that integrate this core with high-voltage bridge drivers, specialized analog peripherals, USB, and LCD segment drivers. Any developer experienced with 8-bit MCUs should feel confident in engineering something around these Holtek parts — even if this obscure platform feels a bit dated.

Infineon XMC1100

Part Tested: XMC1100T016X0016

Subjective Assessment: One interesting thing about this MCU is its timer setup. Each of the four timer units has 4 timer slices which can be used in parallel in capture mode, which, in concatenation mode, allows you to capture signals with 64-bit dynamic range.

Microchip PIC16

Part tested: PIC16F18325
Subjective Assessment: The STM32F0

Microchip PIC16 is hugely popular among hobbyists, and has a wide community. The Microchip forums

There’s far too many PIC16 microcontrollers for Microchip to build a dev kit for each of them, so instead, they ship an 8-bit PIC Curiosity Debugger, which is essentially the guts of a PicKit put on a board with a DIP socket. The board supports 8, 14, and 20-pin PIC MCUs (it’s neat that they’re pinned out in the way that they are).

Microchip PIC24

Part tested: PIC24F04KL100
Subjective Assessment: The STM32F0

Microchip PIC16 is hugely popular among hobbyists, and has a wide community. The Microchip forums

There’s far too many PIC16 microcontrollers for Microchip to build a dev kit for each of them, so instead, they ship an 8-bit PIC Curiosity Debugger, which is essentially the guts of a PicKit put on a board with a DIP socket. The board supports 8, 14, and 20-pin PIC MCUs (it’s neat that they’re pinned out in the way that they are).

Microchip PIC32MM

Part tested: PIC32MM0064
Subjective Assessment: The STC15W

The official chip-breakout-style dev board, the Curiosity series, is expensive ($27), but gets the job done.

20170728-7730

Nuvoton N76

Part tested: N76E003AT20
Subjective Assessment: The N76

20170728-7739

Nuvoton M051

Part tested: M052LDN
Subjective Assessment: The M052

NXP LPC811

Part Tested: LPC811M001JDH16

20170723-7694.jpg

Renesas RL78

Part Tested: R5F102A8ASP

Silicon Labs EFM8

Part tested: EFM8LB11
Subjective Assessment: The

A lot of people describe the EFM8 as a “niche” product, but there’s really nothing niche about it — it’s an excellent, general-purpose 8-bit MCU. It’s just got an extremely small community of die-hard fans.

ST STM8

Part tested: STM8S103F3P6
Subjective Assessment: ST is flooding the market with low-cost STM8 MCUs, and while the DigiKey pricing for this part is just under $1, these MCUs are routinely available through Chinese sites like TaoBao or Ali Express for less than 25 cents. This is the lowest-cost MCU reviewed here. But don’t be fooled by the price tag: the STM8 is an incredibly performant microcontroller, thanks to its 32-bit-wide program memory bus (which allows single-cycle fetching of instructions). Its peripherals feel downright modern — I don’t know of any other 8-bit MCU with adjustable-slew-rate / drive-strength GPIO, a nested vector interrupt controller, individual peripheral power gating, or quite as much timer flexibility as this chip has. While the peripherals seem modern, the complement is fairly average (other than a motor-controller-friendly timer module). Power consumption is average in run mode, and below-average in sleep mode.

Before you jump on Ali Express to buy a batch, you may want to check out the IDE you’ll be using. ST Visual Develop (STVD) is…. special. It’s got a strong Windows 98 stench; it feels like a clone of Keil uVision 2 or so. It has primitive code-completion, but not much in the way of features or configurability. Debugging is a breeze, though; It’s cool that the low-cost ST-Link debuggers work across ST’s ARM and STM8 MCUs effortlessly. The official “chip breakout”-style dev kit is the STM8DISCOVERY, which is sold at a loss ($7).

ST STM32F0

Part tested: STM32F030F4P6
Subjective Assessment: The STM32F0 represents why ARM MCUs are an obvious choice when sleep-mode power consumption isn’t a primary concern in your application. Four timers (plus an advanced control timer, like that found on the STM8), plus a 12-bit ADC are this chip’s stand-out peripherals, though the family as a whole can be configured with up to 8 timers, 6 USARTs, a pair of dedicated SPI modules, as well as a pair of I2C ports, too.

LIke the STM8, ST is dumping these on the Chinese market, so these parts can be purchased for 40 cents or so each in ten-packs on English-language retail web sites like Ali Express. The STM32F0DISCOVERY chip breakout dev board is $9

Unlike the other ARM vendors in this round-up, ST has no in-house IDE. so you’ll be relying on third-party development tools, which can get expensive. You can always roll your own Eclipse-based toolchain, but this can be a tedious process to get everything working. ST’s IDE omission is strange, as ST does provide an in-house debugger — the STLink — and it’s actually one of the highlights of the platform: fast, easy to use, and extremely low-cost (perfectly functional clones hover in the $5 range on eBay). If you really want to avoid proprietary tools, the MCU also has a built-in UART bootloader.

STM32Cube is a workable code-gen tool that makes CMSIS less verbose,

STC 15W

Part tested: IAP15W4K61S4
Subjective Assessment: The STC15W is the newest generation of low-cost 8051-derivative MCUs made by STC in Beijing. This is a modern, single-clock-per-machine-cycle 8051 that’s probably the best bang-for-your-buck in terms of raw performance and peripheral functionality. It is well-suited to hobbyist hackers, as all STC15W parts come in DIP packages and can be used in 3.3V or 5V systems.

There’s no expensive debugger to buy — you’ll interact with the STC part through a ROM-based bootloader that you can interface with using a low-cost USB-to-TTL-serial converter. STC offers an ISP app with built-in code samples and decent integration with Keil uVision.

If it all sounds too good to be true, keep in mind there’s no U.S.-based distributors for the parts, and no true on-chip debugging (though a monitor program is available). All the chips now have English documentation, but there’s essentially no English-speaking people who use this part, so forum-based support is going to be challenging.

Texas Instruments MSP430FR

Part tested: MSP430FR2111
Subjective Assessment: The

Specs

Packaging

Atmel only offers DIP packages for their ancient AVR parts (none of which I reviewed), and Texas Instruments’ meager selection leaves you with few MSP430 DIP options.

Timers

Timers were challenging to quantify, since everyone does things differently. For backwards-compatibility, Silicon Labs provides two old-school 8051-style timers (16-bit counters that can also work as an 8-bit timers with auto-reload). Additionally, they fortify this with four modern, 16-bit auto-reload timers (that can also run double-duty in 8-bit mode), as well as an entirely separate 16-bit PWM module driving 6 output channels.

Microchip’s PIC16 has three entirely different timer designs supporting various combinations of 16-bit and 8-bit modes (with only the 8-bit modes doing auto-reload). There are 7 timers total, but there’s also 3 CCP modules for capture/compare/PWM generation, along with two additional dedicated PWM modules.

Everyone else provided unified results

Development Environment

20170725-7709.jpg

Development Tools

Development tools have been shrinking over the last few years; no longer

SDKs

The term “software development kit” is broad; Since we’re focusing on C programming in this review, a microcontroller’s SDK consists of:

  • A toolchain — compiler, assembler, linker, and other utilities used to convert source code into binary objects to load onto the controller;
  • Start-up scripts necessary to initialize the system, as well as initialize any C variables;
  • Header files that the developer uses to access peripheral registers and interrupts on the target;
  • Run-time libraries that contain math, string, and other utility functions — as well as functions for manipulating peripherals;
  • Examples, manuals, and other documentation that explains how these components are glued together to produce applications

Toolchains

Traditionally, compilers were a very expensive component of a development system; it was common for semiconductor manufacturers to outsource the entire development environment for their products to third parties — this gave developers a wide range of options for developing, and one could often turn to a single software package to target a wide variety of embedded devices. IAR Embedded Workbench, as an example, is available for more than 20 architectures — including many parts reviewed here.

However, most vendors today provide their own free toolchains (usually as part of an IDE) to help potential customers get started quickly. For ARM microcontrollers, the ubiquitous choice for free toolchains is GCC; the GNU ARM Embedded Toolchain, specifically, is officially supported and maintained by ARM themselves.

GCC appears on other tested platforms, too.

Nuvoton was the only vendor in my review that provides no packaged toolchain — instead, for their 8051-based N76 MCUs, they rely on customers going directly to Keil for C51; and for their ARM portfolio, they seem to have contributed to CooCox, who maintains a GCC-ARM/Eclipse-based IDE for some of their ARM microcontrollers.

Header Files

One thing we don’t often think about is the header files we’re using when developing embedded applications. In my review, these varied wildly by vendor.

Since MCUs rely on setting, clearing, toggling, and inspecting bits in registers, it’s extremely important for the compiler and header files to provide methods for setting and clearing individual bits (and preferably bit ranges) inside registers.

This is somewhat limited by the compiler, but not by the target processor. The PIC16 lacks bit-set/bit-clear instructions, yet the XC8 compiler will generate read-modify-write instructions when calling into bit-union-defined registers. I like that the PIC16 header files include bit ranges, too, so if there’s a three-bit range in a register somewhere, you can directly assign it a numeric value:

TCONbits.EDA = 5; // sets EDA0 and EDA2

Conversely, some processors provide bit-set and bit-clear instructions, but don’t have simple C constructs for making use of them. Atmel’s AVR-GCC is the biggest offender in this category: To set or clear pins, you have to use instruction sequences that look like read-modify-write calls:

PORTB_DDR |= 2; // set B1 as an output (gross!)
PORTB_DDR &= ~_BV(3); // set B3 as an input (ugly!)

But the compiler will quietly translate these to proper bit-set / bit-clear instructions:

sbit DDRB, 2;
clrbit DDRB, 3;

Renesas provides a similar bit-union setup as Microchip does, but only uses a single generic set of names for all registers — there’s essentially no value to this outside of GPIO manipulation (where “bit7” of port 0 obviously corresponds to P07).

The 8051 vendors all used Keil’s proprietary SFR bit syntax, allowing:

P0_B5 = 1; // set Port 0, bit 5 high on the EFM8
P12 = 0; // set P1.2 low, using standard Keil 8051 bit names

Silicon Labs uses slightly more verbose (and less ambiguous) names when compared to the standard Keil C51 names that Nuvoton’s N76 and STC’s parts use.

It’s important to note that many 8051 registers are not bit-addressable; only those that lie on multiples-of-8 addresses can be manipulated this way (since SETB and CLRB are only two-byte instructions — you only get 256 possible bits).

Why this limitation is elevated even inside C is a bit puzzling when you consider the underlying ISA of the 8051: Keil C51 could easily implement bit-addressable registers using the 8051’s ORL, ANL, and XRL instructions — you’d lose a cycle, but this wouldn’t have any read-modify-write consequences, since bitwise operations (even on registers) are single instructions on the CISCy 8051 so there’s no worry about interrupts or other issues interfering with the values of register.

You could set P1.2 with:

SETB P12    ; bit-set method, 2 cycles on the EFM8

Or just as easily:

ORL P1, #4    ; OR method, 3 cycles on the EFM8

Holtek was the only vendor that provided bit-addressable singletons for every bit in every SFR on the microcontroller.

Performance

Biquad

With good flash prefetching (or running the code out of RAM), an ARM Cortex-M0 should be able to execute a single filter iteration is less than 30 cycles. 16-bit MCUs with comparable ISAs should take roughly the same time (since the filter is implemented with 16-bit math, not 32-bit math). On the other hand, 8-bit MCUs will struggle with these operations dramatically, and might take 4-12 times more cycles to do this, depending on hardware multiplier availability, and some simple compiler optimizations (two of the multiplies will be replaced by logical shifts if the compiler is halfway decent).

Obviously every 16/32-bit part was way faster than every 8-bit part, to the point where I would have to plot them logarithmically to ensure the data is legible. Consequently, I’m going to divide up the results into 16/32-bit, and 8-bit results. I’m placing it in the 16/32-bit pool, since it has a 16-bit ALU (and Renesas considers it a 16-bit MCU) — even though its data pathways are all 8-bit (and it has power consumption figures and clock rates more common with 8-bit parts).

You may think it’s strange that the nJ/sample and pure performance metrics line up so well; It turns out that static power consumption is a major source of waste on microcontrollers; simply by getting a part to run faster, it can usually perform work more efficiently (by performing it more quickly). In the real world, this translates to waking up from deep sleep on an interrupt, doing some quick processing, and then going back to sleep — you’ll almost always win with an MCU that can perform that processing more quickly (of course, this excludes sleep-mode power consumption, which is often the bigger factor to consider!).

The SAM D10 and Kinetis KL03 are the obvious winners; but the Renesas’s hybrid 8/16-bit architecture turned in results that rivaled some of the worse 32-bit MCUs, and the EFM8 is the clear winner among the true 8-bit MCUs — it’s the only 8-bit MCU that can attempt to compete with the 32-bit and 16-bit parts.

In terms of biquad filter performance, the PIC32 delivered the worst results of any 32-bit processor in the review, managing only 829.88 ksps processing speed @ 7.65 mA, which puts it at 30.42 nJ/sample — three times more power consumption than most of the Cortex-M0 processors (only the Nuvoton M051 had worse power measurements among 32-bit MCUs). The major contributor was the slow 24 MHz clock — the 48 MHz Cortex-M0 parts delivered twice the performance, while using the same (or less!) power.

The tinyAVR and PIC16 MCUs delivered a poor showing in the biquad testing. The PIC’s 4x PLL nicely cancels out its 4T architecture — with the 4x PLL enabled, its 8 MHz oscillator produces an 8 MHz instruction clock speed (via a 32 MHz input clock). Unfortunately, this 8 MHz clock speed is relatively slow compared to the 16-72 MHz speeds of the competition 8-bit MCUs. The PIC16 turned in the second-worst numbers in terms of raw speed, and static power consumption overcomes the PIC in the power consumption measurement, too, using more than 500 nJ/sample to process the data.

The AVR’s excessive power consumption ruined the nJ/sample measurements, as well — even though it turned in reasonable performance figures given its clock speed.

The worst MCU on the list was the Holtek HT66, which suffers from its 4T architecture (much like the PIC16). But unlike the PIC16, it has no PLL, and is stuck at a 16 MHz internal oscillator speed that makes it perform similarly to a 4 MHz single-cycle part.

The best 8-bit MCUs on the list were the SiLabs EFM8 and ST’s STM8. The EFM8 does it through shear clock speed (72 MHz), and its flash wait-states caused it to consume a high number of cycles (356). The STM8, on the other hand, does it with a much slower 16 MHz clock speed, but with the most efficient 8-bit implement I saw — it required only 201 clock cycles to complete a filter operation; which is getting in the range of 16-bit and 32-bit MCUs.

Speaking of instruction cycles, you can throw out the AVR folklore regarding its supposed “instruction efficiency” — while its 372 clock cycles per sample were better than the PIC16, it was beat handily by the STM8 and EFM8 — even though the EFM8’s instruction efficiency was greatly handicapped by its flash wait-states (since it was running at 72 MHz).

DMX-512 Receiver

This test penalizes beefy MCUs that use a lot of power, and rewards efficient, simple MCUs that can run at slightly-reduced speeds, while staying on all the time. This is where 8-bit MCUs should shine, and my results were really interesting.

When compared to the biquad test, the Holtek HT66 went from being the absolute worst to nearly the best microcontroller in this test — due to its simplistic 4T core, it was able to keep the 4 MHz clock rate (necessary for decoding 250 kbps UART data with a 16x oversampling rate) going while using only 570 µA of power. The Renesas RL78 was able to best it by slowing down to 2 MHz (using an 8x oversampling UART), and making use of its extremely efficient architecture to keep the ISR period short enough to process the incoming bytes (no FIFOs were used) — it clocked in the competition-winning 515.62 µA record.

The ARM microcontrollers were really bad at this. Most of them had to run

Hall of Shame

I refuse to declare a “winner” — but I love picking losers. Let’s look at the worst microcontrollers in each category:

Performance

The Holtek HT66 is really slow. Its 4T core is built to sit back and manage the peripherals, and run through state machines — not perform intensive math operations. While it has decent power consumption figures, long-running math operations coupled with static power consumption really ruin its showing in the biquad computational test. The other MCUs with horrible performance was the Atmel tinyAVR and Microchip PIC16 cores.

 

 

Leave a comment