The Amazing $1 Microcontroller

As an embedded design consultant, I have a diverse array of projects on my desk that have wildly different requirements, and end up targeting vastly different microcontroller architectures. At the same time, we all have our go-to chips — those parts that linger in our toolkit after being picked up in school, through forum posts, or from previous projects.

In 2017, we saw several new MCUs hit the market, as well as general trends continuing in the industry: the migration to open-source, cross-platform development environments and toolchains; new code-generator tools that integrate seamlessly (or not so seamlessly…) into IDEs; and, most notably, the continued invasion of ARM Cortex-M0+ parts into the 8-bit space.

I wanted to take a quick pulse of the industry to see where everything is — and what I’ve been missing while backed into my corner of DigiKey’s web site.

It’s time for a good ol’ microcontroller shoot-out.

The Rules

While some projects that come across my desk are complex enough to require a hundreds-of-MHz microcontroller with all the bells and whistles, it’s amazing how many projects work great using nothing more than a $1 chip — so this is the only rule I established for the shoot-out. 1To get technical: I purchased several different MCUs — all less than a $1 — from a wide variety of brands and distributors. I’m sure people will chime in and either claim that a part is more than a dollar, or that I should have used another part which can be had for less than a dollar. I used a price-break of 100 units when determining pricing, and I looked at typical, general suppliers I personally use when shopping for parts — I avoided eBay/AliExpress/Taobao, unless they were the only source for the parts, which is common for devices most popular in China and Taiwan.

I wanted to explore the $1 pricing zone specifically because it’s the least amount of money you can spend on an MCU that’s still general-purpose enough to be widely useful in a diverse array of projects.

Any cheaper, and you end up with 6- or 8-pin parts with only a few dozen bytes of RAM, no ADC, nor any peripherals other than a single timer and some GPIO.

Any more expensive, and the field completely opens up to an overwhelming number of parts — all with heavily-specialized peripherals and connectivity options.

These MCUs were selected to represent their entire families — or sub-families, depending on the architecture — and in my analysis, I’ll offer some information about the family as a whole.

If you want to scroll down and find out who the winner is, don’t bother — there’s really no sense in trying to declare the “king of $1 MCUs” as everyone knows the best microcontroller is the one that best matches your application needs. I mean, everyone knows the best microcontroller is the one you already know how to use. No, wait — the best microcontroller is definitely the one that is easiest to prototype with. Or maybe that has the lowest impact on BOM pricing?

I can’t even decide on the criteria for the best microcontroller — let alone crown a winner.

What I will do, however, is offer a ton of different recommendations for different users at the end. Read on!

The Criteria

This will not be another Atmel-vs-Microchip opinion piece. Instead, this review will have both qualitative and quantitative assessments. Overall, I’ll be looking at a few different categories:

Parametrics, Packaging, and Peripherals

Within a particular family, what is the range of core speed? Memory? Peripherals? Price? Package options?

Some microcontroller families are huge — with hundreds of different models that you can select from to find the perfect MCU for your application. Some families are much smaller, which means you’re essentially going to pay for peripherals, memory, or features you don’t need. But these have an economies-of-scale effect; if we only have to produce five different MCU models, we’ll be able to make a lot more of each of them, driving down the price. How do different MCU families end up on that spectrum?

Package availability is another huge factor to consider. A professional electronics engineer working on a wearable consumer product might be looking for a wafer-level CSP package that’s less than 2×2 mm in size. A hobbyist who is uncomfortable with surface-mount soldering may be looking for a legacy DIP package that can be used with breadboards and solder protoboards. Different manufacturers choose packaging options carefully, so before you dive into an architecture for a project, one of the first things to consider is making sure that it’s in a package you actually want to deal with.

Peripherals can vary widely from architecture to architecture. Some MCUs have extremely powerful peripherals with multiple interrupt channels, DMA, internal clock generators, tons of power configuration control, and various clocking options. Others are incredibly simple — almost basic. Just as before, different people will be looking for different things (even for different applications). It would be a massive undertaking to go over every single peripheral on these MCUs, but I’ll focus on the ones that all MCUs have in common, and point out fine-print “gotchas” that datasheets always seem to glance over.

Development Experience

While this is where things get subjective and opinion-oriented, I’ll attempt to present “just the facts” and let you decide what you care about. The main source of subjectivity comes from weighing these facts appropriately, which I will not attempt to do.

IDEs / SDKs / Compilers: What is the manufacturer-suggested IDE for developing code on the MCU? Are there other options? What compilers does the microcontroller support? Is the software cross-platform? How much does it cost? These are the sorts of things I’ll be exploring while evaluating the software for the MCU architecture.

Platform functionality and features will vary a lot by architecture, but I’ll look at basic project management, source-code editor quality, initialization code-generation tools, run-time peripheral libraries, debugging experience, and documentation accessibility.

I’ll focus on manufacturer-provided or manufacturer-suggested IDEs and compilers (and these will be what I use to benchmark the MCU). There’s more than a dozen compilers / IDEs available for many of these architectures, so I can’t reasonably review all of them. Feel free to express your contempt of my methodology in the comments section.

Programmers / debuggers / emulators / dev boards: What dev boards and debuggers are available for the ecosystem? How clunky is the debugging experience? Every company has a slightly different philosophy for development boards and debuggers, so this will be interesting to compare.


I’ve established three different code samples I’ll be using to benchmark the parts; I’ll be measuring quantitative parameters like benchmark speed, clock-cycle efficiency, power efficiency, and code-size efficiency.

While(1) Blink

For this test, I’ll toggle a pin in a while() loop. I’ll use the fastest C code possible — while also reporting if the code generator tool or peripheral libraries were able to produce efficient code. I’ll use bitwise complement or GPIO-specific “toggle” registers if the platform supports it, otherwise, I’ll resort to a read-modify-write operation. I’ll report on which instructions were executed, and the number of cycles they took.

What this tests: This gives some good intuition for how the platform works — because it is so low-level, we’ll be able to easily look at the assembly code and individual instruction timing. Since many of these parts operate above flash read speed, this will allow us to see what’s going on with the flash read accelerator (if one exists), and as a general diagnostic / debugging tool for getting the platform up and running at the proper speed. This routine will obviously also test bit manipulation performance (though this is rarely important in general-purpose projects).

64-Sample Biquad Filter

This is an example of a real-world application where you often need good, real-time performance, so I thought it would be a perfect test to evaluate the raw processing power of each microcontroller. For this test, I’ll process 16-bit signed integer data through a 400 Hz second-order high-pass filter (Transposed Direct Form II implementation, for those playing along at home), assuming a sample rate of 8 kHz. We won’t actually sample the data from an ADC — instead, we’ll process 64-element arrays of dummy input data, and record the time it takes to complete this process (by wiggling a GPIO pin run into my 500 MHz Saleae Logic Pro 16 logic analyzer).

In addition to the samples-per-second measure of raw processing power, we’ll also measure power consumption, which will give us a “nanojoule-per-sample” measure; this will help you figure out how efficient a processor is.

What this tests: Memory and 16-bit math performance per microamp, essentially. The 8-bit MCUs in our round up are going to struggle with this pretty hardcore — it’ll be interesting to see just how much better the 16 and 32-bit MCUs do. Like it or hate it, this will also evaluate a compiler’s optimization abilities, since different compilers implement math routines quite differently.

DMX-512 RGB light

DMX-512 is a commonly-used lighting protocol for stage, club, and commercial lighting systems. Electrically, it uses RS-485; the protocol uses a long BREAK message at the beginning, followed by a “0” and then 511 bytes of data, transmitted at 250 kbaud. In this test, I’ll implement a DMX-512 receiver that directly drives a common-anode RGB LED. I will do this with whatever peripheral library or code-generator tool the manufacturer provides (if any at all).

While you should really look for a precisely-timed break message, since this is only for prototyping, I’ll detect the start-of-frame by looking for an RX framing error (or a “break” signal, as some UARTs support LIN).

I’ll minimize power consumption by lowering the frequency of the CPU as much as possible, using interrupt-based UART receiver routines, and halting or sleeping the CPU. I’ll report the average power consumption (with the LED removed from the circuit, of course). To get a rough idea of the completeness and quality of the code-generator tools or peripheral libraries, I’ll report the total number of statements I had to write, as well as the flash usage.

What this tests: This is a sort of holistic test that lets me get into the ecosystem and play around with the platform. This stuff is the bread and butter of embedded programming: interrupt-based UART reception with a little bit of flair (framing error detection), multi-channel PWM configuration, and nearly-always-halted state-machine-style CPU programming. Once you have your hardware set up and you know what you’re doing (say, after you’ve implemented this on a dozen MCUs before…), with a good code-gen tool or peripheral library, you should be able to program this with just a few lines of code in an hour or less — hopefully without having to hit the datasheet much at all.

Test notes: I’m using FreeStyler to generate DMX messages through an FTDI USB-to-serial converter (the program uses the Enttec Open DMX plugin to do this). As my FTDI cable is 5V, I put a 1k resistor with a 3.3V zener diode to to ground, which clamps the signal to 3.3V. The zener clamp isn’t there to protect the MCU — all of these chips tested have diode catches to protect from over-voltage — but rather, so that the MCU doesn’t inadvertently draw power from the Rx pin, which would ruin my current measurement results.

The Contenders

This page will compare the devices, development tools, and IDEs all together. However, to prevent this article from getting overwhelmingly long, I’ve created review pages for each device that cover way more details about the architecture — along with more complete testing notes, different benchmarking variations, and in-depth assessment. If an architecture strikes your interest, you should definitely check out the full review below.

Atmel (Microchip) tinyAVR 1-Series
Atmel TinyAVR

Part tested: ATtiny1616
The new-for-2017 tinyAVR line includes seven parts with XMEGA-style peripherals, a two-cycle 8×8 multiplier, the new UPDI one-wire debug interface, and a 20 MHz oscillator that should shoot some energy into this line of entry-level AVR controllers that was looking quite long in the tooth when compared to other 8-bit parts.

Atmel (Microchip) megaAVR
Atmel MegaAVR

Part tested: ATmega168PB
The AVR earned its hobbyist-friendly badge as the first MCU programmable in C with open-source tools. The “B” version of the classic ATmega168 takes a price cut due to a die-shrink, but little else has changed, including the anemic 8 MHz internal oscillator — and, like the tinyAVR, must sip on 5V to hit its full 20 MHz speed.

Atmel (Microchip) SAM D10
Atmel SAM D10

Part tested: ATSAMD10D14A
Atmel is positioning their least-expensive ARM Cortex-M0 offering — the new SAM D10 — to kill off all but the smallest TinyAVR MCUs with its performance numbers, peripherals, and price. Stand-out analog peripherals capstone a peripheral set and memory configuration that endows this part with good value.

Cypress PSoC 4000S
Cypress PSoC 4000S

Part Tested: CY8C4024LQI
Reconfigurable digital logic — the PSoC’s claim to fame — is absent from this entry-level 24 MHz ARM part that also sports average analog features, and no support for Cypress’s handy “it just works” capacitive-touch hardware. Other than its unique development environment, this part treads water in a sea of low-cost ARM devices.

Freescale (NXP) Kinetis KE04
Freescale KE04

Part tested: MKE04Z8VTG4
Freescale introduced the ARM Cortex-M0 KE04 to kill off 8-bit MCUs — and with 2.7-5.5V support, tons of timers and decent analog options, it’s a step in that direction. Processor Expert provides a unique development experience for rapid-prototyping, which may be enough to lure devs away from new, better-endowed parts.

Freescale (NXP) Kinetis KL03
Freescale KL03

Part tested: MKL03Z8VFG4
While the KE series attacks the 8-bit sector with bells and whistles, the The KL series focuses on being some of the lowest-power ARM parts on the market, with good low leakage performance in sleep mode. I’m testing this 48 MHz ARM part inside of NXP’s MCUXpresso, which recently added support for the newer Kinetis devices.

Holtek HT-66
Holtek HT-66

Part Tested: HT66F0185
A basic 8-bit microcontroller with a slow, 4-cycle PIC16-style single-accumulator RISC core. An anemic peripheral selection and limited memory capacity makes this a better one-trick pony than a main system controller. Holtek has a wide range of application-specific MCUs that integrate this core with H-bridges, special analog, and other goodies.

Infineon XMC1100
Infineon XMC1100

Part Tested: XMC1100T016X0016
Infineon ARM chips are common picks for control projects, and the new XMC1100 is no different. With 16K of RAM, a 1 MSPS six-channel ADC, flexible communications, up to 16 timer capture channels, and the ability to form a 64-bit timer for large-range timing gives this part a bit of personality among entry-level Cortex-M0 microcontrollers.

Microchip PIC16
Microchip PIC16

Part tested: PIC16LF18325
Vying with the 8051 as the most famous microcontroller of all time, the latest PIC16 Five-Digit Enhanced parts feature improved peripheral interconnectivity, more timers, and better analog. Still driven by a sluggish core that clambers along at one-fourth its clock speed, the PIC16 has always been best-suited for peripheral-heavy workloads.

Microchip PIC24
Microchip PIC24

Part tested: PIC24F04KL100
An expensive 16-bit part that’s designed (and priced) to mirror the MSP430. While it’s got decent performance and power consumption, it’s hard not to look toward other parts — even Microchip’s own PIC32MM — which offer better pricing, and can beat the PIC24 at everything other than deep-sleep current consumption.

Microchip PIC32MM
Microchip PIC32MM

Part tested: PIC32MM0064
The 32-bit MIPS-powered PIC32MM compares similarly with ARM controllers on a per-cycle basis, but doesn’t provide the same flexibility with tooling that ARM does. It’s a great part for 32-bit beginners, though, as it brings along PIC18/PIC24-style peripherals and fuse-based configuration, making development simpler.

Nuvoton N76
Nuvoton N76

Part tested: N76E003AT20
The N76 is a 1T-style 8051 that brings a few twists and useful additions to the basic set of ’51 peripherals. This MCU has a slower instruction timing versus the EFM8 or STC8, but it’s hard to complain about a well-documented, fully-featured MCU with North American support that you can buy with East Asia pricing.

Nuvoton M0
Nuvoton M0

Part tested: M052LDN
The M0 series is a high-value, 50 MHz Cortex-M0 with excellent timers and comms peripherals, a coherent, easy-to-use functional-oriented peripheral library, a relatively high pin-count, and utilitarian dev tools. The Achilles’ heel is the somewhat-limited IDE options, buggy software, and gross power consumption figures.


Part Tested: LPC811M001JDH16
The LPC81x is famous among hobbyists for the LPC810 — an 8-pin DIP-package MCU. For everyone else, the LPC81x is an older, forgettable 30 MHz ARM that’s short on peripherals (it doesn’t even have an ADC). An easy-to-use function-oriented peripheral library, serial loader, and plenty of code examples on blog posts keep this part alive.

Renesas RL-78
Renesas RL-78

Part Tested: R5F102A8ASP
With the RL-78, Renesas built a clever hybrid MCU with an 8-bit-wide data path and a 16-bit-wide ALU, balancing cost and performance. Excellent low-power consumption, arrayed comms and timer peripherals, plus a good code-gen tool built into the free Eclipse IDE makes this part a strong competitor against the PIC24 and MSP430.

Sanyo (ON Semiconductor) LC87
Sanyo LC87

Part Tested: LC87F1M16
There’s not much to like in the LC-87. Abysmal power consumption, underwhelming peripherals, unfriendly pricing, and an obnoxiously antiquated development ecosystem should steer away almost any rational person from this architecture that, from the copyright dates on the development tools, looks to be headed to the grave.

Silicon Labs EFM8 Laser Bee
Silicon Labs EFM8

Part tested: EFM8LB11
The EFM8 Laser Bee is a snappy 72 MHz 8051 MCU that’s both the fastest 8-bit MCU in our round-up, as well as one of the lowest-power. Low-cost tools, a wonderful free cross-platform IDE, easy-to-program peripherals, and a helpful online community should get hobbyists interested in exploring outside the Atmel/Microchip ecosystem.


Part tested: STM8S103F3P6
The STM8 feels like an ARM in disguise: a 32-bit-wide program memory bus with efficient compute performance, peripheral power gating, and a nested vector interrupt controller makes this thing look more like its STM32 big brothers. If only its STVD development environment felt as modern as its peripheral set does.


Part tested: STM32F030F4P6
While the F0 has an average peripheral set and worse-than-average power consumption, its low-cost ST-Link debugger, free IDE, good code-gen tools, and huge parametric latitude (up to the 180 MHz, 2 MB STM32F4) make this a useful family to learn — plus everyone seems to have an STM32 Discovery board laying around anyway.


Part tested: STC8A8K64S4A12
A brand-new, single-cycle 8051 jam-packed full of flash, RAM, and oodles of peripherals — and a large, 64-pin package to make use of all these guts. Unfortunately, this part isn’t quite ready for prime-time: the datasheet hasn’t been translated into English yet, the errata is massive, and there’s limited availability of the part.

Texas Instruments MSP430FR

Part tested: MSP430FR2111
Texas Instruments dials down the power consumption in the latest-iteration of the MSP430. FRAM memory, flexible power states, and tons of internal clocking options make this part a battery’s dream come true. You’ll pay for this power during check-out — the MSP430 tends to be twice as expensive as competing 8-bit designs.

Specs Comparison


Microcontrollers continue to divide into two camps — those with vendor-specific core architectures, and those who use a third-party core design. Out of the 21 microcontrollers reviewed here, eight of them use a 32-bit ARM core, which is becoming ubiquitous in the industry — even at this price point. Three of the microcontrollers use an 8-bit 8051-compatible ISA. The remaining ten use the vendor’s proprietary core design: six are 8-bit parts, three are 16-bit parts, and the PIC32MM is the sole 32-bit part that doesn’t use an ARM core.

Arm Cortex-M0

The Arm Cortex-M0 2Formerly ARM, but as of August 1, 2017, “Arm” is the capitalization style they now a 32-bit RISC architecture that serves as the entry-level Arm architecture available to silicon vendors for microcontroller applications. Arm cores are designed by Arm Holdings  and licensed to semiconductor manufacturers for integration into their products.

Arm started out as a personal computer microprocessor when Advanced RISC Machines formed a joint venture between Acorn, Apple, and VLSI Technology to manufacture 32-bit processors for the Acorn computer. While Arm cores have grown in popularity as microprocessors for battery-powered systems (they are almost certainly powering your smartphone), Arm moved into the microcontroller sphere as well — the ARM7TDMI-S was probably the first Arm core that was used in microcontrollers — i.e., processors with completely self-contained RAM, flash, and peripherals. The Atmel AT91 and ST STR7 were probably the first microcontroller parts designed with an Arm core.

It’s important to understand the history of Arm, because it explains a serious feature of Arm microcontrollers that differs substantially from the 8051 (the other multi-vendor architecture that dominates the field): Unlike the 8051, Arm is just a core, not a complete microcontroller.

The ARM7TDMI-S didn’t come with any GPIO designs, or provisions for UARTs or ADCs or timers — it was designed as a microprocessor. Thus, as vendors started stuffing this core into their extremely high-end MCUs, they had to add in their vendor-specific peripherals to the AHB (AMBA3Advanced Microcontroller Bus Architecture — these multi-level acronyms are getting tediousHigh-performance Bus).

Consequently, Freescale used a lot of HC08 and ColdFire peripherals; while Atmel designed new peripherals from scratch. ST borrowed a bit from the ST7 (the precursor to the STM8), but used new designs for timers and communications peripherals.

Since many microcontroller projects spend 90% or more of the code base manipulating peripherals, this is a serious consideration when switching from one Arm MCU vendor to another: there’s absolutely zero peripheral compatibility between vendors, and even within a single vendor, their Arm parts can have wildly different peripherals.

Unlike other Arm parts, the M0 series only supports a subset of the 16-bit Thumb instruction set, which allows it to be about 1/3 the size of a Cortex-M3 core. Still, there’s a full 32-bit ALU, with a 32-bit hardware multiplier supporting a 32-bit result. Arm provides the option of either a single-cycle multiply, or a 32-cycle multiply instruction, but in my browsing, it seems as though most vendors use the single-cycle multiply option.

In addition to the normal CPU registers, Arm cores have 13 general-purpose working registers, which is about the sweet spot. The core has a nested vector interrupt controller, with up to 32 interrupt vectors and 4 interrupt priorities — plenty when compared to the 8-bit competition, but a far cry from the 240 interrupts at 256 interrupt priorities that the larger Arm parts support. The core also has full support of runtime exceptions, which isn’t a feature found on 8-bit architectures.

The M0+ is an improved version of the M0 that supports faster two-cycle branches (due to the pipeline going from three-stage to two-stage), and lower power consumption. There are a slew of silicon options that vendors can choose from: single-cycle GPIO, support for a simple instruction trace buffer called Micro Trace Buffer (MTB), vector table relocation, and a rudimentary memory protection unit (MPU).

One of the biggest problems with ARM microcontrollers is their low code density for anything other than 16- and 32-bit math — even those that use the 16-bit Thumb instruction set. This means normal microcontroller type routines — shoving bytes out a communication port, wiggling bits around, performing software ADC conversions, and updating timers — can take a lot of code space on these parts. Exacerbating this problem is the peripherals, which tend to be more complex — I mean “flexible” — than 8-bit parts, often necessitating run-time peripheral libraries and tons of register manipulation.

Another problem with ARM processors is the severe 12-cycle interrupt latency. When coupled with the large number of registers that are saved and restored in the prologue and epilogue of the ISR handlers, these cycles start to add up. ISR latency is one area where a 16 MHz 8-bit part can easily beat a 72 MHz 32-bit Arm microcontroller.

8051 Instruction Timing

  • STC8
  • EFM8
  • STC15W
  • Nuvoton N76


The 8051 was originally an Intel microcontroller introduced in 1980 as one of the first widely-deployed 8-bit parts. It was built from the ground up for

Other Cores

The 8051 was originally an Intel microcontroller introduced in 1980 as one of the first widely-deployed 8-bit parts. It was built from the ground up for


Use the tabs below to compare precise specs across families.

Atmel tinyAVR 20 MHz
Atmel megaAVR 20 MHz
Atmel SAM D10 48 MHz
Cypress PSoC 4000S 24 MHz
Freescale KE04 48 MHz
Freescale KL03 48 MHz
Holtek HT66 20 MHz
Infineon XMC1100 32 MHz
Microchip PIC16 32 MHz
Microchip PIC24 32 MHz
Microchip PIC32MM 25 MHz
Nuvoton N76 16 MHz
Nuvoton M0 50 MHz
NXP LPC811 30 MHz
Renesas RL78 24 MHz
Sanyo LC87 12 MHz
Silicon Labs EFM8 72 MHz
ST STM8 16 MHz
ST STM32F0 48 MHz
TI MSP430FR 16 MHz

The chart above illustrates the differences in core clock speed among each MCU. As will be seen in the evaluation section, core clock speed is not a good predictor of performance when comparing between different MCU families (especially between 8-, 16-, and 32-bit parts). However, most MCUs limit the maximum peripheral clock rate to that of the CPU, which may be a driving factor if your application requires fast peripheral clocks (say, for fast GPIO bit-banging or for high-speed capture/compare timer operations). The Infineon XMC1100 is a neat exception to this rule — its peripheral clock can run at up to 64 MHz.

There are other important asterisks to this data: the Atmel tinyAVR and megaAVR parts have severely limited operating ranges when running below 5V, which will affect most modern designs. The tinyAVR can only run at 10 MHz below 3.6V, and at 5 MHz below 2.2V. The megaAVR has the same speed grades, but even worse, has nothing faster than an 8 MHz internal oscillator. When talking about sub-$1 MCUs, adding a crystal or even low-cost ceramic resonator adds a sizable portion of the cost of the MCU to the BOM.

The Silicon Labs EFM8 Laser Bee, with its 72 MHz core clock speed, beats out even the ARM microcontrollers in this round-up. The Sanyo LC87 brings in a 12 MHz reading — but bear in mind this is a 3T architecture, which limits the actual instruction clock speed to 4 MHz. The Holtek HT66 and Microchip PIC16 are both 4T architectures, but the PIC16 has a relatively snappy 32 MHz core speed (thanks to its on-board PLL), which allows it to compete better with 8 MHz parts.

Atmel tinyAVR 16 KB
Atmel megaAVR 16 KB
Atmel SAM D10 16 KB
Cypress PSoC 4000S 16 KB
Freescale KE04 8 KB
Freescale KL03 8 KB
Holtek HT66 8 KB
Infineon XMC1100 14 KB
Microchip PIC16 14 KB
Microchip PIC24 4 KB
Microchip PIC32MM 32 KB
Nuvoton N76 18 KB
Nuvoton M0 12 KB
Renesas RL78 10 KB
Sanyo LC87 16 KB
Silicon Labs EFM8 16 KB
ST STM32F0 16 KB

Here, we consider flash capacity in terms of bytes — but be aware that flash usage varies considerably by core. The PIC16 may have 14 KB of flash, but being a 14-bit-wide core, it will feel more like it has 8 KB of flash when it comes time to actually storing and manipulating data, and counting instructions. The Holtek HT66 is similar to the PIC16, but with a 16-bit-wide fetch. The same goes for the PIC24, which has a 24-bit-wide flash data path. While other cores may have a wider-than-a-byte fetch size, they still allow instructions and data to be packed into flash efficiently.

The STC8’s insane equipment list starts to shine — with 64 KB of flash, this part should be able to do essentially anything you throw at its space-efficient 8051 core. On the other side of the spectrum, the ARM processors were especially stingy with flash capacity, which is important to consider for applications that rely on a lot of peripheral runtime libraries: as we’ll see in the performance evaluation, many of these parts had little room left over after the test code was programmed onto them. The Microchip PIC24’s 4 KB flash capacity makes the part essentially unusable for general-purpose applications that would usually target a 16-bit controller, as it can only hold 1408 instructions in flash (due to its 24-bit-wide word fetches).

Atmel tinyAVR 2 KB
Atmel megaAVR 1 KB
Atmel SAM D10 4 KB
Cypress PSoC 4000S 2 KB
Freescale KE04 1 KB
Freescale KL03 2 KB
Holtek HT66 0.25 KB
Infineon XMC1100 16 KB
Microchip PIC16 1 KB
Microchip PIC24 0.5 KB
Microchip PIC32MM 8 KB
Nuvoton N76 1 KB
Nuvoton M0 4 KB
Renesas RL78 0.768 KB
Sanyo LC87 1 KB
Silicon Labs EFM8 1.28 KB

Infineon’s memory configuration is actually “backwards” — it has 8 KB of flash, and 16 KB of RAM. It doesn’t have a good flash pre-fetch engine, so performance-critical code should be moved to RAM for fast execution. While you’re at it, unless you’ve got some large data capturing/analyzing procedures, you could just move your entire program into RAM, too.

The KE04 (and probably the PIC24) both have too little RAM for most projects that would target these architectures. I’m actually less worried about the Holtek HT66; it’s efficient peripheral library essentially uses no RAM, and 256 bytes of user data is plenty of space for an ultra-low-power MCU that’s designed for only the most basic duties.

Atmel tinyAVR 28 Points
Atmel megaAVR 17 Points
Atmel SAM D10 51 Points
Cypress PSoC 4000S 37 Points
Freescale KE04 50 Points
Freescale KL03 26 Points
Holtek HT66 12 Points
Infineon XMC1100 48 Points
Microchip PIC16 33 Points
Microchip PIC24 17 Points
Microchip PIC32MM 44 Points
Nuvoton N76 31 Points
Nuvoton M0 62 Points
NXP LPC811 45 Points
Renesas RL78 54 Points
Sanyo LC87 22 Points
Silicon Labs EFM8 42 Points
ST STM8 43 Points
ST STM32F0 40 Points
STC STC8 51 Points
TI MSP430FR 13 Points

Each MCU’s review page discusses its complement of timers, but I wanted to come up with a score to help quickly compare timer peripherals across MCUs — I call it TimerMark.

Here’s how it works:

Basic Timer Blocks

  • 1 point for 8-bit counters
  • 2 points for 8-bit auto-reload (period) timers
  • 2 points for 16-bit counters
  • 4 points for 16-bit auto-reload timers
  • 6 points for 24-bit auto-reload timers
  • 8 points for 32-bit auto-reload timers
  • 2 points for RTC
  • 2 points for each 16-bit timer set that can be extended to 32-bit

Capture Channels & PWM

  • 1 point for each capture channel (regardless of depth)
  • 1 point for each 8-bit PWM channel
  • 2 points for each 16-bit PWM channel
  • 3 points for each 24-bit PWM channel
  • 3 points for each arbitrary-phase PWM channel (i.e., if the channel has separate ON and OFF comparator registers
  • 2 points if the PWM module is “power friendly” (multi-channel outputs with programmable dead-time, blanking, etc)
Atmel tinyAVR 9 Channels
Atmel megaAVR 6 Channels
Atmel SAM D10 8 Channels
Cypress PSoC 4000S 5 Channels
Freescale KE04 8 Channels
Freescale KL03 4 Channels
Holtek HT66 3 Channels
Infineon XMC1100 4 Channels
Microchip PIC16 7 Channels
Microchip PIC24 2 Channels
Microchip PIC32MM 3 Channels
Nuvoton N76 6 Channels
Nuvoton M0 8 Channels
NXP LPC811 4 Channels
Renesas RL78 7 Channels
Sanyo LC87 4 Channels
Silicon Labs EFM8 6 Channels
ST STM8 9 Channels
ST STM32F0 6 Channels
STC STC8 12 Channels
TI MSP430FR 2 Channels
Atmel tinyAVR 3 Resources
Atmel megaAVR 4 Resources
Atmel SAM D10 3 Resources
Cypress PSoC 4000S 2 Resources
Freescale KE04 3 Resources
Freescale KL03 3 Resources
Holtek HT66 2 Resources
Infineon XMC1100 2 Resources
Microchip PIC16 3 Resources
Microchip PIC24 2 Resources
Microchip PIC32MM 4 Resources
Nuvoton N76 4 Resources
Nuvoton M0 6 Resources
NXP LPC811 4 Resources
Renesas RL78 4 Resources
Sanyo LC87 3 Resources
Silicon Labs EFM8 4 Resources
ST STM8 3 Resources
ST STM32F0 3 Resources
STC STC8 6 Resources
TI MSP430FR 1 Resources

The chart above illustrates the total number of communications resources each microcontroller has.

Most vendors use a separate, dedicated module for UART, SPI, and I2C communication. Nuvoton’s M0 went a step further and doubled up everything — two UARTs, two SPI, and two I2C. Except under heavily-loaded buses, or under extremely strange designs, there’s rarely a need for multiple SPI or I2C buses, so this decision struck me as a bit odd.

STC‘s decision to incorporate four separate UARTs in addition to an SPI and an I2C module seems to be more useful, since UARTs are almost always dedicated to a particular peripheral. Doubling-up on UARTs was a popular decision overall: the other 8051s — the EFM8 and the N76 — bring a second UART, as do the PIC16 and PIC32MM, along with the LPC811.

I also like the flexibility that Atmel, Cypress, Infineon, and Renesas have — they use arrayed “serial units,” each can morph into a UART, SPI, or I2C peripheral. Cypress and Infineon have two; Atmel and Renesas have three (plus a fourth I2C-slave-only interface for Renesas).

The odd ones out were the PIC32MM, the LC87, and the MSP430: none of these MCUs have an I2C peripheral — a curious omission in an era of SMBus-complaint digital sensors everywhere. And unlike the 8051’s quasi-bidirectional GPIO configuration, none of these MCUs are well-suited to bit-banging I2C (though it’s obviously possible by switching data-direction).

Peripheral Highlights

All of these

The 1-series appears to use the full megaAVR core, which includes a hardware multiplier. Interestingly, while it is considerably cheaper than the ATmega168pb, in many ways, it’s a better microcontroller. It has a two-level interrupt controller, the same core, and twice as much flash.

Development Ecosystem
The development ecosystem of a microcontroller has a profound impact on productivity and ease of use of the part, and these IDEs, peripheral libraries, dev boards, and debuggers varied wildly among parts reviewed here.

Development Environments

Which of these #ifdefs are enabled? Your guess is as good as mine; Atmel Studio is Visual Studio without Microsoft’s excellent IntelliSense engine, making it worse than even Keil µVision in terms of text-editing productivity — and far inferior to the Eclipse- and NetBeans-based IDEs from competitors. I added 6 publicly-visible global variables in this file among others in the project, and none of them appear in the auto-complete list.

Atmel Studio

This error started popping up recently in Atmel Studio — the only solution seems to restart my computer. It’s obviously from an old chunk of code, since it’s referring to the program as “AVR Studio.”

While many vendors have transitioned to Eclipse-based IDEs, Atmel went with a Visual Studio Isolated Shell-based platform starting with AVR Studio 5. I do a ton of .NET and desktop-based C++ development, so I expected to feel right at home in Atmel Studio when I first launched it. Unfortunately, Microsoft calls this product “Visual Studio Isolated Shell” for a reason — it’s simply the shell of Visual Studio, without any of the meat. The excellent IntelliSense engine that Microsoft spent years perfecting has been replaced by some sort of Atmel-proprietary “Visual Assist” technology that struggles to identify global variables, evaluate pre-processor definitions, or perform refactoring of items defined outside of the current file. The Toolchain editor is a near-clone of the Eclipse CDT one (no reason to reinvent the wheel), but it’s missing checkboxes and inputs for commonly-used compiler and linker options; one stunning omission is link-time optimization, which even when manually-specified as command parameters, doesn’t seem to work — odd, since Atmel is using a recent version of GCC.

My biggest issue with Atmel Studio is how incredibly buggy and unstable it has been every time I’ve used it in the last two years. I’m not referring to a specific installation on a specific computer: rather, every single time I’ve installed the software, I’ve fought with AVR Dragon drivers, a bad DLL file in the installer, programmer firmware issues, or, most recently, the software popping up the “Waiting for an operation to complete” message that prevents me from debugging any Atmel product without restarting my computer. Look, I get it: embedded firmware development is a highly-specialized task, so maintaining software that works reliably for such a small user-base can be challenging. Yet, every other vendor tools tested worked nearly flawlessly.


Bit Toggling

Atmel tinyAVR 4 Cycles
Atmel megaAVR 3 Cycles
Atmel SAM D10 3 Cycles
Cypress PSoC 4000S 11 Cycles
Freescale KE04 3 Cycles
Freescale KL03 4 Cycles
Holtek HT66 16 Cycles
Infineon XMC1100 5 Cycles
Microchip PIC16 20 Cycles
Microchip PIC24 10 Cycles
Microchip PIC32MM 3 Cycles
Nuvoton N76 7 Cycles
Nuvoton M0 8 Cycles
NXP LPC811 4 Cycles
Renesas RL78 5 Cycles
Sanyo LC87 18 Cycles
Silicon Labs EFM8 8 Cycles
ST STM8 4 Cycles
ST STM32F0 9 Cycles
STC STC8 4 Cycles
TI MSP430FR 7 Cycles
Atmel tinyAVR 53.73 kHz
Atmel megaAVR 123.27 kHz
Atmel SAM D10 1822.32 kHz
Cypress PSoC 4000S 885.69 kHz
Freescale KE04 1715.36 kHz
Freescale KL03 1645.24 kHz
Holtek HT66 2.71 kHz
Infineon XMC1100 805.84 kHz
Microchip PIC16 21.83 kHz
Microchip PIC24 838.46 kHz
Microchip PIC32MM 829.88 kHz
Nuvoton N76 38.79 kHz
Nuvoton M0 1732.07 kHz
NXP LPC811 157.17 kHz
Renesas RL78 731.68 kHz
Sanyo LC87 23.55 kHz
Silicon Labs EFM8 202.40 kHz
ST STM8 79.49 kHz
ST STM32F0 1647.79 kHz
STC STC8 156.56 kHz
TI MSP430FR 129.95 kHz
Atmel tinyAVR 3.34 mA
Atmel megaAVR 3.11 mA
Atmel SAM D10 5.09 mA
Cypress PSoC 4000S 3.66 mA
Freescale KE04 14.31 mA
Freescale KL03 5.21 mA
Holtek HT66 1.89 mA
Infineon XMC1100 4.24 mA
Microchip PIC16 3.61 mA
Microchip PIC24 10.74 mA
Microchip PIC32MM 7.65 mA
Nuvoton N76 3.64 mA
Nuvoton M0 19.16 mA
NXP LPC811 4.88 mA
Renesas RL78 3.79 mA
Sanyo LC87 6.51 mA
Silicon Labs EFM8 14.15 mA
ST STM8 6.45 mA
ST STM32F0 12.15 mA
STC STC8 11.87 mA
TI MSP430FR 2.41 mA
Atmel tinyAVR 69.01 nJ/Sample
Atmel megaAVR 83.26 nJ/Sample
Atmel SAM D10 9.22 nJ/Sample
Cypress PSoC 4000S 13.64 nJ/Sample
Freescale KE04 27.53 nJ/Sample
Freescale KL03 10.45 nJ/Sample
Holtek HT66 2304.95 nJ/Sample
Infineon XMC1100 17.36 nJ/Sample
Microchip PIC16 545.76 nJ/Sample
Microchip PIC24 42.27 nJ/Sample
Microchip PIC32MM 30.42 nJ/Sample
Nuvoton N76 309.68 nJ/Sample
Nuvoton M0 36.50 nJ/Sample
NXP LPC811 102.46 nJ/Sample
Renesas RL78 17.09 nJ/Sample
Sanyo LC87 912.36 nJ/Sample
Silicon Labs EFM8 176.20 nJ/Sample
ST STM8 267.76 nJ/Sample
ST STM32F0 24.33 nJ/Sample
STC STC8 192.24 nJ/Sample
TI MSP430FR 61.20 nJ/Sample

Keil C51 struggled to generate good code in the biquad experiment — the biggest problem being the 16-bit signed multiplication. Rather than producing raw assembly that operates on whichever registers end up with these variables, Keil generates function calls into a signed-16-bit multiply library routine. This has drastic performance implications when compared to the much-better AVR-GCC code. The STC8, with a 82% of its CISC instruction set being single-cycle, should have no problem outperforming the AVR MCUs (which have much simpler RISC instructions) — but the STC8 requires more than twice the number of clock cycles — 153 versus 63 — of the AVR architecture.

DMX-512 Receiver

Atmel tinyAVR 1430 microamps
Atmel megaAVR 1270 microamps
Atmel SAM D10 3410 microamps
Cypress PSoC 4000S 1030 microamps
Freescale KE04 3340 microamps
Freescale KL03 1340 microamps
Holtek HT66 568 microamps
Infineon XMC1100 1390 microamps
Microchip PIC16 744 microamps
Microchip PIC24 667 microamps
Microchip PIC32MM 493 microamps
Nuvoton N76 1750 microamps
Nuvoton M0 1970 microamps
NXP LPC811 1970 microamps
Renesas RL78 516 microamps
Sanyo LC87 6420 microamps
Silicon Labs EFM8 607 microamps
ST STM8 1620 microamps
ST STM32F0 769 microamps
STC STC8 5470 microamps
TI MSP430FR 279 microamps
Atmel tinyAVR 33 Cycles
Atmel megaAVR 34 Cycles
Atmel SAM D10 391 Cycles
Cypress PSoC 4000S 35 Cycles
Freescale KE04 0 Cycles
Freescale KL03 0 Cycles
Holtek HT66 73 Cycles
Infineon XMC1100 16 Cycles
Microchip PIC16 35 Cycles
Microchip PIC24 22 Cycles
Microchip PIC32MM 62 Cycles
Nuvoton N76 0 Cycles
Nuvoton M0 35 Cycles
NXP LPC811 45 Cycles
Renesas RL78 25 Cycles
Sanyo LC87 28 Cycles
Silicon Labs EFM8 26 Cycles
ST STM8 0 Cycles
ST STM32F0 24 Cycles
STC STC8 6 Cycles
TI MSP430FR 16 Cycles
Atmel tinyAVR 36 kHz
Atmel megaAVR 33 kHz
Atmel SAM D10 86 kHz
Cypress PSoC 4000S 89 kHz
Freescale KE04 0 kHz
Freescale KL03 0 kHz
Holtek HT66 166 kHz
Infineon XMC1100 56 kHz
Microchip PIC16 35 kHz
Microchip PIC24 22 kHz
Microchip PIC32MM 29 kHz
Nuvoton N76 0 kHz
Nuvoton M0 54 kHz
NXP LPC811 115 kHz
Renesas RL78 15 kHz
Sanyo LC87 131 kHz
Silicon Labs EFM8 59 kHz
ST STM8 0 kHz
ST STM32F0 58 kHz
STC STC8 15 kHz
TI MSP430FR 56 kHz
Atmel tinyAVR 69 Cycles
Atmel megaAVR 67 Cycles
Atmel SAM D10 477 Cycles
Cypress PSoC 4000S 124 Cycles
Freescale KE04 0 Cycles
Freescale KL03 0 Cycles
Holtek HT66 239 Cycles
Infineon XMC1100 72 Cycles
Microchip PIC16 205 Cycles
Microchip PIC24 53 Cycles
Microchip PIC32MM 91 Cycles
Nuvoton N76 0 Cycles
Nuvoton M0 89 Cycles
NXP LPC811 159 Cycles
Renesas RL78 40 Cycles
Sanyo LC87 159 Cycles
Silicon Labs EFM8 85 Cycles
ST STM8 0 Cycles
ST STM32F0 83 Cycles
STC STC8 21 Cycles
TI MSP430FR 73 Cycles
Atmel tinyAVR 962 Bytes
Atmel megaAVR 700 Bytes
Atmel SAM D10 3720 Bytes
Cypress PSoC 4000S 2280 Bytes
Freescale KE04 2788 Bytes
Freescale KL03 5028 Bytes
Holtek HT66 226 Bytes
Infineon XMC1100 4868 Bytes
Microchip PIC16 372 Bytes
Microchip PIC24 618 Bytes
Microchip PIC32MM 3904 Bytes
Nuvoton N76 0 Bytes
Nuvoton M0 5876 Bytes
NXP LPC811 2828 Bytes
Renesas RL78 1118 Bytes
Sanyo LC87 0 Bytes
Silicon Labs EFM8 534 Bytes
ST STM8 1903 Bytes
ST STM32F0 3544 Bytes
STC STC8 376 Bytes
TI MSP430FR 1076 Bytes
Atmel tinyAVR 6%
Atmel megaAVR 4%
Atmel SAM D10 23%
Cypress PSoC 4000S 14%
Freescale KE04 35%
Freescale KL03 63%
Holtek HT66 3%
Infineon XMC1100 61%
Microchip PIC16 3%
Microchip PIC24 15%
Microchip PIC32MM 12%
Nuvoton N76 0%
Nuvoton M0 49%
NXP LPC811 35%
Renesas RL78 11%
Sanyo LC87 0%
Silicon Labs EFM8 3%
ST STM8 6%
ST STM32F0 22%
TI MSP430FR 29%

The DMX-512 Receiver test is useful for evaluating tons of different things about the MCU’s core, peripherals, code generator tools, and peripheral libraries.

Code generation tools

Many of the development environments tested have code-gen tools either integrated directly into the IDE (MPLAB X, Simplicity Studio, DAVE, PSoC Creator, Kinetis Design Studio), or have stand-alone tools (STM32CubeMX, Atmel START, STC-ISP).

Processor Expert uses a component-oriented model with dependency resolution; for example, two high-level “PWM” component instances will share a common “Timer Unit” low-level component (as they will end up on different output-compare channels of that timer unit).

High-level components implement conceptual functionality, not peripheral functions. For example, if you wanted a function to execute every 25 ms, you would add a “TimerInt” component, and set the interval to 25 ms. Processor Expert will figure out which timer to use (FlexTimer, LPTimer, PIT, etc), route the clock appropriately, and calculate the necessary period register values, enable interrupts, generate an interrupt handler that takes care of any bits you need to set or clear.

If you understand the general use of microcontroller peripherals, that’s essentially all you need to know to program a microcontroller if you’re in Processor Expert. The AsyncSerial (UART) component lets you specify an RX and TX buffer size; Processor Expert generates callbacks alerting you to a character being received, as well as when the buffer is full. You can simply copy the data out of the buffer without thinking for a moment about testing and clearing interrupt flags, configuring UART FIFOs, or fussing with DMA.

Unlike some code-gen tools, like Silicon Labs’ Simplicity Configurator, Processor Expert generates initialization/interrupt callbacks as well as runtime libraries. Unlike code-gen tools like Infineon DAVE CE, which generates code that calls into standard runtime peripheral libraries, Processor Expert generates specific API calls on request, with any initialization values pre-calculated as constants. As an example, while some code-gen tools will generate a UART module that calls a standard runtime initialization routine with a human-readable baud rate, Processor Expert will generate code to directly write the correct baud rate generator values to the appropriate registers.

A lot of code-gen tools suffer from a loss of generality that makes them work well in toy example cases, but become useless when deployed in real application scenarios — especially for projects that have low-power requirements, or have dynamic pin-muxing in use. However, Processor Expert supports essentially anything imaginable — multiple system configurations allow you to shift between different clock and run modes, and components can be explicitly configured to share pins with other components. If worse comes to worst, and you’re stuck with a component that needs to behave slightly differently, you can always disable re-generation at the individual component granularity — allowing you to modify a component to your liking.

While this all makes Processor Expert sound extremely fast and optimized, performance is actually somewhat mixed. On large processors with DMA, auto-scanning ADCs, FIFOs, and other advanced features, Processor Expert will transparently use these, which can provide a huge performance boost over runtime peripheral libraries that don’t always expose complex functionality like this to the end user.

Unfortunately, some of the generated code simply makes you shake your head: it takes 40 cycles to toggle a GPIO pin, since the high-level “Bit” component calls into a low-level GPIO component, passing it a configuration structure and other unused parameters. Nothing is inlined, and even compiling with -O3 won’t eliminate these function calls. Of course, for performance-critical GPIO calls, it’s easy enough to use direct register calls, but for most use-cases, it’s hard to replace PE code with optimized application-specific routines — especially when interrupts are involved.

It’s tough to compare code size, but Processor Expert does seem to generate more efficient code than using runtime peripheral libraries (even when link-time optimization is enabled in the latter cases). In my testing, Processor Expert generated peripheral code that was close to half the size of the runtime peripheral library code for other MCUs (including Kinetis SDK on the KL03). Only PSoC Creator seems to generate more optimized code.

The 500-lb gorilla in the room, however, is this: if you’re new to Processor Expert, I think the first thing you’ll notice is how incredibly slow it is to use. I don’t mean “complicated” or “intricate” — I mean that even on my 4.5 GHz 12-thread desktop, creating instances of components, switching views, changing values, generating code, and building projects takes forever. The entire system is single-threaded, and every time a property is changed, everything has to be re-evaluated. I’m not sure it could be made faster — because of how flexible Processor Expert is, almost everything has a huge dependency graph; and because almost everything is automated, the whole system has to solve for the proper register values from a near-infinite possible selection.

Having said all that, in my testing, it was “fast enough” to not be completely frustrating to use, and it’s one of the only development environments I tested where I was able to literally complete an entire project without even glancing at a datasheet for the microcontroller. That would be impressive enough on an 8-bitter, but on a modern Cortex-M0+ ARM microcontroller, with complex (“flexible”) peripherals, it’s downright incredible. It is, by far, the most complete, flexible code-generator tool I’ve ever used, too.

So who is it for? Honestly, peripheral configuration and bring-up is generally a drop in the bucket when compared to the time required to implement an entire commercial embedded project — but if you’re working on tiny projects (either hobbyist stuff, or simple proof-of-concept engineering demos), having a tool like Procesosr Expert around can get things working much more quickly than using runtime peripheral libraries; especially when you’re new to the Kinetis ecosystem, ARM microcontrollers, or MCU programming in general.


I think there was a large contingent of hobbyists in the 1990s and 2000s who were burned by cheap-but-not-free tools ($199 IDEs, $99 compilers, etc) that don’t work well, crash often, and quickly disappear from the market without a trace. However, Keil, IAR, CrossWorks, and Cosmic have been around for 30 years, and aren’t going anywhere quickly. I’ve used all of these products in my testing, and really haven’t had any issues with them.

Making broad generalizations, my testing has tended to agree with the community: GCC is really good at “general computing stuff” — generating good math code, supporting recent C standards, and outputting nicely-optimized code (link-time optimization really helped with this).

But the GCC backends for the 8-bit architectures I tested (the megaAVR, tinyAVR, the MSP430, and the RL78) struggled to produce sensible register operations without optimization enabled. On one architecture, the RL-87, GCC was never able to produce reasonable code — regardless of the optimization — so I abandoned it for the vendor’s proprietary compiler. And even with optimizations on, GCC often produced large interrupt preambles.

Footnotes   [ + ]

1.To get technical: I purchased several different MCUs — all less than a $1 — from a wide variety of brands and distributors. I’m sure people will chime in and either claim that a part is more than a dollar, or that I should have used another part which can be had for less than a dollar. I used a price-break of 100 units when determining pricing, and I looked at typical, general suppliers I personally use when shopping for parts — I avoided eBay/AliExpress/Taobao, unless they were the only source for the parts, which is common for devices most popular in China and Taiwan.
2.Formerly ARM, but as of August 1, 2017, “Arm” is the capitalization style they now use.
3.Advanced Microcontroller Bus Architecture — these multi-level acronyms are getting tedious

Leave a comment