As an embedded design consultant, I have a diverse array of projects on my desk that have wildly different requirements, and end up targeting vastly different microcontroller architectures. At the same time, we all have our go-to chips — those parts that linger in our toolkit after being picked up in school, through forum posts, or from previous projects.
In 2017, we saw several new MCUs hit the market, as well as general trends continuing in the industry: the migration to open-source, cross-platform development environments and toolchains; new code-generator tools that integrate seamlessly (or not so seamlessly…) into IDEs; and, most notably, the continued invasion of ARM Cortex-M0+ parts into the 8-bit space.
I wanted to take a quick pulse of the industry to see where everything is — and what I’ve been missing while backed into my corner of DigiKey’s web site.
It’s time for a good ol’ microcontroller shoot-out.
While some projects that come across my desk are complex enough to require a hundreds-of-MHz microcontroller with all the bells and whistles, it’s amazing how many projects work great using nothing more than a $1 chip — so this is the only rule I established for the shoot-out. 1To get technical: I purchased several different MCUs — all less than a $1 — from a wide variety of brands and distributors. I’m sure people will chime in and either claim that a part is more than a dollar, or that I should have used another part which can be had for less than a dollar. I used a price-break of 100 units when determining pricing, and I looked at typical, general suppliers I personally use when shopping for parts — I avoided eBay/AliExpress/Taobao, unless they were the only source for the parts, which is common for devices most popular in China and Taiwan.
I wanted to explore the $1 pricing zone specifically because it’s the least amount of money you can spend on an MCU that’s still general-purpose enough to be widely useful in a diverse array of projects.
Any cheaper, and you end up with 6- or 8-pin parts with only a few dozen bytes of RAM, no ADC, nor any peripherals other than a single timer and some GPIO.
Any more expensive, and the field completely opens up to an overwhelming number of parts — all with heavily-specialized peripherals and connectivity options.
These MCUs were selected to represent their entire families — or sub-families, depending on the architecture — and in my analysis, I’ll offer some information about the family as a whole.
If you want to scroll down and find out who the winner is, don’t bother — there’s really no sense in trying to declare the “king of $1 MCUs” as everyone knows the best microcontroller is the one that best matches your application needs. I mean, everyone knows the best microcontroller is the one you already know how to use. No, wait — the best microcontroller is definitely the one that is easiest to prototype with. Or maybe that has the lowest impact on BOM pricing?
I can’t even decide on the criteria for the best microcontroller — let alone crown a winner.
What I will do, however, is offer a ton of different recommendations for different users at the end. Read on!
This will not be another Atmel-vs-Microchip opinion piece. Instead, this review will have both qualitative and quantitative assessments. Overall, I’ll be looking at a few different categories:
Parametrics, Packaging, and Peripherals
Within a particular family, what is the range of core speed? Memory? Peripherals? Price? Package options?
Some microcontroller families are huge — with hundreds of different models that you can select from to find the perfect MCU for your application. Some families are much smaller, which means you’re essentially going to pay for peripherals, memory, or features you don’t need. But these have an economies-of-scale effect; if we only have to produce five different MCU models, we’ll be able to make a lot more of each of them, driving down the price. How do different MCU families end up on that spectrum?
Package availability is another huge factor to consider. A professional electronics engineer working on a wearable consumer product might be looking for a wafer-level CSP package that’s less than 2×2 mm in size. A hobbyist who is uncomfortable with surface-mount soldering may be looking for a legacy DIP package that can be used with breadboards and solder protoboards. Different manufacturers choose packaging options carefully, so before you dive into an architecture for a project, one of the first things to consider is making sure that it’s in a package you actually want to deal with.
Peripherals can vary widely from architecture to architecture. Some MCUs have extremely powerful peripherals with multiple interrupt channels, DMA, internal clock generators, tons of power configuration control, and various clocking options. Others are incredibly simple — almost basic. Just as before, different people will be looking for different things (even for different applications). It would be a massive undertaking to go over every single peripheral on these MCUs, but I’ll focus on the ones that all MCUs have in common, and point out fine-print “gotchas” that datasheets always seem to glance over.
While this is where things get subjective and opinion-oriented, I’ll attempt to present “just the facts” and let you decide what you care about. The main source of subjectivity comes from weighing these facts appropriately, which I will not attempt to do.
IDEs / SDKs / Compilers: What is the manufacturer-suggested IDE for developing code on the MCU? Are there other options? What compilers does the microcontroller support? Is the software cross-platform? How much does it cost? These are the sorts of things I’ll be exploring while evaluating the software for the MCU architecture.
Platform functionality and features will vary a lot by architecture, but I’ll look at basic project management, source-code editor quality, initialization code-generation tools, run-time peripheral libraries, debugging experience, and documentation accessibility.
I’ll focus on manufacturer-provided or manufacturer-suggested IDEs and compilers (and these will be what I use to benchmark the MCU). There’s more than a dozen compilers / IDEs available for many of these architectures, so I can’t reasonably review all of them. Feel free to express your contempt of my methodology in the comments section.
Programmers / debuggers / emulators / dev boards: What dev boards and debuggers are available for the ecosystem? How clunky is the debugging experience? Every company has a slightly different philosophy for development boards and debuggers, so this will be interesting to compare.
I’ve established three different code samples I’ll be using to benchmark the parts; I’ll be measuring quantitative parameters like benchmark speed, clock-cycle efficiency, power efficiency, and code-size efficiency.
For this test, I’ll toggle a pin in a
while() loop. I’ll use the fastest C code possible — while also reporting if the code generator tool or peripheral libraries were able to produce efficient code. I’ll use bitwise complement or GPIO-specific “toggle” registers if the platform supports it, otherwise, I’ll resort to a read-modify-write operation. I’ll report on which instructions were executed, and the number of cycles they took.
What this tests: This gives some good intuition for how the platform works — because it is so low-level, we’ll be able to easily look at the assembly code and individual instruction timing. Since many of these parts operate above flash read speed, this will allow us to see what’s going on with the flash read accelerator (if one exists), and as a general diagnostic / debugging tool for getting the platform up and running at the proper speed. This routine will obviously also test bit manipulation performance (though this is rarely important in general-purpose projects).
64-Sample Biquad Filter
This is an example of a real-world application where you often need good, real-time performance, so I thought it would be a perfect test to evaluate the raw processing power of each microcontroller. For this test, I’ll process 16-bit signed integer data through a 400 Hz second-order high-pass filter (Transposed Direct Form II implementation, for those playing along at home), assuming a sample rate of 8 kHz. We won’t actually sample the data from an ADC — instead, we’ll process 64-element arrays of dummy input data, and record the time it takes to complete this process (by wiggling a GPIO pin run into my 500 MHz Saleae Logic Pro 16 logic analyzer).
In addition to the samples-per-second measure of raw processing power, we’ll also measure power consumption, which will give us a “nanojoule-per-sample” measure; this will help you figure out how efficient a processor is.
What this tests: Memory and 16-bit math performance per microamp, essentially. The 8-bit MCUs in our round up are going to struggle with this pretty hardcore — it’ll be interesting to see just how much better the 16 and 32-bit MCUs do. Like it or hate it, this will also evaluate a compiler’s optimization abilities, since different compilers implement math routines quite differently.
DMX-512 RGB light
DMX-512 is a commonly-used lighting protocol for stage, club, and commercial lighting systems. Electrically, it uses RS-485; the protocol uses a long BREAK message at the beginning, followed by a “0” and then 511 bytes of data, transmitted at 250 kbaud. In this test, I’ll implement a DMX-512 receiver that directly drives a common-anode RGB LED. I will do this with whatever peripheral library or code-generator tool the manufacturer provides (if any at all).
While you should really look for a precisely-timed break message, since this is only for prototyping, I’ll detect the start-of-frame by looking for an RX framing error (or a “break” signal, as some UARTs support LIN).
I’ll minimize power consumption by lowering the frequency of the CPU as much as possible, using interrupt-based UART receiver routines, and halting or sleeping the CPU. I’ll report the average power consumption (with the LED removed from the circuit, of course). To get a rough idea of the completeness and quality of the code-generator tools or peripheral libraries, I’ll report the total number of statements I had to write, as well as the flash usage.
What this tests: This is a sort of holistic test that lets me get into the ecosystem and play around with the platform. This stuff is the bread and butter of embedded programming: interrupt-based UART reception with a little bit of flair (framing error detection), multi-channel PWM configuration, and nearly-always-halted state-machine-style CPU programming. Once you have your hardware set up and you know what you’re doing (say, after you’ve implemented this on a dozen MCUs before…), with a good code-gen tool or peripheral library, you should be able to program this with just a few lines of code in an hour or less — hopefully without having to hit the datasheet much at all.
Test notes: I’m using FreeStyler to generate DMX messages through an FTDI USB-to-serial converter (the program uses the Enttec Open DMX plugin to do this). As my FTDI cable is 5V, I put a 1k resistor with a 3.3V zener diode to to ground, which clamps the signal to 3.3V. The zener clamp isn’t there to protect the MCU — all of these chips tested have diode catches to protect from over-voltage — but rather, so that the MCU doesn’t inadvertently draw power from the Rx pin, which would ruin my current measurement results.
This page will compare the devices, development tools, and IDEs all together. However, to prevent this article from getting overwhelmingly long, I’ve created review pages for each device that cover way more details about the architecture — along with more complete testing notes, different benchmarking variations, and in-depth assessment. If an architecture strikes your interest, you should definitely check out the full review below.
Microcontrollers continue to divide into two camps — those with vendor-specific core architectures, and those who use a third-party core design. Out of the 21 microcontrollers reviewed here, eight of them use a 32-bit ARM core, which is becoming ubiquitous in the industry — even at this price point. Three of the microcontrollers use an 8-bit 8051-compatible ISA. The remaining ten use the vendor’s proprietary core design: six are 8-bit parts, three are 16-bit parts, and the PIC32MM is the sole 32-bit part that doesn’t use an ARM core.
The Arm Cortex-M0 2Formerly ARM, but as of August 1, 2017, “Arm” is the capitalization style they now use.is a 32-bit RISC architecture that serves as the entry-level Arm architecture available to silicon vendors for microcontroller applications. Arm cores are designed by Arm Holdings and licensed to semiconductor manufacturers for integration into their products.
Arm started out as a personal computer microprocessor when Advanced RISC Machines formed a joint venture between Acorn, Apple, and VLSI Technology to manufacture 32-bit processors for the Acorn computer. While Arm cores have grown in popularity as microprocessors for battery-powered systems (they are almost certainly powering your smartphone), Arm moved into the microcontroller sphere as well — the ARM7TDMI-S was probably the first Arm core that was used in microcontrollers — i.e., processors with completely self-contained RAM, flash, and peripherals. The Atmel AT91 and ST STR7 were probably the first microcontroller parts designed with an Arm core.
It’s important to understand the history of Arm, because it explains a serious feature of Arm microcontrollers that differs substantially from the 8051 (the other multi-vendor architecture that dominates the field): Unlike the 8051, Arm is just a core, not a complete microcontroller.
The ARM7TDMI-S didn’t come with any GPIO designs, or provisions for UARTs or ADCs or timers — it was designed as a microprocessor. Thus, as vendors started stuffing this core into their extremely high-end MCUs, they had to add in their vendor-specific peripherals to the AHB (AMBA3Advanced Microcontroller Bus Architecture — these multi-level acronyms are getting tediousHigh-performance Bus).
Consequently, Freescale used a lot of HC08 and ColdFire peripherals; while Atmel designed new peripherals from scratch. ST borrowed a bit from the ST7 (the precursor to the STM8), but used new designs for timers and communications peripherals.
Since many microcontroller projects spend 90% or more of the code base manipulating peripherals, this is a serious consideration when switching from one Arm MCU vendor to another: there’s absolutely zero peripheral compatibility between vendors, and even within a single vendor, their Arm parts can have wildly different peripherals.
Unlike other Arm parts, the M0 series only supports a subset of the 16-bit Thumb instruction set, which allows it to be about 1/3 the size of a Cortex-M3 core. Still, there’s a full 32-bit ALU, with a 32-bit hardware multiplier supporting a 32-bit result. Arm provides the option of either a single-cycle multiply, or a 32-cycle multiply instruction, but in my browsing, it seems as though most vendors use the single-cycle multiply option.
In addition to the normal CPU registers, Arm cores have 13 general-purpose working registers, which is about the sweet spot. The core has a nested vector interrupt controller, with up to 32 interrupt vectors and 4 interrupt priorities — plenty when compared to the 8-bit competition, but a far cry from the 240 interrupts at 256 interrupt priorities that the larger Arm parts support. The core also has full support of runtime exceptions, which isn’t a feature found on 8-bit architectures.
The M0+ is an improved version of the M0 that supports faster two-cycle branches (due to the pipeline going from three-stage to two-stage), and lower power consumption. There are a slew of silicon options that vendors can choose from: single-cycle GPIO, support for a simple instruction trace buffer called Micro Trace Buffer (MTB), vector table relocation, and a rudimentary memory protection unit (MPU).
One of the biggest problems with ARM microcontrollers is their low code density for anything other than 16- and 32-bit math — even those that use the 16-bit Thumb instruction set. This means normal microcontroller type routines — shoving bytes out a communication port, wiggling bits around, performing software ADC conversions, and updating timers — can take a lot of code space on these parts. Exacerbating this problem is the peripherals, which tend to be more complex — I mean “flexible” — than 8-bit parts, often necessitating run-time peripheral libraries and tons of register manipulation.
Another problem with ARM processors is the severe 12-cycle interrupt latency. When coupled with the large number of registers that are saved and restored in the prologue and epilogue of the ISR handlers, these cycles start to add up. ISR latency is one area where a 16 MHz 8-bit part can easily beat a 72 MHz 32-bit Arm microcontroller.
8051 Instruction Timing
- Nuvoton N76
- Nuvoton N76
The 8051 was originally an Intel microcontroller introduced in 1980 as one of the first widely-deployed 8-bit parts. It was built from the ground up for
The 8051 was originally an Intel microcontroller introduced in 1980 as one of the first widely-deployed 8-bit parts. It was built from the ground up for
Use the tabs below to compare precise specs across families.
The chart above illustrates the differences in core clock speed among each MCU. As will be seen in the evaluation section, core clock speed is not a good predictor of performance when comparing between different MCU families (especially between 8-, 16-, and 32-bit parts). However, most MCUs limit the maximum peripheral clock rate to that of the CPU, which may be a driving factor if your application requires fast peripheral clocks (say, for fast GPIO bit-banging or for high-speed capture/compare timer operations). The Infineon XMC1100 is a neat exception to this rule — its peripheral clock can run at up to 64 MHz.
There are other important asterisks to this data: the Atmel tinyAVR and megaAVR parts have severely limited operating ranges when running below 5V, which will affect most modern designs. The tinyAVR can only run at 10 MHz below 3.6V, and at 5 MHz below 2.2V. The megaAVR has the same speed grades, but even worse, has nothing faster than an 8 MHz internal oscillator. When talking about sub-$1 MCUs, adding a crystal or even low-cost ceramic resonator adds a sizable portion of the cost of the MCU to the BOM.
The Silicon Labs EFM8 Laser Bee, with its 72 MHz core clock speed, beats out even the ARM microcontrollers in this round-up. The Sanyo LC87 brings in a 12 MHz reading — but bear in mind this is a 3T architecture, which limits the actual instruction clock speed to 4 MHz. The Holtek HT66 and Microchip PIC16 are both 4T architectures, but the PIC16 has a relatively snappy 32 MHz core speed (thanks to its on-board PLL), which allows it to compete better with 8 MHz parts.
Here, we consider flash capacity in terms of bytes — but be aware that flash usage varies considerably by core. The PIC16 may have 14 KB of flash, but being a 14-bit-wide core, it will feel more like it has 8 KB of flash when it comes time to actually storing and manipulating data, and counting instructions. The Holtek HT66 is similar to the PIC16, but with a 16-bit-wide fetch. The same goes for the PIC24, which has a 24-bit-wide flash data path. While other cores may have a wider-than-a-byte fetch size, they still allow instructions and data to be packed into flash efficiently.
The STC8’s insane equipment list starts to shine — with 64 KB of flash, this part should be able to do essentially anything you throw at its space-efficient 8051 core. On the other side of the spectrum, the ARM processors were especially stingy with flash capacity, which is important to consider for applications that rely on a lot of peripheral runtime libraries: as we’ll see in the performance evaluation, many of these parts had little room left over after the test code was programmed onto them. The Microchip PIC24’s 4 KB flash capacity makes the part essentially unusable for general-purpose applications that would usually target a 16-bit controller, as it can only hold 1408 instructions in flash (due to its 24-bit-wide word fetches).
Infineon’s memory configuration is actually “backwards” — it has 8 KB of flash, and 16 KB of RAM. It doesn’t have a good flash pre-fetch engine, so performance-critical code should be moved to RAM for fast execution. While you’re at it, unless you’ve got some large data capturing/analyzing procedures, you could just move your entire program into RAM, too.
The KE04 (and probably the PIC24) both have too little RAM for most projects that would target these architectures. I’m actually less worried about the Holtek HT66; it’s efficient peripheral library essentially uses no RAM, and 256 bytes of user data is plenty of space for an ultra-low-power MCU that’s designed for only the most basic duties.
Each MCU’s review page discusses its complement of timers, but I wanted to come up with a score to help quickly compare timer peripherals across MCUs — I call it TimerMark.
Here’s how it works:
Basic Timer Blocks
- 1 point for 8-bit counters
- 2 points for 8-bit auto-reload (period) timers
- 2 points for 16-bit counters
- 4 points for 16-bit auto-reload timers
- 6 points for 24-bit auto-reload timers
- 8 points for 32-bit auto-reload timers
- 2 points for RTC
- 2 points for each 16-bit timer set that can be extended to 32-bit
Capture Channels & PWM
- 1 point for each capture channel (regardless of depth)
- 1 point for each 8-bit PWM channel
- 2 points for each 16-bit PWM channel
- 3 points for each 24-bit PWM channel
- 3 points for each arbitrary-phase PWM channel (i.e., if the channel has separate ON and OFF comparator registers
- 2 points if the PWM module is “power friendly” (multi-channel outputs with programmable dead-time, blanking, etc)
The chart above illustrates the total number of communications resources each microcontroller has.
Most vendors use a separate, dedicated module for UART, SPI, and I2C communication. Nuvoton’s M0 went a step further and doubled up everything — two UARTs, two SPI, and two I2C. Except under heavily-loaded buses, or under extremely strange designs, there’s rarely a need for multiple SPI or I2C buses, so this decision struck me as a bit odd.
STC‘s decision to incorporate four separate UARTs in addition to an SPI and an I2C module seems to be more useful, since UARTs are almost always dedicated to a particular peripheral. Doubling-up on UARTs was a popular decision overall: the other 8051s — the EFM8 and the N76 — bring a second UART, as do the PIC16 and PIC32MM, along with the LPC811.
I also like the flexibility that Atmel, Cypress, Infineon, and Renesas have — they use arrayed “serial units,” each can morph into a UART, SPI, or I2C peripheral. Cypress and Infineon have two; Atmel and Renesas have three (plus a fourth I2C-slave-only interface for Renesas).
The odd ones out were the PIC32MM, the LC87, and the MSP430: none of these MCUs have an I2C peripheral — a curious omission in an era of SMBus-complaint digital sensors everywhere. And unlike the 8051’s quasi-bidirectional GPIO configuration, none of these MCUs are well-suited to bit-banging I2C (though it’s obviously possible by switching data-direction).
All of these
The 1-series appears to use the full megaAVR core, which includes a hardware multiplier. Interestingly, while it is considerably cheaper than the ATmega168pb, in many ways, it’s a better microcontroller. It has a two-level interrupt controller, the same core, and twice as much flash.
While many vendors have transitioned to Eclipse-based IDEs, Atmel went with a Visual Studio Isolated Shell-based platform starting with AVR Studio 5. I do a ton of .NET and desktop-based C++ development, so I expected to feel right at home in Atmel Studio when I first launched it. Unfortunately, Microsoft calls this product “Visual Studio Isolated Shell” for a reason — it’s simply the shell of Visual Studio, without any of the meat. The excellent IntelliSense engine that Microsoft spent years perfecting has been replaced by some sort of Atmel-proprietary “Visual Assist” technology that struggles to identify global variables, evaluate pre-processor definitions, or perform refactoring of items defined outside of the current file. The Toolchain editor is a near-clone of the Eclipse CDT one (no reason to reinvent the wheel), but it’s missing checkboxes and inputs for commonly-used compiler and linker options; one stunning omission is link-time optimization, which even when manually-specified as command parameters, doesn’t seem to work — odd, since Atmel is using a recent version of GCC.
My biggest issue with Atmel Studio is how incredibly buggy and unstable it has been every time I’ve used it in the last two years. I’m not referring to a specific installation on a specific computer: rather, every single time I’ve installed the software, I’ve fought with AVR Dragon drivers, a bad DLL file in the installer, programmer firmware issues, or, most recently, the software popping up the “Waiting for an operation to complete” message that prevents me from debugging any Atmel product without restarting my computer. Look, I get it: embedded firmware development is a highly-specialized task, so maintaining software that works reliably for such a small user-base can be challenging. Yet, every other vendor tools tested worked nearly flawlessly.
Keil C51 struggled to generate good code in the biquad experiment — the biggest problem being the 16-bit signed multiplication. Rather than producing raw assembly that operates on whichever registers end up with these variables, Keil generates function calls into a signed-16-bit multiply library routine. This has drastic performance implications when compared to the much-better AVR-GCC code. The STC8, with a 82% of its CISC instruction set being single-cycle, should have no problem outperforming the AVR MCUs (which have much simpler RISC instructions) — but the STC8 requires more than twice the number of clock cycles — 153 versus 63 — of the AVR architecture.
The DMX-512 Receiver test is useful for evaluating tons of different things about the MCU’s core, peripherals, code generator tools, and peripheral libraries.
Code generation tools
Many of the development environments tested have code-gen tools either integrated directly into the IDE (MPLAB X, Simplicity Studio, DAVE, PSoC Creator, Kinetis Design Studio), or have stand-alone tools (STM32CubeMX, Atmel START, STC-ISP).
Processor Expert uses a component-oriented model with dependency resolution; for example, two high-level “PWM” component instances will share a common “Timer Unit” low-level component (as they will end up on different output-compare channels of that timer unit).
High-level components implement conceptual functionality, not peripheral functions. For example, if you wanted a function to execute every 25 ms, you would add a “TimerInt” component, and set the interval to 25 ms. Processor Expert will figure out which timer to use (FlexTimer, LPTimer, PIT, etc), route the clock appropriately, and calculate the necessary period register values, enable interrupts, generate an interrupt handler that takes care of any bits you need to set or clear.
If you understand the general use of microcontroller peripherals, that’s essentially all you need to know to program a microcontroller if you’re in Processor Expert. The AsyncSerial (UART) component lets you specify an RX and TX buffer size; Processor Expert generates callbacks alerting you to a character being received, as well as when the buffer is full. You can simply copy the data out of the buffer without thinking for a moment about testing and clearing interrupt flags, configuring UART FIFOs, or fussing with DMA.
Unlike some code-gen tools, like Silicon Labs’ Simplicity Configurator, Processor Expert generates initialization/interrupt callbacks as well as runtime libraries. Unlike code-gen tools like Infineon DAVE CE, which generates code that calls into standard runtime peripheral libraries, Processor Expert generates specific API calls on request, with any initialization values pre-calculated as constants. As an example, while some code-gen tools will generate a UART module that calls a standard runtime initialization routine with a human-readable baud rate, Processor Expert will generate code to directly write the correct baud rate generator values to the appropriate registers.
A lot of code-gen tools suffer from a loss of generality that makes them work well in toy example cases, but become useless when deployed in real application scenarios — especially for projects that have low-power requirements, or have dynamic pin-muxing in use. However, Processor Expert supports essentially anything imaginable — multiple system configurations allow you to shift between different clock and run modes, and components can be explicitly configured to share pins with other components. If worse comes to worst, and you’re stuck with a component that needs to behave slightly differently, you can always disable re-generation at the individual component granularity — allowing you to modify a component to your liking.
While this all makes Processor Expert sound extremely fast and optimized, performance is actually somewhat mixed. On large processors with DMA, auto-scanning ADCs, FIFOs, and other advanced features, Processor Expert will transparently use these, which can provide a huge performance boost over runtime peripheral libraries that don’t always expose complex functionality like this to the end user.
Unfortunately, some of the generated code simply makes you shake your head: it takes 40 cycles to toggle a GPIO pin, since the high-level “Bit” component calls into a low-level GPIO component, passing it a configuration structure and other unused parameters. Nothing is inlined, and even compiling with -O3 won’t eliminate these function calls. Of course, for performance-critical GPIO calls, it’s easy enough to use direct register calls, but for most use-cases, it’s hard to replace PE code with optimized application-specific routines — especially when interrupts are involved.
It’s tough to compare code size, but Processor Expert does seem to generate more efficient code than using runtime peripheral libraries (even when link-time optimization is enabled in the latter cases). In my testing, Processor Expert generated peripheral code that was close to half the size of the runtime peripheral library code for other MCUs (including Kinetis SDK on the KL03). Only PSoC Creator seems to generate more optimized code.
The 500-lb gorilla in the room, however, is this: if you’re new to Processor Expert, I think the first thing you’ll notice is how incredibly slow it is to use. I don’t mean “complicated” or “intricate” — I mean that even on my 4.5 GHz 12-thread desktop, creating instances of components, switching views, changing values, generating code, and building projects takes forever. The entire system is single-threaded, and every time a property is changed, everything has to be re-evaluated. I’m not sure it could be made faster — because of how flexible Processor Expert is, almost everything has a huge dependency graph; and because almost everything is automated, the whole system has to solve for the proper register values from a near-infinite possible selection.
Having said all that, in my testing, it was “fast enough” to not be completely frustrating to use, and it’s one of the only development environments I tested where I was able to literally complete an entire project without even glancing at a datasheet for the microcontroller. That would be impressive enough on an 8-bitter, but on a modern Cortex-M0+ ARM microcontroller, with complex (“flexible”) peripherals, it’s downright incredible. It is, by far, the most complete, flexible code-generator tool I’ve ever used, too.
So who is it for? Honestly, peripheral configuration and bring-up is generally a drop in the bucket when compared to the time required to implement an entire commercial embedded project — but if you’re working on tiny projects (either hobbyist stuff, or simple proof-of-concept engineering demos), having a tool like Procesosr Expert around can get things working much more quickly than using runtime peripheral libraries; especially when you’re new to the Kinetis ecosystem, ARM microcontrollers, or MCU programming in general.
I think there was a large contingent of hobbyists in the 1990s and 2000s who were burned by cheap-but-not-free tools ($199 IDEs, $99 compilers, etc) that don’t work well, crash often, and quickly disappear from the market without a trace. However, Keil, IAR, CrossWorks, and Cosmic have been around for 30 years, and aren’t going anywhere quickly. I’ve used all of these products in my testing, and really haven’t had any issues with them.
Making broad generalizations, my testing has tended to agree with the community: GCC is really good at “general computing stuff” — generating good math code, supporting recent C standards, and outputting nicely-optimized code (link-time optimization really helped with this).
But the GCC backends for the 8-bit architectures I tested (the megaAVR, tinyAVR, the MSP430, and the RL78) struggled to produce sensible register operations without optimization enabled. On one architecture, the RL-87, GCC was never able to produce reasonable code — regardless of the optimization — so I abandoned it for the vendor’s proprietary compiler. And even with optimizations on, GCC often produced large interrupt preambles.
Footnotes [ + ]
|1.||↑||To get technical: I purchased several different MCUs — all less than a $1 — from a wide variety of brands and distributors. I’m sure people will chime in and either claim that a part is more than a dollar, or that I should have used another part which can be had for less than a dollar. I used a price-break of 100 units when determining pricing, and I looked at typical, general suppliers I personally use when shopping for parts — I avoided eBay/AliExpress/Taobao, unless they were the only source for the parts, which is common for devices most popular in China and Taiwan.|
|2.||↑||Formerly ARM, but as of August 1, 2017, “Arm” is the capitalization style they now use.|
|3.||↑||Advanced Microcontroller Bus Architecture — these multi-level acronyms are getting tedious|