What's up with these 3-cent microcontrollers?
(A review of the Padauk PMS150C and friends)

Full disclosure: Padauk sent over a programmer plus two ICEs for me to use in this review, so long as I covered shipping. In hindsight (after seeing my UPS bill that month!), it would have been a lot cheaper to just buy the dev tools myself, so I can’t recommend ordering straight from them if you live in the United States, as they exclusively use UPS. I purchased several different Padauk MCUs from LCSC.com, which is not an official Padauk distributor.

The Padauk microcontrollers — especially the 3-cent PMS150 and newer PMS150C — are all the rage these days. I first heard of the part via a @mikelectricstuf tweet. Dave Jones of the EEVBlog did a brief (for him) video looking at the Padauk stuff, and then a pair of videos using the Padauk ICE and programming an OTP part. SDCC recently got support for the Padauk “pdk14” and “pdk15” architecture, there’s a free open-source programmer, and several blog posts (including a neat comparison of other low-cost parts) and forum threads have hit the Internet.

The parts, of course, date back much further. Padauk was founded in 2005 (and filed patents in 2005 and 2006, describing their Field Programmable Processor Array technology, which we’ll discuss later). I’m not sure the English-speaking world became aware until at least 5 years later: Hector Martin noticed the datasheets for the Padauk parts ripping sections out of the Microchip PIC datasheets way back in 2011, and Kyle Machulis did some early investigating into these parts used in the MyKeepon.

Some of the write-ups have been a little loose on technical details, and few seem to be interested in determining which sorts of applications these parts are suited for — mostly dismissing the parts as novelty items. I wanted to dive a bit deeper into the ecosystem and figure out where you should and shouldn’t use low-cost parts like these, so I’ve spent some time in the last few weeks exploring.

I built a couple cute projects with them — a NeoPixel SPI bridge adapter, as well as a small bike light controller. You can check out my project code on GitHub.

Padauk Ecosystem

The Padauk ecosystem comprises:

  • Different microcontrollers ranging from 6 to 26 I/O, with 512 to 4096 words of program memory and 64-256 bytes of RAM — all with the same 16 MHz main oscillator.
  • A proprietary, extremely light-weight IDE.
  • Various versions of programmers — each capable of programming all the Padauk devices.
  • Various versions of two ICEs — one for the single-FPPA parts, and one for the multi-FPPA parts.

Here’s a big table I built that has all the parts, plus some electrical specifications that might help your selection process: 1The groupings and color codes match the official selection guide PDF. Note that the PMS (commercial) line is also available as PMC (industrial). Though their operating ranges differ, they are otherwise identical.

LCSC Price @ 100IO MaxFPPAROM (KB)RAM (B)ADC8-bit PWM11-bit PWM1/2 VDD LCDCMPMULTIRCMin SupplyMin Supply @ Max SpeeduA @ 1 MHz 5VuA @ ILRC 3.3VnA PD @ 3.3V
SOVA PMS                 
IO Range                 
PMS15A 610.5641622345013500
PMS150C $ 0.03186116411622345013500
PMS152 $ 0.04321411.2580131551.83.5100015600
PMS154B  14121282341702.22.245012*500
PMS154C $ 0.054714121282341701.81.845012*500
8-bit ADC                 
PMS171B $ 0.06141411.596122  1 501.8370036600
12-bit ADC                 
PMS132B $ 0.073814121281223 11552.23.5100015600
PMS133 $ 0.086418132561423411632.23.575040100
PMS134 $ 0.106122142561423411632.23.575040100
PMS                 
IO Range                 
PMS150 61160372.22.510006500
PMS153 $ 0.0623121164352.22.510007500
PMS156 $ 0.07751611644372.22.517008500
PMC251 $ 0.0841122164242.2317008400
8-bit ADC                 
PMS271 $ 0.083316216484242.2317008400
12-bit ADC                 
PMS130  1411.5881221372.22.51700151000
PMS131 $ 0.08181411.51601221372.22.51700151000
PMS232 $ 0.16211822881014242.23170015500
PMS234 $ 0.2000262420811141242.24170015500
MTP                 
PFS154 $ 0.06591412128 23417023.555012*100
PFS172 $ 0.081814121281221561.8360076
PFS173 $ 0.08181813256142351932.23.575087100

Parts in bold-face are parts I evaluated during this review. Specifications in bold-face are best-in-class.

There are a few interesting take-aways:

No comms peripherals. The Padauk microcontrollers have an interrupt controller, at least one timer, sometimes an ADC, comparator, and/or PWM controller, and… well, that’s it. One of the first things I noticed about these parts is all UART, I2C, and SPI communication you’ll need for a project has to be implemented in software.

Multiple “processing-unit” design. Higher-end Padauk parts — like the PMS2xx, PMC2xx, and PGC2x — have two “processing units” in their “FPPA” while the unreleased PGC4xx will be a 4-FPPA design. I put “processing units” in quotes for reasons explained later. It looks like they have had legacy 8-core designs (like the MCS11) that are no longer carried by distributors.

Flash and OTP options. Most modern western-designed MCUs use flash memory for program storage. This is convenient for us developers, but it’s kind of silly when you consider that most embedded devices have no mechanism for firmware updates. Flash memory is expensive and finicky to implement at a process level. Typical low-voltage-programmed flash memory is less immune to damage than the high-voltage OTP-programmed memory (which, itself, is less immune when compared to true mask ROM).

Middling low-power capabilities. Any low-cost 8-bit MCU in 2019 needs to target ultra-low-current battery-powered applications (where other parts can’t compete), but the Padauk series as a whole doesn’t fare well compared to other vendors. Across their entire line, 1 MHz run-mode current consumption varies from 450 – 1700 µA when operating at 5V. Note, however, that the newer parts are actually quite good — the PMS150C uses about 300 µA at 3.3V, which is similar to the exceptionally low-power EFM8SB1 that uses about 270 µA when running at the same speed. Sleep current is below 1 µA with the oscillators stopped, which should be good enough for most battery-operated applications.

Poor battery support. Not only is the current consumption rather high on older parts, but the minimum supply voltage is, too. All microcontrollers should be 1.8V-friendly, but most Padauk MCUs bottom out at 2.2V — only the newer PMS152, PMS154C, and PMS171B are rated for 1.8V operation, and only the PMS154C can run full-speed at 1.8V. Most parts need at least 2.5V for full-speed operation, while some — like the two-FPPA PMS parts, 12-bit PMS parts, and the MTP flash-programmable parts — need 3.5 or even 4V. This means many Padauk-powered gadgets will die long before their batteries do.

Flexible pin-count. Many Padauk parts are simultaneously available in 8-pin, 16-pin, and sometimes 18-20 pin packages. While high-pin-count ARM MCUs are often available in ~80-ball CSPs and much, much larger 100- or 144-pin LQFPs, I haven’t seen many other 8-bit MCUs offered in package sizes with such stark differences in pin count.

Inflexible packages. While most Padauk datasheets advertise SOT, SOP, QFN, and sometimes MSOP, I wasn’t able to find packages other than SOP and SOT available from Padauk distributors (official and unofficial), so I think the QFN and MSOP packages are only available on request. It’d be nice to have widely-available 15-20 I/O chips in a 3×3 or 4×4 QFN. 16-pin SOIC packages are massive, so they rarely make it into my designs.

Architecture

All Padauk MCUs have what appears to be an identical architecture. They use an accumulator-based machine where all instructions — other than jumps, call, ret, and ldtab — are executed in a single cycle. The instruction set is somewhere between a mid-range and enhanced mid-range Microchip PIC16 — it supports indirect loads and stores from an arbitrary memory location, but not with a literal offset or automatic incrementing. Translation for programmers: you can use pointers to dereference memory, but your pointer arithmetic has to be done to the pointer’s memory location first, and using separate instructions that will incur cost. For example, to walk through an array, it will take at a minimum three cycles — two to dereference the pointer, and one to increment the pointer’s value.

Like the PIC16, there’s just a single interrupt vector. Interrupts are automatically disabled before the ISR executes, and re-enabled afterwards. Unlike the PIC16, the stack is located in main RAM, so you don’t have to worry about deep call stacks. The Padauk parts use a 2T architecture where the PIC16 uses a 4T — that means a Padauk part running at 16 MHz (the maximum frequency) will execute a maximum of 8 MIPS (125 ns instruction cycle), which is equivalent to a PIC16 running at 32 MHz.

The instruction set is optimized for traditional low-end 8-bit fare: there are dedicated instructions to set, clear, and compare individual bits in both peripheral and memory locations give you rock-steady timing performance (and efficient ROM usage) when writing bit-bang code. There’s also dedicated instructions to load and store the 16-bit timer value in a RAM location in a single cycle — neat!

Peripherals

There’s only a few peripherals to select from:

  • Clock system with 16-MHz and ~70 kHz internal oscillators
  • 16-bit timer
  • Watchdog timer
  • One or two 8-bit timers*
  • 11-bit PWM generator*
  • VDD/2 LCD bias generator*
  • Analog comparator*
  • 8 or 12-bit ADC*

* available on some models.

GPIO

GPIO pins are the meat and potatoes of any MCU, and because these Padauk parts have so peripherals, you’ll be using GPIO pins even more than usual.

GPIO input pins have selectable pull-up resistor (nominally 100k at 5V). GPIO input pins have a 2V threshold voltage when running at 5V (and 1.5V when running at 3.3V).2The cheapest logic-level converter I could find is $0.05 each in volume, so yes, it’s cheaper to program a Padauk PMS150C to be a logic-level converter than to just buy a logic level converter. Leakage current isn’t specified in the datasheet, but from my observations, it’s extremely low — low enough not to affect power consumption in sleep mode in a meaningful way.

In general, GPIO output pins can be configured globally for normal or low drive strength. Normal drive will source 12 mA at 5V and 5 mA at 3V. Low drive will source 3.5 mA at 5V and about 1.5 mA at 3V. Sinking current is 15 mA at 5V and 5 mA at 3V for normal drive strength, and 5 mA at 5V and 2 mA at 3V for low drive strength. Each MCU appears to have a single open-drain pin (PA5 — reset) which can sink almost twice as much current as the other GPIO pins (but obviously can’t drive the pin).

By default, a change-in-value on any GPIO pin will wake up a sleeping Padauk part, but you can disable wake-up on specific pins if you’d like.

Clock System

All Padauk MCUs have a clocking system that supports an external crystal, along with a 16-MHz high-speed internal oscillator and a low-speed internal oscillator. Depending on the part, the nominal value of this oscillator ranges from 24 – 93 kHz. For example, the PMS150C has an IRC frequency of 62 kHz, while the PMS271’s is 24 kHz, and the PFS173 is 93 kHz. 3I can’t, for the life of me, figure out why the IRC frequencies vary chip to chip. There seems to be no rhyme or reason to intentionally do this; I wonder if the designers simply threw in whatever they could fit in the remaining area, spun some samples, and figured out what they had come up with. I know oscillator design can get finicky, so that’s the only explanation I could come up with.Padauk seems to be a bit inconsistent with this nomenclature; you can set SYSCLK to a maximum frequency of IHRC/2 (8 MHz). Or, you can set SYSCLK to ILRC — but when you set it to ILRC, it doesn’t run at 63 kHz, as indicated in the datasheet, but rather half that (31.5 kHz). I assume some datasheet ILRC ratings are for the actual ILRC oscillator, and some datasheets rate them for the effective speed the system runs at when sourced from one of these oscillators (which is half).

These oscillators are designed to be extremely low-power and low-cost; Padauk provides no specification for the performance of the oscillators prior to calibration, but I wouldn’t be surprised if it were 10-15% error, or worse. However, once calibrated, the oscillators are good to 1% under constant voltage and temperature (which makes sense if you know anything about RC circuits).

Padauk has an ingenious solution to this problem: their in-circuit programmer will actually measure and calibrate each MCU during the programming process. You just have to insert a macro in your code to trigger this process — something like this:

.ADJUST_IC SYSCLK=IHRC/2, IHRC=16MHz, VDD=3.3V, Bandgap=On; 

16-bit Timer

The 16-bit timer is a simple up-counting, non-reloadable counter. Its clock source selection is independent of the CPU clock, and can also count from one of two GPIO pins. It has a divide-by-1, 4, 16, or 64 prescalar, and an overflow interrupt. The timer’s value can be read and written to from software, and… that’s it.

Watchdog Timer

All Padauk parts have a watchdog timer clocked from the internal low-frequency oscillator.

8-bit Timer

Many (but not all) Padauk parts also have one or two 8-bit timers that have an auto-reload (period) output-compare mode as well as 8- or 6-bit PWM. The 8-bit timer has the same prescalar and input settings as the 16-bit timer, but can also take an input from the optional analog comparator (useful for counting threshold-exceeded events), as well as from a few GPIO input complements (letting you count rising- or falling-edge events). The timer also has a divide-by-1 to 31 integer scalar to give you a bit more control over the period.

You’ll want as much prescalar control as you can get, since, when the timer is in PWM mode, you’ll have a fixed period of 256 (0 to 255) — preventing fine-grained control over frequency. Luckily, this isn’t usually an issue with PWM applications. There’s an interesting Code Option bit that you can flick on to disable the PWM output whenever the comparator output goes high (useful for, say, current-limiting H-bridge motor drivers).

11-bit PWM

Some Padauk parts feature a 3-channel 11-bit “SuLED” PWM generator designed specifically for RGB LED control. Padauk introduces this feature on the 4-cent PMS152. The 8-pin version of this chip has just enough pins to implement an SPI daisy-chained RGB LED controller, so I wouldn’t be surprised if we saw applications like that. This peripheral also has selectable complementary outputs, a /2 to /128 prescalar, and an adjustable period register, too (if you don’t need all 11 bits of resolution).

ADC

There are 8- or 12-bit ADCs available on several Padauk parts. The input mux seems wide enough to allow all — or almost all — pins to function as analog inputs, though I didn’t check every chip and package.

You can choose from an external pin or internal VDD reference, and there’s also a 1.2V band-gap you can measure, but you can only use the bandgap as a reference on 12-bit ADC parts. These higher-end parts also have a 1/4 VDD measurement input which — when coupled with the 1.2V bandgap — makes measuring battery life a breeze (why don’t more microcontrollers have this feature?).

Multiplier

Some Padauk parts have a hardware multiplier. It’s kind of weird — it’s both an instruction set extension, but also a memory-mapped peripheral that must be loaded with data. You load the values you want into it, perform the mul instruction, and get the values out. The compiler has no idea how to use it, so this is all done by you, manually. It’s extremely goofy.

Sleep

The Padauk parts have two low-power instructions — stopexe (wait-for-interrupt, basically) and stopsys (clock-power-down sleep). I love how simple it is to do low-power on these parts; every GPIO input pin is, by default, a wake source, so anytime you’re in a busy loop waiting for something, you can just throw a stopsys instruction in there and you’ll get instant power-savings. Wake-up time is configurable as 32 or 2048 ticks of the oscillator, so as fast as 2 µs. It’s not clear to me why you’d ever configure the system for slow wake-up, as there doesn’t seem to be any advantages, but it’s an option, too.

FPPA — A Multi-Core Microcontroller?

Some of the Padauk parts feature what they call an FPPA — Field Programmable Processor Array. Without ever looking into the details, people incorrectly tout this as a multi-core microcontroller. It’s obviously a cute play on “FPGA” — but it’s also a complete misnomer, as it is neither field-programmable, nor a processor array. You see, not even the flash-based Padauk parts support the self-programming functionality needed for in-field firmware updates, and even more confusingly, this thing doesn’t actually have more than one processor. Instead, this part has what I would call a multi-context processor design.

Multi-what?

While an MCU’s CPU has access to ROM, RAM, and peripherals, it does its work using internal registers. For the Padauk part, these registers store which program memory location the CPU is executing (Program Counter), a pointer to the current stack value (Stack Pointer), the source and destination of arithmetic operations (Accumulator), and the state of the CPU’s operations (Flag Register).4This architecture is sometimes called an accumulator-based machine, and it contrasts with architectures that have many working registers.

A CPU’s context is simply the current state of all these registers. While this may seem strange to believe, if we copied the context of a CPU at one instant in time, and dumped it back into the CPU later on, the CPU would appear to pick up right where it left off — since it would start executing the same program memory instruction, with the same stack pointer, and the same data in the accumulator. So if we want to preserve the context of an execution cycle, we just need to save these registers.5Incidentally, this is what happens when interrupts execute — the context is saved, the CPU executes the ISR, and then the context is restored.

Going back to the Padauk’s FPPA feature, each FPP is just a bag of CPU registers — program counter, stack pointer, accumulator, and CPU flag register — that represent the context of the CPU. The CPU takes turns executing each context for one clock cycle before switching to the next.

Why would we want to have multiple CPU contexts?

We usually only think about saving and restoring a CPU’s context when we’re dealing with interrupts, or if we want to run a multi-tasking system that allows us to run multiple functions “simultaneously.” 6of course, we can’t actually run multiple instructions concurrently, because we only have one CPU. However, if we constantly switch between the CPU contexts for two different tasks, it will appear as though they are executing simultaneously.Having multiple FPPs lets you switch context quickly — the FPPA feature is basically like a hardware-based time-slicing RTOS.

Some Padauk parts have multiple FPPs, which enables quick context-switching between different tasks

Let that sink in for a moment, and consider the ramifications: An 8-cent part like the PMS271 lets you build a preemptive multitasking system with a 4 MHz context-switching speed. If that doesn’t sound impressive, consider that most RTOS projects found in the wild are running on 100 MHz ARM Cortex MCUs and don’t have a context-switching (tick) speed much faster than 1 kHz.

Ok, but why would you need or want this capability?

Let’s back up. Recall that none of the Padauk parts have communication peripherals, so if you need SPI, I2C, UART, 1-Wire, or any other communication interface, you’ll have to do it in software. This has some definite advantages: comms peripherals take up valuable silicon and often go unused. It’s hard for IC vendors to figure out what to stuff in MCUs — will customers need an SPI master? Two UARTs? I2C slave support? A separate I2C master and I2C slave? By moving everything to software, you can implement just the comms peripherals you need for your application, without having extra junk laying around.

This raises a problem, though: bit-banging a communication interface is tedious and imprecise. Sure, it’s easy if that’s the only thing your part is doing — but if it’s managing a PID control loop or calculating HSL values to PWM to an RGB LED, it’s going to be hard to get the timing correct for any communication you may need.

This is where the Padauk FPPA comes in handy: you can dedicate an FPP core to, say, a UART transceiver, and it can process incoming UART data and transmit responses back without having to worry about anything else.

Specialized Instructions

While this is certainly better than doing everything on a single core, anyone who has written software-based bit-banged peripherals before knows that getting accurate timing is a headache. Think about the number of instructions it takes to test a bit and jump out of a while() loop once the bit gets set or cleared. Or think about the tediousness of trying to write cycle-accurate delay() loops.

Padauk has a solution. These are the only MCUs I’ve seen that, at a hardware level, support wait and delay CPU instructions. You can delay for up to 256 cycles with a single instruction, and you can also delay until a bit is set (wait1) or cleared (wait0). These instructions remove the latency, unpredictability, and code size bloat you’d encounter when doing conditional loops.

This makes implementing cycle-accurate communication interfaces a cinch.7Note that the neither of these instructions are available on single-FPP devices, and the delay instruction is only available on some of the multi-FPP ones.

The Padauk IDE has a “Code Generate” button that launches this lightweight code generator.
I tested several generated code blocks, like this SPI code, and all worked flawlessly — though you’ll have to modify the resulting code.

To sweeten the deal even more, Padauk has a code-gen tool built into their IDE that does all the work for you — just tell it you want a UART receiver operating at 9600 baud, or an I2C slave device, and it will generate the code for you.

In many ways, the FPPA approach is actually better than having dedicated hardware peripherals: you can implement the number and type of peripherals you actually need, there’s no unused comms peripherals laying around, and because the peripherals are implemented in software, you can implement arbitrary protocols, or do goofy things that wouldn’t otherwise be easy.

I saw Anders Nielsen’s blog post about driving the WS2812B with a Padauk PMS150C, and was inspired to write a little SPI-to-WS2812B converter using one of these 2-FPP parts — specifically the PMS271. This will make WS2812B LED strips behave like APA102C strips.

Development on the Padauk

In-Circuit Emulator (ICE)

Padauk PDK3S-I-003 Multi-FPPA in-circuit emulator (ICE)

Many people scoffed at the one-time programmability of the Padauk parts (even me), but after using the entire ecosystem for a few days, I totally get it — and I’m starting to wonder why you don’t see more In-Circuit Emulator (ICE) tools anymore, because it makes perfect sense for stuff like this.

For the uninitiated, an ICE is essentially a special-built system that pretends to be your target microcontroller — running code full-speed, with all peripherals available — but also has all the debug circuitry and communication channels necessary to communicate with your host computer.

You wire the ICE into your product where the microcontroller would go. With the large SOIC packages common with Padauk parts, I could imagine soldering breadboard wires to the SOIC pads of my product PCB, and connecting the wires straight into the ICE.

I’m using the plural form because Padauk has different ICEs for different microcontrollers. I tested out both the 3S-I-003 (for multi-FPPA parts), and the 5S-I-S02B (for single-FPPA parts).

The 3S-I-003 is large, with lots of unnecessary LED indicators, and requires a separate 9V wall-wart in addition to the USB connection. It looks like it can simulate parts with up to 5 full GPIO ports (40 I/O total). They have a push-button permanently wired to PA0, and a transistor-controlled LED attached to PA1.

The new 5S-I-S02B is a small ICE designed for single-core Padauk devices.

The single-FPPA 5S-I-S02B ICE is much smaller than the multi-FPPA one, and seems to run fine without a 9V power connection. Both ICEs have removable HC49-packaged crystals and nicely-labeled I/O.

Once I started using the ICEs, I realized they didn’t feel much different than using a traditional microcontroller dev board. The Padauk ICEs have high-speed USB interfaces; once connected, they load the program into RAM inside the emulator IC, which keeps download times quick.

And this is when I really started wondering why other low-pin-count vendors don’t provide an ICE. If you’re developing an application with a part that only has a few I/O pins, why would you want to spare even a single one for debugging? Let alone the three or four that many architectures require?

All the sudden, the OTP nature of the Padauk parts became a bit of a non-issue. You sit in front of your computer, ICE in hand, and work through your project. You code, build, test, code, build, test, code build test…. and once you get things where you want them, you pull up the programmer, program your OTP part, solder it to your real board, and see your hard work pay off.

There are a few gotchas when using Padauk ICEs — they don’t perfectly emulate each MCU correctly, so be sure to read the MCU datasheet for notes about using the ICE. One issue I noticed that wasn’t mentioned in the datasheet is the ICE doesn’t appear to emulate the various ILRC frequencies of the Padauk parts — from the table above, these vary from 24 to 93 kHz. For my “bike light” test project, I had all my delays dialed in nicely for my emulator, but when I programmed a PFS173 with the firmware, it was strobing much more quickly. But when I programmed a PMS150C, it behaved identically to the ICE, so I suspect the ICE’s ILRC frequency is around 62 kHz. Another issue with using an ICE is that there’s no way to get power consumption figures. For more complex projects, you may want to use one of the Padauk flash parts, or buy an SOIC ZIF socket to allow you to iterate your code using the actual MCU.

I will say that — like everything else about this ecosystem — everything is optimized for low-cost, high-volume products that have very simple firmware. I could imagine the flash parts would be useful in more complex projects, where you may want to do quite a bit of testing and firmware development in-situ — here, flash memory is a nice thing to have. Just remember that you’ll be giving up PA3, PA5, and PA6 if you want to be able to do in-circuit flash programming of the MTP parts.

The Padauk programmer can run attached to a computer or stand-alone.

Padauk 5S-P-003 Programmer

This Padauk programmer feels like a typical production programmer you’d see at a manufacturer. It’s got a programming socket, a big “PROGRAM” button on it, and a 16×2 character LCD that keeps track of good / no-good count as you program them. You can use the programmer while attached to your computer, controlled with a GUI tool, or you can disconnect it and use the programmer in a stand-alone mode. I found programming speeds to be incredibly fast — less than a second.

Mini C Compiler

Before I actually pulled the trigger on my Padauk order, the scariest part about the ecosystem was the compiler. I definitely had some fun with compilers while working on the the $1 Microcontrollers series, so when I noticed that Padauk’s “Mini C” compiler didn’t support function arguments, return values, for() loops, and many other common C expressions, I became really nervous.

But here’s the thing: when you’re working on a part with no peripherals, 1K of program memory, and 64 bytes of RAM, you’re probably not going to the moon (that would require 70K of program memory and 2K of RAM). And here’s a thought: would you really even want those features? Wouldn’t you prefer an IDE that lets you get to be a bit more specific with what you want the MCU to be doing?

I discovered this while working on my SPI-to-WS2812 converter. When I first need a for() loop, I wrote:

int i = 7; 
while(i >= 0) {
   ...
   i--;
}

This works, but it’s not nearly as cycle-efficient as using one of Padauk’s handy macros:

.FOR i, <7,6,5,4,3,2,1,0>
  ...
.ENDM

When the compiler encounters these macros, it does a source-level replacement of “i” in the content of the block with each value — copying-and-pasting the entire block.

This gives you immense flexibility to hand-tune things. I anticipate most Padauk projects will have ample program memory space for doing this sort of macro expansion, but if you need to pack things in tighter, you can convert your “compile-time” loops to “run-time” loops.

There are many other macros the compiler supports, many of which would be extremely challenging to try to write using traditional preprocessor commands and a “dumb” C compiler. For example, you can delay an arbitrary number of clock cycles like so:

.delay 1100;

The neat thing about this macro is that it will get compiled into different instructions depending on the constant value specified; if the delay value is small, the compiler will simply insert the appropriate number of NOPs (or the CPU’s actual delay instruction, if supported). Once the parameter gets larger, the compiler will create a traditional software loop that counts down the correct delay value. The compiler knows how many cycles each instruction takes, so this generated code is cycle-accurate.

I can hear you exclaiming “on a real microcontroller, you’d just use a timer!” — which is utter nonsense. Timers are only accurate when there’s no CPU involved (PWM generation, output compare toggling, input capture, etc). Timers become inaccurate once you start polling a timer register (which can take plus-or-minus a few cycles depending on things like the CPU’s pipeline or flash memory wait states) or waiting on ISRs (which can have 30 or more clock cycles of latency, and the exactly latency is usually indeterminate). If you want to belch out a really accurate digital waveform, or measure one, this is a great platform for doing that.

Getting Closer to the Metal

Once you get past basic blinking, you’ll quickly realize how limiting the compiler is. Other than the macros — which come in handy — the rest of Mini C is pretty weird and cerebral.

The Mini-C compiler is interesting in that it forces you to understand your architecture and what you can and can’t do on it. You’ll get a compilation error if you try to write a line of code like this:

int b = *(ptr + i);

Why? Because your MCU does not have the ability to do indirect addressing with offsets (constant or otherwise). The compiler wants you to tell it precisely what you want to do. Maybe you want to do this:

ptr += i; // wipes out the old value of ptr, but it's fast
int b = *ptr;

Or maybe this:

int newPtr = ptr + i; // uses a new memory allocation and is slower
int b = *newPtr;

Both of these have trade-offs. The more you develop in Mini-C, the more you’ll start to realize exactly how much performance, code size, and RAM usage you’re used to sacrificing every day, just to make your software a bit briefer to write.

Performance testing

I wanted to put the Padauk PMS150C through its paces with my usual Biquad Filter Test.

When it was all said and done, the part clocked in at 21.73 µs per sample — that’s 46 kHz — which was as good or better than much more expensive parts in my $1 round-up, and perfectly acceptable for many applications.

But it was definitely not easy getting there. The C code I used for those original MCU tests looked something like this:

volatile int16_t in[25];
volatile int16_t out[25];

const int16_t a0 = 16384;
const int16_t a1 = -32768;
const int16_t a2 = 16384;
const int16_t b1 = -25576;
const int16_t b2 = 10508;

int16_t z1, z2;
int16_t outTemp;
int16_t inTemp;

void main()
{
	while(1) {
		_pa = 2;
		for(i=0;i<25;i++)
		{
			inTemp = in[i];
			outTemp = inTemp * a0 + z1;
			z1 = inTemp * a1 + z2 - b1 * outTemp;
			z2 = inTemp * a2 - b2 * outTemp;
			out[i] = outTemp;
		}
		_pa = 0;
	}
}

The Padauk code looks like this:

WORD in[11];
WORD out[11];

WORD z1, z2;

WORD pOut, pIn; // these are pointers, but aren't typed as such

int i;

void	FPPA0 (void)
{
	.ADJUST_IC	SYSCLK=IHRC/2		//	SYSCLK=IHRC/2

	PAC.6 = 1; // make PA6 an output
	while(1) {
		PA.6 = 1;
		pOut = out;
		pIn = in;
		i = 0;
		do {
			*pOut = (*pIn << 14) + z1;
			z1 = -(*pIn << 15) + z2
				+ (*pOut << 14)
				+ (*pOut << 13)
				+ (*pOut << 9)
				+ (*pOut << 8)
				+ (*pOut << 7)
				+ (*pOut << 6)
				+ (*pOut << 5)
				+ (*pOut << 3);

			z2 = (*pIn << 14)
				- (*pOut << 13)
				- (*pOut << 11)
				- (*pOut << 8)
				- (*pOut << 3)
				- (*pOut << 2);

			i++;
			pOut++;
			pIn++;
		} while(i < 11);

		PA.6 = 0;
	}
}

There’s several things going on. First of all, don’t bother including <stdint.h> — it doesn’t exist. You’ll be using “WORD” for 16-bit numbers, “BYTE” for 8-bit ones, “EWORD” for 24-bit, and “DWORD” for 32-bit. I think int is 8 bits (??) on this architecture.

Starting in FPPA0() — the equivalent of main() — you’ll see that, indeed, Padauk has no for() loops. You may think it would make more sense to replace with a while() loop, but there’s a huge gotcha: the Padauk compiler cannot access memory with an offset determined by a variable; in other words, if i is a variable, you cannot access the ith element of myArray by calling myArray[i]. There are two ways of accessing array elements: using the aforementioned .FOR macro, or by walking through it by incrementing a pointer, as shown above.

While a macro .FOR would be faster, I quickly ran out of ROM space, since the entire contents of the function will be duplicated multiple times. I also tried calling the filter code through a function to reduce the ROM requirements, and it was similar in performance.

As for the filter function itself, you’ll see that all the multiplies have been replaced with shift-adds. The Padauk part does not recognize the * operator for multiplication; trying to use it to multiply two variables together results in a syntax error. No, I’m not joking.

Now, on some level, this makes sense since the PMS150C has no hardware multiplier — having said that, this is the kind of stuff you want your compiler to do for you.

And even when you move up to a Padauk part that does have a hardware multiplier, like a PMS132B or PMS134, the compiler still doesn’t recognize the operation.

What about the multiplier?

By the way, for those keeping score, the hardware multiplier helps quite a bit: the PMS134 can process each sample in 14 µs, for a sample rate of 70 kHz. To put that into context, at this point, we’re obtaining about half the math performance of an 8 MHz Microchip megaAVR — pretty impressive for a 7-cent chip 8OK, the PMS134 is closer to 10 cents, but the PMS132B has similar specs — minus a few peripherals — and should be able to hit the same numbers.

Should you switch to SDCC?

Wouldn’t it be nice if you could use a standards-compliant compiler, like SDCC, to target the Padauk parts? That would solve all these problems, wouldn’t it?

I don’t know — I didn’t try it. Why not? Because I honestly have no idea how you’d develop code on one of these parts with SDCC in a useful fashion. There doesn’t appear to be a way of loading an arbitrary binary into the ICE through the Padauk IDE software, and even if you could, there would be no way of getting the Padauk IDE to set breakpoints and inspect variables unless it knew how memory was mapped back to source code.

To this end, there’s really no good way to iteratively develop code using SDCC — the only way would either be to blow through hundreds of OTP parts or use an expensive MTP (flash) part. Either way, you’d lose all ability to debug, which is critical for tiny parts like this. If you’re a maker/hobbyist used to the printf() route on your AVR, think for a minute how many resources will be consumed with a single call to that function on one of these parts. Also keep in mind you have no UART…

The IDE has everything you want, plus a few extras I wish other vendors would incorporate. For example, why doesn’t every MCU IDE let me know when the MCU is actually asleep?

Padauk FPPA IDE

OK, I kind of like ghetto IDEs — one of my favorite underdogs from the $1 Microcontroller series was the Holtek HT-IDE 3000. Padauk’s FPPA IDE makes the Holtek one look like Visual Studio 2019.

This thing is lean, mean, and suuuuuper basic. But it totally gets the job done, and if you hate it, you can go use your favorite text editor instead. Here’s what to expect with it, though:

  • 3.9 MB download. The IDE is free, and there’s no registration.
  • Half-second build times. The compiler is statically linked as library call instead of a separate executable, and the compilation process itself is trivial.
  • 1.3 seconds to launch a debug session. The Padauk ICE communicates using a Cypress high-speed USB 2.0 interface, and it shows. Starting and stopping debugging, setting breakpoints, stepping through code, and inspecting memory felt like I was debugging a PC app locally — not an embedded system.
  • Light and Dark themes: Just go to Help > Color to switch. Then ponder the menu location for that feature.
  • No code completion. You won’t get a lot of IntelliSense-like features here, but there are plausibly-useful tooltips that appear when you hover over symbols, and you can go to definitions of (some) things, too. A smattering of macros have hard-coded help links to them — anything that’s underlined has instant help available.

Test Projects

In addition to the performance evaluation above, I built two test projects to evaluate what the overall experience is like to develop on these parts.

The source files for these projects are available on GitHub.

NeoPixel SPI adapter

This PMC251 serves as an SPI-to-NeoPixel bridge, converting a 500 kHz MOSI / SCK signal pair to a WS2812B-compatible pulse train.

I wanted to try out the dual-FPPA functionality found on the PMC2xx and PMS2xx parts, and since I just got done talking about how great it is for communication, I thought I’d create a NeoPixel SPI adapter built around the PMC251.

The firmware takes a 480 – 510 kHz SPI signal (data and clock only) and outputs it as a WS2812B-compatible pulse train. There is a one-byte (16 µs) latency, which should be fine for most applications.

This part is around 7 or 8 cents, depending on if you get the 8-pin or 14-pin version (either will work for this project).

Because of the flexible logic input thresholds on the Padauk parts, this implementation should work down to 3.0V logic levels or lower, which the WS2812B wouldn’t natively support if you tried driving it directly (for 3.3V interfacing, Adafruit recommends using a 74HCT125 logic-level converter, which is substantially more expensive than this Padauk part).

The NeoPixel SPI adapter has pretty tight timing requirements that mean your 500 kHz SPI signal can’t stop mid-transmission.

The biggest problem I ran into was the ~9 µs timeout of the WS2812B. The fastest SPI signal I could reliably receive was about 460 kHz, but this introduced too much delay while waiting for the next byte. I tried increasing the SPI frequency to 500 kHz, but I would start losing bits when I was running the system at the normal 16 MHz frequency of the chip (which is really 8 MHz for the instruction clock, which is really 4 MHz for the instruction clock for each FPP).

I was able to trim some code down to tighten things up, but the real breakthrough was when I remembered I could nudge the system clock up or down with a single line of code, and the Padauk programmer would calibrate this into each part. I raised the clock to 18 MHz, which allowed me to reliably receive SPI traffic up to 510 kHz. This provided a 6 or 7 µs window between transmitted bytes that the NeoPixels seemed to be able to handle.

The dev tools worked well — once I had the code working on the ICE, I popped a fresh PMC251 onto a breakout board, threw it in the programmer, flashed the image, moved it over to my breadboard, and everything just worked.

Bike Light

I used the PMS150C to build a simple flashing bike light

This is a simple project that uses a tactile push-button switch to toggle on or off a blinking LED, potentially as a bicycle light. While many people might implement this project using a slide switch and a 555 timer, this implementation has substantially fewer BOM lines as well as a much lower BOM cost.

In the GIF above, I’m flashing a single LED, however, the MCU will blink all pins on port A the same — on SOT23-6 devices, you can use up to 3 LEDs, while SO-8 packages will blink up to 5 LEDs.

This project was mostly designed as a practical test of the sleep power modes of the Padauk parts. In sleep mode, the PMS150C sips on just 350 nA when powered from a 3.3V supply. Considering this includes the leakage current from the nternal pull-up on the GPIO input pin used for the push-button, this is pretty impressive. A CR2032 battery could power this thing in sleep mode for 10-15 years — the limiting factor would be the self-discharge of the battery itself.

Closing Thoughts

The Padauk parts are nearly perfect for a set of specific, targeted applications. They should not be considered general-purpose microcontrollers that are drop-in replacements for other more expensive parts, but rather, extremely optimized parts that you reach for when you really want to show off your engineering chops and get BOM cost down to a minimum. Anyone who thinks these parts are terrible doesn’t understand (or at least value) how to design embedded systems properly.

So what are the usage cases to avoid? Basically anything where you need a whole lot of math, a whole lot of comms, or a whole lot of analog.

Because any comms you need will be done in software, it is unlikely you’ll be able to run a receiver at bauds faster than about 500 kbps. But for 100 kHz I2C, transmit-only UART, and slower SPI, I had no problems. In fact, compared to other MCUs, doing everything in software was downright easy, since there’s no peripheral configuration registers to consult in the datasheet.

The compiler is great for efficiently shuffling bits around, but it’s really restrictive for any sort of signal processing duties. It has no math library, so you’ll be writing routines for anything more complicated than addition and subtraction. Still, forcing yourself to use it definitely offers some bare-metal learning outcomes. SDCC support for these is interesting, but without tie-ins with the vendor tools, it’s more of a novelty than a useful tool right now.

If you’re not building large quantities of a product, remember that spending hundreds of dollars on ICEs and programmers will bump up the per-unit cost of development. Hobbyists, hackers, makers, and people working in small shops building one-offs will likely only find these parts amusing — but for engineers designing products in volume, I couldn’t find any reason not to consider these parts for a wide range of simple applications.

Footnotes   [ + ]

1.The groupings and color codes match the official selection guide PDF. Note that the PMS (commercial) line is also available as PMC (industrial). Though their operating ranges differ, they are otherwise identical.
2.The cheapest logic-level converter I could find is $0.05 each in volume, so yes, it’s cheaper to program a Padauk PMS150C to be a logic-level converter than to just buy a logic level converter
3.I can’t, for the life of me, figure out why the IRC frequencies vary chip to chip. There seems to be no rhyme or reason to intentionally do this; I wonder if the designers simply threw in whatever they could fit in the remaining area, spun some samples, and figured out what they had come up with. I know oscillator design can get finicky, so that’s the only explanation I could come up with.
4.This architecture is sometimes called an accumulator-based machine, and it contrasts with architectures that have many working registers.
5.Incidentally, this is what happens when interrupts execute — the context is saved, the CPU executes the ISR, and then the context is restored.
6.of course, we can’t actually run multiple instructions concurrently, because we only have one CPU. However, if we constantly switch between the CPU contexts for two different tasks, it will appear as though they are executing simultaneously.
7.Note that the neither of these instructions are available on single-FPP devices, and the delay instruction is only available on some of the multi-FPP ones
8.OK, the PMS134 is closer to 10 cents, but the PMS132B has similar specs — minus a few peripherals — and should be able to hit the same numbers

Comments (12)

Hi Jay
Thanks for the writeup and thanks for the mention. I’m currently working on some code intended to run the nRF24L01+ compatible SI24R1 2.4Ghz RF transceiver with the PMS150C, should publish next week. Currently works perfectly. SPI was a breeze but I doubt I can get the MOSI data rate higher than 1Mhz.
Oh btw – Padauk uses the term “Mini C”, not “Micro C” 🙂

That’s going to be one cheap RF setup! Where’d you source your SI24R1? Congrats on getting things working!

Thanks! The SI24R1’s are just leftover “nRF24L01+ modules” when testing – if you see a module with an inverted F antenna, it’s usually the SI24R1. Sometimes the IC says nRF24L01+ – sometimes SI24R1. Especially if it’s less than 5$. LCSC stocks SI24R1 IC’s along with the SMD-style modules.
There’s also a cheaper (and non-nRF24L01-compatible) “clone” that might be worth trying out with the Padauks. Don’t remember the IC name though.

Brilliant write-up (again), Jay.
You might want to know that you have an incomplete sentence that begins “You can delay for”

Cheers,
Berwyn

Thanks for spotting my copy-pasta. Fixed!

If you feel like adding it, I think it would be very helpful to have a column listing the packages available for each part (with a * indicating the ones that are commonly stocked).

Also, a link to a spreadsheet version of the table would be very handy if you have it.

Thanks!

The only commonly-stocked packages of these parts is SO (except for the PMS150C, which is also available in SOT). As for pin count, you can usually get the SO-16 chips in SO-14 and SO-8, too. SO-28 chips can be found in SO-14 and SO-16 packages, too. But sometimes you can find a big chip in an 8-pin package that the datasheet doesn’t even mention. As I mentioned, all of these packages are huge, so you usually spec the part based on how many IO you need, and just pick the smallest one.

Regarding the 2048 cycle delay on oscillator start-up, that’s probably to support crystal-based oscillation. Microchip parts have the same discrepancy between crystal and IRC wake-up times.

The free toolchain, which SDCC is a part of, indeed does not tie in with the vendor toolchain. The goal was to create a _free_ toolchain; it also works on various OSes (I’ve seen it used on GNU/Linux, MacOS, Windows). So interacting with non-free Windows tools was not a priority.

On the other hand, it offers various advantages:

* It is free
* It should work on any OS
* It allows use of standard C. The productivity gained would probably offset the harder debugging.
* It is cheap: While the 7 cent Flash µC is relatively expensive to the OTP µC, one can use it during development, and later port to the OTP. So you’d have to compare the price of that 7 cent µC to the ICE. Similarly, the free programmer is cheaper to make (though it is not being mass-produced yet) than the vendor’s programmer.
* It is easily to use with other standard free tools (e.g. make, Code::Blocks).

On the other hand, there are shortcomings:
* The debugging situation is far from optimal (well the ICE also doesn’t behave exactly like the real hardware either, but that’s another story). Unlike e.g. the STM8, the device has no support for on-target debgging. SDCC comes with an emulator, but that is only useful for debugging the software in isolation. One could use printf() (as long as the argument is just a ‘\n’-terminated string literal, SDCC optimizes it into puts().
* There is no support for multiple FPPA yet.
* There is no support for the 16-bit pdk16 devices yet (only pdk13, pdk14, pdk15 supported so far).

But people have done stuff using the free toolchain. On the eevblog forums someone stated, they created an SD-card-music player based on a Padauk µC using the free toolchain.

About 6 weeks ago, I held a small 4-hour workshop on the free toolchain. It was mostly attended by first-year bachelor students of computer science, that had attended a lecture on C, but did not have previous experience programming µC. Soon they had the free tools working on their laptops, and had the “Hello, world” (via software-emulated UART) and timer / blink example program working.
Toward the end of the workshop they did their own ideas (some used PWM to modulate the brightness of an LED, I saw both pure software solutions and use of the PWM peripherals).

P.S.: In your table, the ROM (KB) column has values in KW despite saying KB in the table header. Word width varies from 13 to 16 bit.

Hi Jay. Very good article – I’m excited to see now contenders in “cheaper than their weight in silicon” category, however I’d like to point out there’s some inaccuracy with respect to the following statement about RTOS: “…consider that most RTOS projects found in the wild are running on 100 MHz ARM Cortex MCUs and don’t have a context-switching (tick) speed much faster than 1 kHz.”.

This sentence assumes only the behavior of preemptive RTOS, which relies on RTOS clock (Arm SysTick or otherwise) to instigate the context switch. Having worked in this area for last half of the decade I must say that this is typically the corner case that is most commonly a result of bad application design, rather than intended behavior. Most well-written applications perform context switches on thread/task object interactions, not based on time slicing. Low-priority thread continuously putting semaphores paired up with high-priority thread continuously pending on and consuming them will result in context switch count much closer to 1MHz than 1kHz, on a 100MHz Arm core. It is a common misconception that maximum context switching rate depends on the system clock, when in reality any decent RTOS will switch to highest-priority task that is ready to run without waiting for the timer event.

The article is a bit misleading in representing the FPPA architecture as something akin to preemptive multitasking. Since the context can be switched for each cycle, it is very similar to having several cores. A better comparison would be multithreading as implemented on modern CPUs (Pentium 4 and later) or in the XMOS architecture. (https://en.wikipedia.org/wiki/XCore_Architecture)

Preemptive multitasking as in an RTOS will not allow ‘bitbanged” peripheral that rely on cycle exact timing. FPPA/XMOS does so, and that is why no SPI/I2C/UART peripheral is needed on these devices.

Hi Jay,

excellent article, thank you for providing more insight into these very interesting niché MCUs.

The reason why you did not find any way to integrate SDCC with the official Padauk tools is that the open source toolchain exists in parallel to the official tools. Unfortunately, Padauk is not willing to provide any integration options into their toolchain as of now, which is a pity. Being able to a use a full C compiler is very helpful in many instances.

In addition to SDCC, there is also an open hardware programmer available at https://free-pdk.github.io/.

Most nonprofessionals are not to keen on spending 100+ USD on programming hardware or an ICE, especially considering the abundance of options for STM8, AVR, STM32 and others.

Leave a comment