Browse the Microcontroller pages of LCSC and you’ll see tons of low-cost MCUs from companies like Padauk, Nyquest, Holychip, SimOne, and Fremont Micro Devices — with prices as low as 4 cents. The problem is these parts all use EPROM: Electronically Programmable Read Only Memory. The missing “E” means these aren’t electronically erasable — they’re OTP (One-Time Programmable) parts. Once you’ve burned your hex file, you’re locked in. OTP parts are widely used for simple devices in cost-sensitive applications, but they’re annoying to use in designs since to develop firmware, you need special emulator hardware (or a huge supply of ICs and a conveniently-located trash can).
I was interested in finding the cheapest flash microcontroller LCSC sells, and it turns out it’s not a crusty old 8051 or a PIC16 clone — and it’s not the WCH CH32V003 that the Internet is freaking out about — it’s actually an Arm Cortex-M0+ made by Puya, it’s less than 10 cents in moderate quantities, and it’s awesome.
Puya? They’re a flash memory company out of Shanghai that’s been around since 2012 — mostly making really low-cost SPI flash memory for IoT gadgets. While it may seem odd that the cheapest MCU is made by a Flash memory company, if you know a thing or two about chip design (which I don’t) you’ll know that it’s often the flash memory inside the part that’s the most expensive IP in the design. This is why so many fabless semiconductor shops don’t bother with it and just use external SPI flash or OTP memory (just look at any USB webcam controller, Bluetooth audio chip, or even the popular Espressif products).
Puya PY32 Selection Guide
Puya makes a range of ultra-low-cost Arm Cortex-M0+ parts in the PY32 series. There are three families in the series: the PY32F002A, the PY32F003, and the PY32F030. The families differ in memory, maximum speed, and peripheral inclusion, but they seem entirely progressive (e.g., firmware written for the PY32F002A should run on the PY32F003 and PY32F030). Regardless, all the parts are modern designs that use internal regulators and have an operating voltage range of 1.7 to 5.5V.
Unlike many other low-cost MCUs that only come in one or two (usually low-density) package options, the PY32 has nine different choices, ranging from a tiny 2×3 mm DFN to a full-sized 32-pin LQFP — along with QFN, TSSOP, and drunk-soldering-friendly SOIC packages.
I bought a ton of different PY32 parts from LCSC (just search for PY32). At the time of publication, prices have shot up quite a bit compared to when I first ordered these parts, but currently, the prices range from less than 8 cents up to $0.74 depending on the exact part, configuration, and quantity you’re buying.
If you want to avoid redoing some of the work I’ve done to get these parts going, I have a py32-template GitHub repo with a sample project you can clone and adapt to your needs.
The F002A is the base model; it clocks in at 24 MHz, has 20K of flash, 3K of SRAM, and comes in SOIC8/16, TSSOP20, and QFN16 packages — along with an odd 1mm-pitched ESSOP10 package. It has one advanced-control timer with 4 outputs (plus two complementary pairs for driving H-bridges), 1 low-power timer, 1 general-purpose timer, a 1 MSPS 12-bit ADC with 6 external channels, two comparators, a basic watchdog timer, and one SPI, I2C, and USART peripheral. And since this is a Cortex-M0+, you get an SWD debugging interface, a SysTick timer, and a nested vector interrupt controller (NVIC) bundled in.
The big thing the PY32F003 has over the F002A is DMA support, which is an unusual feature in this price range. The PY32F003 comes in 16K and 32K flash versions (with 2K and 4K of RAM respectively). In addition to the DMA, it adds three more general-purpose timers, a second USART, plus an “IR timer” (which is really just a NAND gate fed from TIM16 and TIM17 which can be configured to generate the carrier and data signals respectively). It also adds some reliability with a windowed watchdog (WWDG) and programmable voltage detect (PVD) brownout supervisor.
While the F003 advertises a 32 MHz operating frequency, its internal oscillator only runs at 24 Mhz. To get to 32 MHz, you’ll need an external crystal, which can cost more than half as much as the actual microcontroller does, and depending on the MCU package, can double the board area required to lay out the part.
The PY32F030 is the highest-end part in the lineup. It adds a 2x PLL that can run the core at 48 MHz off of the 24 MHz internal oscillator. The F030 also adds 32-pin QFP/QFN package support to get you up to 30 GPIO pins. In addition to the other capacities already mentioned, the F030 is also available in a higher-capacity 64K flash / 8K RAM version. The peripheral set between the F003 and F030 is identical, save for the F030’s built-in scanning LED matrix controller, which supports 4 8-segment digits.
What’s particularly strange is that the F030 has two different sets of pinouts. The F2 QFN and TSSOP 20-pin packages has three extra analog inputs, which the F1 package trades for 32kHz oscillator support. On the 32-pin package, the K2 version adds an extra GPIO pin in exchange for one fewer GND pin. I’m not sure of the rationale for trading GPIO pins for redundant power pins; maybe for better analog performance?
Are these STM32 clones?
While the PY32F030 family looks very much like an STM32F030, they are not compatible parts in any capacity. None of the parts appear to have complete pin compatibility, so they are not compatible at a hardware level. And the register offsets and interrupt vectors are different, meaning there is no binary compatibility for even very basic programs. While the peripherals are very similar to STM32 ones, there are subtle differences in the registers that will trip you up if you’re familiar with STM32 programming. These changes mostly appear to correct questionable decisions ST made in the first place; for example, the EXTI peripheral now contains configuration registers that allows you to route the correct GPIO bank to the EXTI signal you want (this was strangely located in SYSCFG on the STM32F030).
Overall, these parts are much easier to use than other parts in this price range, which often have poor English documentation, no code samples or peripheral libraries, or require vendor-provided IDEs and weird tooling.
You don’t need to buy these parts from weird Taobao links or AliExpress sellers — they’re available straight from LCSC. I initially designed my own breakout boards for these parts, but my friend Charles noticed Puya has $5 dev boards available too (which are really just breakout boards — they don’t even have a built-in debugger). Stock has been fluctuating a lot, as have prices, but they’re cheap enough to hoard for small-term low-volume projects.
Documentation and SDK
PY32 documentation is quite good. Visit Puya’s PY32 web page and you can download everything you need — just scroll horizontally all the way to the right and click on the “Datasheet” download (here’s a direct link that I’ll surely forget to update in the future). Notice this isn’t a datasheet, but a ZIP that contains all the PY32 resources (and yes, each product links to this same file on their web site). Inside, you’ll find English and Chinese datasheets and reference manuals, code samples, a Keil DFP that contains the flash algorithms you’ll need to be able to burn your code onto these parts with an SWD probe. The documentation is pretty good, but at the time of writing, only the PY32F030 contains an English reference manual. This isn’t much of a problem as you can use this same guide for all parts.
Next to all the goofy low-cost 8-bit parts I’ve played with, boy, it sure does feel nice to be working on an Arm Cortex part. I fired up VSCode, pulled in the SDK, created a Makefile and
launch.json config, and I was off to the races.
Well… not quite. Like most Arm parts from the East, Puya’s SDK has set them up for Keil MDK development. You can definitely go grab a free version of Keil MDK and get going immediately, but I wanted a VSCode / Cortex-Debug / GCC-based solution, which took a bit more fiddling. Using other GCC Arm Cortex project code I had laying around as a reference, I created GCC linker config and startup files for each part with a memory map and vector table based on the Keil linker config and startup files. With a bit more source-file mashing, I was able to pull in their STM32 HAL-like peripheral library and build a project.
I really only use two debug servers these days: Segger’s J-Link GDB Server (obviously for use with their Segger J-Link) and pyOCD (which I use with low-cost CMSIS-DAP probes). To get pyOCD working, I tried searching for the Puya pack, but it’s not indexed in the repositories that pyOCD searches, so I had to manually install it. I extracted the PY32 DFP (which is just a ZIP file), then I copied the pdsc to pyOCD’s cmsis-pack-manager’s pack folder inside the Python distribution I’m using (mine was in
LocalCache\Local\cmsis-pack-manager\cmsis-pack-manager). It’s important to rename the file to include the version number in it —
Puya.PY32F0xx_DFP.1.1.0.pdsc. I then copied the original whole DFP file to the same directory, following the same convention as all the other packs use — copying it to
Puya\PY32F0xx_DFP\1.1.0.pack. After that, to rebuild the index file to include the new pack, I ran:
PS C:\> pack-manager add-packs Puya.PY32F0xx_DFP.1.1.0.pdsc
I checked that pyOCD could now find the pack:
PS C:\> pyocd pack find PY32 Part Vendor Pack Version Installed ------------------------------------------------------------------ PY32F002Ax5 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F003x4 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F003x6 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F003x8 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F030x3 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F030x4 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F030x6 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F030x7 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F030x8 Puya Puya.PY32F0xx_DFP 1.1.0 True PY32F072xB Puya Puya.PY32F0xx_DFP 1.1.0 True
and then installed it:
PS C:\> pyocd pack install PY32F003x8
Getting J-Link support was actually less involved. Segger doesn’t have Puya PY32 support, but you can add custom device support by following directions here. I created a
JLinkDevices\Puya\PY32 folder structure, then added a Devices.xml file:
<Database> <Device> <ChipInfo Vendor="Puya" Name="PY32F002Ax5" WorkRAMAddr="0x20000000" WorkRAMSize="3072" Core="JLINK_CORE_CORTEX_M0" /> <FlashBankInfo Name="Internal code flash" BaseAddr="0x08000000" AlwaysPresent="1" > <LoaderInfo Name="Default" MaxSize="0x5000" Loader="PY32F0xx_20.FLM" LoaderType="FLASH_ALGO_TYPE_OPEN" /> </FlashBankInfo> </Device> </Database>
I then copied the FLM flash programming algorithm files out of Puya’s DFP into this folder. After doing that, I was able to see the device in J-Link’s target selection box, and everything worked flawlessly. I will note that you have to be on a relatively-recent version of J-Link for this to work, so make sure you’ve upgraded and actually have the correct version on your PATH.
I tested both Segger Ozone and VSCode (via Cortex-Debug) and was a little disappointed with how little effort it took to get going. It’s a Cortex-M0+, so debugging worked without a hiccup. Breakpoints, reading memory, yada-yada. I will say that while I was working through some low-power code, I noticed the SVD files provided by Puya were missing the PWR peripheral, so I had to manually add it. The SVD files are very bare-bones, with no descriptions or enumerations for things, but they seem to be accurate. Again, when compared to other parts in this price range, there’s no contest.
On parts with unique peripherals, I try to play with everything a bit before casting judgment — just to look for gaping holes or big-time “gotchas” — but since these are STM32-like peripherals, you kind of know what you’re getting.
One big difference between this and the STM32F030 is low-power support. The PY32F030 parts advertise a typical low-power stop mode current consumption of 4.5 µA when using a 1.0V setting with the internal regulator, and I was able to get the part to drop down to 5 µA in my testing. Meanwhile, the STM32F030 uses less than half of that in stop mode, and in run mode always outpaces the PY32F030 substantially: it doesn’t have the low-power internal 32 kHz oscillator the PY32 does, but it doesn’t seem to need it — when powered with its internal 8 MHz oscillator it uses less than half the current as the PY32 running off its internal 32 kHz one (while running 250 times faster). All told, the PY32F030 does not live in the sub-microamp ultra-low-power territory that you can hit with more expensive parts, but we’re still talking about a microcontroller that can be powered for several years with a minuscule CR2032 battery.
Puya chose to nearly copy the STM32 peripheral register API, as well as the STM32 HAL library. When I started with STM32 years ago, I wasn’t particularly offended by STM32’s HAL library, but as I’ve used more parts over the years, I find it more and more annoying to configure and use. On a part with only 20K of flash, these HAL libraries can eat into quite a bit of space, so if you’re using this part for real projects, you may find yourself using lighter-weight peripheral libraries to save space.
What about the CH32V003 or STM32F030?
The elephant in the room is the WCH CH32V003, an entry-level RISC-V chip widely covered in the electronics press that may or may not be available for around 10 cents. While people mention buying these parts from WCH’s official AliExpress store, at the time of writing, it appears that they yanked the part, and I can’t find them anywhere on AliExpress for less than $2 per chip, though it’s on Taobao for about 25 cents each.
It’s hard to find a direct cross between the CH32V003 and the PY32. In terms of flash/RAM/peripherals, the CH32V003 is very similar to the entry-level PY32F002A, which can be had for less than 10 cents in fairly modest volumes (though for most projects, you’d need more pins and would opt for the 14-cent QFN16 version). The Puya part has more flash and RAM, while the CH32V003 has an internal 2x PLL that runs the core twice as fast. The CH32V003 also has DMA, which is especially important when working with high-bit-rate communications interfaces on these low-speed devices. If you move up to the PY32F030 to match the clock speed and DMA of the CH32V003, you’ve jumped up considerably to 30 cents or more, but you’re also getting a part with four times more flash and RAM, plus 3 more timers and another USART.
My biggest gripe with the CH32V003 is the tooling: it uses a goofy one-wire debug interface requiring a proprietary debugger and a hacked-together custom version of OpenOCD (a software I detest). I think most WCH users just use MounRiver, an Eclipse-based IDE that comes preconfigured for WCH CH32 parts. This software feels painfully slow compared to VSCode and Ozone-based workflows. Some of these issues are WCH-specific, and some are more broadly a problem with RISC-V — no pyOCD support, no Cortex-Debug support, and spotty J-Link support, too. I haven’t looked into higher-end RISC-V cores, so I have no idea what kind of support there is for instruction trace and other advanced debugging tools, but it just seems like the RISC-V development ecosystem has a long way to go before it achieves parity with Arm.
What about Arm competitors? It’s no surprise these PY32 parts are cheaper than the STM32F030, though the direct cross — the PY32F030 — really isn’t that much cheaper. You can get STM32F030F4P6s from LCSC for about 50 cents each in moderate quantity. These don’t have the flash or RAM or 5V compatibility the PY32F030 does, but you get better low-power support and a bit less pain in initial project set-up (especially considering the STM32CubeMX compatibility). The Puya parts, however, offer way more varied package options. I’m not a big fan of TSSOP and LQFP packages, so it’s nice to get similarly-spec’d parts in tiny DFN and QFN packages. And if you can sacrifice some performance, you can get into the PY32 for much less than an STM32F030.
A while back I wrote a large blog post on the Padauk PMS150; the post ended up so long simply because the ecosystem was quite different than modern development environments. There were several-hundred-dollar emulators and programmers you had to buy, a weird IDE that used a subset C dialect that often looked more like assembly, and OTP memory that meant that one mistake could send you to the soldering rework station to pop off and discard the MCU you just accidentally ruined. Unless you’re making products in super high volumes, it really doesn’t make much sense to invest in the tools, time, and effort to learn that ecosystem.
The Puya PY32 couldn’t be more different. The entire time I worked through this project, I kept forgetting that I was using a part that is 20 times cheaper than what I’m used to using. It has the same sorts of peripherals you’d expect to find on any NXP, ST, or Microchip Arm part you’d buy off DigiKey. It has good English documentation, full compatibility with the debug probes you already own, and you can use whatever software ecosystem you prefer — Eclipse, Vim, VSCode, Embedded Studio, Keil, IAR, or nearly anything else.
Bottom line: the Puya PY32 ecosystem is just plain boring — and that’s a good thing.