Freescale (NXP) KL03

The KL03 is typical entry-level ARM Cortex-M0+ fair:

  • 48 MHz ARM Cortex-M0+
  • 8 KB of flash
  • 2 KB of SRAM
  • QFN-16 package with 14 I/O
  • 12-bit 818 ksps ADC with 4 channels
  • Dedicated 8KB ROM with SPI, I2C, and UART bootloaders built-in
  • 5 timer channels, including 4 channels of PWM
  • Separate UART, SPI, and I2C modules
  • Analog comparator
  • On-chip 48 MHz, 8 MHz, and 1 kHz oscillators
  • Lots of low-power modes, down to about 1.5 µA with SRAM retention and an RTC running

Development Environment

While you can use Kinetis Design Studio to develop with the KL03, NXP is pushing its own MCUXpresso environment pretty hard, so that’s what I used for evaluating this microcontroller. MCUXpresso is an Eclipse Neon-based IDE

While MCUXpresso doesn’t have a built-in config tool, the MCUXpresso Config Tools (separate application) works well. Too bad it integrates so poorly with MCUXpresso IDE.

Code Gen Tools

The old way of doing code-gen on Kinetis was with Processor Expert — the new recommended way is using MCUXPresso Config Tools, a stand-alone lightweight program that can configure clocking, pin muxing and…. well, that’s about it. I was surprised how bare-bones it was; not even letting you initialize peripherals.

And some of the stuff it claims to do, it’s just lying about. There’s a nice little pane below the processor top-view that seems to allow you to set up GPIO pins with their direction, pull-up, drive strength, and even labeling. I spent a good 30 minutes trying to figure out why I couldn’t get my little LED to blink when calling GPIO_WritePinOutput(). Turns out the GPIO_PinInit() function was never generated by MCUXpresso Config Tools. Whoops! That’s OK, though — engineers are pretty cheap to pay to sit around and troubleshoot this stuff.

tools_2017-07-25_21-54-24
MCUXpresso Config Tools is such a trickster — it gives you options to label and set the direction of GPIO pins. Don’t spend too much time diligently working through this, though: the tool doesn’t actually use this information to generate any GPIO initialization routines. You’ll be doing that yourself. Why is this in the tool? Who knows.

While other vendors make stand-alone config tools that can generate project files for a wide variety of IDEs that these vendors don’t even make, MCUXpresso Config Tools can’t even manage to generate a config file for its own IDE. When you click the “generate project” button in MCUXpresso Config Tools, it generates some weird xml/gen/mex file collection that has nothing to do with MCUXpresso IDE’s project file format.

If you want to bring this into MCUXpresso IDE, you have to click the “Import SDK example(s)…” button in MCUXpresso. “Wait, SDK examples? I thought we were importing a code-generated project?” — yeah, but, in NXP’s twisted universe, there’s no difference. Choose a board (“why?” — I don’t know), and then click Next. Ignore everything on the page except for the little, unlabeled icon with an arrow pointing into a box — this is how you import your code-gen project.

Forget to mux a pin correctly? No problem, just head back to MCUXpresso Config Tools and fix the configuration. When you’re done, you’ll see a shiny button that says “Update Project” — go ahead and click it, then go back to MCUXpresso IDE and notice that absolutely nothing happened. When you imported the “SDK Example(s)” you really just copied them. So the only way to update the project is to go through the whole process again, and then rewrite all your code.

This is an utter disaster.

I ended up exporting the “generated project” into a subfolder in my MCUXpresso IDE project; I had to clear out the “src” folder and configure the Paths and Symbols Properties to include the header and source folders inside the generated folder. It’s a total hack, and means that it takes quite a bit of time to start a new project.

By the way, if you’re shuffling around between multiple projects, be careful: MCUXpresso Config Tools will remember the last place you generated the project; and this gets retained even when you open different mex files.

It’s absolutely incredible that Freescale has gone from the company with the best code-gen tool out there, to a company that produces absolutely worthless tools that have to be hacked together to get anything done. Imagine being a first-time ARM microcontroller user, trying to navigate this mess?

Acrobat_2017-07-25_17-50-46
Just how bad is the Kinetis SDK documentation, you ask? This is the only “complete” code example for the UART peripheral, and the functions that it calls — SendDataPolling() and ReceiveDataPolling() — don’t even exist in this version of the API. Whhops.

Kinetis SDK

By now, you’ve realized that the code-gen tools are really just clock and pin-muxing. If you want to actually do anything with those peripherals you just clocked and pin-muxed, you’ll hvae to use the Kinetis SDK, which is a runtime peripheral library.

KSDK is a light-weight C API that consists of thin-layer APIs that each sit on top of the individual peripheral modules. This is a library that’s built for people who have good, fundamental understanding of Kinetis peripherals. KSDK feels like a knee-jerk counterpoint to Processor Expert; it’s jarring to use if you come from that development environment, since all the sudden, you’re going to have to have reference manuals on your screen at all times, and you’ll be constantly reading KSDK functions to see precisely what they do, and if they’re suitable to call.

Consequently, KSDK has an extremely steep learning curve.

For people who are interested in getting their projects plumbed quickly so they can focus on the application itself, KSDK is not for you. Each peripheral library module is confined solely to that peripheral, which wouldn’t be a problem on an 8-bit MCU, but on a modern ARM microcontroller with complex clocking requirements and power gating, you’re actually going to be juggling several different KSDK modules just to get one peripheral working.

As an example, to initialize the LPUART (low-power UART), you need to:

[code language=”cpp”]
CLOCK_SetLpuart0Clock(1U); /* Set LPUART0 clock source. */

PORT_SetPinMux(PORTB, PIN1_IDX, kPORT_MuxAlt3); /* PORTB1 (pin 13) is configured as LPUART0_RX */
PORT_SetPinMux(PORTB, PIN2_IDX, kPORT_MuxAlt3); /* PORTB2 (pin 14) is configured as LPUART0_TX */

SIM->SOPT5 = ((SIM->SOPT5 &
(~(SIM_SOPT5_LPUART0RXSRC_MASK))) /* Mask bits to zero which are setting */
| SIM_SOPT5_LPUART0RXSRC(SOPT5_LPUART0RXSRC_LPUART_RX) /* LPUART0 Receive Data Source Select: LPUART_RX pin */
);

lpuart_config_t user_config;
LPUART_GetDefaultConfig(&user_config);
config.baudRate_Bps = 9600;
config.enableRx = true;
config.enableTx = true;
LPUART_Init(LPUART0, &config, CLOCK_GetFreq(SYS_CLK));
LPUART_WriteBlocking(LPUART0, "Hello world", sizeof("Hello world"));
[/code]

Notice that we’re using four separate peripheral calls: CLOCK, PORT, LPUART, and SIM (and for SIM, we’re not even using KSDK APIs — just interacting with the raw registers).

Unlike other peripheral libraries, KSDK doesn’t attempt to manage hairball stuff for you. No, you can’t just tell it that you want to use B2 as a UART transmit pin; you need to read in the processor datasheet which alt mode corresponds with that function, and pass that along to KSDK.

 

Full disclosure: I am not a fan of this style of peripheral library. To me, it feels like KSDK is, itself, just lazy. Like, Freescale doesn’t want to take the time to write up all this meticulous crap that has to go on under the hood to be able to abstract this stuff. Maybe they got a little gun-shy after trying to maintain Processor Expert for their ever-changing list of products.

Still, I feel like it wouldn’t have to be this way. Imagine a hypothetical re-imagining of the LPUART initialization routine:

[code language=”cpp”]
const lpuart_config_t user_config = {
.baudRate_Bps = 9600,
.enableRx = true,
.enableTx = true,
.txPin = Pin_B1, // mux pin B1 as TX
.rxPin = Pin_B2, // mux pin B2 as RX
};
LPUART_Init(LPUART0, &config);
LPUART_WriteBlocking(LPUART0, "Hello world", sizeof("Hello world"));
[/code]

Here, the lpuart_config_t struct has all-zero “defaults” — which means parameters like “number of stop bits” or “parity” or “data bit count” all have to have “0” correspond with their defaults. Suddenly, enums would not relate directly to the underlying hardware’s bits, but rather, simply convenient abstractions.

This would also mean that LPUART would have to have knowledge of the PORT module, and then tell the PORT module to mux the pins in the desired way. Look-up tables would have to be generated for each processor so that the peripheral runtime would know that Pin_B1 has a TX alt function setting of “Alt3” — and LPUART would have to ask CLOCK what the system frequency is.

Some would argue that this would “reduce performance” — however, with modern compiler optimizations, I believe all of this code would get compiled into const register writes anyway.

I think meticulous, detail-oriented people who want to rack up a lot of billable hours will have no complaints about KSDK, but personally, it makes me feel like a human compiler. I was trying to get three channels of PWM going (spread across the two TPM modules on the MCU). Here’s the code I had:

[code language=”cpp”]
// Timer Config
tpm_config_t tpmInfo;
TPM_GetDefaultConfig(&tpmInfo);
TPM_Init(TPM0, &tpmInfo);

uint8_t updatedDutycycle = 0U;
tpm_chnl_pwm_signal_param_t tpmParam[2];

tpmParam[0].chnlNumber = kTPM_Chnl_0;
tpmParam[0].level = kTPM_LowTrue;
tpmParam[0].dutyCyclePercent = 0U;

tpmParam[1].chnlNumber = kTPM_Chnl_1;
tpmParam[1].level = kTPM_LowTrue;
tpmParam[1].dutyCyclePercent = 0U;

TPM_SetupPwm(TPM0, tpmParam, 2, kTPM_EdgeAlignedPwm, 1000, CLOCK_GetFreq(SYS_CLK));
TPM_SetupPwm(TPM1, tpmParam, 1, kTPM_EdgeAlignedPwm, 1000, CLOCK_GetFreq(SYS_CLK));
TPM_StartTimer(TPM0, kTPM_SystemClock);
TPM_StartTimer(TPM1, kTPM_SystemClock);
[/code]

When I set-up TPM1, KSDK called something that was hard-faulting the processor. The Freescale nerds reading this will quickly notice the problem, but it probably took me two hours to work through.

This is the stuff I thought I was done with — I thought we figured out how to make peripheral initialization and configuration just work, so we can focus on the interesting, application-specific problems we love solving. Why are we getting paid hundred(s) of dollars an hour to be human compilers?

Documentation

The KSDK documentation is essentially just an API reference sheet — list of the functions, their parameters, and return types. Brief, one-sentence explanation of what the function does. A conservative smattering of example snippets of code is sprinkled throughout, but it’s usually not much help.

Instead, KSDK leans heavily on “peripheral driver examples” — a collection of projects you get to import into your workspace, load on the board to see what they do, and then step through the code, line-by-line, trying to make sense of everything. You then copy and paste this junk into your own project, hack away at it a bit, and hope for the best.

 

I don’t want a directory full of code examples that are spread out across a dozen initialization files; I want a simple, concise set of instructions in the Kinetis SDK manual that instruction me how to configure each of the peripherals to do all of their supported tasks. Sure — it’s a lot of work to put that together, but many other manufacturers have taken the time; why can’t Freescale?

Acrobat_2017-07-25_18-36-58.png
The documentation for the Framing Error (FE) flag instructions you to clear the NF flag by writing a logic one to the NF. Hmm, but how do I clear the FE flag?

Honestly, since KSDK is just a wrapper around the underlying peripherals, your main source of documentation isn’t going to be your peripheral library — it’s going to be the reference manual. The KL03 family reference manual is little more than a list of registers, with a few pictures illustrating the peripheral in use. No code examples, configuration procedures, or any other extraneous information is presented. While I was briefly looking through it, I noticed several errors. I went to consult the “Framing Error Flag” (FE) documentation to figure out how to clear the flag (write a 0? write a 1? something else?) and the only thing it tells me is “To clear NF, write logic one to the NF” — hmm OK, I came here to learn about the FE flag, not the unrelated NF flag, but thanks for the tip! I’d consider this documentation to be much worse than average.

Acrobat_2017-07-25_18-54-38.png
Why the hell is there a 0.1 µF capacitor across this RX pin?!

Development Tools

The FRDM boards were Freescale’s attempt to compete with all the low-cost development tools — unfortunately, they suffer the same “evaluation kit” marketing bloat that other dev boards do: yes, you get a microcontroller and a debugger built-in. But you also get:

  • An RGB LED
  • A MMA8451Q accelerometer
  • Some random thermistor thing
  • Plenty of random pull-ups, debouncing filters, push-buttons, and mysterious zero-ohm switch resistors everywhere.

In fact, of the 22 I/O pins on this board, only three of them are completely free of extraneous circuitry: A5, A6, and A7.  This means that you have to be extremely careful when trying to do anything on this dev board other than running the pre-programmed demos.

For example, while you can certainly send and receive data from the built-in USB UART at 9600 baud, don’t think that will translate to higher bit rates, or arbitrary transmitters — there’s a whopping 0.1 µF capacitor across the RX pin, which will completely destroy any high-speed communication you want to do. Why did they put that capacitor there? Because all the pins are heavily multiplexed, and in some other universe, that pin is used as a Because in some alternate-function universe, that pin is used as a VREF capacitor.

This is the kind of stuff that absolutely kills productivity — I spent a good 30-45 minutes going back and forth between my computer and my workbench, removing zero-ohm resistors to try to isolate stuff.

And the problem is that even when they do design something properly on the board, you don’t know that they did, because it’s not in the documentation (or if it were, it would take too long to find and read). This is a dev board you can’t trust. When I couldn’t get my transmitting code to work right, I started popping off the zero-ohm switch resistors on the board that connected the target MCU to the on-board USB-to-serial converter. That didn’t fix the problem (since Freescale made the USB-to-UART buffer tri-stated when the port isn’t open on your computer, it won’t interfere with things), but I couldn’t trust it. There’s nothing more maddening than that.

Debugging

Just like the ST-Link and a few other vendors’ on-board debuggers, the FRDM boards support running an alternative J-Link firmware, in addition to the PE Micro OpenSDA debugger firmware. I’m not really sure why you wouldn’t immediately upgrade every FRDM board you ever buy with the J-Link firmware, as it’s extremely fast, reliable, and supports unlimited breakpoints (“for evaluation only” — whatever that means?).

To swap out firmware, unplug the FRDM board and while holding down the boot button next to the USB connector, plug the board in again. Download the OpenSDA V1 version of the J-Link firmware image and copy it onto the USB MSD that appears on your computer when the FRDM board is in bootloader mode.

It took 5.5 seconds to start a debug session and break on the main() function. Debugging in MCUXpresso is completely average for an ARM development environment, with very few bells and whistles. Many of the MCUXpresso-specific debugging features only work with LPC debuggers, and not the FRDM boards from the Freescale acquisition. Strangely, it’s not clear that NXP even wants you to be in debug sessions. This is an Eclipse-based IDE, yet when you start debugging, it keeps you in the developer perspective; if you click to switch to the Debug perspective so that you can look at memory, inspect and change variables, set breakpoints, view the disassembly, look at the call stack — you know, unimportant stuff like that — MCUXpresso will dump you back in the Developer perspective immediately. You can change it in the preferences menus, but why is this the default?

Performance

As annoying as the development ecosystem can be, it has to be said that this chip is fast. For the biquad test, I set GCC optimization options to -O2 and also ensured it wasn’t doing loop unrolling. The KL03 achieved a throughput of 1.542 MSPS, drawing 5.21 mA of power for an efficiency of 10.45 nJ/sample.

FGPIO uses a weird 1-cycle store (versus regular 2-cycle STR accesses). Coupled with a two-cycle Cortex-M0+ unconditional branch instruction, this should yield a 3-cycle bit-wiggle. However, as is often the case with fast MCUs, we’re limited by our flash memory controller, which only has a 24 MHz clock.

Out of the gate, the MCU can only execute the bit-wiggle program with 6 clock cycles, due to default caching behavior with the flash memory (essentially handicapping the core to 24 MHz in this particular operation). But when prefetch speculation is enabled for both data and instruction memory, this decreases to a 4-cycle bit-wiggle.

Oddly, this is still a cycle longer than it should be. I’ve verified if I run the flash and CPU clock at the same rate (24 MHz), this 4-cycle period is reduced to 3-cycle (albeit at 24 MHz, not 48). In my opinion, this indicates that the flash prefetch is unable to cache even the shortest loops. I’m not sure if there’s additional steps I need to take to decrease the speed to 3 cycles, but Kinetis SDK doesn’t provide any mechanism for further configuration, and the reference manual doesn’t have any additional hints. This is very strange, as the KL03’s sister chip in this review, the KE04, executes this same code in 3 cycles @ 48 MHz.

DMX-512 Receiver

I struggled over the course of a few nights to get this working on the KL03. While Processor Expert provides highly customizable callbacks and methods, KSDK has two extremes: low-level functions that manipulate registers, and super high-level functions that have precisely one way of working. The DMX-512 receiver project was a perfect illustration of the doughnut hole in the middle that other MCUs’ code generator / peripheral libraries provide; KSDK has no high-level callback mechanism for anything other than vanilla UART transactions; there’s no API to register a callback on a framing error, which is the “hack” that makes DMX-512’s start-of-frame receivable with nothing more than a UART.

Because KSDK didn’t have this mechanism, the only way for me to implement it was by implementing an LPUART0_IRQHandler() function. If that looks like a low-level interrupt routine that would be directly called by the processor, that’s because it is. I think my knowledge gap was in forgetting that I needed to study the reference manual thoroughly to ensure I was babysitting the UART when necessary; in this case, that meant clearing any exception flags (framing errors, overrun errors, etc) inside the interrupt. KSDK has functions to do some of this stuff, but everything goes back to the fact that there’s essentially no documentation, so it’s challenging to figure out precisely what needs to be done, unless you back up and read the reference manual directly.

After two nights of struggling, I finally got the UART and the timer working properly. While the KL03 blazed through the other performance measures, it struggles with this one; the lowest clock speed I could get the microcontroller down to was a 4 MHz speed all around (core, bus, flash, peripherals). Anything lower than that, and the microcontroller started missing bytes from the DMX frame. Just like with all the other microcontrollers, I put the KL03 in a “wait” mode in the main loop to reduce power consumption; all said, the KL03 pulled 1.31 mA during the DMX receiving. While I disconnected the MCU’s own VDD supply for current measurement, I haven’t finished unsoldering all the crap on this board — these power measurements might be affected by that, but only marginally (this 1.31 mA figure is similar to what the datasheet quotes).

Code size could become a huge issue on “real” projects — this DMX-512 project was compiled for -Os, with link-time optimization enabled as well, and it used 5284 bytes of flash (65% of our 8KB part) and 964 bytes of SRAM (47% of our 2K of RAM). I understand a lot of this is startup code and reusable peripheral code, but this could really handicap you down the road.

Closing Thoughts

I was excited to try out the KL03 to see what Freescale has been up to. The first ARM microcontroller I ever worked on was a Freescale part, and I purchased a ton of their dev boards, and laid down microcontrollers from all over their catalog. My how things have changed. In many ways, Freescale seems to be aligning the dev tools to be more similar to how other ARM vendors are doing things — with run-time peripheral libraries that you pass big config structures into to get up and running. This was the CMSIS idea several years ago that has never fully been realized at the peripheral level.

At the same time, most other ARM vendors are actually moving toward what Freescale is walking away from with Processor Expert. ST has STM32Cube, Infineon has DAVE, Atmel has START (and ASF), Cypress has PSoC Creator.

I wouldn’t recommend this part to someone getting started with ARM microcontrollers, since there’s so much else out there in the same price range that has better peripherals, a more sane development environment, and development tools that have better out-of-the-box productivity.

At the same time, if you’re working on an ultra-low-power project that needs some computing power, and you don’t mind spending quite a bit of time learning a new platform at a fairly deep, fundamental level, this is a tough microcontroller to beat. It has some of the best compute-per-nJ performance I’ve evaluated, and it also has excellent deep sleep capabilities.