0x52504A's site

I have been working on an ESP32-S3 based RC car with cam streaming (OV5640 camera). I'm limited though to around 5FPS, because when I bump the camera clock from 8MHz to around 20MHz, it hogs all the processing and the WiFi stack becames bursty.

The problem seems to be a camera task created on core 0 by default. Even when not capturing frames, the VSync-triggered DMA is enough to bog down the micro when the camera is clocked too high.

The task is also running at configMAX_PRIORITIES -2, it's nuts, tied with the WiFi. I guess it makes sense because the window for DMA is really short, but in any case I wanted to test if moving it to other core or at least lowering the priority would help me.

In my Arduino framework based project, I'm expecting to see:

Task	Core	Pri	Notes
ipc1	1	24	inter-processor calls, core 1
loopTask	1	1	runs setup() then loop()
IDLE1	1	0	core 1 idle
ipc0	0	24	inter-processor calls, core 0
wifi	0	23	main wi-fi task
cam_task	0	23	VSync DMA's for the camera, what we are debugging.
esp_timer	0	22	esp_timer dispatch
wifi_sta	0	23	station management (sometimes merged with wifi)
tiT	0	18	TCP/IP (lwIP) task
IDLE0	0	0	core 0 idle

The crux of the problem is that the #define CONFIG_CAMERA_CORE1 is fully ignored because Arduino-based projects come with a precompiled libesp32-camera.a binary, and they decided to compile it for core 0.

This leaves me with just a few options to change the core and priority:

I could figure out how to compile that myself and tell the linker to override the default .a file. Boring.
I could migrate the project to ESP-IDF. Painful.
I can hack the hell out of the .a binary. Fun. Not going to end up well but fun.

We are obviously going with option 3.

As a side note, I explored changing the core at runtime, but the FreeRTOS version that the ESP uses does not support those methods.

Tools

This is not a normal x86, ARM etc architecture, it's Xtensa. So we need their tooling:

AR=~/.platformio/packages/toolchain-xtensa-esp32s3/bin/xtensa-esp32s3-elf-ar
ARCHIVE=~/.platformio/packages/framework-arduinoespressif32/tools/sdk/esp32s3/lib/libesp32-camera.a
OBJDUMP=~/.platformio/packages/toolchain-xtensa-esp32s3/bin/xtensa-esp32s3-elf-objdump
AS=~/.platformio/packages/toolchain-xtensa-esp32s3/bin/xtensa-esp32s3-elf-as

Now that we have their toys, we can peek inside libesp32-camera.a and see what's inside:

LIB=~/.platformio/packages/framework-arduinoespressif32/tools/sdk/esp32s3/lib/libesp32-camera.a
$AR -t $LIB | grep cam

cam_hal.c.obj
esp_camera.c.obj
ll_cam.c.obj

We care about cam_hal.c, that's where the source was. Let's copy it into some tmp folder and we can work there:

# Quick reference for objdump flags:
# -t — symbol table: Lists all symbols (functions, variables) defined or referenced in the object file.
# -d — disassemble: Tells you what the code does.
# -r — relocations: Lists places where addresses need to be interpolated.
$AR -x ./libesp32-camera.a cam_hal.c.obj
$OBJDUMP -r cam_hal.c.obj | grep -i "PinnedToCore"

000000c8	R_XTENSA_32	xTaskCreatePinnedToCore
346	R_XTENSA_ASM_EXPAND	xTaskCreatePinnedToCore

You can read the whole objdump to verify this, but the first result is for .literal and can be ignored, the 0x346 one is the one for .text (the actual code) and the one we care about.

$OBJDUMP -d cam_hal.c.obj | grep -C 20 "346:"

...
30f:	ffaa86        	j	1bd <cam_config+0x1bd>
312:	000081        	l32r	a8, fffc0314 <cam_config+0xfffc0314>
315:	0008e0        	callx8	a8
318:	202aa0        	or	a2, a10, a10
31b:	013a16        	beqz	a10, 332 <cam_config+0x332>
31e:	000081        	l32r	a8, fffc0320 <cam_config+0xfffc0320>
321:	0008e0        	callx8	a8
324:	000021        	l32r	a2, fffc0324 <cam_config+0xfffc0324>
327:	0000b1        	l32r	a11, fffc0328 <cam_config+0xfffc0328>
32a:	1129      	s32i.n	a2, a1, 4
32c:	92a122        	movi	a2, 0x192
32f:	ffa286        	j	1bd <cam_config+0x1bd>
332:	05f8      	l32i.n	a15, a5, 0
334:	0add      	mov.n	a13, a10
336:	01a9      	s32i.n	a10, a1, 0
338:	0000c1        	l32r	a12, fffc0338 <cam_config+0xfffc0338>
33b:	0000b1        	l32r	a11, fffc033c <cam_config+0xfffc033c>
33e:	0000a1        	l32r	a10, fffc0340 <cam_config+0xfffc0340>
341:	30cff2        	addi	a15, a15, 48
344:	7e1c      	movi.n	a14, 23
346:	000081        	l32r	a8, fffc0348 <cam_config+0xfffc0348>
349:	0008e0        	callx8	a8
34c:	ff3806        	j	30 <cam_config+0x30>
...

To understand this, we need to first see the prototype of the function that is pinning the cam_task to core 0 (and setting the priority).

xTaskCreatePinnedToCore(
  cam_task,  // Arg 1: the task
  "cam_task", // Arg 2: debug label
  CAM_TASK_STACK, // Arg 3: stack size
  NULL,
  configMAX_PRIORITIES - 2, // Arg 5: Priority
  &cam_obj->task_handle,
  0  // Arg 7: The core!
);

The ABI forXtensa can get a bit tricky because of windowing, but the gist is that the first 6 arguments go into a10, a11, ..., a15, and the seventh goes into the stack (a1, it would be sp[0] in x86 lingo). Return values go to a0.

If you want, you can also RTFM. It helps with the opcodes.

With this information we can demistify the assembly a bit (see comments on the right):

...
30f:  ffaa86      j  1bd
312:  000081      l32r  a8, fffc0314
315:  0008e0      callx8  a8
318:  202aa0      or  a2, a10, a10
31b:  013a16      beqz  a10, 332       # Skips a few lines if a10=0
31e:  000081  J┌─ l32r  a8, fffc0320
321:  0008e0  U│  callx8  a8
324:  000021  M│  l32r  a2, fffc0324
327:  0000b1  P│  l32r  a11, fffc0328
32a:  1129    E│  s32i.n  a2, a1, 4
32c:  92a122  D│  movi  a2, 0x192
32f:  ffa286   └─ j  1bd
332:  05f8        l32i.n  a15, a5, 0
334:  0add        mov.n  a13, a10
336:  01a9        s32i.n  a10, a1, 0   # <- set stack[0] (the core) to a10 (must be a 0)
338:  0000c1      l32r  a12, fffc0338
33b:  0000b1      l32r  a11, fffc033c
33e:  0000a1      l32r  a10, fffc0340
341:  30cff2      addi  a15, a15, 48
344:  7e1c        movi.n  a14, 23      # Move 23 to arg 5 (priority!)
346:  000081      l32r  a8, fffc0348
349:  0008e0      callx8  a8           # actual call to pin the task
34c:  ff3806      j  30
...

Because of s32i.n running a a1[0] = a10, we can see that a10 must be 0, and that beqz is always jumping the bytes marked as "JUMPED". That area is free real state in case we want to inject any crap in there.
The priority can be easily changed, it's just one instruction and we can change the immediate value.
There are plenty more hacks possible, but those 2 should be good enough to let us test the patching.

Approach 1: Abuse the skipped bytes

We are skipping over them, so we may as well noop the beqz and add there whatever we want. In our case, a10 = 1 (set the core to 1).

We don't even need to pad with nops, we could just jump when done:

31b:  nop nop nop     ← 3 bytes (was beqz a10, 332)
31e:  movi.n a10, 1   ← 2 bytes (fixes issue)
320:  j 332           ← 3 bytes (continue normal flow)

What?

If you don't want to RTFM again, you can quickly assemble to see the opcodes:

cat > /tmp/patch.s
.text
.begin no-transform # or the compiler plays it smart and prefers short instructions
nop
nop.n
nop.n
movi.n a10, 1
j 332
.end no-transform

$AS /tmp/patch.s -o /tmp/patch.o && $OBJDUMP -d /tmp/patch.o

/tmp/patch.o:     file format elf32-xtensa-le


Disassembly of section .text:

00000000 <.text>:
   0:	0020f0        	nop
   3:	f03d      	nop.n
   5:	f03d      	nop.n
   7:	1a0c      	movi.n	a10, 1
   9:	000006        	j	0xd

now that we have an opcode cheatsheet, and taking into account that this architecture is little endian, we can plan the patch (I'm going with just nopping instead of jumping):

         beqz a10, 332 = 013a16
               ↓
            ┌─────┐ ┌─────┐┌─────┐ ┌─────┐┌─────┐ ┌──┐ ┌─────┐┌─────┐ (....normal flow)
Original: 2016 3a01 8100 00e0 0800 2100 00b1 0000 2911 22a1 9286 a2ff f805 dd0a a901 c100
Patched : 200c 1af0 2000 3df0 3df0 3df0 3df0 3df0 3df0 3df0 3df0 3df0 f805 dd0a a901 c100
            └───┘└─────┘ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ (....normal flow)
           movi.n  nop   nop  nop  nop  nop  nop  nop  nop  nop  nop
           (1ac0) (long,
                  0020f0)

Where?

To figure out where the bits are, the "canonical" approach is readelf + computing the offset:

$READELF -S cam_hal.c.obj  | grep cam_config

[42] .text.cam_config  PROGBITS        00000000 000704 00034f 00  AX  0   0  4

beqz was at 0x31b, so adding that 0x704 we get absolute position 0xa1f. Let's check the hexdump:

xxd -s 0xA1F -l 100 cam_hal.c.obj

00000a1f: 163a 0181 0000 e008 0021 0000 b100 0029  .:.......!.....)
00000a2f: 1122 a192 86a2 fff8 05dd 0aa9 01c1 0000  ."..............
00000a3f: b100 00a1 0000 f2cf 301c 7e81 0000 e008  ........0.~.....
00000a4f: 0006 38ff 0036 4100 8100 000c 1ba8 0881  ..8..6A.........
00000a5f: 0000 e008 001d f000 0036 4100 8100 00b8  .........6A.....
00000a6f: 080c 08c2 2b12 c718 13e0 9811 8a99 a89b  ....+...........
00000a7f: d099 119a                                ....

Which works but it's super annoying because of endianness. Instead of 013a16 (beqz), 000081, 0008e0, 000021, … we get 163a01, 810000, etc. But the data is there.

But honestly there is enough entropy, just find the bytes in the binary lol. Even just that instruction works, you can get fancier and deduplicate but here it works fine.

bytes = File.read('cam_hal.c.obj', encoding: 'ASCII-8BIT')
beqz = [0x01, 0x3a, 0x16]
pattern = beqz.reverse.pack('C*')

puts bytes.index(pattern).to_s(16)

a1f

How?

Now we just patch (i.e. do a search and replace in the binary string):

bytes = File.read('cam_hal.c.obj', encoding: 'ASCII-8BIT')
beqz = [0x01, 0x3a, 0x16]
short_nop = [0xf0, 0x3d]
long_nop = [0x0, 0x20, 0xf0]
mov_1_to_a10 = [0x1a, 0x0c]

original_pattern = [
  beqz, [0x00,0x00,0x81], [0x00,0x08,0xe0], [0x00,0x00,0x21], [0x00,0x00,0xb1],
  [0x11,0x29], [0x92,0xa1,0x22], [0xff,0xa2,0x86]
].map(&:reverse).flatten.pack('C*')

new_pattern = (
  [] + mov_1_to_a10.reverse + long_nop.reverse + (short_nop.reverse * 9)
).pack('C*')

new_bytes = bytes.sub(original_pattern, new_pattern)

File.write('cam_hal.c.patched.obj', new_bytes)

This script is simplified, you need to be careful of finding only 1 hit in the binary, in case of coincidences, but this is the idea. Careful with the endianness.

And?

Of course it doesn't work, did you think it would be easy?

Linking .pio/build/esp32s3-psram-opi/firmware.elf
~/.platformio/packages/framework-arduinoespressif32/tools/sdk/esp32s3/lib/libesp32-camera.a(cam_hal.c.obj): in function `cam_config':
/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/components/esp32-camera/driver/cam_hal.c:402:(.text.cam_config+0x31b): dangerous relocation: movi.n: cannot encode: (.text.cam_config+0x332)
/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/components/esp32-camera/driver/cam_hal.c:402:(.text.cam_config+0x31e): dangerous relocation: cannot decode instruction opcode: (.literal.cam_config+0x38)
/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/components/esp32-camera/driver/cam_hal.c:402:(.text.cam_config+0x324): dangerous relocation: unexpected relocation: (.literal.cam_config+0x2c)
/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/components/esp32-camera/driver/cam_hal.c:402:(.text.cam_config+0x327): dangerous relocation: unexpected relocation: (.literal.cam_init+0x4)
/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/components/esp32-camera/driver/cam_hal.c:402:(.text.cam_config+0x32f): dangerous relocation: unexpected relocation: (.text.cam_config+0x1bd)
collect2: error: ld returned 1 exit status
*** [.pio/build/esp32s3-psram-opi/firmware.elf] Error 1
================================================================ [FAILED] Took 8.70 seconds ================================================================

The basic thing that we have broken for sure is that the linker is still going to the beqz to try to fix the placeholder relative address in there with the actual final one in the binary, and instead of a beqz it finds a mov (which takes no addresses) and freaks out. You could just go to the linker entries and remove the offending ones, but it's too much work. Instead I tried another approaches.

Approach 2, changing just the priority

We have an easy movi.n a14, 23 (7e1c) that we can change to whatever we want.

Using the assembler (or reading the encoding in the ISA doc), we can see that changing 7e1c by ae0c should change the instruction to store 10 instead of 23 in a14.

It actually works, as confirmed by logging with uxTaskPriorityGet! But sadly it doesn't work very well, the camera still bogs down the WiFi. I guess whatever interrupts is generating are independent of the task etc, or maybe it causes interference in some other way.

Approach 3, change other instructions

Back to changing the core. We have other places to exploit. The tricky bit is that there is no immediate "set a1[0] to my value", we need to set it to another register. So we need to find 2 instructions to change.

The other one that we can abuse is the or above. It's just setting a2 to a10 || a10 (abusing or as a mov, a Xtensa optimization), and a2 is just used inside the branch that we are never taking!

Well, the branch would be taken in case of an error, but we are rockstars and we don't have errors.

In summary,

...
30f:  ffaa86      j  1bd
312:  000081      l32r  a8, fffc0314
315:  0008e0      callx8  a8
318:  202aa0      or  a2, a10, a10     #(CHANGE) Abuse this: a2=1 instead
31b:  013a16      beqz  a10, 332       # Will skip, don't touch
31e:  000081  J┌─ l32r  a8, fffc0320
321:  0008e0  U│  callx8  a8
324:  000021  M│  l32r  a2, fffc0324
327:  0000b1  P│  l32r  a11, fffc0328
32a:  1129    E│  s32i.n  a2, a1, 4
32c:  92a122  D│  movi  a2, 0x192
32f:  ffa286   └─ j  1bd
332:  05f8        l32i.n  a15, a5, 0
334:  0add        mov.n  a13, a10
336:  01a9        s32i.n  a10, a1, 0   #(CHANGE) (core arg) stack[0]=a10 → stack[0]=a2
338:  0000c1      l32r  a12, fffc0338
33b:  0000b1      l32r  a11, fffc033c
33e:  0000a1      l32r  a10, fffc0340
341:  30cff2      addi  a15, a15, 48
344:  7e1c        movi.n  a14, 23
346:  000081      l32r  a8, fffc0348
349:  0008e0      callx8  a8           # actual call to pin the task
34c:  ff3806      j  30
...

We can assemble again and find that we need the instructions

01a022, movi a2, 1, and
0129, s32i.n a2, a1, 0

Whipping out a quick ruby script again,

bytes = File.read('cam_hal.c.obj', encoding: 'ASCII-8BIT')

OBJDUMP = "~/.platformio/packages/toolchain-xtensa-esp32s3/bin/xtensa-esp32s3-elf-objdump"

orig = %w[202aa0  013a16 000081 0008e0 000021 0000b1 1129 92a122 ffa286 05f8 0add 01a9]
orig = orig.map { |x| x.scan(/../).map {|hx| hx.to_i(16)}.reverse }.flatten.pack('C*')

new = %w[01a022 013a16 000081 0008e0 000021 0000b1 1129 92a122 ffa286 05f8 0add 0129]
new = new.map { |x| x.scan(/../).map {|hx| hx.to_i(16)}.reverse }.flatten.pack('C*')

new_bytes = bytes.sub(orig, new)

File.write('cam_hal.c.patched.obj', new_bytes)

`#{OBJDUMP} -d cam_hal.c.obj > /tmp/orig`
`#{OBJDUMP} -d cam_hal.c.patched.obj > /tmp/new`

puts `diff /tmp/orig /tmp/new`

2c2
< cam_hal.c.obj:     file format elf32-xtensa-le
---
> cam_hal.c.patched.obj:     file format elf32-xtensa-le
734c734
<  318:	202aa0        	or	a2, a10, a10
---
>  318:	01a022        	movi	a2, 1
745c745
<  336:	01a9      	s32i.n	a10, a1, 0
---
>  336:	0129      	s32i.n	a2, a1, 0

Cool, perfect. Repack all files back (everything extracted using $AR x libesp32-camera.a, now just $AR rcs *obj, replacing cam_hal.c.obj with the patched version), get new lib and place it on the Arduino library.

It crashes spectacularly, with fireworks and stuff:

E (4399) camera: Camera config failed with error 0x1
E (4400) gdma: gdma_disconnect(299): no peripheral is connected to the channel
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x42003932  PS      : 0x00060e30  A0      : 0x820037f2  A1      : 0x3fcebf10
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3fc9c6d0  A5      : 0x3fc9cb90
A6      : 0x4208a58c  A7      : 0x00ffffff  A8      : 0x82003930  A9      : 0x3fcebef0
A10     : 0x00000000  A11     : 0x3fcebf18  A12     : 0x3fc9e000  A13     : 0x00000004
A14     : 0x00000005  A15     : 0x00000003  SAR     : 0x00000016  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000002  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xfffffffb


Backtrace: 0x4200392f:0x3fcebf10 0x420037ef:0x3fcebfa0 0x4200a486:0x3fcebfd0




ELF file SHA256: 8acacbf099a9c19d

I guess it really needs to be in core 0, no idea. In any case it was good fun.

--EOF