How fast can I read Arduino port D?

Question

I'm trying to see how fast I can read an Arduino IO port. It's being toggled with an random speed /white square wave and I want to see which is faster, the Arduino or the signal. The pin on port D is driven by an open collector comparator (LM311).

So I have this code


buffer[0] = PORTD;
buffer[1] = PORTD;
buffer[2] = PORTD;
buffer[3] = PORTD;
buff...

...fer[28] = PORTD;
buffer[29] = PORTD;

that I can read the port in burst mode. This code is as fast as I can write I think. I've unrolled a possible loop here. buffer[] is a local variable as that seems faster. I'm not an assembly expert so I can't really browse the assembly instructions.

I've added a 500 Ohm pullup resistor to the input pin that I'm interested in. A lower value seems to give more randomly distributed values that a 10k resistor.

I tried timing buffer[0] ... buffer[29] readings and that returns 4 microseconds with interrupts off. I hope that micros() doesn't need interrupts to function! That would equate to 7.5 million readings /sec. That's about 2 clock cycles per instruction on my 16 MHz board. Can this be right?

How fast can this code run?

Turned interrupts off, recorded micros() before reading port 30 times, then print micros() - micros before. Does that work? — Paul Uszak
– Paul Uszak, Commented Nov 10, 2016 at 16:22
I don't trust software, I always use a scope and a GPIO to get actual timing. — Ale..chenski
– Ale..chenski, Commented Nov 10, 2016 at 19:33

Tom Carpenter · Accepted Answer · 2018-07-27 08:33:43Z

4

Your timing seems reasonable.

At a rough guess, without actually compiling it, your code will turn into something along the lines of:

IN R24,PORTD  ;First read   - temp = PORTD     - 1 cycle
ST X+, R24    ;First store  - buffer[0] = temp - 2 cycle
IN R24,PORTD  ;Second read  - temp = PORTD     - 1 cycle
ST X+, R24    ;Second store - buffer[1] = temp - 2 cycle
...

This is indeed the fastest way of reading the port if you want to keep the data. Basically it will do an IO read from PORTD using the IN instruction which takes 1 clock cycle. Then it will perform a store to SRAM using the STS instruction which takes another 2 clock cycles. The store instruction can also post increment the pointer it is using (stored in the R28/R29 X-register pair in my example) for free which saves you time.

So basically that should do 3 clock cycles for every read - one to do the read, and one to do the store, or >5 million reads per second from a 16MHz clock.

There will additionally be a couple of cycles at the beginning to set X-register to point to the buffer. If the buffer is a constant pointer - i.e. the buffer has a fixed address, it will only take two instruction cycles at the beginning to load the address. If it is a non-constant address, it will take up to four as it has to load the address from the SRAM.

edited Jul 27, 2018 at 8:33

answered Nov 10, 2016 at 13:49

Tom Carpenter

4802 silver badges8 bronze badges

I'm surprised that a succession of buffer[const] = ... assembles to ST X+, I would have expected I'd need to use buffer[var++] = ... . I just ask as I'm going to want to do similar 'fast' things myself shortly

Neil_UK
– Neil_UK

2016-11-10 14:10:04 +00:00
Commented Nov 10, 2016 at 14:10
@Neil_UK GCC is quite clever when it wants to be. It will work out that each of the writes corresponds to a series of incrementing addresses and convert it into ST X+ instructions. If you leave a gap, say miss buffer[10], it will probably keep the same optimisation but add an additional ADIW X,1 in the gap rather than recalculating the address.

Tom Carpenter
– Tom Carpenter

2016-11-10 14:29:01 +00:00
Commented Nov 10, 2016 at 14:29
Yeh, I too was wondering about the array indexing. I thought that it might take more processing. One odd thing is that the readings change with the value of the pull up resistor. With a higher value you get more "0" in the results for the same signal. Probably one of those inductance things which probably becomes apparent when reading at 8 MHz. This is probably getting a bit fast for the general plug it in and hope Arduino style of design. 500 Ohms gives an evenish number of 1s and 0s.

Paul Uszak
– Paul Uszak

2016-11-10 16:33:44 +00:00
Commented Nov 10, 2016 at 16:33
st X+, r24 takes two cycles, so that's 3 cycles for every read.

Edgar Bonet
– Edgar Bonet

2018-07-27 08:21:37 +00:00
Commented Jul 27, 2018 at 8:21
@EdgarBonet so it is. Only 1 cycle on ATTiny devices though :). I was using this but they have since updated it with a few corrections.

Tom Carpenter
– Tom Carpenter

2018-07-27 08:34:12 +00:00
Commented Jul 27, 2018 at 8:34

| Show 1 more comment

Edgar Bonet · Accepted Answer · 2018-07-27 09:11:17Z

I just tried compiling your code with avr-gcc 4.9.2:

buffer[0]  = PORTD;
buffer[1]  = PORTD;
...
buffer[29] = PORTD;

Here is what I got:

in  r24,    0x0b  ; temp = PORTD     – 1 cycle
sts 0x0110, r24   ; buffer[0] = temp – 2 cycles
in  r24,    0x0b  ; temp = PORTD     – 1 cycle
sts 0x0111, r24   ; buffer[1] = temp – 2 cycles
...

That's 3 cycles per read, i.e. a 5.33 Mhz reading frequency. For some reason the compiler didn't want to use the st X+, r24 instruction suggested in Tom Carpenter's answer. Let's try to hint the compiler a little bit, and rewrite the C code as follows:

uint8_t * p = buffer;
*p++ = PORTD;
*p++ = PORTD;
...

This generated the exact same assembly! The compiler somehow figured out the address of each memory write, and it replaced each occurrence of the pointer p by an explicit address. To prevent this kind of “optimization”, let's make the pointer a variable whose value is unknown at compile time:

void fill_buffer(uint8_t *p)
{
    *p++ = PORTD;
    *p++ = PORTD;
    ...
    *p++ = PORTD;
}

Here is the generated assembly:

movw r30,  r24   ; Z = p (Z is the register pair r31:r30)
in   r24,  0x0b  ; temp = PORTD   – 1 cycle
st   Z,    r24   ; *Z = temp      – 2 cycles
in   r24,  0x0b  ; temp = PORTD   – 1 cycle
std  Z+1,  r24   ; *(Z+1) = temp  – 2 cycles
...
in   r24,  0x0b  ; temp = PORTD   – 1 cycle
std  Z+29, r24   ; *(Z+29) = temp – 2 cycles
ret              ; return

Still 3 cycles per read. Here the compiler is using the std (store with displacement) instruction rather than st X+ (store with post-increment).

In the end, what instruction the compiler chooses doesn't really matter. All memory access instructions takes two cycles. Then, no matter what you do, repeatedly transferring data from a port to RAM will take 3 cycles per transfer, irrespective of the instruction you choose for the memory write.

Now, this doesn't mean you can't read faster. The AVR CPU core has 32 general purpose registers. Since you are only performing 30 port reads per burst, this means you can use the register file as an ultra-fast temporary buffer. This seems easier to do in assembly, and it will cost you a significant overhead in saving registers to the stack and restoring them afterwards. But the read burst itself will be faster:

; declare as:
;   extern "C" void fill_buffer(uint8_t *p);
.global fill_buffer
fill_buffer:

    ; Prologue: save registers and move the pointer.
    push r2         ; save all the registers belonging to the caller:
    push r3         ;  - 18 register to save (r2 – r17, r28, r29)
    ...
    push r28        ;  - 2 cycles per register
    movw r30, r24   ; Z = p (Z = r31:r30 is a pointer register)

    ; Now we can read the port really fast.
    in   r0,  0x0b  ; temp_0  = PORTD – 1 cycle
    in   r1,  0x0b  ; temp_1  = PORTD – 1 cycle
    ...
    in   r29, 0x0b  ; temp_29 = PORTD – 1 cycle

    ; Now save to RAM.
    st   Z+,  r0    ; *Z++ = temp_0   – 2 cycles
    st   Z+,  r1    ; *Z++ = temp_1   – 2 cycles
    ...
    st   Z+,  r29   ; *Z++ = temp_29  – 2 cycles

    ; Epilogue: restore the registers.
    pop  r28        ; restore all the previously saved registers:
    ...
    pop  r3         ;  - 18 registes to restore 
    pop  r2         ;  - 2 cycles per register
    clr  r1         ; leave r1 cleared, as required by the ABI
    ret             ; return

Now we are reading the port at 16 MHz: one read per cycle!

It turns out that that we can convince the compiler to do exactly this. I had to see to believe, but it works. Something essentially equivalent to the above assembly can be generated from C++ like this:

// Quickly read the port into temporaries.
uint8_t temp_0  = PORTD;
uint8_t temp_1  = PORTD;
...
uint8_t temp_29 = PORTD;

// Now save to RAM.
buffer[0]  = temp_0;
buffer[1]  = temp_1;
...
buffer[29] = temp_29;

Out of curiosity what compiler optimisation flags were you using? Also when I tested it was with I believe avrgcc 4.6, so optimisation behaviour may have changed. — Tom Carpenter
– Tom Carpenter, Commented Jul 27, 2018 at 8:32
@TomCarpenter: I used -Os. And indeed, this may change between compiler versions. — Edgar Bonet
– Edgar Bonet, Commented Jul 27, 2018 at 8:44

Stack Exchange Network

How fast can I read Arduino port D?

2 Answers 2

Your Answer

Hot Network Questions

How fast can I read Arduino port D?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions