4

In a program, I have a lot of arrays of different length strings, and each array is declared as an array of pointers to those strings, like:

static const char * num_tab[] = {"First", "Second", "Third"};
static const char * day_tab[] = {"Sunday", "Monday", "Tuesday"};
static const char * random_tab[] = {"Strings and arrays can have", "diferent", "lenghts"};

The (pointer to) strings are returned from simple functions such as:

const char * dayName(int index) {
  return day_tab[index];
}

Under the AVR architecture, those strings need to be stored in program memory. I understand that the functions need to be changed in order to work also on AVR's, (they need to copy the string from program memory to a buffer in ram, and return a pointer to that instead).

How can I change the arrays' initialization to use PROGMEM, without the need to name each individual string?

The only way I found is to define each string with a name (and PROGMEM), and define an array of pointers initialized with pointers to those strings:

static const char d1[] PROGMEM = "First";
static const char d2[] PROGMEM = "Second";
static const char d3[] PROGMEM = "Third";

const char * const day_tab[] = {d1, d2, d3}; // only needs PROGMEM for large arrays

This works, but for large arrays of different sizes, it changes the code from a few lines to hundreds, which makes maintenaince practically imposible. Also, adding or removing a value from an array, will need a renumbering of all of the following items.

11
  • This may be a job for a Python or Perl script (or another C program) to generate the declarations for you from a list of strings. Commented May 19, 2024 at 12:51
  • Doesn't static const char PROGMEM * const num_tab[] = { "First", "Second", "Third"}; work? Commented May 19, 2024 at 13:54
  • @TedLyngmo: no, that will put the array of pointers in progmem, and the strings will be in ram. It doesn't really matter where you put the PROGMEM macro. Commented May 19, 2024 at 16:00
  • @JohnBode: I am searching for a way to make it work without changing the code too much, or using external scripts. Commented May 19, 2024 at 16:02
  • In C, you can't. Both PROGMEM string literals and PROGMEM pointers to them must be accessed with lpm instruction, e.g. with pgm_read functions family. Commented May 19, 2024 at 18:04

2 Answers 2

3

As your question is tagged "C", GNU-C and named address-spaces as of ISO/IEC DTR 18037 may be a way to go. Compile with -std=gnu99 or higher:

#define F(X) ((const __flash char[]) { X })

static const __flash char *const __flash nums[] =
{
    F("first"), F("second"), F("third")
};

#include <stdio.h>

void print_num (int id)
{
    printf ("num = %S\n", nums[id]);
}

The generated code for the nums[] array is:

    .section    .progmem.data,"a",@progbits
    .type   nums, @object
    .size   nums, 6
nums:
    .word   __compound_literal.0
    .word   __compound_literal.1
    .word   __compound_literal.2
    .type   __compound_literal.2, @object
    .size   __compound_literal.2, 6
__compound_literal.2:
    .string "third"
    .type   __compound_literal.1, @object
    .size   __compound_literal.1, 7
__compound_literal.1:
    .string "second"
    .type   __compound_literal.0, @object
    .size   __compound_literal.0, 6
__compound_literal.0:
    .string "first"

The code to print the string uses LPM to read nums[id], which is again in progmem and printed using the %S format specifier:

print_num:
    lsl r24  ;  33  [c=8 l=2]  *ashlhi3_const/1
    rol r25
    movw r30,r24     ;  26  [c=4 l=1]  *movhi/0
    subi r30,lo8(-(nums))    ;  8   [c=8 l=2]  *addhi3/1
    sbci r31,hi8(-(nums))
    lpm r24,Z+   ;  27  [c=8 l=2]  *movhi/2
    lpm r25,Z+
    push r25         ;  11  [c=4 l=1]  pushqi1/0
    push r24         ;  13  [c=4 l=1]  pushqi1/0
    ldi r24,lo8(.LC0)    ;  14  [c=4 l=2]  *movhi/4
    ldi r25,hi8(.LC0)
    push r25         ;  16  [c=4 l=1]  pushqi1/0
    push r24         ;  19  [c=4 l=1]  pushqi1/0
    call printf  ;  20  [c=16 l=2]  call_value_insn/1
     ; SP += 4   ;  21  [c=4 l=4]  *addhi3_sp
    pop __tmp_reg__
    pop __tmp_reg__
    pop __tmp_reg__
    pop __tmp_reg__
    ret      ;  31  [c=0 l=1]  return

Note: You can port this to archs without __flash by means of:

#ifndef __FLASH
#define __flash /* empty */
#endif

The macro __FLASH is a builtin macro in avr-gcc and only defined when address-space __flash is present.

Print modifier %S prints a string in progmem / flash. (On compliant platforms it stands for a wide string.) So you have to use %s instead.

Sign up to request clarification or add additional context in comments.

11 Comments

YES! This is the answer I was looking for. The question was really "how can I store arrays of strings in progmem on avrs, so the code will still work on non-avrs, with as few changes as possible". Should I edit the question? This solution has also the additional bonus that the array of pointers to the strings, is also stored in progmem, and the only required change to the code is wrapping the string constants with F().
Notably this wastes extra flash though, since you have to store the string literal itself as well as the pointer in flash. The linker really ought to have an option telling it how to deal with string literals, dropping them in for example .text rather than .rodata or some such.
@Lundin: That's inevitable: You either have an array with pointers to string literals that may have different lengths (and hence need an array of pointers), or you have an array of literals that all must have the same length (which might waste space when the literals have different lengths). And this is no different to archs that have .rodata in flash from the start.
I can think of a few dirty tricks to avoid that, in case flash is precious. For example you know how large each string is from the start, so you could allocate them in adjacent memory and still access them by a memory offset. Might look a bit ugly, but not much worse than the custom macro language you've invented here.
@GabrielStaples: It's not from Arduino, and I odn't use Arduino. Moreover, Arduino is C++ / avr-g++ which does not support named address-spaces. That's why I referred to C and the c tag of the question. That macro won't even compile with C++, so why do you even think about I stole it or whatever... The F macro from Arduino ir rather like the PSTR macro from AVR-LibC's pgmspace.h. And no, AVR-LibC didn't steal it from Arduino.
|
0

The AVR PROGMEM fix can be hidden behind a macro like:

#ifdef __AVR__
  #define MEM PROGMEM
#else
  #define MEM /* dummy macro */
#endif

Regarding performance and allocation:

The advantage of having a pointer-based look-up table as in the first example is execution speed - you'll be able to grab each string quickly with no run-time overhead.

The down-side is that you have to allocate the pointers themselves too, so it wastes extra flash or in worst case RAM+flash in case the pointer table is copied down from ROM to RAM during start-up.

This answer is only applicable in case you really need to save flash over everything else. But also if you want to centralize code maintenance to a single list. You can then cook up something evil-looking using "X macros", optionally with a name for each string:

#define STR_LIST(X)     \
  X(d1, "first")        \
  X(d2, "second")       \
  X(d3, "third")        \

You can then allocate this stuff adjacently in the same big array, like this:

static const char STRINGS[] MEM =
{
  #define STR_ALLOC(name, str) str "\0"
  STR_LIST(STR_ALLOC)
};

This uses string concatenation for convenience, but with manually added null terminators since string concatenation would otherwise remove them. It expands to:

"first" "\0" "second" "\0" "third" "\0"

And gets concatenated to:

 "first\0second\0third\0"

(We'll get an extra null terminator at the end, but that might be neat for "sentinel value" purposes.)


Various dirty hacks for "named" access in run-time (flash size over speed optimization) - full code example:

#define STR_LIST(X)     \
  X(d1, "first")        \
  X(d2, "second")       \
  X(d3, "third")        \

#ifdef __AVR__
  #define MEM PROGMEM
#else
  #define MEM /* dummy macro */
#endif

static const char STRINGS[] MEM =
{
  #define STR_ALLOC(name, str) str "\0"
  STR_LIST(STR_ALLOC)
};

typedef enum
{
  #define STR_ENUM(name, str) STR_ENUM_##name,
  STR_LIST(STR_ENUM)
  
  STR_ENUM_N
} str_enum_t;

static str_enum_t key;
#define STR_COUNT(name, str) +(STR_ENUM_##name<key ? sizeof(str) : 0)

#define STR_GET_POS(name) (key=STR_ENUM_##name, STR_LIST(STR_COUNT))

#include <stdio.h>
int main (void)
{
  puts("Memory dump, | marks null terminators:");
  for(size_t i=0; i<sizeof(STRINGS); i++)
  {
    printf("%c", STRINGS[i]=='\0' ? '|' : STRINGS[i]);
  }
  puts("");puts("");

  puts("What the strings are named:");
  #define STR_PRINT_NAME(name, str) printf("%s: %s\n", #name, str);
  STR_LIST(STR_PRINT_NAME)
  puts("");

  puts("Where the strings are found:");
  printf("%s %2zu ", "d1", STR_GET_POS(d1));
  printf("%s\n", &STRINGS[STR_GET_POS(d1)]);
  printf("%s %2zu ", "d2", STR_GET_POS(d2));
  printf("%s\n", &STRINGS[STR_GET_POS(d2)]);
  printf("%s %2zu ", "d3", STR_GET_POS(d3));
  printf("%s\n", &STRINGS[STR_GET_POS(d3)]);
}

Output:

Memory dump, | marks null terminators:
first|second|third||

What the strings are named:
d1: first
d2: second
d3: third

Where the strings are found:
d1  0 first
d2  6 second
d3 13 third

5 Comments

So where is the PROGMEM, and to access that, you need pgm_read_xxx macros from #include <avr/pgmspace.h>. Also you cannot acces it as an array, or am I missing something?
This works if string identifier is known at compile-time (if you access &STRINGS[STR_GET_POS(d1)] with pgm_read_byte()) or by sequential access. You can't access a string by arbitrary index at runtime with O(1).
@emacsdrivesmenuts If more platform-specific stuff is necessary to access memory, then just add it. This is plain array. You can access each string as an array, or the whole lot through STRINGS.
@dimich As was given emphasis in the answer: "This answer is only applicable in case you really need to save flash over everything else." It's a ROM size over execution speed manual optimization for the sake of saving flash.
@Lundin I understand. IRCC something similiar was used in ZX Spectrum 48k ROM for Basic tokens, but MSB inverted in last character as end-of-word indicator instead of NUL byte.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.