As you can see here, the definition of the return value for esp_camera_fb_get() is
typedef struct {
uint8_t * buf; /*!< Pointer to the pixel data */
size_t len; /*!< Length of the buffer in bytes */
size_t width; /*!< Width of the buffer in pixels */
size_t height; /*!< Height of the buffer in pixels */
pixformat_t format; /*!< Format of the pixel data */
} camera_fb_t;
..
camera_fb_t* esp_camera_fb_get();
..
typedef enum {
PIXFORMAT_RGB565, // 2BPP/RGB565
PIXFORMAT_YUV422, // 2BPP/YUV422
PIXFORMAT_GRAYSCALE, // 1BPP/GRAYSCALE
PIXFORMAT_JPEG, // JPEG/COMPRESSED
PIXFORMAT_RGB888, // 3BPP/RGB888
PIXFORMAT_RAW, // RAW
PIXFORMAT_RGB444, // 3BP2P/RGB444
PIXFORMAT_RGB555, // 3BP2P/RGB555
} pixformat_t;
Meaning that fb->buf and bf->len hold thehthe raw data in the format specified by pixformat_t. Aka this is not a "string", these are raw bytes, which you can still base64-encode perfectly fine.
So for the base64 library
static String encode(const uint8_t * data, size_t length);
it already accepts the right data type and you can do
fb = esp_camera_fb_get();
..
//will be allocated on the heap. Takes about 4/3 of the input size, so basically it doubles your memory requirements
String imgDataB64 = base64::encode(fb->buf, fb->len);
//add to a JSON object wit the metadata width, height and format so that it can be decoded
You should output the fb->format value to check what format the data is, and add this and the width & height information so that the image may be constructted on the other side. Beware high memory requierements since the base64 encoding basically creates a new buffer to store the base64 representation of it. That may be optimized by writing the framebuffer data into an initially bigger buffer which is then transformed in-place. But that would have to be changed at the image driver level.