Removing Printf Code from a Rust Program

Posted on September 2, 2022

In which I try to remove all traces of printf from a Rust program.

Context: a really tiny Rust program

Recently, I was working on a Rust program meant to run in a severely constrained environment; for example, the binary size on disk absolutely had to be under 1 MB, and we were aiming to have it come in around 100 KB. To that end, we had a couple rules for our program:

  • No use of heap allocations

  • No use of printf-style formatting code

We made extensive use of the libc Rust crate. The general style was that Rust, with its safety checks, took care of the “business logic” of the program, and we used libc calls to ensure our interactions with the OS don’t have any unexpected effects (like heap allocations or printf formatting).

So imagine my surprise when adding code to create pseudoterminals caused the binary to suddenly balloon in size. (By “balloon” I mean “increased by an unexpected ~10KB”.)

#![cfg_attr(not(test), no_main)]
extern crate libc;

use std::ffi::c_int;
use std::ptr;

#[cfg_attr(not(test), no_mangle)]
fn main() {
    let mut master: c_int = 0;
    let mut slave: c_int = 0;
    if 0 > unsafe {
        libc::openpty(
            &mut master,
            &mut slave,
            ptr::null_mut(),
            ptr::null(),
            ptr::null(),
        )
    } {
        panic!("openpty");
    }
    match unsafe { libc::fork() } {
        -1 => panic!("fork"),
        0 => {
            for fd in 0i32..3 {
                if 0 > unsafe { libc::dup2(slave, fd) } {
                    panic!("dup2");
                }
            }
            // Child execs a shell (or other program) that can be "driven" by
            // the parent program.
            // unsafe { libc::execl(b"/bin/sh\0".as_ptr() as *const c_char, ptr::null()) };
        }
        _ => {
            // parent communicates with child through the master side of the pty
            // read/write calls here (from stdin/stdout or a socket or whatnot)
        }
    }
}

cargo bloat is a handy tool that will tell you what exactly is taking up all that space. Using it to compile and analyze the above snippet1, I get this output:

     File  .text    Size     Crate Name
     4.1%  22.8%  2.9KiB [Unknown] fmt_fp
     3.4%  18.6%  2.3KiB [Unknown] printf_core
     0.6%   3.1%    403B [Unknown] vfprintf
     0.6%   3.0%    391B [Unknown] static_init_tls
     0.5%   3.0%    385B [Unknown] __init_libc
     0.5%   2.9%    366B [Unknown] pop_arg
     0.5%   2.7%    346B [Unknown] _start_c
     0.4%   2.3%    300B [Unknown] openpty
     0.4%   2.1%    268B [Unknown] wcrtomb
     0.3%   1.9%    238B [Unknown] __strchrnul
     0.3%   1.7%    221B [Unknown] __stdio_write
     0.3%   1.5%    196B [Unknown] memchr
     0.3%   1.4%    178B [Unknown] __lockfile
     0.2%   1.4%    175B [Unknown] __fwritex
     0.2%   1.3%    168B [Unknown] vsnprintf
     0.2%   1.3%    162B [Unknown] __lock
     0.2%   1.2%    150B [Unknown] fork
     0.2%   1.2%    148B [Unknown] pad
     0.2%   1.1%    143B [Unknown] fprintf
     0.2%   1.1%    140B [Unknown] sn_write
     3.9%  21.6%  2.7KiB           And 55 smaller methods. Use -n N to show more.
    18.1% 100.0% 12.5KiB           .text section size, the file size is 69.1KiB

Almost 10% of the total file size is taken up by formatting code (fmt_fp, printf_core, and vfprintf). And this is a do-nothing program! In the real code, it used more space. When you’re aiming for <100KB program, even 5–10KB is a noticeable percentage.

Why does openpty require printf?

After some experimentation,2 I realized that the culprit was the openpty call. But experimentation doesn’t tell us why. Nor does the manpage give us any hint—the only reference to string manipulation is:

if name is not NULL, the filename of the slave is returned in name.

But our program does set name to be NULL. So what gives?

In the end, I had to check musl’s source code for openpty to solve the mystery.

int openpty(int *pm, int *ps, char *name, const struct termios *tio, const struct winsize *ws)
{
    /* variable declarations... */
	m = open("/dev/ptmx", O_RDWR|O_NOCTTY);
	if (m < 0) return -1;

	pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &cs);

	if (ioctl(m, TIOCSPTLCK, &n) || ioctl (m, TIOCGPTN, &n))
		goto fail;

	if (!name) name = buf;
	snprintf(name, sizeof buf, "/dev/pts/%d", n);
	if ((s = open(name, O_RDWR|O_NOCTTY)) < 0)
		goto fail;
    /* code snipped */
}

And you can see the snprintf call as plain as day. It turns out that even if the caller doesn’t request the slave filename, openpty still needs to figure out the filename to open the file and return the file descriptor. Makes perfect sense once you think about it.

How do we get a pseudoterminal without printf?

An experienced reader may note that openpty is not the POSIX standard for opening a pseudoterminal. The POSIX standard specifies posix_openpt, grantpt, unlockpt, ptsname, and finally calling open on the slave filename. (You can see why openpty is more popular, despite being non-standard.) Unfortunately, using the POSIX standard functions doesn’t save us— ptsname3 uses snprintf to return the filename in almost the exact same manner as openpty does.

Unfortunately, we’re going to have to reimplement some libc functionality to get around the formatting call. We have a couple options:

  1. Keep using openpty, but write our own openpty function, being careful to not use any formatting functions.

  2. Use the POSIX standard functions, but write our own ptsname function, being careful to not use any formatting functions.

  3. Write our own snprintf function, and make sure that it overrides the libc version. This can work because our version of snprintf would only handle integer conversion—formatting code is bloated because it has to handle floating point, hex, padding, pointers, etc.

Any of these will work. I chose the second option because it 1) limits the amount of code I have to write (ptsname_r is a smaller function than openpty) and 2) it doesn’t rely on redefining a libc function.

fn no_printf_ptsname_r(fd: c_int, buf: *mut c_char, buflen: libc::size_t) -> c_int {
    // The ioctl call gives us the pseudoterminal number, but then we need
    // to convert the number to text _without_ using formatting calls.
    let mut ptsnum: c_int = unsafe { MaybeUninit::zeroed().assume_init() };
    if 0 != unsafe { libc::ioctl(fd, libc::TIOCGPTN, &ptsnum) } {
        return -1;
    }
    // This block of code is roughly equivalent to a very limited itoa() call.
    // We can make a couple of simplifying assumptions, such as a hard limit
    // on the size of the pseudoterminal number.
    const MAX_DIGITS_U32: usize = 10;
    let mut ptsbuf: [u8; MAX_DIGITS_U32+1] = [0; MAX_DIGITS_U32+1];
    let mut i = MAX_DIGITS_U32;
    while ptsnum > 0 {
        i -= 1;
        let digit = ptsnum % 10;
        // 0x30 = '0'. Depends on the character encoding being UTF-8
        ptsbuf[i] = (0x30 + digit)
            .try_into()
            .expect("can't convert digit to u8");
        ptsnum /= 10;
    }
    // CString is far easier to use, but it requires a heap allocation, so
    // we use CStr instead.
    let ptsstrlen = MAX_DIGITS_U32 - i;
    let ptsstr = CStr::from_bytes_with_nul(&ptsbuf[i..]).unwrap();

    // The rest of the function is just copying bytes around so we can end up
    // with a buffer containing b"/dev/pts/<ptsnum>\0"
    let path = b"/dev/pts/\0";
    let pathlen = path.len() - 1;
    if pathlen > buflen {
        return -1;
    }
    let path = CStr::from_bytes_with_nul(path).unwrap();
    unsafe { path.as_ptr().copy_to(buf, pathlen) };
    if pathlen + ptsstrlen > buflen {
        return -1;
    }
    unsafe {
        ptsstr.as_ptr().copy_to(buf.add(pathlen), ptsstrlen);
        *buf.add(pathlen + ptsstrlen) = '\0' as c_char;
    }
    0

Then we can call our POSIX standard functions to open a master and slave pty pair, substituting our custom ptsname_r function in place of the normal one.


#[cfg_attr(not(test), no_mangle)]
fn main() -> i8 {
    let master = unsafe { libc::posix_openpt(libc::O_RDWR) };
    if master < 0 {
        panic!("posix_openpt");
    }
    if 0 > unsafe { libc::grantpt(master) } {
        panic!("grantpt");
    }
    if 0 > unsafe { libc::unlockpt(master) } {
        panic!("unlockpt");
    }
    match unsafe { libc::fork() } {
        -1 => panic!("fork"),
        0 => {
            let mut slave_name: [c_char; 64] = unsafe { MaybeUninit::zeroed().assume_init() };
            no_printf_ptsname_r(master, slave_name.as_mut_ptr(), 64);
            let slave = unsafe { libc::open(slave_name.as_ptr() as *const c_char, libc::O_RDWR) };
            if slave < 0 {
                panic!("open");
            }
            for fd in 0i32..3 {
                if 0 > unsafe { libc::dup2(slave, fd) } {
                    panic!("dup2");
                }
            }
            // Child execs a shell (or other program) that can be "driven" by
            // the parent program.
            // unsafe { libc::execl(b"/bin/sh\0".as_ptr() as *const c_char, ptr::null()) };
            return 0;
        }
        _ => {
            // parent communicates with child through the master side of the pty
            // read/write calls here (from stdin/stdout or a socket or whatnot)
            return 0;
        }
    }

Afterwards, we can see that all formatting code has vanished from our binary and the size of our text section has almost halved from 12.5 KiB to 5.7 KiB.

File  .text   Size     Crate Name
1.1%  12.2%   715B [Unknown] __vdsosym
0.6%   6.7%   391B [Unknown] static_init_tls
0.6%   6.6%   387B [Unknown] __init_libc
0.6%   6.0%   351B [Unknown] fork
0.6%   5.9%   346B [Unknown] _start_c
0.5%   5.0%   291B [Unknown] main
0.4%   4.6%   269B [Unknown] __timedwait_cp
0.4%   3.8%   221B [Unknown] __stdio_write
0.3%   3.3%   191B [Unknown] _Fork
0.3%   3.0%   178B [Unknown] __lockfile
0.3%   3.0%   178B [Unknown] __lock
0.2%   2.4%   142B [Unknown] open64
0.2%   2.1%   121B [Unknown] __copy_tls
0.2%   2.0%   118B [Unknown] __pthread_rwlock_timedwrlock
0.2%   1.9%   113B [Unknown] pthread_rwlock_unlock
0.2%   1.8%   108B [Unknown] __clock_gettime
0.2%   1.8%   107B [Unknown] __init_tp
0.2%   1.8%   105B [Unknown] __do_fini
0.1%   1.5%    88B [Unknown] __timedwait
0.1%   1.3%    78B [Unknown] close_file
1.9%  20.9% 1.2KiB           And 42 smaller methods. Use -n N to show more.
9.3% 100.0% 5.7KiB           .text section size, the file size is 61.3KiB

Resources

min-sized-rust: Why all the weird flags in my example cargo commands? This Github repository contains a list of suggestions for making your Rust program as small as possible. I didn’t go all the way to writing no-std code, but I did use the nightly build-std feature, which recompiles the Rust’s std with only the features you specify in it.

musl’s source code: Besides the fact that I was using musl as my chosen libc, I generally use musl source code to check what the reference implementation of a libc function looks like, since the code is much easier to read than GNU libc.

Advanced Programming in a UNIX Environment, 3rd ed: APUE contains a fairly comprehensive discussion of pseudoterminals and how to use them in chapter 19.

Appendix

Want to check my work? Use the following script to set up a rust repository and check the cargo bloat numbers.

#!/bin/bash
rustproject="rust-remove-printf"

echo '[+] initializing rust project'
cargo init "$rustproject"
rm "$rustproject"/src/main.rs
#
# Initial state of our rust project
#
cat << EOF > $rustproject/src/before.rs
#![cfg_attr(not(test), no_main)]
extern crate libc;

use std::ffi::c_int;
use std::ptr;

#[cfg_attr(not(test), no_mangle)]
fn main() {
    let mut master: c_int = 0;
    let mut slave: c_int = 0;
    if 0 > unsafe {
        libc::openpty(
            &mut master,
            &mut slave,
            ptr::null_mut(),
            ptr::null(),
            ptr::null(),
        )
    } {
        panic!("openpty");
    }
    match unsafe { libc::fork() } {
        -1 => panic!("fork"),
        0 => {
            for fd in 0i32..3 {
                if 0 > unsafe { libc::dup2(slave, fd) } {
                    panic!("dup2");
                }
            }
            // Child execs a shell (or other program) that can be "driven" by
            // the parent program.
            // unsafe { libc::execl(b"/bin/sh\0".as_ptr() as *const c_char, ptr::null()) };
        }
        _ => {
            // parent communicates with child through the master side of the pty
            // read/write calls here (from stdin/stdout or a socket or whatnot)
        }
    }
}
EOF
#
# Rust project with our custom function
#
cat << EOF > $rustproject/src/after.rs
#![cfg_attr(not(test), no_main)]
extern crate libc;

use std::ffi::{c_char, c_int, CStr};
use std::mem::MaybeUninit;

fn no_printf_ptsname_r(fd: c_int, buf: *mut c_char, buflen: libc::size_t) -> c_int {
    // The ioctl call gives us the pseudoterminal number, but then we need
    // to convert the number to text _without_ using formatting calls.
    let mut ptsnum: c_int = unsafe { MaybeUninit::zeroed().assume_init() };
    if 0 != unsafe { libc::ioctl(fd, libc::TIOCGPTN, &ptsnum) } {
        return -1;
    }
    // This block of code is roughly equivalent to a very limited itoa() call.
    // We can make a couple of simplifying assumptions, such as a hard limit
    // on the size of the pseudoterminal number.
    const MAX_DIGITS_U32: usize = 10;
    let mut ptsbuf: [u8; MAX_DIGITS_U32+1] = [0; MAX_DIGITS_U32+1];
    let mut i = MAX_DIGITS_U32;
    while ptsnum > 0 {
        i -= 1;
        let digit = ptsnum % 10;
        // 0x30 = '0'. Depends on the character encoding being UTF-8
        ptsbuf[i] = (0x30 + digit)
            .try_into()
            .expect("can't convert digit to u8");
        ptsnum /= 10;
    }
    // CString is far easier to use, but it requires a heap allocation, so
    // we use CStr instead.
    let ptsstrlen = MAX_DIGITS_U32 - i;
    let ptsstr = CStr::from_bytes_with_nul(&ptsbuf[i..]).unwrap();

    // The rest of the function is just copying bytes around so we can end up
    // with a buffer containing b"/dev/pts/<ptsnum>\0"
    let path = b"/dev/pts/\0";
    let pathlen = path.len() - 1;
    if pathlen > buflen {
        return -1;
    }
    let path = CStr::from_bytes_with_nul(path).unwrap();
    unsafe { path.as_ptr().copy_to(buf, pathlen) };
    if pathlen + ptsstrlen > buflen {
        return -1;
    }
    unsafe {
        ptsstr.as_ptr().copy_to(buf.add(pathlen), ptsstrlen);
        *buf.add(pathlen + ptsstrlen) = '\0' as c_char;
    }
    0
}

#[cfg_attr(not(test), no_mangle)]
fn main() -> i8 {
    let master = unsafe { libc::posix_openpt(libc::O_RDWR) };
    if master < 0 {
        panic!("posix_openpt");
    }
    if 0 > unsafe { libc::grantpt(master) } {
        panic!("grantpt");
    }
    if 0 > unsafe { libc::unlockpt(master) } {
        panic!("unlockpt");
    }
    match unsafe { libc::fork() } {
        -1 => panic!("fork"),
        0 => {
            let mut slave_name: [c_char; 64] = unsafe { MaybeUninit::zeroed().assume_init() };
            no_printf_ptsname_r(master, slave_name.as_mut_ptr(), 64);
            let slave = unsafe { libc::open(slave_name.as_ptr() as *const c_char, libc::O_RDWR) };
            if slave < 0 {
                panic!("open");
            }
            for fd in 0i32..3 {
                if 0 > unsafe { libc::dup2(slave, fd) } {
                    panic!("dup2");
                }
            }
            // Child execs a shell (or other program) that can be "driven" by
            // the parent program.
            // unsafe { libc::execl(b"/bin/sh\0".as_ptr() as *const c_char, ptr::null()) };
            return 0;
        }
        _ => {
            // parent communicates with child through the master side of the pty
            // read/write calls here (from stdin/stdout or a socket or whatnot)
            return 0;
        }
    }
}
EOF
#
# Cargo toml with custom release and bloat profiles
#
cat << EOF > $rustproject/Cargo.toml
[package]
name = "$rustproject"
version = "0.1.0"
edition = "2021"

[dependencies]
libc = "*"

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
panic = "abort"
strip = true

[profile.bloat]
inherits = "release"
strip = false

[[bin]]
name = "before"
path = "src/before.rs"

[[bin]]
name = "after"
path = "src/after.rs"
EOF

cd $rustproject
echo '[+] comparing bloat of before and after targets'
echo '    bloat with formatting'
cargo +nightly bloat --target x86_64-unknown-linux-musl --profile bloat --bin before -Zbuild-std=std,core,panic_abort -Zbuild-std-features=panic_immediate_abort 2> /dev/null
echo
echo '    bloat without formatting'
cargo +nightly bloat --target x86_64-unknown-linux-musl --profile bloat --bin after -Zbuild-std=std,core,panic_abort -Zbuild-std-features=panic_immediate_abort 2> /dev/null

  1. The cargo bloat command

    cargo +nightly bloat --target x86_64-unknown-linux-musl --profile bloat --bin before -Zbuild-std=std,core,panic_abort -Zbuild-std-features=panic_immediate_abort 2> /dev/null

    closely approximates the real world program compilation options. See Resources and Appendix for how and why I used these options.↩︎

  2. Mostly using Rust’s todo! macro to remove code and recheck the bloat results.↩︎

  3. ptsname_r is the thread-safe version of ptsname. musl implements ptsname as a call to ptsname_r.↩︎