In which I try to remove all traces of printf from a Rust program.
Context: a really tiny Rust program
Recently, I was working on a Rust program meant to run in a severely constrained environment; for example, the binary size on disk absolutely had to be under 1 MB, and we were aiming to have it come in around 100 KB. To that end, we had a couple rules for our program:
No use of heap allocations
No use of printf-style formatting code
We made extensive use of the libc Rust crate. The general style was that Rust, with its safety checks, took care of the “business logic” of the program, and we used libc calls to ensure our interactions with the OS don’t have any unexpected effects (like heap allocations or printf formatting).
So imagine my surprise when adding code to create pseudoterminals caused the binary to suddenly balloon in size. (By “balloon” I mean “increased by an unexpected ~10KB”.)
#![cfg_attr(not(test), no_main)]
extern crate libc;
use std::ffi::c_int;
use std::ptr;
#[cfg_attr(not(test), no_mangle)]
fn main() {
let mut master: c_int = 0;
let mut slave: c_int = 0;
if 0 > unsafe {
libc::openpty(
&mut master,
&mut slave,
ptr::null_mut(),
ptr::null(),
ptr::null(),
)} {
panic!("openpty");
}
match unsafe { libc::fork() } {
-1 => panic!("fork"),
0 => {
for fd in 0i32..3 {
if 0 > unsafe { libc::dup2(slave, fd) } {
panic!("dup2");
}
}
// Child execs a shell (or other program) that can be "driven" by
// the parent program.
// unsafe { libc::execl(b"/bin/sh\0".as_ptr() as *const c_char, ptr::null()) };
}
=> {
_ // parent communicates with child through the master side of the pty
// read/write calls here (from stdin/stdout or a socket or whatnot)
}
}
}
cargo bloat
is a handy tool that will tell you what exactly is taking
up all that space. Using it to compile and analyze the above
snippet1, I get this output:
File .text Size Crate Name
4.1% 22.8% 2.9KiB [Unknown] fmt_fp
3.4% 18.6% 2.3KiB [Unknown] printf_core
0.6% 3.1% 403B [Unknown] vfprintf
0.6% 3.0% 391B [Unknown] static_init_tls
0.5% 3.0% 385B [Unknown] __init_libc
0.5% 2.9% 366B [Unknown] pop_arg
0.5% 2.7% 346B [Unknown] _start_c
0.4% 2.3% 300B [Unknown] openpty
0.4% 2.1% 268B [Unknown] wcrtomb
0.3% 1.9% 238B [Unknown] __strchrnul
0.3% 1.7% 221B [Unknown] __stdio_write
0.3% 1.5% 196B [Unknown] memchr
0.3% 1.4% 178B [Unknown] __lockfile
0.2% 1.4% 175B [Unknown] __fwritex
0.2% 1.3% 168B [Unknown] vsnprintf
0.2% 1.3% 162B [Unknown] __lock
0.2% 1.2% 150B [Unknown] fork
0.2% 1.2% 148B [Unknown] pad
0.2% 1.1% 143B [Unknown] fprintf
0.2% 1.1% 140B [Unknown] sn_write
3.9% 21.6% 2.7KiB And 55 smaller methods. Use -n N to show more.
18.1% 100.0% 12.5KiB .text section size, the file size is 69.1KiB
Almost 10% of the total file size is taken up by formatting code (fmt_fp
,
printf_core
, and vfprintf
). And this is a do-nothing program! In the real
code, it used more space. When you’re aiming for <100KB program, even 5–10KB
is a noticeable percentage.
Why does openpty
require printf
?
After some experimentation,2 I realized that the culprit was the openpty
call. But experimentation doesn’t tell us why. Nor does the
manpage give us any
hint—the only reference to string manipulation is:
if name is not NULL, the filename of the slave is returned in name.
But our program does set name
to be NULL. So what gives?
In the end, I had to check musl’s source
code for
openpty
to solve the mystery.
int openpty(int *pm, int *ps, char *name, const struct termios *tio, const struct winsize *ws)
{
/* variable declarations... */
= open("/dev/ptmx", O_RDWR|O_NOCTTY);
m if (m < 0) return -1;
(PTHREAD_CANCEL_DISABLE, &cs);
pthread_setcancelstate
if (ioctl(m, TIOCSPTLCK, &n) || ioctl (m, TIOCGPTN, &n))
goto fail;
if (!name) name = buf;
(name, sizeof buf, "/dev/pts/%d", n);
snprintfif ((s = open(name, O_RDWR|O_NOCTTY)) < 0)
goto fail;
/* code snipped */
}
And you can see the snprintf
call as plain as day. It turns out that
even if the caller doesn’t request the slave filename, openpty
still
needs to figure out the filename to open the file and return the file
descriptor. Makes perfect sense once you think about it.
How do we get a pseudoterminal without printf
?
An experienced reader may note that openpty
is not the POSIX standard
for opening a pseudoterminal. The POSIX standard specifies
posix_openpt
, grantpt
, unlockpt
, ptsname
, and finally calling
open
on the slave filename. (You can see why openpty
is more
popular, despite being non-standard.) Unfortunately, using the POSIX
standard functions doesn’t save us—
ptsname
3
uses snprintf
to return the filename in almost the exact same manner
as openpty
does.
Unfortunately, we’re going to have to reimplement some libc functionality to get around the formatting call. We have a couple options:
Keep using
openpty
, but write our ownopenpty
function, being careful to not use any formatting functions.Use the POSIX standard functions, but write our own
ptsname
function, being careful to not use any formatting functions.Write our own
snprintf
function, and make sure that it overrides the libc version. This can work because our version ofsnprintf
would only handle integer conversion—formatting code is bloated because it has to handle floating point, hex, padding, pointers, etc.
Any of these will work. I chose the second option because it 1) limits
the amount of code I have to write (ptsname_r
is a smaller function
than openpty
) and 2) it doesn’t rely on redefining a libc function.
fn no_printf_ptsname_r(fd: c_int, buf: *mut c_char, buflen: libc::size_t) -> c_int {
// The ioctl call gives us the pseudoterminal number, but then we need
// to convert the number to text _without_ using formatting calls.
let mut ptsnum: c_int = unsafe { MaybeUninit::zeroed().assume_init() };
if 0 != unsafe { libc::ioctl(fd, libc::TIOCGPTN, &ptsnum) } {
return -1;
}
// This block of code is roughly equivalent to a very limited itoa() call.
// We can make a couple of simplifying assumptions, such as a hard limit
// on the size of the pseudoterminal number.
const MAX_DIGITS_U32: usize = 10;
let mut ptsbuf: [u8; MAX_DIGITS_U32+1] = [0; MAX_DIGITS_U32+1];
let mut i = MAX_DIGITS_U32;
while ptsnum > 0 {
-= 1;
i let digit = ptsnum % 10;
// 0x30 = '0'. Depends on the character encoding being UTF-8
= (0x30 + digit)
ptsbuf[i] .try_into()
.expect("can't convert digit to u8");
/= 10;
ptsnum }
// CString is far easier to use, but it requires a heap allocation, so
// we use CStr instead.
let ptsstrlen = MAX_DIGITS_U32 - i;
let ptsstr = CStr::from_bytes_with_nul(&ptsbuf[i..]).unwrap();
// The rest of the function is just copying bytes around so we can end up
// with a buffer containing b"/dev/pts/<ptsnum>\0"
let path = b"/dev/pts/\0";
let pathlen = path.len() - 1;
if pathlen > buflen {
return -1;
}
let path = CStr::from_bytes_with_nul(path).unwrap();
unsafe { path.as_ptr().copy_to(buf, pathlen) };
if pathlen + ptsstrlen > buflen {
return -1;
}
unsafe {
.as_ptr().copy_to(buf.add(pathlen), ptsstrlen);
ptsstr*buf.add(pathlen + ptsstrlen) = '\0' as c_char;
}
0
Then we can call our POSIX standard functions to open a master and slave
pty pair, substituting our custom ptsname_r
function in place of the
normal one.
#[cfg_attr(not(test), no_mangle)]
fn main() -> i8 {
let master = unsafe { libc::posix_openpt(libc::O_RDWR) };
if master < 0 {
panic!("posix_openpt");
}
if 0 > unsafe { libc::grantpt(master) } {
panic!("grantpt");
}
if 0 > unsafe { libc::unlockpt(master) } {
panic!("unlockpt");
}
match unsafe { libc::fork() } {
-1 => panic!("fork"),
0 => {
let mut slave_name: [c_char; 64] = unsafe { MaybeUninit::zeroed().assume_init() };
, slave_name.as_mut_ptr(), 64);
no_printf_ptsname_r(masterlet slave = unsafe { libc::open(slave_name.as_ptr() as *const c_char, libc::O_RDWR) };
if slave < 0 {
panic!("open");
}
for fd in 0i32..3 {
if 0 > unsafe { libc::dup2(slave, fd) } {
panic!("dup2");
}
}
// Child execs a shell (or other program) that can be "driven" by
// the parent program.
// unsafe { libc::execl(b"/bin/sh\0".as_ptr() as *const c_char, ptr::null()) };
return 0;
}
=> {
_ // parent communicates with child through the master side of the pty
// read/write calls here (from stdin/stdout or a socket or whatnot)
return 0;
}
}
Afterwards, we can see that all formatting code has vanished from our binary and the size of our text section has almost halved from 12.5 KiB to 5.7 KiB.
File .text Size Crate Name
1.1% 12.2% 715B [Unknown] __vdsosym
0.6% 6.7% 391B [Unknown] static_init_tls
0.6% 6.6% 387B [Unknown] __init_libc
0.6% 6.0% 351B [Unknown] fork
0.6% 5.9% 346B [Unknown] _start_c
0.5% 5.0% 291B [Unknown] main
0.4% 4.6% 269B [Unknown] __timedwait_cp
0.4% 3.8% 221B [Unknown] __stdio_write
0.3% 3.3% 191B [Unknown] _Fork
0.3% 3.0% 178B [Unknown] __lockfile
0.3% 3.0% 178B [Unknown] __lock
0.2% 2.4% 142B [Unknown] open64
0.2% 2.1% 121B [Unknown] __copy_tls
0.2% 2.0% 118B [Unknown] __pthread_rwlock_timedwrlock
0.2% 1.9% 113B [Unknown] pthread_rwlock_unlock
0.2% 1.8% 108B [Unknown] __clock_gettime
0.2% 1.8% 107B [Unknown] __init_tp
0.2% 1.8% 105B [Unknown] __do_fini
0.1% 1.5% 88B [Unknown] __timedwait
0.1% 1.3% 78B [Unknown] close_file
1.9% 20.9% 1.2KiB And 42 smaller methods. Use -n N to show more.
9.3% 100.0% 5.7KiB .text section size, the file size is 61.3KiB
Resources
min-sized-rust: Why
all the weird flags in my example cargo
commands? This Github
repository contains a list of suggestions for making your Rust program
as small as possible. I didn’t go all the way to writing no-std
code,
but I did use the nightly build-std
feature, which recompiles the
Rust’s std
with only the features you specify in it.
musl’s source code: Besides
the fact that I was using musl
as my chosen libc, I generally use
musl
source code to check what the reference implementation of a libc
function looks like, since the code is much easier to read than GNU
libc.
Advanced Programming in a UNIX Environment, 3rd ed: APUE contains a fairly comprehensive discussion of pseudoterminals and how to use them in chapter 19.
Appendix
Want to check my work? Use the following script to set up a rust repository and check the cargo bloat numbers.
#!/bin/bash
rustproject="rust-remove-printf"
echo '[+] initializing rust project'
cargo init "$rustproject"
rm "$rustproject"/src/main.rs
#
# Initial state of our rust project
#
cat << EOF > $rustproject/src/before.rs
#![cfg_attr(not(test), no_main)]
extern crate libc;
use std::ffi::c_int;
use std::ptr;
#[cfg_attr(not(test), no_mangle)]
fn main() {
let mut master: c_int = 0;
let mut slave: c_int = 0;
if 0 > unsafe {
libc::openpty(
&mut master,
&mut slave,
ptr::null_mut(),
ptr::null(),
ptr::null(),
)
} {
panic!("openpty");
}
match unsafe { libc::fork() } {
-1 => panic!("fork"),
0 => {
for fd in 0i32..3 {
if 0 > unsafe { libc::dup2(slave, fd) } {
panic!("dup2");
}
}
// Child execs a shell (or other program) that can be "driven" by
// the parent program.
// unsafe { libc::execl(b"/bin/sh\0".as_ptr() as *const c_char, ptr::null()) };
}
_ => {
// parent communicates with child through the master side of the pty
// read/write calls here (from stdin/stdout or a socket or whatnot)
}
}
}
EOF
#
# Rust project with our custom function
#
cat << EOF > $rustproject/src/after.rs
#![cfg_attr(not(test), no_main)]
extern crate libc;
use std::ffi::{c_char, c_int, CStr};
use std::mem::MaybeUninit;
fn no_printf_ptsname_r(fd: c_int, buf: *mut c_char, buflen: libc::size_t) -> c_int {
// The ioctl call gives us the pseudoterminal number, but then we need
// to convert the number to text _without_ using formatting calls.
let mut ptsnum: c_int = unsafe { MaybeUninit::zeroed().assume_init() };
if 0 != unsafe { libc::ioctl(fd, libc::TIOCGPTN, &ptsnum) } {
return -1;
}
// This block of code is roughly equivalent to a very limited itoa() call.
// We can make a couple of simplifying assumptions, such as a hard limit
// on the size of the pseudoterminal number.
const MAX_DIGITS_U32: usize = 10;
let mut ptsbuf: [u8; MAX_DIGITS_U32+1] = [0; MAX_DIGITS_U32+1];
let mut i = MAX_DIGITS_U32;
while ptsnum > 0 {
i -= 1;
let digit = ptsnum % 10;
// 0x30 = '0'. Depends on the character encoding being UTF-8
ptsbuf[i] = (0x30 + digit)
.try_into()
.expect("can't convert digit to u8");
ptsnum /= 10;
}
// CString is far easier to use, but it requires a heap allocation, so
// we use CStr instead.
let ptsstrlen = MAX_DIGITS_U32 - i;
let ptsstr = CStr::from_bytes_with_nul(&ptsbuf[i..]).unwrap();
// The rest of the function is just copying bytes around so we can end up
// with a buffer containing b"/dev/pts/<ptsnum>\0"
let path = b"/dev/pts/\0";
let pathlen = path.len() - 1;
if pathlen > buflen {
return -1;
}
let path = CStr::from_bytes_with_nul(path).unwrap();
unsafe { path.as_ptr().copy_to(buf, pathlen) };
if pathlen + ptsstrlen > buflen {
return -1;
}
unsafe {
ptsstr.as_ptr().copy_to(buf.add(pathlen), ptsstrlen);
*buf.add(pathlen + ptsstrlen) = '\0' as c_char;
}
0
}
#[cfg_attr(not(test), no_mangle)]
fn main() -> i8 {
let master = unsafe { libc::posix_openpt(libc::O_RDWR) };
if master < 0 {
panic!("posix_openpt");
}
if 0 > unsafe { libc::grantpt(master) } {
panic!("grantpt");
}
if 0 > unsafe { libc::unlockpt(master) } {
panic!("unlockpt");
}
match unsafe { libc::fork() } {
-1 => panic!("fork"),
0 => {
let mut slave_name: [c_char; 64] = unsafe { MaybeUninit::zeroed().assume_init() };
no_printf_ptsname_r(master, slave_name.as_mut_ptr(), 64);
let slave = unsafe { libc::open(slave_name.as_ptr() as *const c_char, libc::O_RDWR) };
if slave < 0 {
panic!("open");
}
for fd in 0i32..3 {
if 0 > unsafe { libc::dup2(slave, fd) } {
panic!("dup2");
}
}
// Child execs a shell (or other program) that can be "driven" by
// the parent program.
// unsafe { libc::execl(b"/bin/sh\0".as_ptr() as *const c_char, ptr::null()) };
return 0;
}
_ => {
// parent communicates with child through the master side of the pty
// read/write calls here (from stdin/stdout or a socket or whatnot)
return 0;
}
}
}
EOF
#
# Cargo toml with custom release and bloat profiles
#
cat << EOF > $rustproject/Cargo.toml
[package]
name = "$rustproject"
version = "0.1.0"
edition = "2021"
[dependencies]
libc = "*"
[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
panic = "abort"
strip = true
[profile.bloat]
inherits = "release"
strip = false
[[bin]]
name = "before"
path = "src/before.rs"
[[bin]]
name = "after"
path = "src/after.rs"
EOF
cd $rustproject
echo '[+] comparing bloat of before and after targets'
echo ' bloat with formatting'
cargo +nightly bloat --target x86_64-unknown-linux-musl --profile bloat --bin before -Zbuild-std=std,core,panic_abort -Zbuild-std-features=panic_immediate_abort 2> /dev/null
echo
echo ' bloat without formatting'
cargo +nightly bloat --target x86_64-unknown-linux-musl --profile bloat --bin after -Zbuild-std=std,core,panic_abort -Zbuild-std-features=panic_immediate_abort 2> /dev/null
The
cargo bloat
commandcargo +nightly bloat --target x86_64-unknown-linux-musl --profile bloat --bin before -Zbuild-std=std,core,panic_abort -Zbuild-std-features=panic_immediate_abort 2> /dev/null
closely approximates the real world program compilation options. See Resources and Appendix for how and why I used these options.↩︎
Mostly using Rust’s todo! macro to remove code and recheck the bloat results.↩︎
ptsname_r
is the thread-safe version ofptsname
. musl implementsptsname
as a call toptsname_r
.↩︎