Because bzip2 was written for 32-bit machines, and yet it must handle 64-bit files, the C code does not use "long" nor "size_t" anywhere (they would be 32-bit on those platforms).

This should clean up nicely in , but the code will have to be careful not to convert file sizes to usize!

@federicomena Porting the code to C99 would already likely improve its readability. Relying on `size_t` and sized integer types in a code base that deals with files, parsing, and binary data is kind of a must.

@ebassi @federicomena Hi. I'm new to this. Why is using size_t a must for parsing files?

@aeveltstra @federicomena size_t is defined to be able to contain the maximum size of a representable type; this value is platform-defined, and can be bigger or smaller than an unsigned integer. If you're reading and writing data on disk or over the wire, you should always strive to use types that have a well-defined size.

@ebassi @federicomena Understood. What if you have no good way to predict file size because the program is portable and could be used to read small as well as gigantic files? Wouldn't that make a choice for a defined data type ill-fit?

@aeveltstra @federicomena size_t is for sizes in memory. If you're looking at file sizes and offsets into them, always use off_t, as it can be resolved to 64 bit sizes even for 32 bit applications.

@ebassi @aeveltstra Yup. For example, note how fs::Metadata::len() explicitly returns u64 in Rust.

In C, size_t fits a pointer, so it depends on the architecture - it's the same as Rust's usize.

In C, off_t fits file sizes, for example, it's the type of struct stat's st_size field. For compatibility with old code, -D_FILE_OFFSET_BITS=64 makes the system's header files turn off_t into a 64-bit type, and stat into stat64 - which one should basically always use nowadays.

@ebassi @aeveltstra One particularly nasty bit is that mmap() is

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

Which means that on a 32-bit system with size_t being 32-bit and off_t being 64-bit, you can only map up to a 4 GB window at a time from a huge file. Code which blindly does something like

fd = open(...);
fstat (fd, &st);
p = mmap(NULL, st.st_size, ..., fd, 0);

is broken, since passing statbuf.st_size like that will truncate the off_t.

@federicomena @aeveltstra Things that happen when the OS-level API is cobbled together across four decades of accidental collaboration

@ebassi @aeveltstra the whole "rust stdlib without libc underneath" seems more and more attractive all the time :)

@federicomena Hi. I'm new to #rustlang. What problems are caused by resizing files as usize?

@aeveltstra @federicomena That's because usize on a 32-bit OS is a 32-bit integer. Even 32-bit operating systems use 64-bit integers when interacting with files and disk devices.

If files and disk devices were limited to what's available in a usize, then a file or device could not be larger than 4GB.

Regístrate para participar en la conversación

¡Primer servidor de Mastodon de México!

Siéntete libre de unirte a esta instancia e invita a todos tus amigos a unirse, entre más gente haya más divertido

Lee atentamente las reglas aquí: /about/more

Si tienes problemas ó deseas reportar a algún usuario o instancia (spam, porno, insultos, etc.), contacta a @maop (aquí y en twitter)

NO SE ACEPTAN BOTS DE MARKETING. Se darán de baja todos los bots de marketing sin excepción y sin aviso