Seguir

This paper on a malloc() replacement that DOES COMPACTION even on C/C++ is making the rounds: arxiv.org/pdf/1902.04738.pdf

Scarily beautiful.

@federicomena
It's been what... forty years since personal computers, and such an elegant solution JUST appeared?

@federicomena woah, that's cool. i wonder if anyone has tried it with rust yet?

@balrogboogie no idea. It would be nice to measure!

Also, I suppose it could really help with everything-on-the-heap languages...

@federicomena Interesting! Well worth a read!

Strikes me that GNOME partitions their heap quite similarly with their "Slice" allocator, so I wonder how quickly they would adopt this?

@alcinnz I'm afraid glib's slice allocator is pretty obsolete these days 😅

I think it was nice when system malloc() was slow, but those have improved a lot. Also, unfortunately Glib's slice allocator could never really use the recycle-to-default-values scheme from Solaris's original allocator.

I don't remember if the GTK team had plans to remove GSlice and just use malloc... maybe @hergertme remembers?

@federicomena @hergertme From what I'm seeing it certainly doesn't look absolete.

I still see it used heavily in plenty of GNOME code, and in the code generated by Vala.

@alcinnz @federicomena @hergertme obsolete in the sense that it doesn't provide any advantage, not in the sense that no code uses it

@federicomena @alcinnz @hergertme

GLibc malloc was always on par with GSlice performance. GSlice was faster than non-glibc malloc() impls and it's more memory efficient, because it doesn't have to store boundary tags (2*size_t fields before each memory block that contain the memory size free() needs to know about).

True to the original slab magazine paper, it only recycles memory back to the kernel after a timeout of several seconds, a fact the Gitlab bug benchmarks don't reflect.

@alcinnz @federicomena
The GSlice allocator doesn't suffer from the catastrophic fragmentation described in the paper intro. Basically, it has per-thread aches for objects of the same size and it allocates and recycles objects of the same size from a single page, which prevents large fragmentation due to widely varying object sizes.

@federicomena whoa, I haven't read the whole paper yet and already can't wait to see this used in the glibc allocator

@bugaevc @federicomena heh, I was just thinking I should gank their code and dump it into glibc verbatim and dare anyone to object

(glibc malloc has a bad case of territorial maintainer)

@federicomena Ooh, that's clever. I was ready to ramble about how that can already work at the OS level but they addressed that concern right away and added another layer to it that cleverly fixes the problem with that.

@fluffy I'm boggling at the part where it installs a segfault handler to catch writes to pages that it is in the process of compacting. Truly a lovely, scary hack.

@federicomena @fluffy yikes, that might kill any chance of putting it into a general purpose C library, signal handlers in general can only be set by the application

@zwol @fluffy I'd love to see a survey of how programs use signal handlers 🧐 It's hard for me to swallow these together:

* sigsegv as "game over, program is buggy"
* sigsegv as "perfectly legitimate way to catch writes to a page, who's your VMM now"

And of course I'm biased, but using this allocator on a memory safe language seems like a wonderful opportunity.

@federicomena @zwol chained handlers are a thing and segfaults are already a natural part of how mmap() et al work. Not to mention virtual memory in general. There’s definitely some care you need to take in chaining sigsegv due to the prevalence of crash reporters and such though. I don’t see this as something you’d want to LD_PRELOAD willy-nilly

@fluffy @zwol yeah, the paper mentions that they only do their wait-until-remapped thing in the sigsegv handler only if the pointer in question is known to the Mesh allocator. I haven't read the code to see if it chains to other handlers if it isn't, but

a) that sounds like the right thing to do, anyway;

b) I have absolutely no clue how they deal with the order of initialization of the handlers :)

@federicomena @zwol yeah and I assume the allocator itself tries to encourage offsets to be non-overlapping in the first place. But that’s just preloading the fragmentation instead...

@federicomena @zwol I guess my big question about this is whether it’s actually beneficial in actual usage - intuitively I feel like the chance of two pages having non-overlapping allocation offsets isn’t going to be much higher than the chance of a page having no allocations at all, and the OS VMM can already defragment physical allocations in the latter case.

@fluffy @zwol they prove some interesting bounds in the paper; they use a randomized allocation strategy precisely to encourage pages to mesh. (They don't do per-page meshing, they do spans of multiple pages. It's pretty well thought out.)

@federicomena @fluffy I was thinking about this with my “occasional contributor to glibc” hat on, and the issue there is that signal handlers are process globals, which means libraries mustn’t touch them. Also, come to think of it, nothing stops you from using sigprocmask to block delivery of SIGSEGV and (hurriedly writes test program) this causes both Linux and BSD kernels to kill the process instead of invoking a handler.

@federicomena @fluffy Chained handlers are a thing, yes, but not one I would consider reliable enough to use in the guts of malloc. That’s me though.

@zwol @federicomena yeah that’s absolutely fair and I would expect anyone who’s using this allocator to specifically know they’re doing it and only use LD_PRELOAD as a reliable means of overriding the libc one. Because overriding a default allocator in C++ at the language level is a gigantic pain in the butt.

@fluffy @federicomena I’m pretty seriously thinking about writing a variant that stops the world during copies and doing some benchmark bake-offs. It might not be slower, and stopping the world is, oddly enough, something the C library *can* safely do

@zwol thanks, now I can't unsee that loop. But it *is* pretty clever 😮

@federicomena
> For example, on low-end Android devices,Google reports that more than 99 percent of Chrome crashesare due to running out of memory when attempting to dis-play a web page

surely the solution is to write some allocator trickery. yup.

@grainloom heap fragmentation is a real problem; I don't think they mention the Chrome factoid other than for illustrative purposes.

@federicomena I assume they're updating pointers by making pages inaccessible and using a segfault handler? That's a lot more practical now that everything is 64 bit. I guess the need to keep the mappings around isn't that big a deal if you don't move objects gratuitously.

I'd been thinking about a similar approach for object-level virtual memory/persistence, inspired by "Pointer Swizzling At Page Fault Time" and the implementation of libgc.

@federicomena Another idea I had is to achieve some level of memory safety by never reusing addresses. This would work really well with a compacting allocator.

@freakazoid they don't update pointers. They move an allocated block to another page, at the same page offset it had before, and remap the old page to point to the new one.

@federicomena That's not even theoretically ...

> Hound first searches for pages whose live objects do not overlap. It then copies the contents of one page onto the other, remaps one of the virtual pages to point to the single physical page now holding the contents

Oh! Right! I stand corrected! Wow!

And then Mesh adds two randomized algorithms to reduce the risk of unmeshable pages and to quickly find meshable pairs of pages.

I am in awe.
Regístrate para participar en la conversación
MaSToDoN.MX

Mastodon es una red social basada en protocolos web abiertos y software libre y de código abierto. Está descentralizado como correo electrónico.