Venti notes

Major flaws

There are dozens, possibly hundreds, of sysfatals. There is no graceful error handling. If the cache is unable to grow, for instance, instead of falling back to erasing old items to make room, Venti crashes. This is incredibly stupid. The only such "abort on error" instances that should be allowed are a) during initialization, and b) truly unrecoverable errors. "I didn't bother hooking this up to the existing cache eviction" clearly does not meet either requirement.

Note: there may be legitimate reasons during runtime to decide the process needs to die, because of irrecoverable errors. This is reasonable. Sysfatal means that we’re not making any attempt at cleaning up first, which seems like a bad idea when there may be in-progress writes - need to look into how that interacts. For instance, any attached unvac processes will hang indefinitely if memory allocation fails, at present.

I’ve mostly fixed the memory layer. Here are the old notes, though:

Memory allocation is simply broken. The pool allocator hard-caps individual allocations at 2G, and needs to be configured to allow more than 2G total. Rather than address those underlying issues, venti uses a custom memory allocation layer - which, of course, wraps around the pool allocator, thus keeping all of its issues, while adding several new ones.

Firstly, Venti’s memory allocation uses signed 32-bit integers for sizes, currently. I’ve personally had that patched out for months now, but I need to further analyze the impact fixing this has on the codebase before I’m confident enough in the myriad of cascading changes to push it upstream. I have not had any issues, but that is not the same as "there are not any issues," which is what I need to be sure of before I’m willing to truly commit this.

Second, the so-called "brk" function (vtbrk) is just a wrapper around vtmallocz that guarantees specific alignments. It locks access to vtmallocz, which, okay, makes sense - but if there’s a need for locking inside mem.c, why the hell isn’t it in vtmalloc proper? vtbrk is, effectively, an arena allocator wrapped around the pool allocator, which can never be freed.

Both of those issues have now been fixed. Additionally, the ezmalloc layer in venti/srv has been subsumed into vt*.

Packet management is stupidly complicated at present. The entire packet layer needs to be nuked long term. Might be fine to just use debugpacket.c, which doesn’t appear to include any debugging info, and seems to just be a simpler implementation...

Note: the packet.c layer provides a roughly 3% speedup over debugpacket.c, with debugpacket.c patched just enough to barely work, in exchange for a mound of crappy code; as such, it’s been deleted in my fork. Furthermore, the Packet APIs are largely awful. Many of them were never used, or were used in a single place where a better API already exists.

One example: libventi/client.c creates a foreign packet out of an existing buffer just so it can pass it to vtwritepacket. If a vtwrite function existed which could write a raw buffer, that would be entirely unnecessary, and would remove the only use of packetforeign that registers a custom data freeing function. However, that custom freeing function is for a buffer allocated via vtmalloc, so it can be removed anyways!

libventi architecture - file structure

cache.c: VtBlock caching, exists only in memory

Uses a hash chain to store Venti blocks, and an array for "local blocks." What does it mean for a block to be local? libventi is used for clients *and* the server; is this for local caching in the libventi client? Also appears to depend on heap behavior? "free blocks are in the heap, which is supposed to be index by second-to-last use but actually appears to be last use" - need to check if the grammar there is correct (whether it’s actually talking about the heap, and not the free blocks), and simplify the caching system.

client.c and server.c: client and server for the venti protocol, respectively. server.c spins off processes to handle each received request - it might be worthwhile to take the hit of using a single process to make debugging easier, though of course that would be temporary and should never be committed. Neither the client nor server appears to directly abort on error, but I believe that errors would never got propagated to them anyways.

mem.c: A very stupid thin wrapper over malloc and free. Should be eliminated entirely, in the long run. Short-term: need to replace the use of vtmallocz in vtbrk with sbrk+memset. Should probably keep the lock unless I’m certain it’s not needed. Maybe add locking to vtmalloc{,z} as well?

mem.c is basically as good as it can get; the long-term goal of removing it remains, but is a very low priority.

debug.c: just contains vtdebug, which is just a thin fprint wrapper which does nothing if the connection’s debug flag is not set.

debugpacket.c: Simpler packet layer. I’m in the process of cleaning it up and debugging it; this should simplify the code, reduce bugs, improve maintainability, and improve performance in the long run.

packet.c: a very much not simple packet implementation. Includes such reasonable features as: packet fragmentation, packet splitting which (ab)uses fragments such that a packet can contain fragments which are OWNED BY OTHER FUCKING PACKETS, inconsistent memory allocation (malloc and vtmalloc) WITHIN ANY GIVEN PACKET, and unbounded memory allocations. Obviously, this is all absolutely necessary for any sort of performant communication. The total benefit, as benchmarked against debugpacket.c as a baseline? 3%. Worse, cleaning up the crappy design of the packet layer more than recoups this 3% anyways, thus rendering the entire value proposition of packet.c into a JOKE.

file.c: manages VtFile trees. "There may be more than one VtFile structure for a given Venti file," "there’s just mutable local blocks and immutable venti blocks." Local blocks again... need to find out more about those. Need to look into the venti disk layout docs.

entry,fcall,file,root: these are all related to details of the on-disk format which I don’t currently know.

queue.c: job queue?

strdup.c: Why? Why is there a vtstrdup function whose behavior is identical to strdup? Oh, that’s right - so it can use vtmalloc instead of malloc, because obviously that’s extremely important and not in any way indicative of a really stupid design. Should probably chuck this into mem.c.

zeroscore.c: hard-coded score of a zero-sized block. Definitely not going to break if and when we e.g. migrate to SHA2. Other than that, this part is actually reasonable.

The important caches, in terms of bugs which currently exist in 9front, are not in libventi, but in venti/srv.

Memory allocation

Exposed functions for allocation:

• vtmalloc

• vtmallocz

• vtrealloc

• vtbrk

• vtstrdup

• ez* versions of all of the above (these have been eliminated in favor of the vt* variants)

DONE

• Checked types of all mem.c clients, ensure it’s safe to change to ulong

• Fix sites where it’s not

• Switched to ulong

• vtfree is literally just if(p != 0)free(p), but free(nil) is a nop, so it’s just equal to free(p) - this can just be removed from mem.c and replaced everywhere else textually, since there’s no special meaning to vtfree above free.

• Nuked packet.c.

• Migrated the caches to use vtbrk for permanent allocations instead of vtmalloc. This allows for significantly larger sizes.

• Did performance testing. packet.c is overcomplicated, but if there’s a measurable, significant benefit to using it, it’ll indicate that debugpacket.c might not be good enough, and a simpler method of improving performance will need to be found. The results: packet.c is trash, debugpacket.c is at worst 3% slower, and will probably be faster with a cleaned up implementation.

TODO

• Finish

• Look into the on-disk layout and the caching system and get an understanding of how it all fits together, and how each piece works.

• Audit the small stuff - make sure there’s no glaring oversights.

• ADD ERROR HANDLING. No more of this, "we didn’t expect you to specify 16G caches so we’re going to SILENTLY CRASH" bullshit.

• Fix known fossil bugs. Expose an internal tracing mode for usage with automated testing. Enable said automated testing.

• Once the code is cleaned up, better tested, and we’re more confident that it’s correct and working properly: look into improvements to Venti proper beyond just implementation details.

• Follow up with mycroft re: changing the hash.

• Follow up with ori: re using gear hash and incremental chunking, and designing a new, modern arena format

• Move direct uses of vtmalloc, vtmallocz, vtrealloc, and vtfree to malloc, mallocz, and realloc respectively.

• setmalloctag and setralloctag are only needed due to the wrappers.

• Need to actually handle the result - in some places, sysfatal may well be appropriate. In others, more graceful error handling may be desirable - at which point the only question is how much the implementation will need to be reworked to support it.

• Once all uses are moved, remove the functions entirely. This will leave vtbrk as the only function in mem.c, and it will wrap around sbrk instead of vtmallocz.

• At this point, the memory allocation layer will be gone, and at the very least the path towards graceful error handling will be opened.

Clients:

• vtbrk is NOT used by cmd/venti, so the only uses are in libventi’s packet.c

In libventi

cache.c:    c = vtmallocz(sizeof(VtCache));

cache.c:    c->hash = vtmallocz(nblock*sizeof(VtBlock*));

cache.c:    c->heap = vtmallocz(nblock*sizeof(VtBlock*));

cache.c:    c->block = vtmallocz(nblock*sizeof(VtBlock));

cache.c:    c->mem = vtmallocz(nblock*c->blocksize);

conn.c: z = vtmallocz(sizeof(VtConn));

debugpacket.c:  p = vtmallocz(sizeof *p);

debugpacket.c:  q->data = vtmalloc(n);

fcall.c:            f->crypto = vtmalloc(f->ncrypto);

fcall.c:            f->codec = vtmalloc(f->ncodec);

file.c: r = vtmallocz(sizeof(VtFile));

log.c:  s = vtmalloc(nname*sizeof(char*)+size);

log.c:  l = vtmalloc(sizeof *l + nc*(sizeof(*l->chunk)+LogChunkSize) + strlen(name)+1);

queue.c:    q = vtmallocz(sizeof(Queue));

queue.c:    e = vtmalloc(sizeof(Qel));

rpc.c:  r = vtmallocz(sizeof(Rwait));

server.c:   s = vtmallocz(sizeof(VtSrv));

server.c:       sc = vtmallocz(sizeof(VtSconn));

server.c:       r = vtmallocz(sizeof(VtReq));

strdup.c:   ss = vtmalloc(n);

string.c:   s = vtmalloc(n+1);

debugpacket.c:  p->data = vtrealloc(p->data, p->len+n);

debugpacket.c:  p->data = vtrealloc(p->data, p->len+n);

packet.c:       p = vtbrk(sizeof(Packet));

packet.c:       f = vtbrk(sizeof(Frag));

packet.c:       m = vtbrk(sizeof(Mem));

packet.c:       m->bp = vtbrk(nn);

In cmd/venti:

copy.c: buf = vtmallocz(VtMaxLumpSize);

copy.c: buf = vtmallocz(VtMaxLumpSize);

randtest.c: buf2 = vtmalloc(blocksize);

randtest.c: buf2 = vtmalloc(blocksize);

randtest.c: buf = vtmalloc(blocksize);

randtest.c: order = vtmalloc(packets*sizeof order[0]);

randtest.c:         buf = vtmalloc(blocksize);

randtest.c: template = vtmalloc(blocksize);

read.c: buf = vtmallocz(VtMaxLumpSize);

readlist.c: buf = vtmallocz(VtMaxLumpSize);

ro.c:   buf = vtmalloc(r->tx.count);

root.c: buf = vtmallocz(VtMaxLumpSize);

write.c:    p = vtmallocz(VtMaxLumpSize+1);

srv/arena.c:        arena->cig = vtmalloc(1);

srv/arena.c:    cig = vtmalloc(ncig*sizeof cig[0]);

srv/arena.c:    ci = vtmalloc(ArenaCIGSize*sizeof ci[0]);

srv/bloom.c:    b = vtmallocz(sizeof *b);

srv/bloom.c:    data = vtmallocz(b->size);

srv/bloom.c:    data = vtmallocz(b->size);

srv/checkindex.c:       newbloom = vtmallocz(sizeof *newbloom);

srv/checkindex.c:       newbloom->data = vtmallocz(oldbloom->size);

srv/cmparenas.c:    data = vtmalloc(blocksize);

srv/cmparenas.c:    data1 = vtmalloc(blocksize);

srv/fixarenas.c:        sb->hist = vtmalloc(sb->nhist*sizeof *sb->hist);

srv/fixarenas.c:    bci = vtmalloc(nci*sizeof bci[0]);

srv/fixarenas.c:    an = vtmallocz(sizeof *an);

srv/fmtbloom.c: b.data = vtmallocz(size);

srv/hdisk.c:    blk = vtmalloc(8192);

srv/hdisk.c:    table = vtmalloc(ap->tabsize+1);

srv/hdisk.c:    blk = vtmalloc(HeadSize);

srv/hdisk.c:    blk = vtmalloc(head.blocksize);

srv/hdisk.c:    blk = vtmalloc(arena->blocksize);

srv/hdisk.c:    blk = vtmalloc(arena->blocksize);

srv/hdisk.c:    blk = vtmalloc(ClumpSize + VtMaxLumpSize);

srv/hdisk.c:        blk2 = vtmalloc(VtMaxLumpSize);

srv/icache.c:   ih = vtmallocz(sizeof(IHash)+size*sizeof(ih->table[0]));

srv/icache.c:   icache.entries = vtmallocz(entries*sizeof icache.entries[0]);

srv/icache.c:   icache.sum = vtmallocz(scache*sizeof icache.sum[0]);

srv/icache.c:   icache.sum[0] = vtmallocz(scache*sizeof icache.sum[0][0]);

srv/icache.c:   icache.sentries = vtmallocz(scache*ArenaCIGSize*sizeof icache.sentries[0]);

srv/png.c:  zw.buf = vtmalloc(IDATSIZE);

srv/reseal.c:   data = vtmalloc(blocksize);

srv/verifyarena.c:  data = vtmalloc(blocksize);

srv/arena.c:        arena->cig = vtrealloc(arena->cig, (arena->ncig+1)*sizeof arena->cig[0]);

srv/buildindex.c:       isect = vtrealloc(isect, (nisect+1)*sizeof(isect[0]));

srv/fixarenas.c:                sb->hist = vtrealloc(sb->hist, sb->nhist*sizeof *sb->hist);

srv/fixarenas.c:        cibuf = vtrealloc(cibuf, mcibuf*sizeof cibuf[0]);

srv/fixarenas.c:            an->map = vtrealloc(an->map, (an->n+1)*sizeof an->map[0]);