August 2022 – [wt : 4]

Piecemeal

Apropos nothing (a lie, of course—it almost always is), I am once again noticing how incoherent and ad hoc my areas of knowledge and expertise seem to be. I can’t help getting the sense when approaching a topic such as linear algebra—which I use a lot and have been in various ways for literally decades—that everything I know about it is sort of cobbled together, accreted erratically—perhaps organically would be more charitable—jury-rigged from disparate, disreputable sources on a strictly need to know basis. Whereas every other user has—surely—learned the subject in a nice clean, well-structured and elegant fashion, built up piece by precision-engineered piece from rock solid foundations to flying buttresses, assembled into an unassailable edifice of logic and proof.

Or so I imagine.

Where is your beautiful theory?

This is just an example, of course. The same applies to everything from finance to pharmacology, modelling to music, arsehole to breakfast time. I know, or at least have known, a lot of things, a lot of fragments and random bits of stuff, but it doesn’t feel like it all adds up to anything very systematic or lucid, anything that could be persuasively interpreted as understanding.

Or it’s just routine imposter syndrome rearing its head again, as the new term looms with its complement of new tasks. I’m basically gaslighting myself at this point. I should probably stop.

Notebook

I’ve adapted the music generation experiments from the other day to Google Colab here. Not that I expect anyone to go look at it anytime soon — let alone run it — at least not from a discreet link tucked away in this hillbilly offramp from the information superhighway. But some future iteration may (eventually) become useful in some (as yet unclear) way to some (possibly imaginary) students.

Colab is both a blessing and a curse in these matters. On the one hand, Google dispense compute resources profligately to anyone who wants them (albeit with no guarantees and somewhat unpredictable constraints); on the other hand, it’s all delivered via the medium of Jupyter fucking Notebook.

I don’t want to get into a tedious round of that thing you like is bad, actually, but I’m very much not a fan of Jupyter. I do appreciate that its web based UI can be handily draped over an online and/or virtual backend to alleviate (or at least delegate) some of the dependency hell that besets complex software these days. Setting up a functioning work environment for data science and deep learning can be a toothgrinding business — a huge chunk of student queries and TA workload for COMP0088 last year arose from that — so it’s not really surprising that people will flock to anything that eases the burden a bit. But that doesn’t mean said web based UI isn’t also a problem in its own right.

Jupyter represents the lonesome death of software engineering. It actively militates against modularisation and scoping. Why bother to define a function when you can select a cell and run it again? It’s like Edsger Dijkstra and Niklaus Wirth died for nothing*.

Unless you really strive not to, you’ll wind up dumping everything into a global namespace that can be acted on — created, changed, deleted — unpredictably, out of order; and worse, that can persist unacknowledged between runs. When you load up a notebook there might be any amount of unknowable state lurking in it to bite you in the arse.

Python and Markdown are sometimes abstruse, but fundamentally they are made up of readable text. Notebooks are binary files full of crap. They contain nice textual code, but they are so much more than that — and by the same token, so much less. They suck for version control and configuration management. They suck for automation. They suck for reusability — and, most of the time, for plain usability.

Basically — I guess my position on this is not exactly a secret by this point — they suck.

It’s only fair to note that the notebook concept didn’t originate with Jupyter. I first encountered it in Stephen Wolfram’s amazing yet also awful Mathematica, and it may well predate that. The same decomposition into functionally unordered code cells interspersed with rich text documentation went on to infect Matlab and R too. But similar fundamental objections apply wherever it rears its ugly head. Jupyter strikes me as the worst iteration yet, but that may just be because it’s the one I used today.

Anyway, you pays your money and you takes your choice. Jupyter is the face of Colab, so everyone uses it. Even — apparently, occasionally, for my sins — me.

* Yes, yes, Wirth is still alive and Dijkstra wasn’t exactly killed in hand-to-hand combat with a GOTO statement, but you know what I mean.

Patchwork

One of the modules I’m likely to be mucking in on next year is the brand new Auditory Computing. It’s not at all clear what I’ll be doing on it, and I haven’t even seen the syllabus yet, but one of the potential tasks might be the setting of coursework. In idle and probably misdirected preparation for that I’ve been having a bit of a play with generating music using deep learning. People have tended to use LSTMs for this, but I thought it would be fun to try Andrej Karpathy’s neat little implementation of the notorious GPT. Training data is from the Classical Music MIDI dataset.

The results aren’t exactly going to be headlining the Last Night of the Proms, but some are quite cute, I think:

GPT-nano, trained on Mozart, Bach & Haydn

GPT-micro, trained on the whole dataset

You can definitely hear the model regurgitating memorised fragments. But don’t we all do that?

The task is treated as a language modelling one, with a vocabulary of chords and durations. To somewhat reduce the vocab size and increase the representational overlap I’ve pruned chords to no more than 3 notes at a time. A snippet of code for this is below — not because I expect anyone to read or reuse it, really this is just to test out the syntax colouring WordPress plugin that I’ve just installed.

def simplify ( s, limit, mode='low', rng=local_rng ):
    """
    Drop notes from big chords so they have no more than `limit` notes.
    NB: operates in place.
    
    Drop strategies are pretty dumb. We always keep the highest and lowest notes
    (crudely assumed to be melody and bass respectively). Notes are dropped from
    the remainder according to one of three strategies:
    
        'low': notes are dropped from low to high (the default)
        'high': notes are dropped from high to low
        'random': notes are dropped randomly
    
    Latter could actually increase vocab by mapping the same input chord
    to several outputs. Modes can be abbreviated to initial letters.
    """
    if limit < 2: limit = 2
    
    drop_func = {
                    'r' : lambda d, c: rng.choice(d, c, replace=False),
                    'h' : lambda d, c: d[(len(d)-c):]
                }.get(mode.lower()[0],
                      lambda d, c: d[:(c-len(d))])
    
    for element in s.flat:
        if isinstance(element, MU.chord.Chord):
            if len(element) > limit:
                drop_count = len(element) - limit
                drops = [ nn.pitch.nameWithOctave for nn in element ][1:-1]
                
                if len(drops) > drop_count:
                    drops = drop_func(drops, drop_count)
                
                for note in drops:
                    element.remove(note)

Perhaps that will get more use in future, if all this coheres and I start working more of this out in public. Perhaps not.

Solidity

Today in “things that are unlikely to become clear to anyone anytime soon”:

Addendum: 3d software is always and inevitably difficult and, y’know, good luck to all who sail in her and all that, but I really don’t like SketchUp.