How patten used text-to-audio AI to make an entire album: "We're at the precipice of a fundamental shift in how we think about making music"

Matt Mullen

Fri, May 19, 2023 at 10:48 AM UTC

25 min read

Experimental is a word that’s thrown around a lot these days. It’s fair to say that over time, its omnipresence in creative discourse has gradually diluted its meaning, the adjective effectively reduced to denoting anything that, however timidly, colours even a shade outside of the well-defined lines that popular music has drawn.

There are few artists, though, that approach their work in a way that’s truly, quite literally, experimental: testing new techniques and investigating new ideas, they challenge established musical paradigms, making discoveries that, at their best, advance our collective understanding.

Electronic musician, visual artist and lecturer Damien Roach has spent close to two decades experimenting with sound, images and concepts under the alias patten. His serpentine career path has seen him release nine full-length projects and more than a dozen EPs, remix Bj?rk and Giorgio Moroder, perform at London’s Tate Modern, design album art for artists like Caribou and Nathan Fake, construct immersive audiovisual installations and teach students at the London College of Communication.

While much of his early work found a home in the storied Warp Records catalogue, Roach now releases music through 555-5555, a record label and creative agency he founded in 2018.

Though seemingly disparate, Roach tells us that these projects collectively form an interdisciplinary web of ideas that seek to examine the nature of creative thought. Across every facet of his work, Roach says, he’s chiefly interested in pursuing creative impulses that are “counter-hegemonic: things that are intended to explore outside of the realm of that which we already know”.

With every new release, he pokes and prods musical norms and artistic conventions, probing for the broken seams through which new and untested ideas might spill forth. Roach’s latest project, Mirage FM, is perhaps his most experimental yet.

Mirage FM is the first album to be made entirely from samples produced by generative artificial intelligence. The samples in question were created with an AI-powered text-to-audio sound generator called Riffusion. Where projects like Google’s MusicLM and OpenAI’s Jukebox have used AI to create novel sounds, musical ideas and even entire tracks from scratch, Riffusion takes a slightly different tack.

Its creators have repurposed Stable Diffusion, an AI model that generates images from text prompts, to produce images of spectrograms, which are visual representations of sound. Those spectrograms can then be converted into audio, meaning that Riffusion can generate sounds in response to text input.

Type in any prompt that you can think of - anything from ‘electric guitar solo’ to ‘serialist G-funk theme tune for ‘70s cop show’ - and Riffusion will begin playing a short, looped clip based on the prompt you’ve entered.

Though the music it produces is garbled, lo-fi and often completely unrelated to the text input, the results are nonetheless absolutely fascinating. The fact that these sounds are being generated in real time, summoned instantaneously from the digital ether without the involvement of any synths or samples, is undeniably mind-blowing.

You get the sense that this is a kind of music that simply couldn’t have been made without this technology. The way that the AI interprets instructions - and crucially, the way that it fails to interpret instructions - is utterly singular: in its confused, imperfect attempts to imitate the sound of humanity, there’s something startlingly new to be found, something bewitchingly inhuman.

Upon discovering Riffusion, Roach was curious to see how it could be used as a tool in the studio. After spending several late nights feeding the app with a variety of prompts, he patiently combed through the results to find sounds that caught his ear. These were then arranged, manipulated and stitched together in software to produce Mirage FM’s 21 tracks.

Roach’s brief manifesto for the record reads “crate-digging in latent space”, likening his process to the way a producer like DJ Shadow or Madlib might rifle through crates of vinyl, hunting for samples with which to craft a beat: except Roach’s sample-digging didn’t take place within the four walls of a record shop, but instead the digital field of infinite possibility opened up by artificial intelligence.

The 21 tracks that make up Mirage FM are captivating, entirely unique and disorienting in their strangeness

The 21 tracks that make up Mirage FM are captivating, entirely unique and disorienting in their strangeness. Each song is a marvel of oddity, but familiar genres are discernible beneath the mirage: there are bizarre approximations of aqueous techno on Drivetime, smooth jazz on Fade, and sultry R&B on Where Does The Time Go? Alright could be an ‘80s ballad that’s been sliced up Dada-style and ran through a bitcrusher, while Walk With U sounds like what an extraterrestrial might offer us if we tasked it with reproducing old-school grime.

Testament to Roach’s skill as a producer is his ability to recontextualize Riffusion’s disjointed output into tracks that feel somewhat coherent, using carefully applied audio editing and effects processing to create mini-narratives in each track, moments of tension and release that help lend Riffusion’s jumbled miniatures a recognisable shape. He’s perhaps ideally prepared to take on such head-scratching source material, having spent a career producing abstract, leftfield beats that turn conventional musical structures on their head.

We spoke with patten following the release of Mirage FM to find out more about how the album was made, unpack the ideas behind its conception and discuss the implications of generative AI for musicians, producers and society as a whole.

Could you talk us through the background to Mirage FM? What led you towards the idea initially?

“I've been working with AI for a long time in lots of different ways, especially across the visual element of what I do. What’s surprising about Riffusion is that the tool can do a lot more than it seems like people are using it for. Sonically and stylistically, it's really shocking.

“It's similar to the situation with AI visuals, whereby one logic would say: okay, you’re giving people a tool where they can really conjure up anything from their imagination. But I’ve been surprised with the imagery that's kind of flourished from the various AI tools that are available, and how often it's derived from the language of cinema, or the language of ubiquitous popular culture.

These tools are extremely powerful, but there's a lot of room for using them in more powerful ways than they’ve been used so far

“You're faced with a text input box in front of DALL-E 2. You can make any image you want and I'm shocked that that image is often more or less Mickey Mouse in space, or Indiana Jones riding a hoverboard. That’s the extent of the field of imagination that's being explored: it’s often very much based on what's already known. There's a combination of pop cultural tropes which are used to produce a new kind of imagery.

“These tools are extremely powerful, but there's a lot of room for using them in more powerful ways than they’ve been used so far. So when I found this Riffusion tool, I was intrigued as to what was possible to do with it, and I was really shocked. Very quickly, after experimenting with it, I became aware of what the possibilities might be. I wasn't planning to make an album at all. It was something that happened through finding this tool and exploring what it could do, then immediately realising that there was a huge potential there.”

How did the process develop from there?

“I started making loads of recordings of things that I was generating from it. Hours and hours and hours of recordings - I was using it constantly in an all day and all night session. As for what was coming out of it - I think the tool is incredible, and very interesting, but it can’t write a song. It can’t create something with a structure that you would recognise as music. Essentially, it can make loops and little sequences that have some relationship to what we'd understand as being music.

“I’ve taken all these tiny snippets and spliced them together. One thing that's interesting is that I’ve tried to have some truth to the form of slightly jittery, broken music that the tool pokes at. I’ve tried to retain some of that, but it wasn't really necessary. I think it'd be very hard for anyone to tell what's intentional, and what’s derived from the process. I wanted to explore a form of music that was almost only suggestive of songs, or almost literally like a mirage.”

What kind of processing were you applying to the samples once they were pieced together?

“There's some reverb, there’s a little bit of pitch-shifting. There’s no MIDI across the whole thing and everything is based on using existing sound as building blocks for the record. There’s lots of EQ, and loads of deep DSP. The dry sound that you get out of Riffusion, it’s very low-bitrate. I was very interested in that as an aesthetic. It reminded me of listening to a mix on dial-up or something, that old, lo-fi, internet-streamed sound. It did make me think about the aesthetics of failure, and how historically those have ended up becoming something where the error is sought out.

I was interested in those moments where what you're listening to speaks about the way that the audio itself has found its way into the world

“The number of plugins that exist today to make something sound like a badly recorded cassette, for example. There’s bitcrushing plugins, and vinyl crackle, and all that sort of stuff. I was interested in those moments where what you're listening to speaks about the way that the audio itself has found its way into the world, then rather than shying away from that reveal, actually moves towards it. I thought, is there some room for the sound of low-bitrate recordings to have a quality which is embraced positively, or explored for its aesthetic qualities?

“So I didn't want to get rid of that entirely, of course, but there's a lot of processing. Even to give the impression of stereo, for example, which has a massive impact on fidelity. Working the sound until it felt rich in some way. For example, the track Drivetime is an interesting one. I suppose you could describe it as sort of a techno track. It sounds like it’s progressing, and hi-hats are coming in and things like that, but that’s all EQ.

“I’ve done a lot with using small bits of sound, and then pulling out certain elements of it using EQ, and maybe sometimes layering the same sound on top of each other, but very heavily. So it might be one layer that’s just this super-high frequency little bit of sound. So I’m using techniques like that to give the idea of progression. There's a lot going on that I don't think is necessarily that perceptible. But it was definitely an interesting process, teasing songs out of all these recordings I'd made.”

What kind of prompts were you giving Riffusion? Was it a scattergun approach or were you honing in on things you were looking to hear?

“A combination of those, actually. As I said, the source material is derived from a very exploratory process, which was about a sense of wondering what was possible, combined with things that I wanted to hear, I guess. I didn't go in with a particular outcome in mind at all. It was really one thing to the next: like, ‘I just want to try this, let me hear this’, as I went along.

I was looking for things that had a simultaneous sort of familiarity and also a strangeness to them

“The amount of material that’s gone into a single, short piece of music was huge. A lot of what was being spat out by Riffusion was not great. You had to wait a long time for something, a tiny little thing that made you say ‘yes, okay, there's something there’. A lot of it was looking for things that had a simultaneous sort of familiarity and also a strangeness to them, that I could kind of tease out and push in the direction I wanted to.”

You’ve compared the process to a new kind of crate-digging, but you’re sampling a neural network as opposed to a human performance. Could you expand on that?

“To step back a little bit and talk a bit about the wider sphere of artificial intelligence and the creative process: there’s a huge potential in being able to conjure up any source material that you want. But not really necessarily thinking about that as being an end result, instead something that you can then fold into another process. In terms of AI, music and sound, it's crazy, isn't it?

“There are lots of questions about sampling and crate-digging in this way. It's like being presented with an infinite record shop, and the limits of what you can find and use are boundless. You’re only bound by the limits of your imagination. It’s mind-boggling, and that's the reality that we're inside of now.

We are at the precipice of a very fundamental shift in the way we think about making all sorts of creative work

“We are at the precipice of a very fundamental shift in the way we think about making all sorts of creative work: visual stuff, music, writing, and so on. There are so many questions that I think it'd be impossible for me to not engage in it. It’s almost unbelievable that that's possible. So for me, I found that really inspiring as an idea. That tagline, crate-digging in latent space, was just a way of trying to communicate what it is.

“If you're not a musician or you haven't explored anything to do with these tools, one of the challenging things is trying to explain to people what it was they’re listening to. I think it is important and it's part of what the music is and what it's about, the way that it's been made, so I was trying to explain that using terms that people already understand. That one phrase is almost like a manifesto for the record and the idea behind it.

“Because it's not the case that I've written a prompt and it's made this thing that I’m then serving to people. There would be no problem with that, as well: I don’t believe in a fixed or concrete connection between the time spent on something and its value. We all know this is a misnomer, right? You could have something that’s been worked on for years and years and years, and it's just not very interesting, or it’s not touching, or it doesn't affect people in any way.

“But somebody could just pick up an acoustic guitar and do something on the fly, and it's just beautiful, and it makes you cry, or makes you laugh. So if you could put in a prompt, and it would spit out a track that was fully formed, there's no problem with that. But in this case, that's not what's happened.”

What are your thoughts on how we should navigate the realm of copyright and authorship when it comes to AI that’s been trained on the work of existing artists? For instance, Getty Images are suing the creators of Stable Diffusion, because their AI model has been trained using Getty’s photographs.

“It's so interesting. One of the most fascinating things about all the discourse around AI and AI-assisted creativity is that we're having to ask ourselves these fundamental questions about, what is creativity? What is inspiration? What's theft? How do we generate ideas? Completely removing AI from the equation for a second, these are the questions we're having to ask ourselves. How does somebody actually come up with an idea? What is that?

“Then coming back to these tools and thinking, how does what's happening with AI-assisted creativity differ from what's happening when somebody is drawing from their own experiences, their knowledge and tastes and desires? I think where the conversation gets super interesting is, where we have been forced to ask these fundamental questions about what creativity is, where it is, and how it happens. There’s no simple answer.

It’s both fascinating and exciting to see these tools come into being - but it's also terrifying

“The lawsuit that you mentioned is based on a slight misunderstanding of the way these systems work. They’re not reproducing things, but they're looking at characteristics, and then embedding those characteristics in new things. Mentioned in that case, there were a small number of anomalies where there were very similar images reproduced by the AI, to images that exist out there. But you have very few cases where this happens. I think that was based on a degree of repetition within the dataset, related to one particular prompt. It shouldn’t happen, really. That was very much an exception to how those things work.

“I'm interested to see how that all plays out. One of the motivations behind making this record is just to talk about that. As somebody who's spent their whole life making things of different types, it’s both fascinating and exciting to see these tools come into being but it's also terrifying, the fundamental shift in the way that we think about creativity and the value attributed to creativity. It throws up a lot of these questions about value.”

Thinking about AI tools like MusicLM that can spit out almost a full track - if someone just types in a three-word prompt, and then releases the audio that’s produced as their own song, what level of authorship can we ascribe to their involvement in that process, in comparison to somebody who labours over a song in the conventional way?

“The traditional way of making music that you refer to, that's a very wide remit. How different is what you described from using a sample bank, or using Splice? A lot of people do that - people make music in very different ways. We're used to that, to the point where it hasn't really become a question - nobody cares if you made that loop by picking a kick drum and picking a hi-hat or whatever, and putting it together yourself. I don't know how much people worry about those things. I'm not saying it doesn't matter. But we're used to some of the technologies that are prevalent today in terms of making music, and those are things that aren't really questioned.

How different is using AI to using a sample bank, or using Splice?

“Again, this is one of the reasons why I think it's really fascinating what's happening with AI, because it's making us ask these questions about, what is it that we value in terms of music-making? What you described, that process of someone getting this tool, putting in a prompt, it generates a song, then they release the song… the question is what kind of level of attribution can we give them, and how creative or talented is that person? That’s the question you’re asking, and it’s based in how we think of music as this very individual, talent-based activity.

“What happens if you remove that craft and artisanal element of the way that we attribute values to things? Does it matter? Is it important that someone spent hours working on something? I don't really know. What I do like is the idea that people who have been restricted from being able to contribute to culture, to music and writing and visual art, because of their skill set not allowing them to produce the things that they're imagining. I do like the idea of people being given the tools to remove that barrier. That seems good, right?

Do you believe that AI could make music composition and production more accessible?

“I think we’re all born creative. Children make music, they imagine things, they dance around, they paint, they draw, they make up stories. That’s the operating system that we start with, right? It gets drummed out of people because we have to be realistic. That creative impulse gets drummed out of people very early in life, in the majority of cases. There’s a level of limitations that are imposed on us. We have this idea that there are creative people, and then there are other people who aren't really creative and that's not what they do. I think that's really sad for individuals, but also for culture as a whole.

You don't have to be able to play an instrument to realise your ideas now, if you don't want to do that. I think that's really good

“Surely, if there's more stuff being made from more minds, we win. There must be people out there with amazing ideas who've just never picked up an instrument before. But they can think of something that maybe for some reason that nobody has thought of - or maybe that doesn't even matter - but something that could touch you, and that's great, right? Surely that's a good thing. There are things that could not exist if it wasn't for these tools.

“If we look at where we're at now, and we go back through time, the accessibility of making music, and making visual art, it's become so much more democratised as time has gone on. You don't have to be able to play an instrument to realise your ideas now, if you don't want to do that. I think that's really good.

“I know there's a lot of fear - and it’s valid - amongst various creative communities, because if someone has built their whole identity, their whole sense of worth and their livelihood, on their engagement with the creative process, I completely understand that that's threatened. But it's our reality, and it's not going anywhere. For me it felt like the most important thing I could do would be to engage with it and explore what the possibilities are with these systems, whilst also being very aware of the potential implications, and speaking about those implications through the work.”

I suppose one of the potential drawbacks would be a company like Spotify using this technology to fill their service with AI-generated content that’s doesn’t come out of any kind of creative impulse. It’s just endless amounts of material for them to monetize.

“I agree with you, I think this is really worrying. But what would make that different to what's already happening now? We all know that on Spotify there’s this playlist fodder by artists that don't seem to really exist, this stuff that seems to have been generated by Spotify for plays. You have to wonder what the difference is.

“What does it mean when you can sit down and listen to music that sounds like things you might already be into, but it’s just being generated, and the artist is completely cut out of the picture? Why pay artists to make new music when you can just have an algorithm producing stuff that's tuned to exactly to the tastes of the sort of end user? That's concerning, isn't it?

“The question here is really about the ethics of how much of our engagement with culture is down to the decisions made by tech companies. It leads into other sociological questions about how much power those companies have in terms of the delivery of art.”

Do you share any concerns with those who might suggest that this kind of technology could put artists, producers and sound designers out of a job in the future?

“AI has presented us with a new site or environment for the continuation of lots of questions that we've been having for hundreds of years, about what creativity is and how we attribute value to things. It’s not easy to resolve this at all. It's complicated and definitely concerning, but the thing is, the way that we frame and value creativity is something of a time. Sometimes times do change, and the way that we consider certain activities changes.

What value is a song, when you have access to all songs created by all of humanity, at very little cost?

“Here’s an example: the colour purple. If you look at old paintings and you see people from within the monarchy wearing purple furs and things like that. Do you know why that is? The dye used to produce those fabrics was derived from a really rare shell. You had to crush the shell out to get even a little bit of the powder to make this stuff. It was really expensive and hard to make. So when you saw someone wearing that, what it would indicate is an incredible amount of value attributed just to this particular colour.

“So at one point in time, there was a value there which was embedded in the process of its creation. The point is that value shifts. What things mean and how we think of where they sit within our society, that does shift. What we've seen happening quite rapidly within the music industry is a huge shift in value.

“What value is a song, when you have access to all songs created by all of humanity, at very little cost? It’s strange, and it's really concerning. I’m not pro the dismantling of the economic viability of being a musician. But I'm also aware that when you zoom out and look at history, there's nothing natural about the way that we think about things. It shifts and then changes depending on circumstances.

“It could simply be that the era that preceded Napster, it's just gone. The album as an idea is something that's just of the past and we need to move forward from that. The album as a format and the artistry that we apply to that is based on a set of circumstances, a set of technologies and social norms built around them. The question is whether we should hold on to those values.”

Or whether it’ll even be possible to hold on to them, if we want to.

“Exactly. That's another question. There will always be people who do things a very specific way, right? Some might say there's no reason to learn to play the drums, because why would you learn to play the drums when you could use software?

“But of course, people will always learn to play the drums. People want to see that. People learn to typeset with huge machines or people learn to render images using an airbrush. These things seem completely wild, but there are still going to be people who value and take pleasure in those processes, even if they're not necessary.”

I think the fascinating thing will be when this kind of technology comes into the DAW, as opposed to existing in a separate domain. We’ve spoken to developers working on a text-to-audio sample generation plugin, pretty much like a more advanced Riffusion in your DAW that will create any sample you want on demand.

“It's funny you say that, because it makes me think of another question, or a statement. I feel that we may sometimes overestimate the amount of creativity that goes into music production. In that, a lot of music is very genre based, whereby a techno producer might be aware of a certain set of limitations in terms of what they might do sonically or production-wise. The same as a rock band, or a jungle producer, or something. There are specific sounds and modes and patterns in the way that a piece might manifest itself.

Certain modes of production, how creative are they? Or how much is it a case of someone making variations on an existing set of rules?

“This is also an interesting point, isn’t it: certain modes of production, how creative are they? Or how much is it a case of someone making variations on an existing set of rules and systems? That could be melodically, or sonically, or production-wise. This is also something which is important to think about in terms of, how creative are creative people really being a lot of the time?

“Is there much difference between that and putting a prompt into a system that has that same knowledge that a producer might have and making a sort of a pseudo-new version of this thing which is actually based in a very specific set of rules? I don't really know if there's much difference there. Personally, my interest is in trying to find ways to push the edges of these systemic formats to see what lies outside of that, and what possibilities are there.”

patten’s Mirage FM is out now on 555-5555.

About Our Ads

Solve the daily Crossword

The Daily Crossword was played 11,212 times last week. Can you solve it faster than others?The Daily Crossword was played 11,212 times last week. Can you solve it faster than others?

Crossword

Entertainment News

How patten used text-to-audio AI to make an entire album: "We're at the precipice of a fundamental shift in how we think about making music"

Solve the daily Crossword

Recommended articles