Artificial Intelligence as a Tool and Beyond: Three Perspectives on Positioning Generative AI Apps[1]
Mui Kato
Goldsmiths, University of London
When I began my PhD research on artificial intelligence (AI) in popular music production in 2021, the technology seemed mysterious and abstruse. There were some conspicuous cases around the time—notably for me, Holly Herndon’s Proto and the AI reproduction of a dead singer, Misora Hibari, in 2019.[2] However, for non-experts like me, its implementation in my own music creation appeared challenging. The landscape surrounding creative AI changed with the release of OpenAI’s ChatGPT in November 2022. As the release popularized generative AI, a branch of AI that focuses on the generation of content (Carle and Eck 2023), the technology has become increasingly accessible to the public as integrated into various applications.[3] Before the popularity of such applications, the public might have recognized AI technology from cultural representations and science magazines (Natale and Ballatore 2020). Today, however, individuals can easily harness that technology for their creative activities for the first time in history.
With this shift, AI’s influence extends beyond experimental composers to broader areas of musical life. Music has always involved technology, but the current situation requires ethnomusicologists to question the impact of the new technology compared to previously existing ones, notably musical instruments and recording techniques. This paper considers a framework for analyzing musical engagements with generative AI apps.[4] The first section briefly explains how artists have come to no longer regard AI merely as a passive tool, acknowledging the subjectivity of technological objects.[5] Reflecting on this, the subsequent section proposes three perspectives for positioning generative AI apps relative to users: the Other, the (Reflection of) Self, and the Extension of Corporate Power.
AI as a Tool and Beyond in Its Use by Composers
One of the earliest compositions associated with AI is Lejaren Hiller and Leonard Isaacson’s The Illiac Suite for String Quartet (1957), arguably the first computer composition (Roads 1996). Particularly, the fourth movement influenced the later composition of AI music with its employment of a stochastic model called the Markov chain (Avdeeff 2019). Theoretical and practical studies on the use of AI in composition followed in the 1960s. Specifically, Roads (1980, 15) states, “1968 marks the beginnings of modern research into AI and music.” However, their purviews were limited to practical uses, namely composition, analysis, and education. AI was a tool aiding their activity rather than a subjective agent.
A different view is presented by George Lewis in a theoretical reflection on his original work Voyager (2000). Voyager is “a non-hierarchical, interactive musical environment that privileges improvisation” (33). It analyzes the performance “in real time” and generates “complex responses … and independent behavior that arises from its own internal processes” (33-39). Debunking the conventional understanding of computer programs as “‘objective’ or ‘universal’” (33), Lewis suggests a collaborative perspective on the interaction with the system. Responding to dismissals of computer-assisted music, he insists that “musicality” exists between the human and the system rather than from either side (38). In this view, the system is not subject to the human performer because of its status equivalent to the human. His perspective influenced, directly or indirectly, subsequent practices: Holly Herndon, a leading composer in the domain, draws on Lewis as “the most prescient inspiration” and calls her system Spawn “a child” to collaborate and look after (Herndon 2019). These practices afford subjectivity to AI, their systems transcending a mere tool that humans use.
Along this trajectory, current scholarship has increasingly acknowledged the subjectivity of AI. Adrian Mackenzie (2017), for example, investigates how machine learning is shaping modes of learning by ethnographically observing the working of “machine learners,” which include “both humans and machines or human-machine relations” (6). This approach, which seeks to understand the machine’s subjectivity, reflects larger conceptual shifts in the humanities, that is, the nonhuman turn and posthumanism (Grusin 2015). However, these theoretical frameworks are not just 'new' concepts that appeared after AI and are only applied to the analysis of it. Rather, they have been derived from AI itself, and its conceptual precursor, cybernetics, which have been developed through the cross-referencing of the machine and human mechanism (Hayles 1999). Considering the machine’s subjectivity in relation to the human thus has a historical significance when discussing AI.[6]
Positioning Generative AI Apps in Relation to the User
Unsurprisingly, claiming the subjectivity of a physical object is not entirely new to ethnomusicologists. For example, avoiding taking musical instruments “as things that humans use or make or exchange, or as passive artifacts from which sound emanates,” Eliot Bates (2012) explains that the instruments have their own “power, mystique, and allure” according to the social relations in which they are embedded (364). Bates’ approach to the subjectivity of an instrument focuses on its various relationships to the external environment. This relational thinking permeates my approach to generative AI apps. However, I update their framework by acknowledging the technological differences from conventional instruments. Specifically, I consider the following elements of AI underlying generative AI apps to be crucial: the automatic capacity, the training datasets, and the owner company. I therefore propose three ways of positioning the apps—the Other, the (Reflection of) Self, and the Extension of Corporate Power.
Arguably, the most famous generative AI app is ChatGPT. Although it is not designed for music composition, many have been interested in its musicality, as demonstrated by hundreds of articles and YouTube videos. When I asked the chatbot “Can you compose?” it answered affirmatively but emphasized its limited, collateral role:
As an AI language model, I can generate text-based representations of musical ideas, but I don’t have the ability to directly generate sound or audio files.
For example, I can create simple compositions in a music notation format such as ABC notation, or I can assist you in creating lyrics for a song. However, these are not fully-fledged compositions and they still require human musicians or software to perform and interpret them into music.
If you’re interested in composing a song or piece of music, I’d be glad to help generate ideas, assist with music theory questions, and provide input on song structure, chord progressions, and more.[7]
However, when people engage in the app, they often expect independent creativity rather than assistance. For example, a YouTuber, Aries, made up a new genre, “Aqua-funk,” by asking ChatGPT to compose the chord progressions, baseline, drum patterns, lead melody, and lyrics.[8] Others have generated music in a more traditional style, such as a four-voice, Bach-inspired chorale.[9] In these and other YouTube videos, after such experiments, the uploader typically discusses the output’s musical quality, notably based on human standards.
To interpret this, I introduce the first perspective, the Other, which positions the technology as an agent alien to its users. This associates the above way of evaluating AI’s output with the Turing test, which judges how close a machine is to humans (Turing 1950). Here, the chatbot’s ability is viewed suspiciously, although the human user may perceive its agency because of the automatic function generating plausible responses. In the examples above, the chatbot is expected to have creativity that supersedes humans or catches up with them. To bring this to the current ethnomusicological studies, one can focus on the assumption people have when dealing with output that appears unique in some ways. For example, they may incorporate the output into their work, emphasizing its uncanniness (Avdeeff 2019). How, then, do conventional assumptions inform creative decisions? What impacts does the latter have on the former? These questions help to understand the assumption of music that continuously changes by the negotiation between the human and the non-human subject.
Contrarily, the (Reflection of) Self views the technology as both an extension and representation of its users and creators. Since generative AI develops its models based on human creations, the resulting output inherently mirrors humans, regardless of how alien it may appear.[10] This perspective helps study our society, as ethnomusicologists have inspected where they live (Finnegan 2007). Eryk Salvaggio (2023) has developed methodologies for analyzing AI-generated images, understanding them as “infographics” that represent social assumptions embedded in the training datasets. As he suggests comparing multiple images, these biases are too subtle to be easily recognized. Ruha Benjamin (2019) argues that once social bias is encoded in the technological infrastructure, it can, in turn, be reinforced. If creators actively utilize generative AI for producing cultural content, such biases might be unintentionally embedded in it, unnoticed by the producer. This is already beginning to be the case in the film industry, as some have used generative AI for images and scripts (Heaven 2023). Thus, generative AI apps not only reflect but also shape the Self. The investigation into AI apps from this perspective can be proactive and speculative since it addresses the potential risk.
Regarding the last point, it is important that how generative AI apps reflect the existing assumption is not neutral because its owner company designs the app—it selects the training dataset, determines the operation, and modulates the output. As such, the third perspective, the Extension of Corporate Power, views generative AI apps to be subject to particular companies rather than independent. It concerns the expansion of online platforms conditioning the ecology of cultural products (Srnicek 2016), wherein the production, distribution, and consumption of music become more and more mediated by IT and data companies (Negus 2019). This concern stems from conventional inquiries into the impact of international music industries. For instance, Wallis and Malm (1984) consider how corporations affect local music production by controlling the access to the studio and the flow of recordings. Similarly, by focusing on musical instruments, Théberge (1997) emphasizes that a musician, the producer of musical sound, is also a consumer of the technology. AI app proponents often promise musical democratization, a cliché for promoting musical instruments and software (Clancy 2021: 94; Théberge 1997: 29). Generative AI apps certainly lower many initial barriers to music creation, but also lead to the company’s involvement in the process of individual music creation. This is of particular concern due to the frequently opaque nature of AI datasets, which often blurs the company’s responsibility when an app seems to appropriate someone’s work.[11] Due to the technical opacity, the effect of the company’s involvement is disguised as that of the machine’s agency. However, this perspective does not dismiss the subjectivity of apps; rather, it suggests investigating both the technology and the owner by assuming their complicit relationship.
Conclusion
In this paper, I explained how AI has transcended a mere tool for artists and proposed three perspectives positioning generative AI apps as the Other, the (Reflection of) Self, and the Extension of Corporate Power. While I treated the three types of positioning as distinct categories, they interrelate with each other. As discussing one of the aspects touches on another, one can investigate their overlapping areas. The framework may not depict the overall picture of generative AI and music culture but offers an entry point to these entanglements.
Notes:
[1] I am at the preliminary stage of the research on applications utilizing generative AI. This paper aims to present a tentative framework for further development of the study.
[2] https://pitchfork.com/reviews/albums/holly-herndon-proto/; https://www.yamaha.com/en/news_release/2019/19100801/.
[3] Generative AI is technically known as generative modeling. The algorithm first analyzes a set of training data to build a probabilistic model representing the general patterns within that dataset, and the model is then used for generating new contents (samples) similar to the original dataset (Foster 2022, 24).
[4] By “generative AI apps,” I refer to software and web-browser-based applications adopting generative modeling.
[5] I use the term “subjectivity” in the Foucauldian sense, following Mackenzie (2017). In this sense, the concept proposes the positive roles of technological objects generating content at a particular social position.
[6] The discussion about the boundary between the machine and living entities including humans was well entrenched in philosophy much before the cybernetics movement, according to Hui (2019).
[7] Answers to my prompt “Can you compose?,” GPT-4, 19 May 2023.
[8] https://www.youtube.com/watch?v=c1vkiYPz8Fc
[9] https://www.youtube.com/watch?v=d_7EsKcn8nw
[10] This does not deny the agency of the AI app itself because the agency can be acknowledged in reassembling the material.
[11] There are many articles discussing this. For instance, see Chayka (2023).
References
Avdeeff, Melissa. 2019. “Artificial Intelligence & Popular Music: SKYGGE, Flow Machines, and the Audio Uncanny Valley.” Arts 8 (4): 130. https://doi.org/10.3390/arts8040130.
Bates, Eliot. 2012. “The Social Life of Musical Instruments.” Ethnomusicology 56 (3): 363–95. https://doi.org/10.5406/ethnomusicology.56.3.0363.
Benjamin, Ruha. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Medford, MA: Polity Press.
Carle, Eben, and Douglas Eck. 2023. “Ask a Techspert: What Is Generative AI?” Google. April 11, 2023. https://blog.google/inside-google/googlers/ask-a-techspert/what-is-generative-ai/.
Chayka, Kyle. 2023. “Is A.I. Art Stealing from Artists?” The New Yorker, February 10, 2023. https://www.newyorker.com/culture/infinite-scroll/is-ai-art-stealing-from-artists.
Clancy, Martin. 2021. “Reflections On The Financial And Ethical Implications Of Music Generated By Artificial Intelligence.” PhD Dissertation, Trinity College Dublin.
Finnegan, Ruth. 2007. The Hidden Musicians: Music-Making in an English Town. Reprint edition. Middletown, CT: Wesleyan University Press.
Foster, David. 2022. Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play. Sebastopol, CA: O’Reilly Media, Inc.
Grusin, Richard A. 2015. The Nonhuman Turn. Minneapolis: University of Minnesota Press.
Hayles, N. Katherine. 1999. How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics. University of Chicago Press.
Herndon, Holly Rebecca. 2019. “Proto.” DMA dissertation, Stanford University. https://purl.stanford.edu/fh292ky0538.
Heaven, Will Douglas. 2023. “Welcome to the New Surreal. How AI-Generated Video Is Changing Film.” MIT Technology Review. June 1, 2023. https://www.technologyreview.com/2023/06/01/1073858/surreal-ai-generative-video-changing-film/.
Hui, Yuk. 2019. Recursivity and Contingency. London: Rowman & Littlefield International.
Lewis, George E. 2000. “Too Many Notes: Computers, Complexity and Culture in Voyager.” Leonardo Music Journal 10 (December): 33–39. https://doi.org/10.1162/096112100570585.
Mackenzie, Adrian. 2017. Machine Learners: Archaeology of a Data Practice. Cambridge, MA: MIT Press.
Natale, Simone, and Andrea Ballatore. 2020. “Imagining the Thinking Machine: Technological Myths and the Rise of Artificial Intelligence.” Convergence 26 (1): 3–18. https://doi.org/10.1177/1354856517715164.
Negus, Keith. 2019. “From Creator to Data: The Post-Record Music Industry and the Digital Conglomerates.” Media, Culture & Society 41 (3): 367–84. https://doi.org/10.1177/0163443718799395.
Roads, Curtis. 1980. “Artificial Intelligence and Music.” Computer Music Journal 4 (2): 13–25. https://doi.org/10.2307/3680079.
———. 1996. The Computer Music Tutorial. Cambridge, MA: MIT Press.
Salvaggio, Eryk. 2023. “How to Read an AI Image.” IMAGE. May 22, 2023. https://image-journal.de/how-to-read-an-ai-image/.
Srnicek, Nick. 2016. Platform Capitalism. Polity Press.
Théberge, Paul. 1997. Any Sound You Can Imagine: Making Music / Consuming Technology. Middletown, CT: Wesleyan University Press.
Turing, A.M. 1950. “Computing Machinery and Intelligence.” Mind LIX (236): 433-460.
Wallis, Roger, and Krister Malm. 1984. Big Sounds From Small Peoples: The Music Industry in Small Countries. London: Constable.