Hit Music by Artificial Intelligence Coming in a Decade?


Can artificial intelligence write hit songs? Again and again?

Yes, maybe rolling them out a decade from now, given enough capital.

Not writing the lyrics, yet. Humans will be better at that for a long time. But the music, yes. And human lyricists can partner with AI supplying the scores.

Premises

The ten years I’m rashly predicting is contradicted by history. Time-frame predictions haven’t all held up. While an ordinary human newbie can probably learn the rules of chess in half an hour and play a legal game, which is played on only 64 squares plus only two rest areas between only two players using only sixteen pieces each of only six types (in other words it shouldn’t be too complicated to program), a challenge to make a computer to beat human masters was raised in and 32 years later a top-level human could still beat a computer at chess. Groundwork for music-writing AI is already being laid; cybernetics, the comparing of how brains and machines work, has been studied since at least Norbert Wiener’s groundbreaking book was published in . Sony produced one Beatles-style musical composition in , although, as far as I know, it did not sell well, if it sold at all. High finance is needed and whether it’s forthcoming is dubious. But, if the finance were to come, I think we’re close enough in skill and hardware for ten years to be reasonable for developing the software.

While, according to statistics rather than anyone’s promise, millions of monkeys randomly typing on keyboards will generate a complete and accurate copy of all of Shakespeare’s works, that will take too long and too many monkeys to be worthwhile. The AI music project can’t take that long or that many parallel systems or it won’t be realistic in this decade on planet Earth to anticipate success. So, we can’t chance waiting for randomness. We also can’t chance awaiting the AI getting almost everything wrong through bad decisions without corrective influences on the AI so that we merely hope the AI eventually gets it right. With AI, we don’t have to. AI often learns. We understand the concept of machine learning.

In music, hits are what’s popular; defining that is easy. Writing what’s popular is demanded by much of the audience for a genre. That’s harder. If that audience generally buys what it likes, then sales count; but likely most of what an audience likes it doesn’t buy but appreciates in other public ways, and that should be measured. However, while sales might seem more easily measured, even sales are hard to measure. The data is often proprietary and private and is often manipulated into being unreliable by some of the people who rely on it, such as when record albums were normally consigned to dealers but occasionally a record would be distributed with a prohibition on returns, forcing sales at reduced prices to customers with less enthusiasm (such as for gifts) when only quantities sold were counted for the common charts of rankings.

Lyrics will be written and selected in whatever ways humans write and select them. Our focus here will be on the music. I don’t judge whether AI is necessary or even just useful, other than that as an exercise for AI it is useful, whether or not it is useful to music consumers.

Probably AI won’t be adopted for music creation for consumption on a large scale until it tends to outperform humans. That will likely require working within a niche where one of the creations happens to become a major hit overflowing its intended niche. Then, there’ll be a inviting of what else it can do. The niche music thereby created will be music that is likely to be accepted by fans of other niches, a despecialization of the music to widen the composition’s appeal.

Some of AI’s output will be rejected by much of the intended human audiences as lacking credibility precisely because it’s from AI. That could be enough resistance to defeat popularity, which could defeat AI on this mission. It’s generally unpersuasive for a computer to say it falls in love, wants justice, or relaxes on a mountainside. That kind of rejection is experienced by human artists and, for that reason, singers select songs they can credibly sing as if they’re their own words. The solution is either to cultivate public acceptance for creativity by AI or to pass the creativity off as humanly-generated so that it becomes popular because the association with AI is unknown to the public until, later, the sourcing to the AI is revealed. Given what little I know of industry practice, I think passing first as human is likelier except for some narrower genres.

Experience and its absence matter. Music from AI will mainly be of low quality in the beginning. Improvement will be gradual, perhaps in steps rather than rising smoothly, but it will not suddenly become nearly perfect. Human artists often seem to be unknown one day and nearly perfect the next, but that’s because they merely crossed some threshold that was partly one of fame and most of the audience paid attention only when the threshold was crossed, previously paying attention instead to other artists who had already crossed that threshold. The rise of the ability of AI, being in one new system and not many, should be compared to that of one human artist. The new human artist is observed by a small number of people before getting famous and the new AI system is also being observed by a small number of people before getting famous, and that’s the valid comparison.

What might be most difficult would be evaluating a proposed establishment of a new genre or other newness outside of known boundaries, even though boundaries tend to be fuzzy. That’s difficult enough without AI. While someone testing music would not need to specify a genre, although should for the sake of getting a more focused evaluation, the AI probably would not have enough data to be usually successful at predicting that test music could start a new genre, even after successfully predicting that test music would likely be successful within a given genre.

Developing AI Side-by-Side

Multiple parallel nonidentical AI implementations should be developed and applied, especially to support experimentation so that experiments do not prevent reliance on stability already achieved during progress of AI. The same inputs should be available to all of the implementations and customers seeking output should be able to get it from any or all of the implementations for the one fee paid.

Learning

Machine learning depends on initial states being set and on subsequent feedback being applied to set states. Human learning does, too. Non-AI computers benefit from feedback when a human changes settings, perhaps with DIP switches, because of what a human has learned; AI automates some of the process of perceiving feedback and changing internal settings with less of human intervention. Thus, AI can learn, with some mistakes along the way but still progressively learning more and pursuing a desired direction.

To help it learn, we must input data. That’s data which most people don’t think of as data, but includes expertise.

Experts should include elite and popular figures, academics who have published refereed studies and conference papers and untrained people who just know what they like (scholars often decline popular knowledge as not proven adequately for peer-reviewed publication, which may be good scholarly caution, as standards being upheld raises the reliability of the scholars’ pronouncements, but doubts the validity of knowledge some of which is popularly accepted and then influences popular actions including choice of music), musicians and listeners, critics known for liking almost anything popular and critics more critical, broadcast programmers and venue bookers, global sales managers, regional wholesalers, and local retailers, executives, investors, and consumers, and the currently active and the recently retired (the recently-retired might be freer to talk and might have begun to reflect on their work leading to shifts in conclusions and the long-retired might qualify on other grounds).

Experts should be geographically sourced and linguistically and otherwise culturally diverse so that they will contribute expertise often likely to be overlooked at the cost of biasing the knowledge base and weakening the AI’s analytical abilities needed for its advice to customers.

Insights from expertise should be more than those that are true or false. It should include nuances and relativity. Maybe the experts should be asked to assign percentage probability or temperature-scale ratings to their statements. A majority of the public appears properly to apply temperature scales but is not good at applying any that are larger than five points, although a significant minority, especially if including experts comfortable with applying advanced statistics, scales, and nuance, might be good with larger scales.

The subjects of the most relevant expertise, besides that on computers, will obviously be on musical history but also, less obviously, on industrial and social history. So, an AI system will need inputting of all of that history.

Music Data

Before music collections are exposed to or stored in the computer system, means of analyzing the music are needed. Experts on music are needed to supply those tools. Some high school music teachers and a larger number of musicians who’ve made modest local livings, by which one could pay market rent on a bedroom by playing original music, can break music down into its basic musical components. Going up the escalator, bringing in more accomplished musicians who are good with various instruments and with their singing voices as instruments will introduce more sophisticated capabilities to musical analysis. One who knows, say, trumpet or alto voice may know what gains or loses prospective fans in blowing a trumpet or singing alto in a given genre. The ten best opera singers are not the only singers to consider as remarkable, even when we’re amazed at what comes out of their lungs, but they’re probably not who became popular in most genres. Producers both specialized and varied in backgrounds and musicologists with substantial scholarship can build on what the teachers and the studio-apartment musicians said. Escalating again: Mozart and someone else were competing at a party; Mozart said he could write a piano piece the other person couldn’t play, which turned out to be true, but the other person challenged Mozart to play it himself, if he’s so good; when Mozart’s hands spread out to reach the high and low keys at the same time and there was a note in the middle, Mozart played it with his nose; we need the advice of musicians who know their instruments that well.

Then, music itself will have to be inputted. This can be either by one-time exposure or into storage. If it is merely exposed without storage, arguably costs may be lower. In choosing between mere exposure and storage, either will allow analysis by the AI but when the criteria governing the analysis is changed over time, which is likely, only storage will allow the AI to reanalyze the music. Thus, AI will do a better job over time with storage than with exposure.

We’ll need all the top sellers in a given genre. We could do just 99% of the top songs, but that’s tricky, because many top songs are distinctly different from all others at their level of popularity. So, one that we don’t get, out of a hundred we need, may well be different enough that the AI needs to know about it to give good advice. Our only hope in that case is that many secondary songs will have mimicked the unique qualities of the missing one, but that’s uncertain, because maybe no one will know what influenced a secondary song. We could hope an expert tells the AI about the missing one and which other songs were influenced by it, but that would be nowhere nearly as good as getting the missing piece itself.

We’ll need high percentages of songs that attained second-rank and lower-rank popularity. Even for very-low-rank popularity, we’ll need to input some percentage of the songs. We’ll have to determine the percentage needed at each rank, while keeping the percentages toward the high side, because we need the variations within a genre. Then, we should not apply thoughtlessness or chance getting the best choices but apply a scientific method designed for random selection. Randomness results in getting lousy clunkers mixed in with the half-way decent stuff, clunkers we might personally hate from our guts, but that’s because, on average, a percentage are clunkers. If we have a fair cross-section of music, we will have about the same percentage of clunkers as there are in all the music of the genre. We’ll need that fair cross-section, so we’ll live with it. The good news is that we don’t have to listen to most of it. The computer does.

While many genres have popular lyrics about love and variations on that theme, they often have popular lyrics about other subjects, which subjects being more popular varying by genre, because they vary by the audience which likes the given genre. The lyrics likely influence the music. So, we’d need to input the lyrics for analysis, not of the lyrics per se, but to assist the analysis of the music.

Copying not from lyric sheets but by playing musical recordings and having the computer parse the sound to extract the lyrics may yet be too difficult and unreliable for the computer, especially since much music musicalizes the voice carrying the lyrics.

Listeners we would pay may have to decipher the audio and write the words, or edit the lyric inputs.

Copying from lyric sheets is sometimes unreliable because some lyrics cannot legally be broadcast, so music may be issued in legally broadcastable form that differs from how it is sold or distributed to consumers and the lyric sheets may state only legally broadcastable lyrics even if the music does not conform to the sheets.

To support analysis of lyrics and of music as dependent on the lyrics, we’d need to understand the social conditions that give rise to the popularity (or not) of some lyrics. What are important about social conditions are popular perceptions of the social conditions, including content, popularity, and intensity, so we’d need to input data about social perceptions. To be more accurate, we’d need to input a lot of data from a lot of sources, including sources with overlapping expertise.

Music can be selected on the basis of genre, time period of scoring or performance, audience reaction, and geographic distribution of audience reaction. Selecting just one genre would weaken the AI’s analytical power even within the genre, because genres and their popularity are influenced by other genres because audiences judge music partly by comparison to what they already know, including a genre popular in the same demographic and classical, children’s, and religious music as genres often taught in schools attended by the demographic, in their homes, and in houses of worship attended by the demographic, respectively, so, even if the AI is intended to focus on one genre, it should have data about several genres.

Selectivity may vary within a set of criteria, so that input within a category is not only of everything or nothing but perhaps some percentage in between.

Large metadata should be included. Use in, among other things, cover performances, advertising, and popular culture such as movies, should include metadata about the additional uses.

Industry Data

Industrial history to be inputted will include production media information, such as which music companies had greater or lesser market power, which producers had what track records of success, what instruments were generally available for older music especially, audio engineers who know what effects and mixes of those possible today appeal to fans, proprietary market studies and sales and return figures and proxies such as Billboard chart data (the proxies have their own influence apart from that of underlying sales figures), promotion, broadcast play and audience approval, concert performance and ticket sales, intellectual musicology, psychological studies on audiences, and the effects of dislike on fans (e.g., whether it shortens or makes shallower the popularity among fans).

A lot of industrial history will be needed.

Social Data

Social environmental history also has to be inputted. One report was that when economics takes a downturn, such as during a recession, popular music’s musical rhythms tend to be simpler. War may increase the prevalence of martial qualities in music, such as rhythms reminiscent of marching and instrumentation reminiscent of bugling. Crisis may favor the reassuringly familiar and thus crowd out some originality in music. Adverse confinement in which music is regulated may influence the music produced after the confinement or by contemporaries who bond with those in the confinement. Seasons likely matter and a climatologist should illuminate why and when, similarly to noting that the six-month-long nights in Norway correlate with higher rates of suicide. Occasions may be associated with particular music and musical styles. Scientific psychologists and neurologists likely have contributions to this research, for example on how brains react to various musical stimuli. Sociologists can talk about how fan groups interact, influencing their decisions to like and to buy.

Social environmental data, such as indicators and predictors (including past predictions for evaluation), have to be inputted in large volumes.

Post-Input Analysis

People from low-level experts to high-level experts, multiple at each level, should evaluate the data by various criteria. Each expert should be granted a weight relative to their degree of expertise in their field. Each expert in turn should assign a weight to subsets of data the expert evaluates within their expertise. An expert human may determine that some data even within their own field of expertise is irrelevant; they should add that determination to what the AI will process.

Because experts overlap, two may evaluate a datum differently, and both views should be respected for the knowledge base according to the weights of the respective experts and the weights assigned by the respective experts.

The experts do not have to be computer experts. Programmers can interpret or transfer noncomputerate expertise into a form the AI can apply.

The humans should not do what the AI should do. This is about what AI can do, so humans should do what humans are uniquely qualified to do and leave the rest to AI.

Concentrated Service Organization

Because this would be too costly for one record label or similar kind of industry member to support, a service bureau should specialize in this, and then serve anyone who’s interested, including competitors in the music business.

The service bureau would preserve neutrality and then rent time at its terminals for anyone, even an anonymous customer, perhaps a musician, lyricist, producer, instrument maker, researcher, executive, or budding competitor, for a fee, who has prepared data files for their own needs, perhaps including their desire as to choice of audience and taste niche (e.g., children’s music, music for a distinct occasion, or mood), timing of intended sales, and their own music exemplars to be tested, to input their own data and then receive analytical evaluations of probability of success, either for the music the customer has inputted or for a set of characteristics of potential music the customer wishes to test. Test music being inputted could be by live performance but probably should not be, instead being by prerecorded performance or by a compatible system of notation, in order that the resulting analysis can be associated with a known input. The customer’s input should, by default, be by exposure without storage beyond the session time; but it could be stored through multiple sessions so that re-input can be waived for the sake of consistency, thus presumably by agreement.

Test music can be anything from a single strum of a guitar to the length of a multi-disc album, from a solo on-pitch note to an opera in movements. However, there have been at least two compositions consisting of silence (and, I think, a copyright dispute arose between the parties) and I don’t know what AI could contribute to a future composition of silence.

Whether the AI would retain an AI’s recommendation beyond when the customer might want it retained is a decision on confidentiality for the customer’s benefit and, on the other hand, on the AI’s ability to evaluate, for feedback, any success attributable to the recommendation.

Developmental Test Volume

To gain a large body of experience for the AI, a great volume of test submissions from many sources should be sought. To encourage more participation, the fee should be low. Nonetheless, the analytical service should be the same.

Creative Output

While the AI should start by evaluating proposed music, it should progress into creating music for a customer’s approval. This can be done a step at a time. Perhaps it will contribute a better bass line to a customer’s song, replacing the customer’s prior bass line. Probably, later, it will score all the parts for another customer.

This might raise a legal issue about copyright on music resulting from the AI’s contribution, whether the copyright belongs to the customer or to the AI’s owner. This will depend on the role of each in the final music newly in the customer’s possession at the end of an AI session. If that can be resolved before a session begins, it should be resolved by the copyright belonging to the customer, in order to encourage more acceptance by customers and prospective customers, albeit at the cost of denying revenue from copyright exploitation to the AI’s owner.

AI Growth

AI development should be continuous. Even as AI is being applied and even as successes are being achieved, development should continue, including using originality, feedback, and experimentation to inform future development.

The feedback from every AI implementation should be available to all of the implementations and adaptable by each. For purposes of adapting the feedback, no AI implementation should be a black box; these should not be competitors, in that the input and the output should each be mutually intelligible across implementations. On the other hand, after this creative potential has been proven and businesses compete to outdo each other in musicianship by AI, mutual unintelligibility can be a hallmark, just not while attempting the proof of the concept.

The feedback also has to include a wide range of new human music except for some of the worst failures. There are too many of the worst by humans to input all of them, even within a genre or other subset circumscribing all the successes that are being inputted. The human comparisons are for uninterruptedly continuing the human collection already inputted, by exposure or storage, at the beginning. The lack of interruption is to maintain contemporaneousness of the AI’s knowledge base when the AI is evaluating its creative output. The continuity of inputting has to be consistent with the rights of copyright holders.

Initially, humans will probably always outdo AI. However, with experience, AI will gain some modest occasional success; and then proportionately more and greater successes. The failures, even the worst of them, and the successes, and everything in between creditable to the help of AI should feed the feedback.

Tower of Money

The hardware is likely to have to be a supercomputer, just as weather prediction requires one. That would be expensive just to buy, never mind the programming or the data acquisition.

AI per se has to be developed for this project and, because we’d not be waiting for other uses of AI to develop improvements from which this project could benefit, that would be expensive.

Multiple parallel nonidentical AI implementations should be organized so as to keep down the costs of the copyright licenses for music being inputted.

Acquisition of expertise is expensive. This application of AI has potential major commercial value and not much noncommercializable value, so arguing that the expertise should be shared for free won’t get far. Experts who have spent years honing their expertise and earning high compensation will tend to want to be paid well again. The top expertise will be the hardest to obtain at low prices. While lower-level people are not much worried about the competition we would introduce later, the elites will see the competition from a mile away, because they didn’t get to be big fish without steering clear of the fishhooks. If we’re going to put them out of business, at least they’d like to retire comfortably with their families, and they’re already well-heeled, so you’ll have to pay enough to impress them for their remaining lifetimes.

While one-time exposure of music to the computer without storage may cost significant money for the copyright licenses, possibly the law does not require licensing for one-time performances (I don’t know) and if licensing is required then perhaps counterparties will agree to lower licensing fees although they may still demand major money, and even if they keep their demands down the total could add up to substantial money. If the music is stored, there’d be no access to the exemptions in copyright law for performing for friends and family (it’ll be hard to persuade a jury that a computer or a fee-paying customer is a friend) and for fair use (among other reasons because copying not only of entire works but of entire collections of works would normally far exceed fair use) and thus copyrights will definitely have to be licensed for performance and storage, and that licensing is likely an expensive proposition, especially since storage will likely be for multiple years. It is likely expensive even for music in the public domain, because modern performances of that music are generally not in the public domain. It’s likely no law says how much to pay for the rights. We’ll have to negotiate with people who know we want completeness and thus have no alternative. Three licensing organizations (ASCAP, BMI, and SESAC) license most of the well-known music and all three licenses will be needed or musical selection based on which licenses are held will not be random or comprehensive. For music under copyright but not represented by any of the three organizations, we may need to license directly from copyright holders, but they can be hard or impossible to find. If they are found and if their representatives take the shark teeth plaques off the wall, they’re still sharks.

An alternative of relying on license fee regulation by the Federal government, as occurs with compulsory licensing, is unavailable unless the law were to be amended to include this project or purposes encompassing this project, but such an amendment is politically unlikely to be enacted and a public policy argument for it is hardly compelling against opposition likely to come from copyright holders concerned about steep reductions of earnings for their creative contributions.

Copying the lyrics not in the public domain into computer storage requires copyright licensing and, generally, fees.

All the data collection and human analysis is likely so expensive that it is unlikely that any one company in the music industry can afford to do it without substantial compromises, such as by collecting data only about their own performers. That would lower the ability of the AI to produce valuable output even during the proof-of-concept stage and thus might impede the proof altogether.

The fees for customers of the AI service should be below breakeven, whether across the board or through selective reductions to prospective customers on tighter budgets. Not only is execution of this plan to be expensive, the revenue would be inadequate for years.

Conclusion

This, for me, is an intellectual challenge. I don’t have an interest in art being by computers. Art is interesting or not; but usually that’s regardless of who or what created it.

If computerization results in fewer people creating popular music, there’ll be other things the displaced creators will do, likely creative and almost certainly remunerative.

Society has done well with most of the innovations it has developed.