Editing Audio Metadata#

Last updated: November 25, 2023

Minim can organize your local music library by tagging audio files with metadata retrieved from popular music services, such as iTunes, Spotify, and TIDAL.

from pathlib import Path

from minim import audio, itunes, spotify, tidal, utility
import numpy as np

Setup#

Instantiating API clients#

To get started, we will need to create API clients for the music services that we want to query for album and track information:

client_itunes = itunes.SearchAPI()
client_spotify = spotify.WebAPI()
client_tidal = tidal.PrivateAPI()

Finding audio files#

To find all audio files in a specified directory, we use the pathlib.Path.glob() method:

audio_files = [f for f in (Path().resolve().parents[3] 
                           / "tests/data/previews").glob("**/*") 
               if f.suffix == ".flac"]

Defining helper functions#

Before diving into the examples, we define a helper function that will print out the metadata of an audio file:

def print_metadata(audio_file):
    for field, value in audio_file.__dict__.items():
        if not field.startswith("_"):
            if field in {"artwork", "lyrics"}:
                if value:
                    value = type(value)
            field = (field.upper() if field == "isrc" 
                     else field.replace("_", " ").capitalize())
            print(f"{field}: {value}")

The two examples below highlight the utility of the minim.audio.*Audio classes. The first example involves an audio with no metadata other than that stored in its filename, and the second example shows how to update the tags of an audio file without overwriting existing metadata.

Converting and tagging an audio file with no metadata#

First, we load the audio file into a file handler by passing its filename and its corresponding regular expression and metadata fields to the minim.audio.Audio constructor:

audio_file = audio.Audio(audio_files[0], pattern=("(.*)_(.*)", ("artist", "title")))
audio_files[0].name, audio_file
('spektrem_shine.flac', <minim.audio.FLACAudio at 0x7f9d871d2dd0>)

A minim.audio.FLACAudio object is returned, as the minim.audio.Audio constructor has automatically determined the audio format. Let’s take a look at the file’s metadata:

print_metadata(audio_file)
Hide code cell output
Album: None
Album artist: None
Artist: spektrem
Comment: None
Composer: None
Copyright: None
Date: None
Genre: None
ISRC: None
Lyrics: None
Tempo: None
Title: shine
Compilation: None
Disc number: None
Disc count: None
Track number: None
Track count: None
Artwork: None
Bit depth: 16
Bitrate: 1030107
Channel count: 2
Codec: flac
Sample rate: 44100

While the file originally had no artist or title information, the search pattern we provided to the minim.audio.Audio constructor has allowed it to pull the information from the filename. At this point, however, the artist and title information have not yet been written to file.

If we wanted compatibility with most music players, we can convert the FLAC file to a MP3 file using minim.audio.FLACAudio.convert():

audio_file.convert("mp3")
audio_file
size=    1032kB time=00:00:30.09 bitrate= 280.9kbits/s speed=63.3x    
<minim.audio.MP3Audio at 0x7f9d871d2dd0>

With the file conversion, the audio_file object is automatically updated to a minim.audio.MP3Audio object. Let’s take a look at the new file’s metadata:

print_metadata(audio_file)
Hide code cell output
Album: None
Album artist: None
Artist: spektrem
Comment: None
Compilation: None
Composer: None
Copyright: None
Date: None
Genre: None
ISRC: None
Lyrics: None
Tempo: None
Title: shine
Disc number: None
Disc count: None
Track number: None
Track count: None
Artwork: None
Bit depth: None
Bitrate: 280593
Channel count: 2
Codec: mp3
Sample rate: 44100

The metadata persisted—even the artist and title, which has not been written to the FLAC file—with the exception of format-specific properties, like the bitrate and codec.

Now, we start populating the file’s metadata. The Apple Music/iTunes catalog typically contains the most complete and accurate information about a track, so it is generally a good idea to start there. As such, we

  • build a query using the only information available to us, namely, the artist and title,

  • search for the track on iTunes via minim.itunes.SearchAPI.search(),

  • select the closest match out of the results by choosing the one with the lowest Levenshtein distance/ratio for the artist and title (available via minim.utility.levenshtein_ratio()),

  • separately get the track’s album information using minim.itunes.SearchAPI.lookup(), and

  • populate the file handler’s metadata with the JSON results using minim.audio.FLACAudio.set_metadata_using_itunes().

query = f"{audio_file.artist} {audio_file.title}".lower()
itunes_results = client_itunes.search(query)["results"]
itunes_track = itunes_results[
    np.argmax(
        utility.levenshtein_ratio(
            query, 
            [f"{r['artistName']} {r['trackName']}".lower() 
             for r in itunes_results]
        )
    )
]
itunes_album = client_itunes.lookup(itunes_track["collectionId"])["results"][0]
audio_file.set_metadata_using_itunes(itunes_track, album_data=itunes_album, 
                                     overwrite=True)
print_metadata(audio_file)
Hide code cell output
Album: Enter the Spektrem - Single
Album artist: Spektrem
Artist: Spektrem
Comment: None
Compilation: False
Composer: None
Copyright: ℗ 2013 GFTED
Date: 2013-03-06T12:00:00Z
Genre: Electronic
ISRC: None
Lyrics: None
Tempo: None
Title: Shine
Disc number: 1
Disc count: 1
Track number: 2
Track count: 3
Artwork: <class 'bytes'>
Bit depth: None
Bitrate: 280593
Channel count: 2
Codec: mp3
Sample rate: 44100

We can see that most of the fields have now been filled out. The iTunes Search API does not return composers, ISRC, lyrics, or tempo information, so we will have to use the Spotify Web API and the TIDAL API to complete the metadata.

The Spotify catalog contains ISRCs for tracks. Conveniently, the Spotify Web API also has a minim.spotify.WebAPI.get_track_audio_features() endpoint that returns a dict of audio features, including the track’s tempo.

Like before for the iTunes Search API, we

  • search for the track on Spotify via minim.spotify.WebAPI.search(),

  • select the closest match out of the results by choosing the one with the lowest Levenshtein distance/ratio for the artist and title,

  • get the track’s audio features using minim.spotify.WebAPI.get_track_audio_features(), and

  • populate file handler’s metadata with the JSON results using minim.audio.FLACAudio.set_metadata_using_spotify().

Note

By default, the minim.audio.FLACAudio.set_metadata_using*() methods do not overwrite existing metadata. To change this behavior, pass overwrite=True as a keyword argument.

spotify_results = client_spotify.search(query, type="track")["items"]
spotify_track = spotify_results[
    np.argmax(
        utility.levenshtein_ratio(
            query, 
            [f"{r['artists'][0]['name']} {r['name']}".lower() 
             for r in spotify_results]
        )
    )
]
audio_file.set_metadata_using_spotify(
    spotify_track, 
    audio_features=client_spotify.get_track_audio_features(spotify_track["id"])
)
print_metadata(audio_file)
Hide code cell output
Album: Enter the Spektrem - Single
Album artist: Spektrem
Artist: Spektrem
Comment: None
Compilation: False
Composer: None
Copyright: ℗ 2013 GFTED
Date: 2013-03-06T12:00:00Z
Genre: Electronic
ISRC: GB2LD0901581
Lyrics: None
Tempo: 128
Title: Shine
Disc number: 1
Disc count: 1
Track number: 2
Track count: 3
Artwork: <class 'bytes'>
Bit depth: None
Bitrate: 280593
Channel count: 2
Codec: mp3
Sample rate: 44100

Finally, we repeat the process above using the TIDAL API to get the composers and lyrics by

  • searching for the track on TIDAL via minim.tidal.PrivateAPI.search(),

  • selecting the correct result by matching the ISRC,

  • getting the track’s composers and lyrics using minim.tidal.PrivateAPI.get_track_composers() and minim.tidal.PrivateAPI.get_track_lyrics(), respectively, and

  • populating the file handler’s metadata with the JSON results using minim.audio.FLACAudio.set_metadata_using_tidal().

tidal_results = client_tidal.search(query)["tracks"]["items"]
tidal_track = next((r for r in tidal_results if r["isrc"] == audio_file.isrc), None)
tidal_composers = client_tidal.get_track_composers(tidal_track["id"])
tidal_lyrics = client_tidal.get_track_lyrics(tidal_track["id"])
audio_file.set_metadata_using_tidal(tidal_track, composers=tidal_composers, lyrics=tidal_lyrics)
print_metadata(audio_file)
Hide code cell output
WARNING:root:Either lyrics are not available for this track or the current account does not have an active TIDAL subscription.
Album: Enter the Spektrem - Single
Album artist: Spektrem
Artist: Spektrem
Comment: None
Compilation: False
Composer: None
Copyright: ℗ 2013 GFTED
Date: 2013-03-06T12:00:00Z
Genre: Electronic
ISRC: GB2LD0901581
Lyrics: None
Tempo: 128
Title: Shine
Disc number: 1
Disc count: 1
Track number: 2
Track count: 3
Artwork: <class 'bytes'>
Bit depth: None
Bitrate: 280593
Channel count: 2
Codec: mp3
Sample rate: 44100

The metadata for the track is now complete. (For this example, TIDAL did not have songwriting credits for the track. This happens sometimes when the track is not very popular.)

Don’t forget to write the changes to file using minim.audio.FLACAudio.write()!

audio_file.write_metadata()

Tagging an audio file with existing metadata#

Now, we will process an audio file that already has most of the metadata fields populated. As before, we load the file, but this time using the minim.audio.FLACAudio constructor directly:

audio_file = audio.FLACAudio(audio_files[1])
audio_files[1].name, audio_file
('tobu_back_to_you.flac', <minim.audio.FLACAudio at 0x7f9cdddd6090>)

Let’s take a look at the file’s metadata:

print_metadata(audio_file)
Hide code cell output
Album: Back To You - Single
Album artist: Tobu
Artist: Tobu
Comment: None
Composer: Tobu & Toms Burkovskis
Copyright: 2022 NCS 2022 NCS
Date: 2023-07-06T07:00:00Z
Genre: House
ISRC: GB2LD2210368
Lyrics: None
Tempo: None
Title: Back To You
Compilation: None
Disc number: 1
Disc count: 1
Track number: 1
Track count: 1
Artwork: <class 'bytes'>
Bit depth: 16
Bitrate: 1104053
Channel count: 2
Codec: flac
Sample rate: 44100

The file has a poorly formatted copyright string and is missing lyrics, tempo, and cover art information. We can fix this by querying the three APIs as we did in the previous example, and overwrite the existing metadata:

query = f"{audio_file.artist} {audio_file.title}".lower()

# iTunes Search API
itunes_results = client_itunes.search(query)["results"]
itunes_track = itunes_results[
    np.argmax(
        utility.levenshtein_ratio(
            query, 
            [f"{r['artistName']} {r['trackName']}".lower() 
             for r in itunes_results]
        )
    )
]
itunes_album = client_itunes.lookup(itunes_track["collectionId"])["results"][0]
audio_file.set_metadata_using_itunes(itunes_track, album_data=itunes_album, 
                                     overwrite=True)

# Spotify Web API
spotify_results = client_spotify.search(query, type="track")["items"]
spotify_track = spotify_results[
    np.argmax(
        utility.levenshtein_ratio(
            query, 
            [f"{r['artists'][0]['name']} {r['name']}".lower() 
             for r in spotify_results]
        )
    )
]
audio_file.set_metadata_using_spotify(
    spotify_track, 
    audio_features=client_spotify.get_track_audio_features(spotify_track["id"])
)

# Private TIDAL API
tidal_results = client_tidal.search(query)["tracks"]["items"]
tidal_track = next((r for r in tidal_results if r["isrc"] == audio_file.isrc), 
                   None)
tidal_composers = client_tidal.get_track_composers(tidal_track["id"])
tidal_lyrics = client_tidal.get_track_lyrics(tidal_track["id"])
audio_file.set_metadata_using_tidal(tidal_track, composers=tidal_composers, 
                                    lyrics=tidal_lyrics)
WARNING:root:Either lyrics are not available for this track or the current account does not have an active TIDAL subscription.

Let’s take another look at the file’s metadata:

print_metadata(audio_file)
Hide code cell output
Album: Ncs: The Best Of 2022
Album artist: Various Artists
Artist: Tobu
Comment: None
Composer: Tobu & Toms Burkovskis
Copyright: ℗ 2022 NCS
Date: 2022-11-25T12:00:00Z
Genre: Electronic
ISRC: GB2LD2210368
Lyrics: None
Tempo: 98
Title: Back to You
Compilation: False
Disc number: 1
Disc count: 1
Track number: 15
Track count: 16
Artwork: <class 'bytes'>
Bit depth: 16
Bitrate: 1104053
Channel count: 2
Codec: flac
Sample rate: 44100

Voilà! The metadata has been updated and is now complete. (Toms Burkovskis, otherwise known as Tobu, appears twice in the composer field because of the unique names. There is no elegant solution to this problem, unfortunately.)

As always, don’t forget to write the changes to the file:

audio_file.write_metadata()