I’ve finally found some time to realize my concept written here. In the meantime I’ve also managed to connect external hard drive to my Raspberry Pi. I have a lot of problems with it and tried lots of approaches but finally it’s working (I will describe it in the next post).
Now my music library is available in Raspberry Pi 🙂 It’s quite a lot of mp3s so I decided to write a quick little script in Python that will fetch basic data about file name, location, size, song duration, bitrate and data from ID3 tags. I’m also computing checksum of file to discover duplicates. All this data is later stored in MySQL database.
At first I prepared a SQL script with table creation:
CREATE TABLE IF NOT EXISTS `music` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `file_name` text COLLATE utf8_unicode_ci NOT NULL, `file_format` text COLLATE utf8_unicode_ci NOT NULL, `file_location` text COLLATE utf8_unicode_ci NOT NULL, `file_checksum` text COLLATE utf8_unicode_ci NOT NULL, `file_size` INT(11) NOT NULL, `tag_title` text COLLATE utf8_unicode_ci NOT NULL, `tag_artist` text COLLATE utf8_unicode_ci NOT NULL, `tag_album` text COLLATE utf8_unicode_ci NOT NULL, `tag_year` text COLLATE utf8_unicode_ci NOT NULL, `tag_comment` text COLLATE utf8_unicode_ci NOT NULL, `tag_genre` text COLLATE utf8_unicode_ci NOT NULL, `tag_tracknumber` INT(11) NOT NULL, `song_length` INT(11) NOT NULL, `song_bitrate` INT(11) NOT NULL, `date_added` datetime NOT NULL, `health` INT(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1; |
The script is quite straightforward so I will explain it briefly. To fetch ID3 tags I’ve used Mutagen library and MySQLdb for database connection. Here is the code:
#!/usr/bin/env python # -*- coding: utf-8 -*- # @author Jaroslaw Szmigielski # @description Music Cataloger # @created 17.01.2013 import sys from os import listdir from os.path import isfile, isdir, join, getsize from mutagen.mp3 import MP3 import MySQLdb import subprocess import os import hashlib mysql_server = "localhost" mysql_user = "user" mysql_pass = "password" mysql_db = "dbname" def ConnectToDB(): try: conn = MySQLdb.connect(mysql_server, mysql_user, mysql_pass, mysql_db, charset='utf8', use_unicode=True) return conn.cursor() except MySQLdb.OperationalError, err: os._exit(1) def md5sum(filename): md5 = hashlib.md5() with open(filename,'rb') as f: for chunk in iter(lambda: f.read(128*md5.block_size), b''): md5.update(chunk) return md5.hexdigest() supported_extensions = ["mp3", "wav"] startpath = '/media/' directory_list = [] directory_list.append(startpath) if __name__ == "__main__": cursor = ConnectToDB() while(len(directory_list) > 0): popped = directory_list.pop() directories = [ f for f in listdir(popped) if isdir(join(popped, f)) ] for file in directories: directory_list.append(join(popped, file)) files = [ f for f in listdir(popped) if isfile(join(popped, f)) ] for file in files: if file[file.rfind(".")+1:] not in supported_extensions: continue file_location = join(popped,file) file_checksum = md5sum(file_location) #file_checksum = "" file_format = "mp3" file_name = file file_size = getsize(file_location) extension = file[file.rfind(".")+1:] audio = "" if(extension == "mp3"): audio = MP3(file_location) try: song_length = int(round(audio.info.length)) except: continue song_bitrate = int(round(audio.info.bitrate)) tag_title = "" tag_artist = "" tag_album = "" tag_year = "" tag_comment = "" tag_genre = "" tag_tracknumber = "" if "TIT2" in audio.keys(): tag_title = audio["TIT2"].text[0] if 'TPE1' in audio.keys(): tag_artist = audio['TPE1'].text[0] if 'TALB' in audio.keys(): tag_album = audio['TALB'].text[0] if 'TDRC' in audio.keys(): tag_year = audio["TDRC"].text[0] if "COMM" in audio.keys(): tag_comment = audio["COMM"].text[0] if "TCON" in audio.keys(): try: tag_genre = audio["TCON"].text[0] except IndexError,e: tag_genre = "" if "TRCK" in audio.keys(): tag_tracknumber = audio["TRCK"] try: tag_tracknumber = int(tag_tracknumber) except: tag_tracknumber = -1 file_location = file_location.decode('utf8').replace("'", u"\u0301") file_name = file_name.decode('utf8').replace("'", u"\u0301") tag_title = tag_title.replace("'", u"\u0301") tag_artist = tag_artist.replace("'", u"\u0301") tag_album = tag_album.replace("'", u"\u0301") tag_comment = tag_comment.replace("'", u"\u0301") tag_genre = tag_genre.replace("'", u"\u0301") execute = u"insert into music (file_name, file_format, file_location, file_checksum, file_size, tag_title, tag_artist, tag_album, tag_year, tag_comment, tag_genre, tag_tracknumber, song_length, song_bitrate, date_added, health) values " execute += u"('%s', '%s', '%s', '%s', %d, '%s', '%s', '%s', '%s' ,'%s', '%s', %d, %d, %d, NOW(), 0)" % (file_name, file_format, file_location, file_checksum, int(file_size), tag_title, tag_artist, tag_album, tag_year, tag_comment, tag_genre, int(tag_tracknumber), int(song_length), int(song_bitrate)) cursor.execute(execute) |
I needed to use utf-8 encoding because I have music in many different languages and the ID3 tags there vary a lot. Script uses simplified version of DFS algorithm to go through directories and it will visit all the locations in given path. I’m making replaces at the end to avoid >'< sign, that prevents from breaking SQL insert query. Sorry for the quality of source code, it had been written for about 20 min 😉
With this kind of data in your database you can easily get very interesting information about your library, for example amount of files, total time of songs or disk space that they use. To discover if there are any duplicates in your library use this SQL query:
SELECT c.* FROM music c JOIN (SELECT c.file_checksum FROM music c GROUP BY c.file_checksum HAVING COUNT(*) > 1) x ON x.file_checksum = c.file_checksum |
I’ve also set up mpd and Icecast server on Raspberry Pi to listen to my music collection wherever I am (together with MPDroid on Android device). I will describe it in later post.
I hope you’ve found this post useful 🙂 Try it on your own music library!