Apparently it’s 4 weeks since my last post == sadly my music generation using RNN / LSTM didn’t go as well as I expected, nonetheless I have made some minor progress, thus I think is a good time to post an update.
First of all, I named this side project MeloDeep. Pun intended 🙂 By using deep learning and melody of a song to generate new music, I think that’s a good name haha (from a person who is very bad at naming, that’s my best shot)
As shared in the previous post, the main idea is use existing music, extract the features (notes, rests, duration etc.) and approach it as a supervised learning classification problem. Details of the code is in the github link here. Some of the main challenges are:
- I am using .midi audio file as the training data input. Audio information are digitally stored in midi format (compare to just waveform in mp3 etc.) which is easier to parse, but somehow I felt some information are lost depending on the quality of the audio file.
- I haven’t think of a good way to handle multiple tracks or channels (e.g. melody and harmony). Therefore the whole process from training to generating only handles melody note. That result in lost of information from original songs, as well as a generated song which lack of layers
- Not sure how can I incorporate the duration of a note into the generating step. In the end all the note I generated are having a constant duration (I can assign random duration but it doesn’t sounds good either…)
- Using the same song notes as seed to the generation step will result in highly similar song to the original one. Looks like an overfitting problem.
Nonetheless, This is my first attempt in using machine learning to generate music, it can only go better from here… right? 🙂 If you want to listen to the output generated from the MeloDeep v1, here’s the output which trained from Jay Chou’s 青花瓷, and generated with its first 10 notes.
It sounds very bad, but I promise future version will be better!