For vocals? Vocal Dataset Recording Specifications
What to play
<aside>
Play songs or musical pieces only
AI needs to learn the emotions, flow, dynamic changes, articulations within the musical contents.
❌ Don’t play:
- Scales, one-shot notes
- Whole track with only one single articulation all the time
</aside>
<aside>
Play like you’re on stage
It is necessary to reflect the expressiveness and dynamics of the performance as much as possible in the recording, rather than precisely interpreting the score.
✅ Try your best to:
- Don’t get limited by scores, we need the best performances rather accurate notes.
- We’re not doing sampling for sample libraries, so please don’t play like a robot.
- Try to reach extreme key and dynamic range: super high/low, super loud/quite.
</aside>
<aside>
Use articulations in your tracks
AI needs to learn when and how to use articulations in a real performance.
We don’t need a track only for a certain articulation, but you need to insert as many articulations as you can in your play, even if the original score doesn’t include any.
- Make sure all articulations cover a wide key range.
- For main articulations, keep the amount portion over 10% in the whole dataset
</aside>
<aside>
Length of each track
80% tacks should be a full music
20% tracks should contain several short phrases, each phrase is 2 - 8 bars long
We need 1 hour of recordings in total, but each track should be bounced as one audio file
</aside>
<aside>
You are the performer behind the model
The AI model learns everything from your recordings. It plays as you play, feels as you feel. It is your digital avatar.
</aside>
How to record
- Tune your instrument before recording to ensure it is set to the standard A 440 pitch.
- Follow the clicks of each song! It’s ok to have a few time offsets
- Dry instrument tracks without reverb, delay, or other backing tracks
- For polyphonic instruments, feel free to do polyphonic
- No background noise or big room reflections.
- No obvious instrumental leaks from your headphones. (Try to lower your headphone volume)
- When two clips are connected, use cross-fade and do not cover any words, cross-fade over silence or breath or consonant only. No need to remove breaths.
- There should be at least 1s long silence space at the beginning and end of each track.



Samples
Violin: