For vocals? Vocal Dataset Recording Specifications

What to play

<aside>

Play songs or musical pieces only

AI needs to learn the emotions, flow, dynamic changes, articulations within the musical contents.

❌ Don’t play:

Scales, one-shot notes
Whole track with only one single articulation all the time </aside>

<aside>

Play like you’re on stage

It is necessary to reflect the expressiveness and dynamics of the performance as much as possible in the recording, rather than precisely interpreting the score.

✅ Try your best to:

Don’t get limited by scores, we need the best performances rather accurate notes.
We’re not doing sampling for sample libraries, so please don’t play like a robot.
Try to reach extreme key and dynamic range: super high/low, super loud/quite. </aside>

<aside>

Use articulations in your tracks

AI needs to learn when and how to use articulations in a real performance.

We don’t need a track only for a certain articulation, but you need to insert as many articulations as you can in your play, even if the original score doesn’t include any.

Make sure all articulations cover a wide key range.
For main articulations, keep the amount portion over 10% in the whole dataset

</aside>

<aside>

Length of each track

80% tacks should be a full music 20% tracks should contain several short phrases, each phrase is 2 - 8 bars long We need 1 hour of recordings in total, but each track should be bounced as one audio file

</aside>

<aside>

You are the performer behind the model

The AI model learns everything from your recordings. It plays as you play, feels as you feel. It is your digital avatar.

</aside>

How to record

Tune your instrument before recording to ensure it is set to the standard A 440 pitch.
Follow the clicks of each song! It’s ok to have a few time offsets
Dry instrument tracks without reverb, delay, or other backing tracks
For polyphonic instruments, feel free to do polyphonic
No background noise or big room reflections.
No obvious instrumental leaks from your headphones. (Try to lower your headphone volume)
When two clips are connected, use cross-fade and do not cover any words, cross-fade over silence or breath or consonant only. No need to remove breaths.
There should be at least 1s long silence space at the beginning and end of each track.

Samples

Violin: