JACK user documentation

What is JACK?

JACK is a low-latency audio server, written primarily for the GNU/Linux operating system. It can connect a number of different applications to an audio device, as well as allowing them to share audio between themselves. Its clients can run in their own processes (ie. as normal applications), or can they can run within the JACK server (ie. as a "plugin").

JACK is different from other audio server efforts in that it has been designed from the ground up to be suitable for professional audio work. This means that it focuses on two key areas: synchronous execution of all clients, and low latency operation.

This diagram, using ardour as an example, will give you an overview of how a JACKed Linux audio system works.

Jack has two sets of parameter options. The first part are specific to running the jack server. The second part are run time options for how jack interfaces with the sound driver - currently only ALSA.

The easiest way to start jack is to run this command:

jackd -d alsa -d hw:0

Of course that gives you very little control over what jack does to the audio stream and which device you use. You can specify a card name by setting up an .asoundrc file. Visit the online ALSA docs for your card/device to get one.

There are many useful options which can be found by typing

jackd -h or jackd -d alsa -h

Example commandlines

Many people with soundblaster live cards find that more appropriate settings are:

jackd -v -d alsa -d (cardnamehere) -p 512

People using RME cards have reported success with:

jackd -v -d alsa -d (cardnamehere) -p 64

A commandline for starting at 44100hz with verbosity, realtime scheduling, hardware monitoring, and shaped dither enabled:

jackd -v -R -d alsa -d (cardnamehere) -r 44100 -H -z s

You can use ecasound to generate a pure sine wave tone for testing the sound quality of your device.

ecasound -f:32,1,48000 -i null -o jack_alsa,myport -b:1024 -el:sine_fcac,440,1

There are a few other options which you will find useful.

JACK specific options

The default settings for jack are to run at 48000hz with a buffersize of 1024 frames per second and a period size of 2. Jack currently supports two bitrates. Jack's alsa driver/client tries to use SND_PCM_FMT_S32_LE, which is the format used by all current 24 bit audio cards except for some USB interfaces that actually use 24 bits rather than 24-packed-in-32-bits. If the device can't do that, it tries for SND_PCM_FMT_S16_LE, which every audio interface should/does support. True 24 bit format wouldn't be a lot of work to support, but its not trivial either.

The buffersize determines the latency between when the sound is received by jack and when it is sent to the pcm device (the card output). Obviously the less the buffersize the more realtime response you will have. Many people have found that for general purpose use the default setting is more than adequate but when you are doing recording you should set the buffersize as low as your card/device can handle without causing sound dropouts (xruns). Some people advocate using higher latency for recording to ensure smooth audio. This is a tradeoff between realtime response for monitoring and audio quality. It is recommended that you test your card and system to find out what the best setting is for your setup. 64 frames per interrupt is the lowest currently possible in any PC audio hardware. Due to the binary number system you should increase the frames in multiples of 2 starting at 64.

For example: 64, 128, 256, 512, 1024, 2048, 4096, 8192....

jackd -v -a -R -P -d

-v means verbose. It will output the actions that jack is performing to a console. This is very useful for debugging.
-a means to use the inbuilt ASIO support. This can only be enabled on cards that support ASIO. ASIO is a protocol developed by Steinburg the makers of many Microsoft audio applications. It allows for much lower latency performance internal to the soundcard/device.
-R means realtime. This allows you to take full advantage of the low latency patches for the Linux kernel. You should enable this if you are doing master recordings or want to ensure the applications will receive the audio stream as quickly as possible.
-P means Priority. This is superfluous to the -R flag but allows for setting the priority of jack to the maximum available. Also useful when you need low latency.
-d means driver. This sets the sound driver which jack intefaces with. Currently "alsa" is the only option.

Driver specific options

jackd -d alsa -d -r -p -n -H -C -D -C -z

Currently jack only has support for alsa as a sound driver. In the future there may be more driver options although it is not very likely.

-d means device. This allows you to specify a device other than hw:0
-r means sample rate. Use this to set the number of samples per second that the audio is streamed at. 44100Hz is cd quality, 48000Hz (the default) is DAT quality, Anything between 44100Hz and 192000Hz is DVD quality. The higher the sample rate the more audio data you capture per second and therefore the more space you use on your HDD. For many people CD quality is fine. The debate rages as to whether sample rates higher than 44100Hz provide better sound quality or not. Currently it is at a standoff until someone conducts conclusive double blind tests in the tradition of Pepsi vs Coke.

Many people only work at 44100Hz because resampling down from a higher sample rate is known to degrade the audio quality when compared to recording at 44100Hz originally. It is also highly likely that sample libraries you may want to use are only available at 44100Hz. Saying that, most people agree that acoustic recordings do generally sound better when recorded at higher sample rates. Unfortunately CD's are not going to dissapear soon and DVDRW's remain expensive so if you want to distribute your recordings it is more than likely that they will be shipped at 44100Hz.

-p means the frames per period. This is the buffer rate which JACK will stream audio at. See above for an explanation of what this means.
-n means periods per hardware buffer. This sets the number of periods per interrupt which ALSA polls for your device. Most cards use two periods but some use 3, 4 or even 8 or 16 (delta 10/10).

What is the exact purpose of the p and n parameters?

There are several kinds of latency:

p affects input latency: how long from when a piece of data arrives at the audio interface connectors until user space software can use it?

p*n affects output latency: how long from when a piece of data is delivered by user space data until it leaves the audio interface connectors?

Roundtrip latency is combination of these two.

Conventional low latency systems (e.g. ASIO) use n=2 all the time. ALSA is rather unusual in allowing other values.

-H means Hardware monitoring. This is only available with cards/devices that support this feature. Usually cards that support ASIO will support hardware monitoring. It allows you to hear the audio stream flowing through the pcm in/outs at that very moment. This is very good for hearing what you are recording as you are recording it.
-C means capture only. This opens the ALSA driver in read only mode which is useful for people who only want to record audio and don't have a need to hear what they are recording.
-D means duplex. This opens the ALSA driver in read/write mode which means that you can play and record at the same time. Most people will only want to use this which is the default mode anyway.
-P means playback only. This opens the ALSA driver in write only mode which is useful for people who have no inputs or only want to play audio not record. It can also reduce latency.

-z means dither. There are currently four options to the dither flag.

-z r means rectangular dither.

-z t means triangular dither.

-z s means shaped dither.

-z - means no dither(the default).

Dither is used to make the audio cleaner. The best way to describe it is to imagine a painting with many dots. If you view it up close you can see each dot and the image is not very clear. If you view it from far away the image becomes clearer because your eyes/brain dither the dots to smooth out the image. It is a murky subject and obviously a very personal choice as to what dither is the best. For most people it is just plain magic. Anyone running at 16bit who cares about quality or has CPU cycles to spare should run with dither. Triangular is probably the best compromise of quality vs cpu cost (its very fast), but shaped is the best.

Document prepared by Patrick Shirkey <pshirkey_at_boosthardware.com>
Thanks to everyone who contributes, wittingly or not...