$ pv
pv: the classic
pv [flags] < floatsams > floatsams
N: fft length [1024]
R: sampling rate [44100]
M: window size in samples [2048]
D: decimation factor in samples [256]
I: interpolation factor in samples [256]
P: pitch factor [0.]
t: threshold generator [.005]
s: synthesize analysis input
a: time alteration factor
v: verbose report
The expresion
pv [flags] < floatsams > floatsams
is a throwback to the CARL years. The term
floatsams is short for floating point samples, and the < > signs
indicate that the program in question (pv) receives a stream of floats
on its standard input and sends a stream of floats to its standard
output.
A few of these parameters appear regularly in POWERpv programs; we will briefly cover them here.
"N" is the FFT size in samples. Larger FFT sizes give better frequency resolution and worse time resolution.
"R" is the sampling rate of the sound to be analyzed.
"M" is the window size, which often works best as twice the FFT size. The window size must be at least as large as the FFT size, and must be related to it by a power of 2.
"D" is the decimation rate. This is the number of samples shifted into
the analysis window for each analysis/synthesis frame. In general the
lower this value, the higher the quality of synthesis (and the longer
the computation time). A good rule of thumb is that the decimation
rate should be no higher than M/8 for high quality
resynthesis. However D could theoretically be as high as M, for poor
quality, but perhaps sonically interesting resynthesis. Incidentally,
D need not be a power of 2.
"I" is the interpolation rate. This is the number of samples shifted out for each analysis/synthesis frame. When I differs from D, the duration of the output sample relative to input will be altered by the ratio I/D.
"P" is a pitch-scaling factor. A factor of 2 results in transposition
up one octave. P of 0.5 puts the sound down one octave. The P flag
also determines the kind of synthesis used. If P is zero, the sound
will be resynthesized with an inverse FFT. Otherwise, the resynthesis
is done by oscillator bank, which can be computationally more
intensive, but can also provide better resynthesis. Thus it can make
sense to use P = 1.0. The P parameter is independent from D and I. It
is possible to time-stretch a sound while also raising its pitch.
"t" is used to speed up processing when using oscillator bank
resyntheisis. The value for t provides a varying gate threshold
relative to the maximum bin amplitude in a given FFT
frame. Oscillators are turned off for any bin that has an energy level
below this threshold. When t is set to zero, all oscillators are on,
all the time, which gives the best quality synthesis, at higher
processing cost. High values of t can create interesting gating
effects. A value greater than 1.0 is not likely to be useful.
The "s" flag indicates that input to pv will be analysis frames
rather than samples. A utility called pvanal will later be introduced
to create stored files of FFT analysis data. It is up to you to make
sure that the same basic parameters used for analysis are also used
for synthesis. If you analyze with N = 1024, and resynthesize with N =
512, the results may be ... interesting (or not).
The "a" flag, specific to pv, is a convenience feature for specifying
the time alteration factor directly, rather than by calculating what D
and I must be. Values larger than 1.0 indicate time-stretch. Less than
1.0 indicate time-shrink.
The "v" flag reports basic parameters of synthesis and possibly other
information, depending on the program. This could be useful here if
you use the "a" flag and wish to know the precise values of D and I
selected by pv.
We conclude with an example.
fromsf -c1 my_snd.aif | pv -a.5 -P0.8 -t.001 -v | tosf -c1 my_pv_snd.aif
This run will lower the pitch by 20% (down a major third) while also
shrinking the duration in half, so the result is twice as fast as the
original. The verbose report will tell you that N is set to 1024, M is
set to 2048, D is set to 256 and I is set to 128.
back