Basic Parameters for POWERpv

Basic Parameters for POWERpv

We will continue with the POWERpv program pv for the moment. We have already seen the "-a" flag in action. What other parameters are available? We can find out by simply typing the program name.

$ pv
pv: the classic
pv [flags] < floatsams > floatsams
     N: fft length [1024]
     R: sampling rate [44100]
     M: window size in samples [2048]
     D: decimation factor in samples [256]
     I: interpolation factor in samples [256]
     P: pitch factor [0.]
     t: threshold generator [.005]
     s: synthesize analysis input
     a: time alteration factor
     v: verbose report

The expresion

pv [flags] < floatsams > floatsams

is a throwback to the CARL years. The term floatsams is short for floating point samples, and the < > signs indicate that the program in question (pv) receives a stream of floats on its standard input and sends a stream of floats to its standard output.

A few of these parameters appear regularly in POWERpv programs; we will briefly cover them here.

  • "N" is the FFT size in samples. Larger FFT sizes give better frequency resolution and worse time resolution.

  • "R" is the sampling rate of the sound to be analyzed.

  • "M" is the window size, which often works best as twice the FFT size. The window size must be at least as large as the FFT size, and must be related to it by a power of 2.

  • "D" is the decimation rate. This is the number of samples shifted into the analysis window for each analysis/synthesis frame. In general the lower this value, the higher the quality of synthesis (and the longer the computation time). A good rule of thumb is that the decimation rate should be no higher than M/8 for high quality resynthesis. However D could theoretically be as high as M, for poor quality, but perhaps sonically interesting resynthesis. Incidentally, D need not be a power of 2.

  • "I" is the interpolation rate. This is the number of samples shifted out for each analysis/synthesis frame. When I differs from D, the duration of the output sample relative to input will be altered by the ratio I/D.

  • "P" is a pitch-scaling factor. A factor of 2 results in transposition up one octave. P of 0.5 puts the sound down one octave. The P flag also determines the kind of synthesis used. If P is zero, the sound will be resynthesized with an inverse FFT. Otherwise, the resynthesis is done by oscillator bank, which can be computationally more intensive, but can also provide better resynthesis. Thus it can make sense to use P = 1.0. The P parameter is independent from D and I. It is possible to time-stretch a sound while also raising its pitch.

  • "t" is used to speed up processing when using oscillator bank resyntheisis. The value for t provides a varying gate threshold relative to the maximum bin amplitude in a given FFT frame. Oscillators are turned off for any bin that has an energy level below this threshold. When t is set to zero, all oscillators are on, all the time, which gives the best quality synthesis, at higher processing cost. High values of t can create interesting gating effects. A value greater than 1.0 is not likely to be useful.

  • The "s" flag indicates that input to pv will be analysis frames rather than samples. A utility called pvanal will later be introduced to create stored files of FFT analysis data. It is up to you to make sure that the same basic parameters used for analysis are also used for synthesis. If you analyze with N = 1024, and resynthesize with N = 512, the results may be ... interesting (or not).

  • The "a" flag, specific to pv, is a convenience feature for specifying the time alteration factor directly, rather than by calculating what D and I must be. Values larger than 1.0 indicate time-stretch. Less than 1.0 indicate time-shrink.

  • The "v" flag reports basic parameters of synthesis and possibly other information, depending on the program. This could be useful here if you use the "a" flag and wish to know the precise values of D and I selected by pv.
  • We conclude with an example.

    fromsf -c1 my_snd.aif | pv -a.5 -P0.8 -t.001 -v | tosf -c1 my_pv_snd.aif

    This run will lower the pitch by 20% (down a major third) while also shrinking the duration in half, so the result is twice as fast as the original. The verbose report will tell you that N is set to 1024, M is set to 2048, D is set to 256 and I is set to 128.

    back