Hawkesbury Radio Astronomy Observatory

Vela Pulsar Observations

Observing PSR B0833-45 with a Small Aperture Antenna

Technical Notes

Details of various technical activities in support of the observation activities as well as details of techniques used in processing the data.

RFI Survey

The presence of radio frequency interference (RFI) can mask the weak pulsar signals.  In-band RFI (within the observation bandwidth) can blot out the pulsar signals completely, especially pulsed RFI. Out-of-band RFI can also reduce the sensitivity if it is so strong that it passes through early-stage filtering and causes intermodulation products by overloading the front-end low noise amplifier and the following amplifier chain.  The frequency and strength of RFI signals determines the best frequency of observation.

Although the location of this observatory (HawkRAO) is semi-rural, the site is elevated on a 70m-high escarpment which has a good take-off to the city of Sydney (pop. 5 million) some 60 kms away.  While there are no close transmitters, the take-off to the city (and the nearby airforce base) provides a wide range of moderate-level RFI signals as is shown in the spectrum scan of 436 Mhz ±50 MHz to the right.

In the averaged spectrum scan (click to view larger graphic) the 20 MHz passband hump of the first LNA can be seen.  There are many carriers within the LNA passband as well as quite a few outside the passband.  There appears to be a clear gap centered on 436 MHz into which the 2.4 MHz passband of the RTL-SDR dongle receiver could fit.

Zooming in and viewing 436 MHz ±5 MHz (in peak hold mode to capture intermittent carriers) shows that in the range 436 MHz ±1.2 MHz (the nominal bandwidth of the RTL-SDR dongle receiver running at 2.4 Msps) there is a small number of peaks in the observation bandwidth (shaded green) over a time period of 30 minutes.  So while, not completely clear of RFI, this frequency (436 MHz) is relatively clear and so is chosen as the observation frequency.  If I lived in a rural location then the frequency chosen would likely be completely clear and I also would have a wider choice.

Digital AGC (RTL-SDR Receiver)

The RTL-SDR dongle drivers have the option to enable 'digital AGC'.  There has been a lot of discussion on whether this should be enabled or not for radio astronomy observations.  I have done tests using a simulated low SNR pulsar signal and have found no loss in sensitivity when digital AGC is enabled, despite warnings expressed elsewhere to the contrary.  Also I did one real data run (13th May) with 'digital AGC' disabled and noted there was no difference. However, care must still be exercised in setting the analog gain to ensure enough bits are exercised in the A−D converter.  The technique I use is to disable the digital AGC and set the gain to give a recording level which is roughly 1 to 2 dB below maximum.  Then when the digital AGC is enabled it lifts this up to the full range.  The time constant of the digital AGC is much longer than the on-pulse duration of any pulsar and so does not reduce the amplitude of the pulsar signal.   There is also the benefit that RFI noise spikes are clipped and small variations in the gain in the receiver chain are compensated for.

RTL-SDR Sampling Clock and Cooling Modifications

A high accuracy sampling clock is a critical requirement for sensitive detection of weak pulsar signals.  The sensitivity of the epoch-folding process is dependent on the sampling frequency being stable over the time of observation (sometimes up to 10 hours).

The actual accuracy need only be within a few ppm (this can be compensated for in the folding software), but the stability should be better than ±0.1 ppm over the observation time.  A good 0.5 ppm TCXO is sufficient - as long as it is located in a relatively stable temperature environment.

Test done years ago show that cooling the RTL2832U chip on the RTL-SDR dongle is essential for stable data sampling at the higher sample rates. To achieve this a piece of tinplate (from the lid of a salmon tin) fin has been soldered to the pad below the RTL2832U chip onto which a small 5V fan blows air.

The small PCB to the upper right holds a 9.6 MHz 0.5 ppm TCXO, followed by a frequency tripler to provide the 28.8 MHz dongle clock. It also holds the 5 V regulator for the dongle power supply.

Testing the stability and calibrating the accuracy of the sampling process is critical to success in detecting pulsars as well as providing the basis for solid verification of results.  In the case at HawkRAO, the sampling clock stability and accuracy have been tested against a Rubidium Frequency Source (RFS).

The stability of the sampling process is such that at 2.4 Msps the shape of a 2 ms wide 10 Hz simulated pulse derived from a Rubidium Frequency Source (RFS) is preserved over a 4 hours data run as shown below.

To prevent RFI coming down the USB cable from the PC, the dongle power is supplied through a separate 5 V regulator, rather than from the PC itself.

Modified 'RTL-SDR.EXE' Console Application

The original 'rtl-sdr.exe' console application was coded with 32-bit variables.  This placed a 4 Gbyte limit on the size of the output file.  This, in turn, placed a limit of about 15 minutes for the length of an observation run @ 2.4 Msps.  As observation runs for small aperture antennas are typically over 1 hour in duration, and possibly up to 12 hours, the code had to be modified to use 64-bit variables and re-compiled.  At the same time the 'Digital AGC' option was added (removed in some versions of the console application).

For reference, the recording time of 120 minutes @ 2.4 Msps used for observing Vela (B0833-45) for produces a daily IQ binary file about 35 Gb in size. Approximately 4 weeks observation fills up a 1 TByte drive.

Conversion of RTLSDR IQ Data to FILTERBANK Format

The conversion of the data to filterbank format is done for two main reasons, the first being that it allows de-dispersion to be done for different values of DM (i.e., a search in DM space for the best SNR), and the second being is that the FILTERBANK format is suitable for feeding to the SIGPROC and PRESTO application suites for producing a result familiar to professional astronomers.

A convenient side-effect is that the FILTERBANK file size is 1/10th the size of the raw IQ data file size (~3.5 Gb), allowing storage of years of data at a reasonable cost (< AUD$100/year).  The 32-channel FILTERBANK data contains all the information needed to document the daily observations. It allows frequency-based RFI excision, etc, to be done retrospectively, if so desired, in the future.

Filterbank Format

From the SIGPROC description document detailing the FILTERBANK conversion program (http://sigproc.sourceforge.net/sigproc.pdf)...

The filterbank program reads in the raw data files produced by the machine, dealing with the header information contained in the files and the channel ordering of the samples. The filterbank program outputs the data in the following way:
HEADER_START stream_of_header_parameters HEADER_END stream_of_data_values
The HEADER_START and HEADER_END literal character strings signal the start and finish of a stream of header parameters that describe the data. The default is to include these at the beginning of the data file.
The header variables have been restricted to key parameters for ease of use. Currently these are:
• telescope id (int): 0=fake data; 1=Arecibo; 2=Ooty... others to be added
• machine id (int): 0=FAKE; 1=PSPM; 2=WAPP; 3=OOTY... others to be added
• data type (int): 1=filterbank; 2=time series... others to be added
• rawdatafile (char []): the name of the original data file
• source name (char []): the name of the source being observed by the telescope
• barycentric (int): equals 1 if data are barycentric or 0 otherwise
• pulsarcentric (int): equals 1 if data are pulsarcentric or 0 otherwise
• az start (double): telescope azimuth at start of scan (degrees)
• za start (double): telescope zenith angle at start of scan (degrees)
• src raj (double): right ascension (J2000) of source (hhmmss.s)
• src dej (double): declination (J2000) of source (ddmmss.s)
• tstart (double): time stamp (MJD) of first sample
• tsamp (double): time interval between samples (s)
• nbits (int): number of bits per time sample
• nsamples (int): number of time samples in the data file (rarely used any more)
• fch1 (double): centre frequency (MHz) of first filterbank channel
• foff (double): filterbank channel bandwidth (MHz)
• nchans (int): number of filterbank channels
• nifs (int): number of separate IF channels
• refdm (double): reference dispersion measure (cm−3 pc)
• period (double): folding period (s)
A given header stream will contain most, but not necessarily all, of the above variables.

In the case of the HawkRAO data, only required data will be included in the header stream.

Replicating the Action of the SIGPROC Filterbank Program

The task of the FILTERBANK conversion code used at HawkRAO is to replicate the action of the SIGPROC filterbank program - producing a FILTERBANK format data file indistinguishable from that produced by the SIGPROC program.  This will allow further processing to be done by SIGPROC and PRESTO applications if needed.

The 'telescope id' identifies the source observatory of the data allowing the filterbank program to read the specific header information and data format for that particular observatory contained natively in the raw data stream from that observatory's hardware.  That is, the raw data stream from professional observatories already has a header prepending the raw data - the SIGPROC filterbank program needs only to input that header information and re-format it to the FILTERBANK header format.  In the case of the raw IQ data from the RTLSDR dongle, there is no header prepending the data - so the header data must be added 'artificially'.

Generating the Filterbank Header

In the case of the professional observatories, the required header information is 'married' to the observation data by the hardware settings being read and included in the raw data header.  The format of this raw data header is specific to each observatory. Example recognised machines are the Wide Band Arecibo Pulsar Processor (WAPP), the Penn State Pulsar Machine (PSPM), the Arecibo Observatory Fourier Transform Machine (AOFTM), the Berkeley Pulsar Processors (BPP), the Parkes/Jodrell 1-bit filterbanks (SCAMP) and the filterbank at the Ooty radio telescope (OOTY). Others have been added(?) since the documentation was written.  The SIGPROC filterbank program reads the data from these machines and searches for a specific observatory identification string or data bit pattern and, thus identified, includes a telescope ID and machine ID in the filterbank header.

For HawkRAO filterbank files, there needs to be a method of 'marrying' the artificially provided observatory header information to the observation data.  This is done by providing an external 'observation parameters' with a filename matching the observation data filename (except for the start MJD value - which of course is different for each observation).  For example a raw data IQ file might be acquired and assigned this filename...


...which contains the start MJD (57939.1104), observation frequency (436 MHz), target pulsar (B088-45), sample rate (2.4 Msps) and configuration ID (00).  At the beginning of the raw IQ -> filterbank format conversion a file named...


...is checked to exist and, if so, its contents are used to generate the filterbank file header.  An example observation parameter file (e.g., the file given above) contains this information in ASCII...

Source Name,B0833-45
Source RA (hhmmss.s),083520.6
Source DEC (ddmmss.s),-451034.8
Reference DM,67.99
Pulsar Period,0.089399
Number Of Channels,32
Highest Observation Frequency (MHz),437.1625
Channel Bandwidth (MHz),0.075
Channel Spacing (MHz),0.075
Observation Sampling Period (us),0.4166666666
Telescope ID,15
Machine ID,20
Clock Correction (ppm),1.301
Data Type,1

...which is parsed to fill the required fields in the filterbank header.  Note that this information is specific to HawkRAO B0833-45 pulsar observations and RTLSDR dongle frequency and sample rate settings.  The telescope ID = 15 is the value assigned to the HawkRAO observatory, while the machine ID = 20 refers to the RTLSDR dongle hardware.

The 'Observation Sampling Period' is expressed in us and is (sampling rate)-1 = (2.4 x 106)-1 = 0.4166666666 us.

The 'Pulsar Period' is a nominal value which is not used in calculations as the predicted period for the time of the observation is calculated and used instead.

The number of channels needs to be tailored to the combination of DM value for the pulsar and the 50% width (PW50) of the expected pulse profile.  The number of channels sets the delay resolution available during the de-dispersion process (described later).  For Vela DM = 67.99 the delay over the 2.4 MHz observation bandwidth (for the IQ quadrature sampling system of the RTLSDR dongle, the bandwidth = the sampling frequency) is ~16 ms.  The PW50 quoted for Vela is 2.1 ms, although due to scattering (a slightly different mechanism to dispersion) at 436 MHz it is likely to slightly wider - but using 2.1 ms is a conservative value.  In order for the dispersion delay resolution to be ~0.5 ms (fine enough for PW50 = 2 ms) the bandwidth needs to be split into 16 / 0.5 = 32 channels.  For other pulsar PW50 widths, DM values, and/or observation frequencies and bandwidths, a different number of channels will be optimum.  Note that the number of channels for the process used at HawkRAO needs to be a power of 2 - why will be described shortly.

Having determined the number of channels required, the values for 'Highest Observation Frequency' (437.1625 MHz), 'Channel Bandwidth' (0.075 MHz), and 'Channel Spacing' (0.075 MHz) need to be calculated by the following method...

In some system configurations it may be possible to space channels with a gap between them and so the channel spacing could be different to the channel bandwidth, but in the case of the HawkRAO data the channels are not physically different hardware data streams, but artificially produced in software.  In the HawkRAO case the channel spacing is equal to the channel bandwidth as the channels sit shoulder-to-shoulder across the observation bandwidth.

The channelised filterbank format is generated by taking the complex FFT of the IQ raw data.  Fourier analysis converts the time domain information into the frequency domain.  Fourier conversion of n-samples of time domain data outputs the total spectral power split into n-frequency bins (channels).  The bandwidth of each frequency bin is given by...

ChannelBW = (sampling period x number of samples)-1 = (0.4166666666 x 10-6 x 32)-1 = 0.075 MHz

The highest observation frequency is calculated by...

FrequencyHOF = RTLSDR tuned frequency + ((number of channels -1) / 2) x channel bandwidth)
= 436 + (15.5 x 0.075) = 437.1625 MHz.  Note that it is unclear in the documentation, unless I have missed it, whether the highest frequency refers to the top edge or the middle of the highest frequency channel.  I have taken it to be the latter.  In any case for the processing of Vela data in the present configuration (observation frequency and bandwidth) the difference is not significant.

Note that a clock correction value is included in the observation parameters file (e.g. 1.301 ppm).  This is not passed on to the filterbank header directly, but is used to pass a corrected sampling period to the header to account for errors in the TCXO-controlled sampling clock.  This means subsequent HawkRAO data processing does not have to account for the clock error.  in any case, the filterbank format header has no provision for passing a clock correction value as the sampling clock is assumed to be locked to some atomic standard and so the correction has to be applied to the passed sampling period.  Currently the sampling clock is not locked to an atomic source at HawkRAO, but in the future a Rubidium clock source will be implemented, in which case the clock correction would be 0.000 ppm.

Channelising the Data

Having generated and written out the header, the task of converting the data to 32 channels is performed.  This is done by the following process...

Note that the file size doesn't follow the data rate reduction directly as the format and precision of the data changes at each step.  That is, the original data file size for 120 minutes of observation time (which has two bytes/sample) is...

File SizeORIG = 120 x 60 x 2.4 x 106 x 2 = 3.456 x 1010 bytes = 32.19 Gb

..which is converted into filterbank format...

File SizeFB = (((File SizeORIG / number of samples in FFT block / number of bytes per IQ sample) * number of channels ) / downsample rate) * number of bytes in floating point value) = (((3.456 x 1010 / 32 / 2) x 32) / 20) x 4 = 3.456 x 109 = 3.19 Gb.  So while the data rate has been reduced by a factor of 640, the file size has been reduced by a factor of 10.

At this stage the filterbank format file can be input to PRESTO's 'prepfold' application to produce an output file of this format...

Click image for a larger view...

...which is produced from an early observation - 5th May 2017.

Note that PRESTO (and SIGPROC) applications need to be run under Linux.  I have set aside a small notebook running Lubuntu for this purpose.  Daily results are not fed to PRESTO and SIGPROC applications, but occasional runs are done for comparison with results from HawkRAO software processed data.

The next stage in processing involves de-dispersion and conversion into the time series format (TIM).  That process is described next.

De-dispersion of FILTERBANK Format Data to Time Series Format

Dispersion effects cause the pulse signal from the pulsar to arrive in the higher frequency channels before the lower frequency channels as the graphic (source unknown) for the Vela pulsar @ 435 MHz on the right shows.

The horizontal green lines show a bandwidth of 2.4 MHz centred on 435 MHz.  The graphic shows the pulse in the high end of the bandwidth arrives ~16 ms before the pulse in the low end (indicated by the vertical green lines).

If the channel data was summed as is, the resultant pulse profile would be as shown by the thin green trace.  Obviously the pulse shape has been smeared, but, more critically, the peak has been reduced, thereby significantly reducing the SNR.  In order to restore the pulse shape and SNR, the higher frequency channels data needs to be held for 0-16 ms to allow the lowest frequency data to arrive so that all the channels can be lined up to the same arrival time.

Programmatically this is done by constructing a floating point 2D array in memory with a width equal to the number of channels (32) and a depth equal to the number of samples covering the maximum delay (16 ms )...

Array depth = maximum delay x sample rate
= 16 x 10-3 x 3750 =  60.

Imagining the 2D array as one those box shelves like the image below...but 32 boxes across and 60 boxes high.

As each block of 32 channel sample data is read in from the filterbank format file, the contents of all the box rows are moved one level higher up (with the previous top row contents disappearing) and the newly read data is placed in the bottom row. Assuming the channels are in order from left-to-right from the lowest frequency channel to the highest, to line up all the channels in corrected arrival time (de-dispersed) the values in the boxes marked by the purple dotted line are summed and written out as one sample value (this is now the time series format). In this way the data from the higher frequency channels which have 'arrived' earlier are summed with the lowest frequency channel just 'arrived'.  This has the effect of rotating the skewed data position to being vertical on the time scale as shown in black on the dispersion graphic above where they all line up in time.

After this conversion the time series format (TIM) output file is 32 times smaller than the filterbank input file as 32 channels have been combined into 1 channel.

File SizeTIM = File SizeFB / number of channels = 3.456 x 109 / 32 = 1.08 x 108 = 103 Mb

The TIM format output file can be input to SIGPROC, but at HawkRAO the conversion goes one step further...

Conversion of TIM Format Data to Unsigned 8-bit Format

The de-dispersed data in the TIM format file can be fed to the Epoch Folding process, but in order to apply some simple RFI mitigation, the TIM format files are processed further.

RFI Mitigation

RFI mitigation can be done at any of the various stages of processing - right from the stage of processing the raw IQ data through to the TIM format.  What types of RFI mitigation techniques are available is dependent on the format of the data.  Simple clipping/blanking can be applied to all formats, but frequency-dependent RFI can only be addressed in the FILTERBANK format stage - where the data is split over a number of frequency channels.  RFI which presents itself at a fixed frequency in the observation bandwidth (i.e. restricted to only one or two channels) can be excised by discarding those channels and so preventing the RFI from progressing further down the processing chain.

At the time of writing (19 July 2017) only simple blanking RFI mitigation is applied to the data and is done at the very last stage of format conversion (TIM to unsigned 8-bit).  The TIM format data file is read and on the first pass the standard deviation and mean is calculated.  Then the TIM file is read a second time and where the data exceeds a set level of standard deviation (currently 3σ) the data is replaced by the mean.  The data is then offset and scaled to fit within the range 5 - 255 to be compatible with an unsigned 8-bit data value and written out to a 'filename.u8b' data file.  It is this unsigned 8-bit file that is used for the Epoch Folding process.

Note that the unsigned 8-bit format file does not have a header - it is data only.  Consequently the actual sample rate (with clock correction applied) is not included by default.  To ensure folding is done at the correct period, the corrected sample rate is pre-pended to the file as a double precision value (8-bytes).  The Epoch Folding process first reads that double precision value and stores it as the sample rate and then reads the unsigned 8-bit data for folding which follows it.

This last conversion drops the file size by a factor of 4 (single precision float to unsigned 8-bit) and so the final filesize is...

File SizeU8B = File SizeTIM / 4 = 1.08 x 108 / 4 = 2.7 x 107 = 25.75 Mb

Archiving Data

For archiving data the small size of the unsigned 8-bit files is attractive - however, that format has had de-dispersion and RFI mitigation processing done at set levels.  This processing cannot be 'unwound' and re-done at different settings at a later date - say in the case where it is found that different levels of RFI excision are optimum in the future.  Consequently, currently both the filterbank and time series format files are archived as well as the unsigned 8-bit data files to allow re-processing in the future.