Tile Health Monitor
Overview
tile_health_monitor.py contains a class which contains methods for monitoring and control of TPM Health Monitoring.
Note
This class is inherited by tile.py, but is seperate for increased readability.
Python Class & Methods Index
Hardware functions for monitoring of TPM hardware health status.
This depends heavily on the pyfabil low level software and specific hardware module plugins.
- class pyaavs.tile_health_monitor.TileHealthMonitor[source]
Tile Health Monitor Mixin Class, must be inherited by Tile Class
- all_monitoring_categories()[source]
Returns a list of all monitoring point ‘categories’. Here categories is a super-set of monitoring points and is the full list of accepted strings to set_monitoring_point_attr. For example, these monitoring points: voltages.5V0 io.udp_interface.crc_error_count.FPGA0
would have these associated categories: ‘voltages’ ‘voltages.5V0’ ‘io’ ‘io.udp_interface’ ‘io.udp_interface.crc_error_count’ ‘io.udp_interface.crc_error_count.FPGA0’
More info at https://confluence.skatelescope.org/x/nDhED
- Returns:
list of categories
- Return type:
list of strings
- all_monitoring_points()[source]
Returns a list of all monitoring points by finding all leaf nodes in the lookup dict that have a corresponding method field.
The monitoring points returned are strings produced from ‘.’ delimited keys. For example: ‘voltages.5V0’ ‘io.udp_interface.crc_error_count.FPGA0’
More info at https://confluence.skatelescope.org/x/nDhED
- Returns:
list of monitoring points
- Return type:
list of strings
- check_ad9528_pll_status()[source]
Status of TPM AD9528 PLL chip
This method returns lock status True if both PLLs in the AD9528 are locked. The lock loss counter increments for a loss of lock event on either PLL.
- Returns:
current lock status and lock loss counter value
:rtype tuple
- check_adc_pll_status(adc_id=None)[source]
Status of ADC PLL.
This method returns a tuple, True if the lock of the PLL is up and True if no loss of PLL lock has been observed respectively.
NOTE: AD9680 used in TPM 1.2 does not support loss of lock bit, only current lock status. Will return None.
A dictionary is returned with an entry for each ADC.
- Returns:
(True, True) if lock is up and no loss of lock
:rtype dict of tuple
- check_adc_sysref_counter(adc_id=None, show_info=True)[source]
Checks ADC sysref counter is incrementing. Sysref counter increments for each sysref event and overflows at 255 ~ every 3.28ms.
Returns True if counter is incrementing for a given ADC. Returns a dictionary of bool, one for each ADC.
Will retry for 1 second until two readings can be taken in under 3ms to guarantee no overflow.
For debugging, if show info is enabled then each counter reading will be displayed along with the elapsed time.
- Parameters:
adc_id (integer) – Specify which ADC, 0-15, None for all ADCs
show_info (bool) – displays info messages
- Returns:
True if sysref counter incrementing
:rtype dict of bool
- check_adc_sysref_setup_and_hold(adc_id=None, show_info=True)[source]
Status of the ADC status and hold monitor. Returns True if no setup or hold error for a given ADC. Returns a dictionary of bool, one for each ADC.
If show info enabled then desciptions from AD9695/AD9680 documentation are also displayed to explain the value of the setup and hold monitor.
- Parameters:
adc_id (integer) – Specify which ADC, 0-15, None for all ADCs
show_info (bool) – displays info messages about current setup/hold
- Returns:
True if timing requirements OK
:rtype dict of bool
- check_clock_manager_status(fpga_id=None, name=None)[source]
Check status of named TPM clock manager cores (MMCM Core). Reports the status of each MMCM lock and its lock loss counter.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
name (string) – Specify name of clock manager (non case sensitive)
- Returns:
Status and Counter values
:rtype dict
- check_clock_status(fpga_id=None, clock_name=None)[source]
Check status of named TPM clocks Options ‘jesd’, ‘ddr’, ‘udp’ Input is non case sensitive An FPGA ID can be optionally specified to only check status on one FPGA
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
clock_name (string) – Specify name of clock or None for all clocks
- Returns:
True when Status is OK, no errors
:rtype bool
- check_ddr_initialisation(fpga_id=None)[source]
Check whether DDR has initialised.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
true if all OK
- Return type:
bool
- check_ddr_parity_error_counter(fpga_id=None)[source]
Check status of DDR parity error counter - used only with station beamformer
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
counter values
- Return type:
dict
- check_ddr_reset_counter(fpga_id=None, show_result=True)[source]
Check status of DDR user reset counter - increments each falling edge of the DDR generated user logic reset.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
show_result (bool) – prints error counts on logger
- Returns:
counter values
- Return type:
dict
- check_f2f_hard_errors()[source]
Check F2F for hard errors. Asserted until the core resets.
- Returns:
hard_err register value
- Return type:
integer
- check_f2f_pll_status(core_id=None, show_result=True)[source]
Check current F2F PLL lock status and for loss of lock events.
- Parameters:
core_id (integer) – Specify which F2F Core, 0,1, or None for both cores
show_result (bool) – prints error counts on logger
- Returns:
current status and counter values
- Return type:
dict
- check_f2f_soft_errors()[source]
Check F2F for soft errors. Asserted for a single user_clk period.
- Returns:
soft_err register value
- Return type:
integer
- check_global_status_alarms()[source]
Wrapper for tpm get_global_status_alarms method Returns none if tpm version is 1.2
- Returns:
alarm status dict
- Return type:
dict
- check_jesd_lane_error_counter(fpga_id=None, core_id=None)[source]
Check JESD204 lanes errors. Checks the FPGA link error counter register.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
core_id (integer) – Specify which JESD Core, 0,1, or None for both cores
- Returns:
true if all OK
- Return type:
bool
- check_jesd_lane_status(fpga_id=None, core_id=None)[source]
Check JESD204 lanes errors. Checks the FPGA link error counter register.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
core_id (integer) – Specify which JESD Core, 0,1, or None for both cores
- Returns:
true if all error counters are 0
- Return type:
bool
- check_jesd_link_status(fpga_id=None, core_id=None)[source]
Check if JESD204 lanes are synchronized. Checks the FPGA sync status register.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
core_id (integer) – Specify which JESD Core, 0,1, or None for both cores
- Returns:
true if all OK
- Return type:
bool
- check_jesd_qpll_status(fpga_id=None, show_result=True)[source]
Check JESD204 current status and for loss of QPLL lock events. Checks the FPGA qpll lock and qpll lock loss counter registers (shared between JESD cores).
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
show_result (bool) – prints error counts on logger
- Returns:
current status and counter value tuple
- Return type:
dict
- check_jesd_resync_counter(fpga_id=None, show_result=True)[source]
Check JESD204 for resync events. Checks the FPGA resync counter register (shared between JESD cores).
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
show_result (bool) – prints error counts on logger
- Returns:
counter values
- Return type:
dict
- check_pps_status(fpga_id=None)[source]
Check PPS is detected and PPS period is as expected. Firmware counts number of cycles between PPS and sets an error flag if the value does not match the pps_exp_tc register.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
true if all OK
- Return type:
bool
- check_station_beamformer_status(fpga_id=None)[source]
Check status of Station Beamformer error flags and counters.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
show_result (bool) – prints error counts on logger
- Returns:
True when Status is OK, no errors
:rtype bool
- check_tile_beamformer_status(fpga_id=None)[source]
Check tile beamformer error flags. :param fpga_id: Specify which FPGA, 0,1, or None for both FPGAs :type fpga_id: integer
- Returns:
True when Status is OK, no errors
:rtype bool
- check_udp_arp_table_status(fpga_id=None, show_result=True)[source]
Check UDP ARP Table has been populated correctly. This is a non- destructive version of the method check_arp_table.
- Parameters:
show_result (bool) – prints ARP table contents on logger
- Returns:
true if each FPGA has at least one entry valid and resolved.
- Return type:
bool
- check_udp_bip_error_counter(fpga_id=None)[source]
Check UDP interface for BIP errors.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
counter values
- Return type:
dict
- check_udp_crc_error_counter(fpga_id=None)[source]
Check UDP interface for CRC errors.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
counter values
- Return type:
dict
- check_udp_decode_error_counter(fpga_id=None)[source]
Check UDP interface for 66b64b decoding errors.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
counter values
- Return type:
dict
- check_udp_linkup_loss_counter(fpga_id=None, show_result=True)[source]
Check UDP interface for linkup loss events.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
show_result (bool) – prints error counts on logger
- Returns:
counter values
- Return type:
dict
- check_udp_status(fpga_id=None)[source]
Check for 40G errors.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
true if all OK (all error counters are 0)
- Return type:
bool
- clear_clock_manager_status(fpga_id=None, name=None)[source]
Clear status of named TPM clock manager cores (MMCM Core). Used to reset MMCM lock loss counters.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
name (string) – Specify name of clock manager (non case sensitive)
- clear_clock_status(fpga_id=None, clock_name=None)[source]
Clear status of named TPM clocks Used to Clear error flags in FPGA Firmware Options ‘jesd’, ‘ddr’, ‘udp’ Input is non case sensitive An FPGA ID can be optionally specified to only clear status on one FPGA
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
clock_name (string) – Specify name of clock or None for all clocks
- clear_ddr_reset_counter(fpga_id=None)[source]
Reset value of DDR user reset counter.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- clear_f2f_pll_lock_loss_counter(core_id=None)[source]
Reset value of F2F PLL lock loss counter.
- Parameters:
core_id (integer) – Specify which F2F Core, 0,1, or None for both cores
- clear_jesd_error_counters(fpga_id=None)[source]
- Reset JESD error counters.
JESD Error Counter
JESD Resync Counter (shared between JESD cores)
JESD QPLL Lock Loss Counter (shared between JESD cores)
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- clear_pps_status(fpga_id=None)[source]
Clear PPS error flags.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- clear_station_beamformer_status(fpga_id=None)[source]
Clear status of Station Beamformer error flags and counters. Including DDR parity error counter.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- clear_tile_beamformer_status(fpga_id=None)[source]
Clear tile beamformer error flags. :param fpga_id: Specify which FPGA, 0,1, or None for both FPGAs :type fpga_id: integer
- Returns:
True when Status is OK, no errors
:rtype bool
- clear_udp_status(fpga_id=None)[source]
Reset 40G error counters.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- disable_clock_monitoring(fpga_id=None, clock_name=None)[source]
Disable clock monitoring of named TPM clocks Options ‘jesd’, ‘ddr’, ‘udp’ Input is non case sensitive An FPGA ID can be optionally specified to only disable monitoring on one FPGA
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
clock_name (string) – Specify name of clock or None for all clocks
- enable_clock_monitoring(fpga_id=None, clock_name=None)[source]
Enable clock monitoring of named TPM clocks Options ‘jesd’, ‘ddr’, ‘udp’ Input is non case sensitive An FPGA ID can be optionally specified to only enable monitoring on one FPGA
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
clock_name (string) – Specify name of clock or None for all clocks
- get_available_clocks_to_monitor()[source]
- Returns:
list of clock names available to be monitored
:rtype list of string
- get_available_currents(fpga_id=None)[source]
Get list of available current measurements for TPM.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
TPM current names
- Return type:
list
- get_available_voltages(fpga_id=None)[source]
Get list of available voltage measurements for TPM.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
TPM voltage names
- Return type:
list
- get_current(fpga_id=None, current_name=None)[source]
Get current measurements for TPM.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
current_name (string) – Specify name of current, None for all currents
- Returns:
TPM currents
- Return type:
dict
- get_fpga_temperature(fpga_id=None)[source]
Get FPGA temperature.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
- Returns:
FPGA temperature
- Return type:
dict
- get_health_status(**kwargs)[source]
Returns the current value of TPM monitoring points with the specified attributes as set in the method set_monitoring_point_attr. If no arguments given, current value of all monitoring points is returned.
For example: If configured with: tile.set_monitoring_point_attr(‘io.udp_interface’, my_category=’yes’, my_other_category=87)
Subsequent calls to: tile.get_health_status(my_category=’yes’, my_other_category=87)
would return only the health status for: io.udp_interface.arp io.udp_interface.status io.udp_interface.crc_error_count.FPGA0 io.udp_interface.crc_error_count.FPGA1 io.udp_interface.bip_error_count.FPGA0 io.udp_interface.bip_error_count.FPGA1 io.udp_interface.decode_error_count.FPGA0 io.udp_interface.decode_error_count.FPGA1 io.udp_interface.linkup_loss_count.FPGA0 io.udp_interface.linkup_loss_count.FPGA1
A group attribute is provided by default, see tpm_1_X_monitoring_point_lookup. This can be used like the below example: tile.get_health_status(group=’temperatures’) tile.get_health_status(group=’udp_interface’) tile.get_health_status(group=’io’)
Full documentation on usage available at https://confluence.skatelescope.org/x/nDhED
- get_voltage(fpga_id=None, voltage_name=None)[source]
Get voltage measurements for TPM.
- Parameters:
fpga_id (integer) – Specify which FPGA, 0,1, or None for both FPGAs
voltage_name (string) – Specify name of voltage, None for all voltages
- Returns:
TPM voltages
- Return type:
dict
- init_health_monitoring()[source]
Method to load monitoring point lookup dict into attribute.
TPM monitoring point format and lookup loaded from: tpm_1_2_monitoring_point_lookup.py tpm_1_6_monitoring_point_lookup.py
- set_monitoring_point_attr(path, override=True, **kwargs)[source]
Specify attributes for a monitoring point or subset of monitoring points. Specified by path, a string name produced from ‘.’ delimited keys of the lookup dict. All available options returned from all_monitoring_categories().
See https://confluence.skatelescope.org/x/nDhED for example usage.
- Parameters:
path (str) – Monitoring point path (i.e any of:’io.udp_interface.crc_error_count’, ‘io.udp_interface’, ‘timing’, ‘io’)
override (bool) – Overrides the specified attribute if true, if False appends
**kwargs –
key word args (i.e rate=’fast’ or rate=8, group=’my_group’ or group=[‘my_group1’, ‘my_group2’])
- pyaavs.tile_health_monitor.health_monitoring_compatible(func)[source]
Decorator method to check if provided firmware supports TPM health monitoring. Achieved by attempting to access a register which was added for TPM health monitoring. Bitstreams generated prior to ~03/2023 will not support TPM health monitoring.