The idea that practicing behavioural analysts should collect and report reliability or interobserver agreement (IOA) in behavioural assessments is demonstrated by the Behavior Analyst Certification Board`s (BACB) assertion that behavioural analysts are responsible for the use of “different methods of evaluating the results of measurement methods such as inter-observer agreement, accuracy and reliability” (BACB, 2005). In addition, Vollmer, Sloman and St. Peter Pipkin (2008) argue that the exclusion of these data significantly limits any interpretation of the effectiveness of a behavioural change procedure. Validity requirements in a behavioural assessment study should therefore be conditional on the inclusion of insurance data (Friman, 2009). In light of these considerations, it is not surprising that a recent review of articles in the Journal of Applied Behavior Analysis (JABA) from 1995 to 2005 (Mudford, Taylor, Martin, 2009) revealed that 100% of articles reporting continuously recorded dependent variables contained IOA calculations. These data, as well as previously published reports on reliability procedures in JABA (Kelly, 1977), suggest that the inclusion of IOA is in fact a trademark – if not a standard – of behavioural evaluation. The three algorithms most chosen by behavioral analysis researchers to calculate the Continuous Recording Interobserver Agreement were used to assess the accuracy of data recorded by 12 observers from video samples on handheld computers. The rate and duration of the reaction were recorded for three samples. The data files were compared to data sets of criteria to determine the accuracy of the observer. Block-by-block tuning algorithms and exact tuning algorithms were subject to excessive tuning and accuracy estimates at lower rates and durations. The exact method of the agreement seemed too strict to respond to higher rates (23.5 responses per minute) and a higher relative duration (72% of the meeting).
The time slot analysis appeared to inflate the accuracy rating to a relatively high but not lower response rate and duration (4.8 responses per minute and 8% of the session). Trial-by-trial: Comparing the agreement between the different discrete studies, instead of the total number We performed analyses similar to those of Repp et al. (1976) by examining the correlation between changes in response rate and variations in percentage accuracy between computational algorithms.