Introduction
For decades, within-household selection has been a standard component of probability-based telephone surveys. However, within-household selection methods that were designed to result in more representative samples from landline frames may no longer be as effective when combined with the demographically different cell phone frame. The present study evaluates 11 selection methods and their resulting response rates, accuracy of the selection, demographic representativeness, and substantive results. The landline samples were then combined with a cell phone sample to assess demographic and substantive differences between treatment groups.
SELECTION METHODS
Within household selection techniques can be grouped into three main categories, with the probability component being the key distinction: probability methods, quasi-probability methods and nonprobability methods (Gaziano 2005). Methods vary in terms of complexity and invasiveness, adherence to true probabilistic methodology, and their utility for obtaining interviews from traditionally underrepresented demographics.
Probability-based methods enumerate the adult members of the household, have a random component for selecting the respondent in each household, and every adult member of the household has a known probability of selection (Kish 1949, 1965). While probability-based methods may result in unbiased selection, the enumeration of all household members can seem intrusive and burdensome to respondents (Paisley and Parker 1965; Salmon and Nichols 1983) and may result in nonresponse bias (Binson, Canchola, and Catania 2000; O’Rourke and Blair 1983).
Quasi-probability selection traditionally employs a next or last birthday method. While birthday methods have a random component for selection, if any variables of interest are related to the date of birth, the selection may bias those estimates (Gaziano 2005).
Nonprobability methods include techniques that are designed to simplify the selection process, such as selecting from only the people at home and/or to improve the age and gender distribution of the sample. These methods include “in household” methods that select a person within the household and methods that select among only the people “at home.” All nonprobability methods lack a true random component, and certain members of the household may have no opportunity to be selected.
Data and Methods
SELECTION METHODS
This study compared one probability method, four quasi-probability, and six nonprobability methods.
The probability method that we used in this experiment was a variant of the Rizzo-Brick-Park (R-B-P) method (Rizzo, Brick, and Park 2004). Single-member households did not require a selection component, respondents in two-member households were selected randomly by a computer, and the Kish grid was used to select respondents in households with three or more members (Kish 1949, 1965).
The four quasi-probability methods we used were: most recent birthday, last birthday, next birthday, and a multipart most recent birthday.
We tested three “in-household” and three “at-home” nonprobability methods: youngest adult in-household, youngest male/youngest female (YMYF) in-household, YMYF at-home, multipart YMYF in-household, multipart YMYF at-home, and youngest male/oldest female (YMOF) at-home.
Gender was assigned at random in the YMYF methods, and if an adult of the selected gender did not reside in the household, the interviewer asked for the youngest adult of the other gender. For the in-household version of YMYF, if the selected person was not available, the interviewer scheduled a call back. In the at-home version, if the selected gender was not available, the interviewer asked to speak with the youngest adult of the other gender.
The YMOF method always asked for the youngest adult male first. If an adult male did not live in the household or was not at home, the interviewer asked to speak to the oldest female at home. If the male was at home but was not available to complete the interview, a call-back was scheduled.
Within these groups, we also tested single-question and multipart question formats. Figure 1 shows the exact wording we used for each of these methods. We hypothesized that respondents may find it difficult to process all the information that is typically included in within-household selection items. Take for example the following request: "May I please speak to the person, 18 years of age or older, living in this household, who had the most recent or last birthday?" In that single question, respondents must consider three components: the required age, residency in the household and birthday. Breaking the query into separate items may help the respondent accurately comprehend and report the information required.
DATA COLLECTION
Gallup conducted 10,999 random-digit-dial (RDD) landline interviews from June 24–July 19, 2013. Overall, the study had an 11.3% response rate (AAPOR – RR3). Interviews were conducted in English and Spanish with adults age 18 and older living in the United States. Sampled telephone numbers were randomly assigned to the treatment group prior to data collection. A three call design was used, and the selection method remained the same for the household for all contact attempts.
The survey was approximately five minutes in length and included nine political and economic questions and 14 demographic items. The final question was a household roster. Respondents were asked to report the number of adults in the household and the age, gender and birth date (month and day) of each adult.
INTERVIEWERS
Selection methods were randomly assigned to the sample case, not the interviewer. A chi-square test of independence by interviewer and completed selection method was not significant (X2=525.683, df=520, p =.422) confirming that the assignment mechanism worked and that the association of selection treatment group and an interviewer’s completed interviews did not depart from randomness. In other words, there were not instances where some interviewers more successfully completed interviews with some treatment groups than with others.
CELL PHONE SAMPLE
To create a dual frame sample for each of the 11 treatment groups, we randomly selected a sample of 1,000 completed cell phone interviews that were conducted on Gallup’s daily tracking survey June 24–July 19, 2013. This survey has an average response rate of 10.5% (AAPOR RR3) and uses a similar methodology, survey content, and call design as the within household experiment, making it an ideal source of cellular cases. Cell phones were treated as personal devices, and the individual who answered the phone was asked to complete the survey.
Results
Response rates (AAPOR-RR3), refusal rates (AAPOR – REF2), cooperation rates (AAPOR – COOP1) and contact rates (CON2) are presented in Table 1 (American Association for Public Opinion Research 2016).
All refusals occurred before selection was completed. Refusals include hard and soft refusals that were unresolved at the end of the field period. Although some respondents may desire a call-back, or the selected respondent is not available at the time of the call, some individuals may request a call-back as a polite way of declining the interview (an implied refusal), for example saying, “Now isn’t a good time,” instead of explicitly refusing. The sample management system did not distinguish between types of call-backs, and we elected to be conservative with refusal rates and include call-backs in refusal rates.
Most recent birthday and the three nonprobability at-home methods had the highest response rates—all exceeding 12%. These methods also had among the lowest refusal rates and highest cooperation rates. The R-B-P method and the YMYF in-household methods had lower response rates than the other methods, which can be explained in part by lower cooperation rates. This finding for R-B-P, which used a Kish grid for households with more than two adults, is consistent with past methods comparisons which found that respondents are more likely to break off when the Kish method is used (American Association for Public Opinion Research 2016; Gaziano 2005; O’Rourke and Blair 1983).
The response rates for multiquestion selection methods were nearly identical to those that used the same selection method with only a single question screener. It does not appear that breaking the selection process down into a series of shorter questions positively or negatively affected response rates.
ACCURACY
The birth month and day, age, and gender of household members collected on the household roster were compared against the selected household member and the date of interview to determine if the correct household member completed the interview. The selection check requires an assumption that the household roster is accurate, which may not always be the case.
At-home methods were excluded from the analysis because the roster did not ask which household members were home during the interview. The R-B-P method was also excluded from analysis. Since the computer randomly selected the household member to be interviewed (in cases of 2+ person households) an error in the R-B-P method was not possible. It is important to point out that this does not mean the R-B-P results in perfect accuracy. This method still relies on an informant to give the interviewer accurate information. An informant may either be unable (Martin 2007) or unwilling (Tourangeau et al. 1997) to provide an accurate household roster, and previous work has found that errors do occur (Groves and Kahn 1979).
Accuracy results are displayed in Table 2. Households with only one member are, by default, accurate. Therefore, the table also includes an accuracy figure for households with two or more members.
The most recent birthday method was the least accurate, resulting in errors in nearly 30% of households with two or more members, which is consistent with other research on selection accuracy (Battaglia et al. 2008; Olson and Smyth 2014; Olson, Stange, and Smyth 2014). At the other end of the spectrum, the YMYF in household methods, when asked as a single question or a multipart question, resulted in near perfect accuracy rates.
Breaking screening questions into multipart questions seemed to improve accuracy, although the findings are limited. The most recent birthday item had an accuracy rate of 71.9% when asked as a single item and 79.1% accuracy when broken out into a multipart question. The YMYF in household method had an already extremely high accuracy rate, so the gains with the multipart version are minimal.
DEMOGRAPHICS
In general, selection method demographics for the landline frame did not vary significantly by treatment, unless the method was designed to attempt to oversample a certain demographic characteristic. In these cases, significant differences in gender and age were observed. YMOF at-home, which always asks for the available male first, yielded a sample that was 56% male (X2(10) = 73.552, p <.001). In contrast, the youngest adult in the household method yielded a sample that was 33% male.
The nonprobability methods designed to target younger adults all successfully recruited more young people into the survey than the probability and quasi-probability methods (X2(30) = 102.769, p <.001). This finding is consistent with recent within-household selection research (Olson, Stange, and Smyth 2014). However, all methods ultimately significantly underrepresented young people, which is consistent with other work comparing household selection techniques (Battaglia et al. 2008) and is not unexpected given the landline frame.
There were no significant differences between treatment groups by race and ethnicity [white: (X2(10) = 11.199,p =.342; black: (X2(10) = 10.737, p = .378; Hispanic: (X2(10) = 31.880, p = .373) and educational attainment (X2(50) = 61.55, p = .127], but there were differences by marital status (X2(10) = 86.459, p = .001) and household size (X2(20) = 119.427, p <.001).
Integrating a Cell Phone sample
To construct a sample that mimics a dual-frame study, a random sample of 1,000 cell phone interviews was drawn and combined with the landline samples. The same cell phone cases were combined with each of the within-household selection treatment samples to yield a sample of 50% cell phone and 50% landline completes. Since the same cell phone respondents were combined with each treatment group, any variation between treatment groups is due to the variability in the landline sample.
Two sets of weights were created for each of the 11 landline and cell phone samples, following procedures outlined by Kennedy (Kennedy 2008). The first weight included adjustments for probability of selection, nonresponse, and the overlapping dual frame design. The second weight is the final post-stratified weight, which matched targets from the U.S. Census Bureau by age, gender, ethnicity, race, population density, and region, and National Health Interview Survey (NHIS) estimates for phone status.
Results with dual frame weighting
Table 4 shows the demographic distributions for the landline/cell phone samples by each of the selection methods, prior to post-stratification. There are very few significant differences between methods when combined with cell phone sample. Gender was significantly different across groups (X2(10) = 18.099, p = .05). YMOF resulted in significantly more males than any other treatment group. In fact, combining this method with a cell phone sample only exaggerated the overrepresentation of males. When combined with the cellphone frame, youngest adult in the household still underrepresented males. The differences in household size also remained significant across treatment groups (X2(20) = 45.199, p = .001) when combined with the cell phone sample.
Although youngest adult in the household continued to perform best in terms of age, this was not statistically different from the other treatment groups (X2(50) = 31.00, p = .98).Treatment group comparisons of Hispanic (X2(30) = 10.826, p=.999); white (X2(30) = 3.889, p = 1.00); black (X2(30) = 3.379, p= 1.00); education level (X2(50) = 23.441, p= 1.00); and marital status (X2(50) = 31.443, p= .981) were also not significantly different when combined with the cell phone sample.
Results with poststratification weighting
Although it is useful to look at characteristics of the underlying unweighted sample, the demographic differences between treatment groups may be of little consequence once poststratification weighting is applied. In fact, once the sample is poststratified, significant age, gender and household size differences no longer exist.
SUBSTANTIVE RESULTS
This analysis also explored differences in two key attitudinal variables by selection method—approval of the president and party identification—by selection method.
There were no statistically significant differences between groups on job approval or party identification in the unweighted landline-only samples. This finding is consistent with Gaziano’s meta-analysis (Gaziano 2005), which also concluded that the various selection methods resulted in very few differences on key substantive items.
Although many of the presidential approval and party identification estimates changed when the landline sample was combined with the cell phone sample, there were no significant differences between treatment groups, for either the sample weighted for selection probability or with the final poststratification weight.
Conclusions
There is no clear evidence, based on the findings of this study, that one within-household selection method is preferable for landline frames when conducting a dual-frame telephone survey. When taking into account response rates, accuracy of selection, demographics, and substantive differences, there is not one method that clearly outperforms all others. A decision of which method to use may come down to other considerations not directly explored by this study, such as the target population, the need to adopt a probability-based method, and budgetary and timeline constraints.
Many nonprobabilistic methods attempt to compensate for nonresponse bias by purposefully selecting respondents who tend to be underrepresented, but after combining the landline sample with the cell phone sample, we found few significant demographic or substantive differences between the selection method groups. Given that the nonprobability methods had no clear advantage in terms of representativeness of the demographics, researchers may find it preferable to use a probability or quasi-probability based method, which adhere to the underlying probability assumptions of most sample surveys and statistical analyses.
Considering each method individually, we did find possible opportunities for improvement. Our modified R-B-P method deviated from the procedures outlined by Rizzo et al. (Rizzo, Brick, and Park 2004), and the refusal rates and average household size for this group indicate the Kish grid may have been too intrusive for households with more than three adults. Results may have differed if we had used a quasi-probability method for households with three or more adults, instead of the Kish grid.
If a quasi-probability method is used, it may be advisable to use next or last birthday, rather than most recent birthday, which had the lowest accuracy rate. The “most recent” language may not have been as easily understood by respondents and is a finding that may have practical application to within household selection for other modes.
Of the three nonprobability at-home methods, the YMOF method significantly overrepresented males. Cell phone samples tend to result in a higher proportion of males than females (Kennedy, McGeeney, and Keeter 2016), and this overrepresentation persisted when YMOF was combined with the cell phone sample. Based on the findings of this study, it seems prudent to avoid YMOF when conducting a dual-frame study.
This study is not without limitations. We assigned selection methods to sampled cases, and interviewers administered all treatment groups. The advantage of this design is that interviewer variance is distributed across methods, rather than within a method. The downside of this approach is that interviewers switched methods with each contact, even though we found no evidence in quality reviews that interviewers had difficulty adapting to this change.
Our method for obtaining cellular cases was another limitation. Rather than collecting cellular cases for this specific study, we randomly selected cases from a parallel study that used identical sampling and call design methodology but with a slightly different survey instrument. Our results may have been different if the cellular respondents had completed the same survey instrument as the landline respondents.
Although response rates for this study are in line with comparable surveys conducted by Gallup and other commercial firms (Keeter et al. 2017), response rates were considerably lower than what might be expected from studies with more rigorous call designs. It is unknown how a more extensive call design would change the results of this experiment.
It is conceivable that in the near future the majority, if not all, telephone surveys will be completed from the cellular frame, making within household selection irrelevant for telephone. However, the results of this study highlight designs that result in improved accuracy and cooperation. These comparisons have possible applications to the design of face-to-face and mail modes, which will continue to rely on within household selection.