I. Introduction
The Census of Agriculture (COA) is one of the most important data collections conducted by the National Agricultural Statistics Service (NASS). In part, this is due to the type and amount of information collected. The COA takes an estimated 50 minutes to respond on average and contains 36 sections of questions with multiple questions per section, ranging from the type of agricultural activity conducted to complex financial calculations of asset values and expenses, among others (a public link to the COA form is in the Appendix). In an effort to reduce respondent burden in the COA, researchers at the NASS conducted an experiment where respondents’ previously reported data (PRD) were prefilled into the answer spaces in the COA Content Test’s web mode.
The reasons to use PRD are well-established in the literature: using PRD can improve data quality, enhance data collection efficiency, and reduce objective measures of respondent burden, such as response times (Holmberg 2004; Jäckle 2006). Reducing respondent burden is a desirable feature of PRD. In survey settings where some respondents’ participation are frequently requested, a common grievance is that they have to answer many of the same questions survey-to-survey even though their circumstances have not changed (Hoogendoorn 2004). Some of the questions in the COA are also asked on other NASS surveys. Therefore, providing PRD on the COA is an attempt to assuage respondent frustration in this regard and reduce this aspect of burden.
Reporting for some items on the COA can also be complex (e.g., questions requiring respondents to perform mathematical calculations). Mathiowetz and McGonagle (2000) highlight that providing respondents with their previous answers can help with comprehension by anchoring the context to their previous response, which also helps the respondent form a judgment about the proper response that should be given in return. Therefore, the response task is made simpler by supporting the respondent’s memory with the cognitively easier task of recognition as opposed to recall (Mathiowetz and McGonagle 2000). Given these factors and that respondents can simply verify if the PRD is correct or update it if necessary, respondent burden may be reduced (Holmberg 2004).
Interestingly, however, in studies of the effects of PRD, it does not seem that much attention has been paid thus far to measuring or evaluating respondent perceptions of its impact on their total response experience. Studies have made great strides in utilizing PRD to increase data quality (Jäckle and Eckman 2020, for example) and decrease objective measures of burden, such as response times (Holmberg 2004; Jäckle 2008), and several studies have obtained insightful qualitative data on respondent perceptions of PRD with regard to burden perceptions for individual survey questions (e.g., Stanley and Safer 1997; Ridolfo and Edgar 2015). Measuring respondent perceptions of PRD on the total response experience, however, remains an important area of exploration.
Bradburn (1978) pioneered much of the theoretical groundwork on respondent perceptions of burden in surveys and their importance when considering the response experience. Hedlin et al. (2005) developed questions designed to ascertain perceived length of response times and the degree to which response was easy or difficult, among other perceptions (such as the respondents’ view that the survey data requested was important for society as a whole). Although measures have been designed to assess respondent burden generally in surveys, none have been designed (to our knowledge) to specifically assess respondents’ attitudes towards PRD in surveys, particularly whether its use is viewed as resulting in a less burdensome survey response experience.
This paper presents the results of a novel set of questions designed to assess respondent attitudes toward the use and presence of PRD on their experience. The analysis aims to answer several research questions: (1) Do respondents perceive that PRD made it easier to complete the survey? (2) Do respondents perceive having completed the survey faster with their PRD being present? (3) Do respondents have a positive reaction to their PRD being used in the survey? We also explore whether certain PRD metadata, such as how much PRD was used and how long ago it was reported, are associated with attitudes reported in the research questions above.
II. Methods
The data used in this paper come from an experiment in NASS’s 2020 Census of Agriculture Content Test, where respondents’ answers to previous NASS surveys were prefilled into the answer spaces in the Content Test web instrument. The population of interest for this experiment was all agricultural operations on NASS’s list frame that had one or more items of interest on the Content Test that could be prefilled with their answers to the same questions on previous NASS web surveys (i.e., their PRD) and was reported within three years of the start of the Content Test. A stratified random sample with a serpentine sort of n=9,000 operations was drawn from a population of 267,111. The sample frame was stratified by agricultural region, the amount of PRD a respondent had available for use, and the PRD’s recency. The sorting variables were predicted response propensity, agricultural operation type, and operation size (measured by total annual value of sales). After the sample was selected, respondents were placed into one of three experimental groups using stratified random assignment. The experiment consisted of two factors as seen in Table 1: A) presence of PRD in the respondent’s web mode, and B) respondent receiving the standard contact mailings or experimental contact mailings emphasizing the use of PRD. This design was created to see what impact PRD has on response rates compared to a control group (Figure A1 in the Appendix shows an example of one of the PRD-emphasized survey invites).
A few things to note: even though their survey invites differed, all respondents with PRD in their web mode were alerted to PRD’s use in the web survey’s introductory page (see Figure A2 in the Appendix). On the first page that PRD appeared, respondents in A2B1 and A2B2 also saw a message at the top of the screen highlighting the PRD and instructing respondents to change the prefilled answer box if their current answer was different, an example of which appears in Figure 1 below. Figure A3 in the Appendix shows an example of what a respondent without PRD (and all those in group A1B1) would see, which was identical to the example in Figure 1 but without the message at the top of the screen and with blank answer boxes.
To explore whether certain PRD metadata are associated with attitudes reported in the research questions above, two variables were created. The first, “PRD Amount,” was a binary variable categorizing the number of survey items that a respondent could have prefilled, split between “More PRD” (eight or more items) and “Less PRD” (one to seven). These groups were selected because the median number of items that could be prefilled with respondents’ PRD on the sample frame was seven. The maximum number of items that could be prefilled with PRD was seventeen.
The second, “PRD Recency,” was defined as the number of days that had elapsed between the start of data collection for the Content Test and the date on which the PRD was reported by the respondent. This was a binary variable categorizing operations as having “Recent PRD” (reported within 364 days) or “Older PRD” (365 days or older). This delineation was chosen because of sample size considerations, as the large majority in the sample universe had PRD that was older and to make sure the selected sample included a large number of those with more recent PRD. Research on respondent recall might suggest the “Recent PRD” be defined on a shorter timeframe given what is known about the role of memory in survey responding (see Tourangeau, Rips, and Rasinski 2000). However, we did not have the sample sizes at smaller intervals to have sufficient power (approximately 80% at to detect small differences for PRD recency comparisons until the threshold was set to 364 days or less.
Nonetheless, the recency of the PRD might be an important component that influences respondent perceptions of burden, and an attempt to evaluate this dimension should still be attempted given this constraint. The usefulness of PRD may depend on respondents’ ability to recognize the data and on the data’s stability over time. The older the PRD, perhaps the less likely respondents will recognize the PRD, and the less likely the previous answer will reflect the current value at the time of the Content Test. If the PRD prefilled in the survey is unrecognizable or is an inaccurate reflection of the current value, then the perceived burden of the response task may increase rather than decrease (Ridolfo and Edgar 2015). The number of items prefilled with PRD may also impact respondent perceptions of PRD’s presence in the survey. The more items that are prefilled with PRD that respondents recognize and view as accurate, perhaps the more likely they may be to perceive that they completed the survey faster and easier, or the perception of a less burdensome experience overall.
To gauge web respondents’ perceptions of PRD use, we asked them to rate their level of agreement with seven statements. These statements appeared on the screen after respondents in experimental groups A2B1 and A2B2 had completed the final question in the Content Test. The lead-in to this set of statements read: “Before submitting your data, please provide your opinions about pre-filled information used in some answer cells in this survey. If you do not wish to provide your opinions, please scroll to the bottom and press ‘next.’” The design is shown in Table 2:
Evidence available in survey methodology literature suggests attitudinal questions with item-specific scales may produce better data quality than those with agree-disagree rating scales, and that there are limits to what can be inferred from the respondents’ answers (Dykema et al. 2022; Fowler 1995). For example, it is hard to interpret what a response rating of ‘3’ meant to the respondent, and similarly whether a “Don’t know” response represents that they do not have an attitude for that statement or perhaps that they do not understand the question. Furthermore, the answers to subjective questions are always relative to the way the question is worded (Fowler 1995). Therefore, different wording could result in different distributions of responses to the subjective dimensions we attempted to measure here. Although some limitations exist, we believe the results may still be valuable to the survey research field, particularly among those utilizing PRD as part of their data collection methods.
For this paper, the main research questions stated in the Introduction are answered by evaluating the responses to PRD agreement statements AS4, AS5, and AS7 in Table 2. (The results to the other agreement statements are presented in the Appendix, but not further discussed here.) The results are presented by the metadata of PRD Amount and PRD Recency, and tests of independence will be shown to highlight associations between these characteristics and respondent answers to the agreement statements. We use PROC SURVEYFREQ in SAS 9.4 that incorporates sample design and nonresponse information in the production of frequency estimates and their Taylor-series linearized standard errors. In addition, we use the Rao-Scott (1984) design-adjusted test of independence, since failure to account for the complex sample design features can lead to the incorrect conclusion that associations exist (Heering, West, and Berglund 2017).
III. Results
In total, 8,866 operations were mailed survey invites. (A total of 134 records were removed from the original sample of 9,000 for logistical reasons.) In addition, 36 records in the sample were labeled as undeliverable as addressed (UAA). Therefore, the denominators for the response rates (after removals and UAAs) are 2,924 (Control/A1B1), 2,939 (A2B1), and 2,967 (A2B2). Using AAPOR RR1 (The American Association for Public Opinion Research 2016), the response rate for each group was 70.11% (A1B1), 71.49% (A2B1), and 70.58% (A2B2). The overall response rates were not significantly different from each other at α = 0.05 using the Rao-Scott (1984) design-adjusted chi-square test.
Responses were collected in three modes: mail/paper questionnaire, web, and computer-assisted telephone interview (CATI). Each version of the invites encouraged participation via the web, but paper questionnaires that could be filled out and mailed back accompanied several of the mailings, and CATI was used in the last month of data collection for nonresponse follow-up. Most of the responses were completed via the web: 78.24% for the A1B1 group, 77.72% for A2B1, and 82.28% for A2B2. A chi-square test yielded a test-statistic of 25.62 and a p-value of <.0001, suggesting web response is significantly associated with treatment group. This suggests that highlighting the use of PRD in the mailed invites increased the web response rate but did not impact the overall response rate.
The remaining analysis is focused on the 3,356 respondents that completed the survey via the web mode from the two PRD groups (1,633 from A2B1 + 1,723 from A2B2), their perceptions of PRD being used in their online surveys, and whether their perceptions are significantly associated with the amount and recency of their PRD.
Table 3 shows the distribution of responses to each of the agreement statements (AS) of interest. Table 4 provides the Rao-Scott (1984) design-adjusted tests of independence between PRD perceptions and the PRD Amount and PRD Recency variables, with the null hypothesis being that there is no association between these categorical variables.
Perhaps the clearest point of comparison from Table 3 is the contrast between those indicating “Strongly Disagree” and “Strongly Agree” for each AS. For AS4, the percentage strongly disagreeing was 3.9% (SE of % = 0.7) and the percentage strongly agreeing was 50.2% (1.7). We see similar dichotomies between those strongly disagreeing and those strongly agreeing for AS5 and AS7, with far more respondents strongly agreeing than strongly disagreeing. Not many respondents selected ‘2’ (what may be reasonably assumed to be on the disagree end of the ordinal scale), while a larger portion selected ‘4’ (what may be assumed to be on the agree end of the ordinal scale). Nonignorable portions of respondents selected ‘3’ or 'Don’t know" for each AS, and as stated previously, it is difficult to gain much insight from these responses. With regard to the experimental design, responses to all the agreement statements were not significantly associated with PRD experiment group (A2B1 vs. A2B2). The distributions and resulting chi-squares by the two PRD groups are shown in Tables A2-A8 in the Appendix.
Our next approach was to assess whether the amount and recency of PRD in respondents’ surveys may impact their burden perceptions and attitudes toward PRD being prefilled. The Amount and Recency of PRD (ARPRD) is a categorical variable describing the metadata of respondents’ PRD as being “More and Recent,” “More and Older,” “Less and Older,” and “Less and Recent.” To address whether ARPRD was associated with perceptions and attitudes toward PRD in the survey, we performed seven (one for each AS) Rao-Scott chi-square tests of the null hypothesis: Table 4 shows the outcomes for R-S tests of independence for AS4, AS5, and AS7. While we show the results to these hypothesis tests here, the full frequency distributions of the responses for each AS can be found in Tables A9-A15 in the Appendix.
ASk rating is independent of ARPRD.For AS4 (“The pre-filled information made it easier for me to complete the survey”), the Rao-Scott test of independence yielded a chi-square value of 25.88 and a p-value = 0.0394. Therefore, the null hypothesis of no association between ARPRD and agreement with this AS is rejected, meaning the metadata of respondents’ PRD (amount + recency) is associated with respondents’ ratings to the statement that the prefilled information made it easier to complete the survey. For AS5 (“The pre-filled information helped me finish the survey faster”), the Rao-Scott chi-square was 19.65 with a p-value of 0.1859, so the null hypothesis of no association was not rejected. For AS7 (“Overall, I have a positive reaction to pre-filled information being used in the survey”), the Rao-Scott chi-square was 24.59 with a p-value of 0.3705, indicating the null hypothesis was not rejected. For AS5 and AS7, there does not seem to be much evidence that the metadata of the PRD significantly impacts respondents’ perceptions of faster completion and favorable regard for its use in the survey.
IV. Discussion
As stated in the Introduction, research on PRD has largely focused on objective impacts to data quality and respondent burden (such as the measurement of response times), leaving the viewpoint of the respondent less understood. Furthermore, in the PRD literature, there is a lack of information available regarding PRD’s impact on response rates. In this experiment, the impact of PRD on response rates yielded some interesting results. While we did not find an overall difference in response rates, we did find significantly more web responses when respondents were told their PRD would be used in the web survey. This may be an important finding that could have practical or beneficial implications, and thus should be studied further. This paper, however, mainly sought to answer three questions regarding respondents’ perceptions of PRD: 1) Do they view PRD as making survey completion easier? 2) Do they view PRD as making survey completion faster? And 3) Overall, do they have a positive reaction to its use?
Based on the results in Table 3, the answers to all three research questions most often appear to be “yes” when considering that the majority of respondents selected “4” or “Strongly Agree” for each AS. On two dimensions of respondent burden—level of effort for survey completion and survey completion time— these results seem to suggest that PRD resulted in the perception of easier and faster experiences for most respondents. However, nonignorable portions of respondents did not select “4” or “Strongly Agree” for these statements. Due to the limitations of the question design mentioned previously, the ability to decipher why these respondents are not agreeing with these statements or what their nonagreement means remains to be explored. It could mean that the PRD made no difference to them on these dimensions or perhaps that it made survey completion more difficult or longer. As a result of this lack of insight, it would be beneficial to redesign these questions to be item-specific in nature and to ask follow-up questions that probe deeper into the ‘why’ element of their attitude formation toward PRD’s use.
One area to start could be designing questions that relate to the metadata characteristics of respondents PRD, given that we found that respondents’ perceptions of the ease of survey completion were impacted by the recency and amount of their PRD. More research on this topic with other types of PRD metadata characteristics could be warranted. For example, the stability of the actual value of a survey item over time may impact respondent perceptions of burden when PRD is used. Some survey items will change more quickly (e.g., livestock inventory), while some will experience change more slowly over time (e.g., number of acres owned). So, a measure that indicates an item’s relative stability over time could provide greater insight into the recency of PRD element studied here and respondents’ attitude formation of PRD’s use.
Other areas of future research could include analysis of this topic within the framework of the business response process (Edwards and Cantor 2004; Willimack and Nichols 2001) and how PRD metadata (e.g., amount and recency, among others) impact certain stages (e.g., gathering relevant information) that form their response experience. Future research could also investigate the impact of PRD on the business’s decision to respond (Bavdaz 2010), and how different ways of communicating to respondents that their PRD are present affects response. Although the experimental design in this study did not yield different overall response rates between groups, we did see a significant difference in the proportion of respondents completing their survey via the web when their contact mailings emphasized PRD’s use in the online version of their survey, which has implications for survey cost reduction and increased data collection efficiency. Additional research on wording and mode of contact for communicating the presence of PRD and the benefit to the respondent is recommended, along with an evaluation of its impact on response rates.
Finally, this research hopes to inspire future studies of respondent perceptions of PRD in surveys where this feature is adopted. Important works have been produced highlighting PRD’s benefit to data quality, but fewer have sought to measure and evaluate the respondent’s perception of their total experience on the whole when PRD is used. The survey research field can only benefit from more studies of respondent perceptions of this data collection method.