Introduction and Research Questions
Cognitive interviewing is a qualitative method that aims to reveal information from respondents about the cognitive processes they use when answering survey questions and to identify problems with questions (Willis 2005). Conventionally, cognitive interviewing involves conducting face-to-face (f2f) interviews with small sample sizes of five to 30 respondents (Willis 2005). The semi-structured, in-depth interviews are conducted by specially trained cognitive interviewers on the basis of an interview protocol which contains the questions to be tested in the cognitive interview and the techniques to be adopted, in particular think-aloud and follow-up questions (probing). The technique of probing is used to elicit information about how respondents interpret questions or define specific terms and how respondents arrive at their answers. In addition to the scripted probing questions included in the interview protocol, emergent probes can be asked to follow up on respondents’ comments during the interview. Probing questions are administered either immediately after the subject has answered the survey question (concurrent) or at the end of the cognitive interview (retrospective; Willis 2005).
An alternative to conducting f2f cognitive interviews in the lab is to transfer the probing procedure into an online questionnaire, a method called online or web probing. Here, for the questions to be tested, open and closed probing questions are developed and then implemented into an online questionnaire. In the concurrent probing format, respondents first answer a survey question and after clicking on the next button receive one or more probes on the next survey page. As web probing does not involve a cognitive interviewer, respondents have to answer the probing questions in a self-administered form. The method of web probing has recently been recognized as a promising tool for evaluating survey questions, both during the post-survey assessment of item validity (Behr et al. 2012, 2013) and as a pretesting method to collect data about response strategies (Edgar 2012).
In comparison to cognitive interviewing, web probing has several benefits: First, it allows for recruiting respondents in a quicker and more cost-effective way and thereby a realization of larger sample sizes. This, in turn, allows researchers to quantify their pretest findings (Behr et al. 2012). Second, recruiting participants via the Internet enhances the radius of the regional accessibility. Furthermore, the self-administered mode rules out any interviewer effects and thus increases the reliability and comparability of the results (Conrad and Blair 2009). However, due to the absence of the interviewer, no one can probe for more information, follow up on incomplete answers or provide clarification of the tasks. Probing is restricted to the scripted questions previously programmed and implemented into the Web survey. Moreover, no one can motivate the respondents during completion of the Web survey to answer the (open) probing questions thoughtfully and elaborately. This can result in more satisficing response behavior of the respondents (Krosnick 1991) who then do not provide the same depth of information as participants in a f2f cognitive interview (Meitinger and Behr 2016). Nevertheless, Behr and colleagues have shown that web respondents give meaningful answers to open-ended probing questions (Behr et al. 2012), and Meitinger and Behr (2016) found that there is an extensive overlap between the results of both methods with respect to identified error types and uncovered themes although cognitive interview respondents provided, on average, more indications of errors than web probing respondents.
In the present study, we replicate the earlier research of Meitinger and Behr (2016) by examining whether web probing produces similar results to f2f cognitive interviewing with regard to the problems detected. In addition, we extend Meitinger and Behr’s (2016) research by examining whether both methods produce similar results concerning the item revisions suggested.
Methods
To examine these research questions, we embedded four items from the International Social Survey Programme (ISSP) 2013 and 2014 into a larger online questionnaire fielded in May 2014. The questionnaire included several methodological studies (of which only the present study applied web probing) and respondents required approximately 25 minutes to complete it. The four items examined in the present study had been tested previously via f2f cognitive interviewing in the GESIS pretest lab in August/September 2013 so that results on the performance of these four items were already available.[1]
The Web survey respondents were drawn from a respondent pool that was assembled during the set-up of the GESIS Online Panel Pilot, however, which is not representative of the German population. Of the 897 respondents who were invited, 534 participated in the survey and 508 completed it, resulting in a response rate of 59.3 percent (American Association for Public Opinion Research RR1). The 20 respondents participating in the f2f cognitive interviews were recruited from a respondent pool maintained by the GESIS pretest lab using quotas for age, education, and gender. The participants received a compensation of 5€ for completing the 25-minute web questionnaire and a compensation of 30€ for taking part in the 60 min f2f cognitive interviews, respectively. Table 1 shows some demographic characteristics of both respondent groups. While the composition of both respondent groups was quite similar with regard to sex and age, they differed somewhat with regard to educational attainment: on average, participants in the web survey had received a higher education than participants in the f2f cognitive interview.
The four items to be tested were taken from the modules National Identity and Citizenship of the German questionnaires of the ISSP 2013 and 2014. (See Table 2 in the Results section for the English wording of these items and the Appendix for the original German version.) The items were evaluated by the same probing techniques in both methods, that is by comprehension probes (“What does the term X mean to you?”), elaborative probes (“Could you please explain your answer a little further?”), and specific probes (“What kinds of elections did you think of when answering this question?”). However, in the f2f setting, the interviewers were also encouraged to apply additional probing questions if they deemed it necessary and respondents often commented spontaneously on the items prior to the administration of any probe. Hence, the verbal data obtained by the f2f interviews are based on more information sources than the data obtained by the Web survey. In both groups, the probing questions were administered immediately after respondents answered the target questions (concurrent probing).
Before analyzing the respondents’ answers to the probing questions, the f2f interview data were transcribed from the video recordings of the interviews. Afterward, the data in both groups were analyzed by two researchers, working independently and each one reviewing both data sets, as follows: first, they openly coded respondents’ answers to the probes with regard to the kinds of information they provided. Second, they organized these codes into larger categories and specified the core themes and types of problems that emerged from the analysis. Finally, they developed draft revisions for the questions. The researchers then met to discuss the findings, to resolve minor discrepancies in the codings, and to make a final decision about the recommendations for revision.
Results
The results of our analyses are displayed in Table 2. All in all, the f2f cognitive interviews and the web probing method identified very similar question problems and led to identical suggestions for revising the items. Differences in the types of problems detected were only found for item 2 and item 4. In item 2, one respondent in the f2f setting said that he would rather answer whether he is in favor or not of the issue in question (i.e., whether long-term residents of a country, who are not citizens, have the right to vote in national elections) than rating how important he finds the issue. This problem was not found in the web probing data. In item 4, f2f cognitive interviewing revealed that some respondents misinterpreted the term “citizen of the world” as referring to people living in a multicultural society (e.g., ID 12: “Nowadays, people from all over the world are living here, and we have got so used to it that you could indeed say one feels rather connected to the whole world.”). Again, this interpretation was not found in the web probing data. By contrast, web probing revealed that the term “citizen of the world” was unfamiliar to some respondents who, as a consequence, were not able to answer the question meaningfully (e.g., ID 178: “What is a ‘citizen of the world’ supposed to be?”). Despite these minor differences, both methods resulted in the same recommendations for revising item 2 and item 4, namely in replacing the term “national elections” with “nationwide elections” (item 2) and deleting the unclear term “citizen of the world” (item 4).
With regard to the prevalence of the problems detected, we found some substantial differences between the two methods. For example, while 30 percent of the f2f cognitive interview respondents said that the term “civil disobedience” in item 1 was unfamiliar to them, only 5 percent of the web respondents did so. This might be due to the fact that participants in the f2f setting often spontaneously commented on an item before answering one of the probing questions. Hence, some of these participants first said that they were unsure about the meaning of the term and afterward (in response to the probing question) explained what they thought the term most likely referred to. In the web probing setting, respondents had no means to comment on an item spontaneously and were thus more focused on answering the probing questions. Again, however, the differences in the prevalence of problems had no effects on the suggested item revisions. Irrespective of their prevalence, the same problems were either deemed significant or insignificant for rendering item revisions necessary in both methods.
Finally, we examined whether the problems detected had any effects on measurement quality, in particular, whether respondents misinterpreting an item or having any other difficulty answering it systematically erred in one direction when responding to the item. This response behavior was found in three of the four items (I1, I2, I4). In item 1, respondents who associated the term “civil disobedience” with violent behavior were more likely to rate the item as not important than respondents who (correctly) interpreted the term as referring to nonviolent behavior. In item 2, we found that respondents who were primarily thinking of local elections when answering the item valued long-term residents’ right to vote more important than if they thought of national elections. And finally, in item 4, respondents misinterpreting the term “citizen of the world” were more likely to agree that they “feel like a citizen of the world” than to disagree with this statement. Hence, the proportion of respondents who really hold cosmopolitan views might be overestimated when using this item. In sum, the problems detected by both methods were indeed severe enough for rendering revisions necessary.
As a by-product of the analyses presented above, we additionally found some substantial differences between both methods regarding item nonresponse and the proportion of meaningful and interpretable answers respondents provided to the probes. While nearly all f2f respondents provided interpretable answers to the probing questions asked, many web respondents did not answer the probing questions meaningfully, but simply skipped these questions, provided unintelligible or very short answers or copied definitions from the Web. On average, this behavior occurred in 14 percent of the cases.
Discussion
In this study, we examined whether traditional f2f cognitive interviewing and web probing yield similar results in pretesting survey questions. Our findings indicate that both methods detect very similar problems and lead to the same suggestions for item revisions. Hence, web probing appears to be a promising method for pretesting questionnaires, and our findings suggest that it may be used as an alternative to standard cognitive interviewing.
On the positive side, web probing additionally allows researchers to quantify their pretest findings and to estimate the measurement error associated with the problems detected if large sample sizes are used. In addition, almost no staff resources are needed for recruiting participants and conducting interviews, and incentives are generally lower in online surveys than in f2f interviews. On the negative side, we found that a considerable amount of the web respondents did not provide meaningful answers to the probing questions, and thus, it seems important that practitioners recruit larger sample sizes than necessary when conducting a web probing pretest to obtain a suitable amount of interpretable responses.
There are several limitations to this study calling for future research. First, it is important to note that we applied only one of several existing cognitive interviewing techniques (i.e., verbal probing) in both pretesting methods in this study. Thus, our findings are restricted to this particular technique and do not generalize to other techniques commonly used in f2f cognitive interviews, such as thinking aloud, for example. Given that it is technically possible to do an audio and screen recording of the web respondents’ answering process, future studies should look into whether web respondents can be motivated to perform think-aloud tasks while answering the online questionnaire, and if so, whether the web and f2f settings again yield similar pretesting results. Second, it seems worthwhile to examine whether additional behavioral data, such as keystrokes, response times, and mouse movements, which can be collected easily in Web surveys, could provide further insights on response difficulties. Finally, our study focused exclusively on attitudinal questions and did not examine the performance of both methods in testing factual and behavioral questions. Hence, future research should ideally include a broader set of question types.
Given that the use of web probing as a pretesting method is still in its infancy, there are several other issues worth to be addressed in future studies. For example, future research should investigate the potential merit of implementing nonresponse probes into the online questionnaires, that is, motivating probes (e.g., “Please answer this question. It is of great importance to this study.”) automatically triggered by undesired respondent behavior (e.g., providing very short or no answers to probing questions). Moreover, it should be examined whether web respondents can be motivated to answer as many probing questions as f2f cognitive interview respondents, that is, to fill in a questionnaire for up to 60 minutes. And finally, future research should study the minimum sample size necessary to ensure a sufficiently high likelihood that a problem is being detected in a web probing pretest.
Acknowledgment
The authors wish to thank Hannah Soiné for her support in conducting this study.
Appendix
Original German version and English translations of items and suggested revisions
I1: Wie wichtig ist es für Sie, dass Bürger die Möglichkeit des zivilen Ungehorsams haben, um ihre deutliche Ablehnung gegenüber Regierungsentscheidungen zum Ausdruck zu bringen?
[How important is it that citizens may engage in acts of civil disobedience when they strictly oppose government actions?]
1 – Überhaupt nicht wichtig, 2, 3, 4, 5, 6, 7 – Sehr wichtig, Kann ich nicht sagen.
[1 – Not at all important, 2, 3, 4, 5, 6, 7 – Very important, Don’t know.]
Revision I1: Wie wichtig ist es für Sie, dass Bürger die Möglichkeit des gewaltlosen Protests haben, um ihre deutliche Ablehnung gegenüber Regierungsentscheidungen zum Ausdruck zu bringen?
[How important is it that citizens may engage in acts of nonviolent protest when they strictly oppose government actions?]
I2: Wie wichtig ist es für Sie, dass Menschen, die schon lange in einem Land leben, aber dort nicht eingebürgert sind, das Recht haben, bei nationalen Wahlen abzustimmen?
[How important is it that long-term residents of a country, who are not citizens, have the right to vote in that country‘s national elections?]
1 – Überhaupt nicht wichtig, 2, 3, 4, 5, 6, 7 – Sehr wichtig, Kann ich nicht sagen.
[1 – Not at all important, 2, 3, 4, 5, 6, 7 – Very important, Don’t know.]
Revision I2: Wie wichtig ist es für Sie, dass Menschen, die schon lange in einem Land leben, aber dort nicht eingebürgert sind, das Recht haben, bei landesweiten Wahlen abzustimmen?
[How important is it that long-term residents of a country, who are not citizens, have the right to vote in that country’s nationwide elections?]
I3: Inwieweit stimmen Sie den folgenden Aussagen zu oder nicht zu? Die Welt wäre besser, wenn die Deutschen zugeben würden, dass in Deutschland nicht alles zum Besten steht.
[How much do you agree or disagree with the following statement: The world would be a better place if Germans acknowledged Germany’s shortcomings.]
Stimme voll und ganz zu, Stimme zu, Weder noch, Stimme nicht zu, Stimme überhaupt nicht zu, Kann ich nicht sagen.
[Agree strongly, Agree, Neither agree nor disagree, Disagree, Disagree strongly, Don’t know.]
Revision I3: Inwieweit stimmen Sie den folgenden Aussagen zu oder nicht zu? Die Welt wäre eine bessere, wenn Deutschland gegenüber anderen Ländern einräumen würde, dass hierzulande auch nicht alles zum Besten steht.
[How much do you agree or disagree with the following statement: The world would be a better one if Germany admitted to other countries that over here there are shortcomings too.]
I4: Inwieweit stimmen Sie den folgenden Aussagen zu oder nicht zu? Ich fühle mich eher als Weltbürger und somit verbunden mit der Welt insgesamt und weniger als Bürger eines bestimmten Landes.
[How much do you agree or disagree with the following statement: I feel more like a citizen of the world, and thus connected to the world as a whole, and less as a citizen of a particular country.]
Stimme voll und ganz zu, Stimme zu, Weder noch, Stimme nicht zu, Stimme überhaupt nicht zu, Kann ich nicht sagen.
[Agree strongly, Agree somewhat, Neither agree nor disagree, Disagree somewhat, Disagree strongly, Don’t know.]
Revision I4: Inwieweit stimmen Sie den folgenden Aussagen zu oder nicht zu? Ich fühle mich eher mit der Welt insgesamt verbunden als mit einem bestimmten Land.
[How much do you agree or disagree with the following statement: I feel more connected to the world as a whole than to a particular country.]
The items in the online questionnaire were asked as part of an experiment that varied the number of probing questions asked (ranging from 4 to 7 probing questions), the number of nonresponse probes asked (also ranging from 4 to 7), and the number of probing questions presented per page (ranging from 1 to 2 questions per page). The results of this experiment will be presented elsewhere. In this paper, we restrict ourselves to the qualitative analysis of the respondents’ answers to the probing questions and the comparison of these results to the findings of the f2f cognitive interviews. Initial analyses comparing the results from the three experimental groups revealed no differences relevant to our research questions, so we combined data from the three sources in the analyses.