1. Introduction
Online surveys are increasingly completed on smartphones (Tourangeau et al. 2017). However, smaller screen sizes and touchscreen functionalities on mobile phones decrease ease of responding and appear to have a negative effect on response quality compared to regular desktop PCs. For example, several researchers found that responses to open-ended questions are shorter for mobile surveys compared to regular desktop PC surveys, in terms of words and characters (Mavletova 2013; Revilla and Ochoa 2016). Furthermore, mobile phones are typically used for short messaging and can be used at any time and place. Antoun, Couper and Conrad (2017) showed that respondents multitask more frequently while completing a survey on mobile phones compared to regular desktop PCs. Sendelbah et al. (2016) showed that respondents who engaged in multitasking produced higher item-nonresponse than those who did not multitask. Despite differences in size and functionality, cognitive response processing seems to be similar between respondents on different devices (Peytchev and Hill 2010).
There are several ways to structure online surveys to create an optimal experience for any screen size. First, surveys can be designed specifically for small screens. While most survey software uses a responsive design resembling a mobile-friendly design, most surveys are still not designed “mobile-first.” Second, surveys can be designed to be more conversational or gamified. For example, communicating through applications (apps) such as WhatsApp and Snapchat closely resembles natural turn-by-turn conversations between individuals. Service chatbots mimic that communication style as well, turning a survey into a WhatsApp type of survey. We conjecture that these opportunities might improve the survey experience and lower respondent burden, and hence generate data of higher quality.
An innovative way to administer questions is via a research messenger, a WhatsApp-like survey software that communicates as one does via WhatsApp (see http://surveyfriendly.com/demos/chatbot-store/ for an example of a survey research messenger design). Since we increasingly live in a culture of texting, using a research messenger to administer questions might be a way to increase respondent motivation compared to a responsive design. In this study, we compare a research messenger layout to a responsive design in order to investigate data quality (measured by dropout, time of completion, use of “back” button, number of nonsubstantive answers, primacy effects, and length of open-ended answers) and respondent satisfaction.
2. Methods
The experiment has been carried out using panel members from Amazon Mechanical Turk in the United States. In this self-selected sample, we administered four survey batches between June and August 2018. Respondents could self-select into a particular device. For batch 3 and 4, we selected panel members who indicated that their main device for accessing the Internet is a mobile phone. We use all four batches in the analyses. The estimated time to complete the survey was 5-10 minutes. Respondents were randomly assigned to the research messenger survey or the responsive design (control condition). In addition, we randomly varied the type of questions (long answer scale, short answer scale, open-ended) to investigate if the type of answer scale is related to the type of survey and respondents’ evaluation of the survey. We used four blocks of questions regarding politics, news, sports, and health, which were ordered randomly. The appendix shows screenshots of the survey in the research messenger and control condition, taken on a smartphone or regular desktop PC. Apart from the difference in layout between the research messenger and control condition, there was also a difference in how respondents navigated through the open questions in the surveys. For open questions, “next” buttons were used in the control condition, while respondents in the research messenger condition had to press “enter.” Autoforwarding was used for closed questions in both survey types. Overall, 1728 respondents completed the survey. We investigate dropout, time of completion, use of “back” button, number of nonsubstantive answers, primacy effects, length of open-ended answers and respondent satisfaction with the survey.
3. Results
As can be seen in Table 1, most respondents completed the survey on a regular desktop PC (60.8% in the research messenger condition, 64.3% in the responsive design condition). Despite our selection of panel members who indicated that their main device for accessing the Internet is a mobile phone, for two out of four batches (about 65% of our sample), the percentage of respondents that used a mobile phone was 32.8% for the research messenger condition and 29.9% for the responsive design condition. A small group of respondents completed the survey on a tablet (6.3% in the research messenger condition and 5.8% in the responsive design condition). For some respondents, the device used to access the survey could not be derived (.3% in the research messenger condition and 1.4% in the responsive design condition). There was no significant difference in the type of device used to complete the survey between the research messenger design or the responsive design.
3.1 Dropout
Table 2 shows the dropout rate for both types of surveys. The dropout rate for the research messenger was 8.7% and 7.2% for the responsive design survey. The difference was not statistically significant.
3.2 Time of completion
Table 3 shows the mean time of completion for both survey types. The responsive design survey was on average 56 seconds shorter than the research messenger survey. This difference was significant (p = .005). There was no interaction effect between survey condition and device. The total time of completion was 760 seconds or about 13 minutes.
3.3 Number of back actions
In both survey types, respondents could go back in order to change an answer to the previous question. The mean number of back actions in the research messenger design was 1.54 compared to 2.62 in the responsive design, as can be seen in Table 4. This difference was significant at an alpha level of .001. Since there is no function to go back in a conversation in a ‘real’ WhatsApp setting, respondents might not have been aware of this function. A Tukey post-hoc test revealed three homogeneous subsets. Respondents who completed the survey in the research messenger on a PC/tablet (M=1.49, n=583) and respondents who completed the survey in the research messenger on a mobile phone (M=1.64, n=285) had fewer back actions than respondents who completed the survey in the responsive design on a PC/tablet (M=2.48, n=592), who in turn had fewer back actions than respondents who completed the survey in the responsive design on a mobile phone (M=2.96, n=253).
3.4 Nonsubstantive answers
For nonsubstantive answers we looked at 10 items in the module on politics that contained the nonsubstantive answer options “don’t know” and “can’t choose”. Six out of 10 items were measured on a five-point Likert scale with a “don’t know” option. Four items were measured on a four-point scale with a “can’t choose” option. We investigated the number of respondents that gave at least one nonsubstantive answer in the 10 items. As can be seen in Table 5, there were no significant differences in the mean number of nonsubstantive answer options between the two surveys. In the research messenger format, 9.1% had selected at least one nonsubstantive answer in the list of 10 compared to 7.6% in the responsive design survey.
3.5 Primacy effects
We define primacy effects as selecting (one of the) first answer categories in a list. We randomly selected three questions from each module. From the sport module, we asked respondents what sports they watch on TV. We took any of the first three answers chosen as indicating primacy effects. For the two other variables, we took the first answer only as an indicator of primacy effects (due to the shorter list of answer options). In all three questions, we did not find any significant differences between the two designs, indicating that there was no difference in primacy between the two surveys, as can be seen in Table 6.
3.6 Length of open-ended answers
The survey included seven open-ended questions. We computed a variable indicating the total length of these open answers by adding the number of characters of these seven questions as a sum score. Table 7 demonstrates that the average open-answer length in the research messenger survey was 117 characters shorter than the responsive design survey.
3.7 Evaluation of the survey
At the end of the survey, we asked respondents on a Likert five-point scale if they thought the survey was difficult (M=1.75, std. dev.=1.10), enjoyable (M=3.88, std. dev.=.99), and interesting (M=4.04, std. dev.=.97). We used a linear regression analysis to predict the evaluation score from the type of survey, device used, type of answer scale that was used, gender, age, education, and a dummy for privacy (how concerned people feel about their own personal privacy on the Internet). We also tested an ordered logistic regression model to account for the ordinal measurement level of the scale. We found similar results compared to the linear regression and present results of the linear regression for the less complicated nature of the output. Table 8 shows that there are no significant differences in the type of survey.
Whether respondents received the research messenger survey or the responsive design survey did not affect their evaluation of the survey. Respondents who used a mobile phone enjoyed the survey more than their regular desktop PC counterparts. An open or closed format did not have an effect on the evaluation questions. Female respondents evaluated the survey as less difficult compared to men. Young individuals (30 and younger) evaluated the survey as less enjoyable and less interesting. College graduates also thought the survey to be less enjoyable. There was no effect of respondent’s level of privacy concerns on the evaluation of the survey.
The research messenger had no significant effect on the evaluation of the survey; however, manual coding of the open evaluation question showed 50 positive comments on the user interface of the survey and 7 negative comments. All these comments came from respondents in the research messenger condition; no comments were given about the user interface in the responsive design condition.
Conclusion
In this paper we used a research messenger design that mimics a messenger app type of communication, and compared it to a responsive design survey. We investigated whether the responses were similar in data quality and if respondents were more positive in evaluation questions. Our results show that there were no differences in relation to primacy effects, number of nonsubstantive answer options such as “don’t know” and “can’t choose,” nor dropout rate. The length of answers to open-ended questions was shorter for the research messenger survey compared to the responsive design survey. The time of completion was longer in the research messenger survey: completion time was about a minute longer. The evaluation questions at the end of the survey showed no significant differences between the research messenger survey and the responsive design survey, although comments on the open questions at the end of the survey showed many positive remarks about the style of the research messenger survey. We conclude that a messenger type of survey yields similar results compared to a responsive design survey, but it takes longer to complete and respondents provide less text in an open-ended question format. Future research should investigate the use of messenger-type surveys in order to shed light on which types of surveys can profit from this type of survey; for example, in surveys in which it is warranted to have a more conversational type of communication with survey respondents, e.g., customer-relation surveys. In addition, future research should focus on finding the right balance between the amount of interaction and data quality in research messenger surveys, since more interaction results in longer completion times. Research should shed more light on how research messenger surveys affect respondent burden in a positive (higher motivation) or negative (longer time of completion) manner.