Background
Historically, survey data collection in sub-Saharan Africa (SSA) has taken place face-to-face through household and location-based sampling. However, cell phone ownership in SSA has grown rapidly—32% of the population were mobile phone subscribers in 2012, increasing to 43% in 2022 (GSMA 2019, 2023). Cell phone surveys grew simultaneously to cell phone ownership (Gibson et al. 2017) and the first published account of cell phone surveys in Sub-Saharan Africa was in 2011 (Dillon 2011). This is in contrast to the United States, where phone surveys began in the 1970s with landline phones (Lavrakas et al. 2017).
Phone surveys in SSA began by enrolling face-to-face survey participants who consented to be contacted by phone (Croke et al. 2012; Demombynes, Gubbins, and Romeo 2013; Hoogeveen et al. 2014; Etang-Ndip, Hoogeveen, and Lendorfer 2015; Dabalen et al. 2016). The first documented use of random digit dialing in SSA was 2013, using computer-assisted telephone interviews (Larmarange et al. 2016) followed by interactive voice response in 2015 (Leo et al. 2015). Methodological studies mainly focused on errors of representation and found that in SSA, women, non-English speakers, young and rural residents were systematically excluded from phone surveys, (Lau et al. 2019; Pariyo et al. 2019; Greenleaf et al. 2020; Brubaker, Kilic, and Wollburg 2021) likely due to lower phone ownership (Okano et al. 2022). Errors of representation seem to be the greatest drawback to phone surveys as many have had challenges creating population-representative estimates despite sampling and weighting efforts (Lau et al. 2019; Greenleaf et al. 2020, 2023). In spite of these limitations, the use of cell phone surveys in SSA accelerated, in part due to the limitations of in-person data collection presented by the COVID-19 pandemic. Cell phones have been used to collect survey, surveillance and monitoring data, among other uses (Greenleaf et al. 2017; Himelein et al. 2020; Arita et al. 2023).
To date, the majority of data collected from phone surveys in SSA is cross-sectional, but cell phone surveys offer a low-cost approach to collecting data repeatedly from the same person. Longitudinal data capture the timing of exposure and outcome, allowing causal analysis. Collecting repeated measures creates data that can identify patterns of behavior, variations in exposure and key time points or events for intervention. However, published examples of high-frequency remote data collection in SSA are monthly for no longer than a year (Croke et al. 2012; Ballivian et al. 2013; Trucano 2014; Gourlay et al. 2021; El Ayadi et al. 2020).
This paper presents patterns of participation in a surveillance system in Lesotho that called participants (enrolled during a face-to-face survey) weekly for two years. Interviewers called respondents each week to inquire about influenza-like illness symptoms (as a proxy for COVID-19 cases) among the participants and any household members they had seen in the past week. Direct reports of an outcome via technology (mainly phones) by a population at risk, outside of a health system, is called “participatory surveillance” (Smolinski et al. 2017). The Lesotho Cell Phone Population-based HIV Impact Assessment (LeCellPHIA) survey is a participatory surveillance system. We present enrollment and attrition rates, and patterns of participation from July 2020 to June 2022.
Methods
The LeCellPHIA sample was created from the Lesotho Population-based HIV Impact Assessment survey (LePHIA2020), a nationally representative two-stage cross-sectional household survey among adults 15 years of age and older, conducted from December 2019 through March 2020 that focused on HIV-related outcomes (Sachathep et al. 2021). The sampling frame contained over 5,600 primary sampling units (PSU) and 540,000 households. Using probability proportional to size method, 342 PSU were chosen from the 10 strata (districts) and about 21 households were randomly selected within each PSU (Lesotho Ministry of Health 2022).
Of the 15,267 LePHIA2020 participants who were over age 18[1], 735 did not consent to follow-up and among those who consented, 2,446 did not provide a phone number and 111 provided an invalid phone number. Thus, 11,975 participants (78%) were eligible to be sampled for LeCellPHIA.
Anticipating a 60% response rate - based on previous cell phone surveys in SSA and a small pilot of non-sampled LePHIA2020 phone numbers (Greenleaf et al. 2021) - we randomly selected nine households in each PSU from those households that had eligible participants to obtain a target sample size of five households per PSU. From each sampled household, one eligible LePHIA2020 participant 18 years of age or older was sampled. Due to the higher risk of COVID-19 and low phone ownership, households with older adults (defined as age 60 or older) were oversampled with a ratio of 2:1 between households with and without older adults. Further details about sampling, procedures and survey tools are available in a research protocol (Greenleaf et al. 2021). To increase our sample size, we asked all those enrolled to give a proxy report each week about the symptoms of household members listed in LePHIA2020.
The interviewers enrolled participants from July 1, 2020 to July 13, 2020 over the phone. Interviewers, all of whom had worked for LePHIA2020, called and confirmed participant identification, established eligibility (whether the participant was still in the same PSU where LePHIA2020 took place as this was vital for weight creation) then consented participants. The interviewer then reviewed the household listing from LePHIA2020 to understand who was currently residing in the house with the participant. After these questions, the interviewer consented the participant to weekly calls for the next year, and if the participant agreed, asked for the phone number of a family member or neighbor in case of difficulty contacting the participant, and also asked for consent to send a WhatsApp or text message in case of difficulty contacting.
The weekly surveillance questions began immediately after enrollment and were repeated weekly until the end of the study. Respondents were asked if they had any flu-like symptoms each week. If they said yes, the respondent reported whether they had fever, dry cough, or shortness of breath. The respondent also answered these questions for each LePHIA2020 household member they had seen in the past week. Data were collected weekly except for the 23rd to the 30th of December 2020. Given participants only consented for one year of data collection, at the beginning of year 2, over two weeks, participants were re-consented (asked to continue participation). The following week, taking re-consent rates into account, we refreshed the sample for one week by calling a sample of eligible LePHIA2020 participants who had not been previously sampled. Data were collected all weeks in Year 2.
Participants received an incentive at the end of each month; the amount dictated by the number of calls they participated in, with those who participated in all four calls receiving approximately 2 US Dollars, participating in three, $1.50, participating in two, $1.00, and so forth.
Results of the surveillance system - estimates of influenza-like illness as a proxy for COVID-19 cases - were shared in real-time, weekly, with the Ministry of Health and other stakeholders, usually three days after data collection finished.
To calculate enrollment response rates in accordance with AAPOR’s Standard Definitions guidance document (AAPOR 2016), we assigned disposition codes to each sample respondent.[2] We used the final disposition codes to create response (Response Rate #2) and refusal rates (#1).
Results
Enrollment and Attrition
Over two weeks, 16 interviewers contacted 3,020 eligible LePHIA2020 respondents, of whom 1,778 enrolled (62%, see Table 1). Ninety-four people consented to participate but were not in the PSU where LePHIA2020 took place but intended to return within the year. These participants were called monthly to establish whether they had returned to the original EA. There were only eight direct refusals (<1%). Twenty-nine percent of the sample had unknown eligibility, of whom 22% were not reached because their phone was turned off or they were out of network. Nine percent were not eligible, with 4% of phone numbers being disconnected. During enrollment, 1,461 participants (82%) had at least one household member. A total of 4,466 household members were accounted for by the participant at enrollment.
Throughout year 1, 185 participants were removed from the study (withdraw = 66, moved out of EA = 85, ineligible = 34; see Table 2). After one year of data collection, we removed 117 participants who had not responded to the survey in the past two months or who indicated at study start they would move back to the LePHIA2020 EA but had not.
From July 15 to 27, 2021, 1,476 participants were called, and 1,287 (87%) were re-consented, of whom 1,265 were living in the same EA as LePHIA2020 and 22 had moved but planned to return (so would be called monthly). In summary, between enrollment and beginning of year 2, 73% of participants remained in the study.
To add new participants to the sample, from July 29 to August 3, 2021, 1,094 participants that were not contacted at baseline were called, and 495 enrolled. Among the new sample, a lower percent (45%) enrolled as respondents (487 participants enrolled out of 1,094), with a larger percent being classified as unknown eligibility. Specifically, 34% had their phone turned off or were out of network. Refusal rate (<1%) and the ineligible rate (13%) were similar to year 1. Between the re-consented and refresh sample, 1,782 participants comprised of 3,391 household members started year 2 of data collection.
Throughout the second year, 62 participants were removed from the study (refusal = 40, moved out of EA = 22, and ineligible = 42), resulting in 96% retention.
The year one (AAPOR #2) response rate was 68% and year two was 49%. Both years had less than 1% direct refusals.
Participation
The average response rate per quarter ranged from a low of 71% in year 1, quarter 2 to a high of 84%, during year 2, quarter 3 (Figure 2). Year 1 average response rate was 78% and year 2 was 83%.
Among participants who answered at least one phone call in a month in year 1 (i.e., not including those who answered zero calls in a month), the majority (65%) answered all four calls, 19% answered three of four calls, 10% answered two of four and 7% answered only one of four calls. Figure 3 shows the number of calls completed per month in year 1, and Figure 4 is for year 2. In year 2, the average number of monthly calls among participants who answered at least one call in a month was slightly higher, with 71% percent answering all four calls, 17% answering three calls, 7% answering two calls and 4% answering one call.
Over the two years, although the rural response rate was consistently lower than urban and peri-urban response rates, the response rates did not differ significantly between geography (urban, rural, peri-urban)[3], nor by gender (male and female).
The distribution of participants by 10 districts in LeCellPHIA mirrored the 2021 Lesotho Demographic Survey (LDS) population distribution both at year 1 enrollment and year 2 (Table 3). Most participants were in the capital, Maseru, (27.3% LDS, 27.2% Year 1, 30.2% Year 2), followed by Leribe (17.4% LDS, 16.4% Year 1, 17.0% Year 2).
Discussion
Over the two years of weekly surveillance, participation was high; in year 1 73% of participants were retained and on average 65% of monthly participants answered all four calls in a month. Performance improved in year 2 with 96% retention and an average of 71% of monthly participants answering all four calls. Year 2 response rates were likely higher due to dropping participants who had been non-responsive at the end of year 1 and due to increased supervision of interviewers (ensuring interviewers were working assigned shifts) and monitoring (reviewing interviewer performance each week and monitoring the list of participants that had not been called each week).
Response rates were not differential by location (urban, peri-urban or rural), or gender and the distribution of participants by district reflected the distribution of the population in Lesotho. Enrollment rates decreased between year 1 (68% response rate) and year 2 (49% response rate) due to an increased number of participants having their phone turned off or out of network (22% in year 1, 34% in year 2); a difference to be expected due to increased time between reporting cell phone number and enrollment.
Our results show that the general population in Southern Africa is willing to participate in a high-frequency, longitudinal study. Feedback from interviewers and respondents indicate that incentives were motivating for respondents, and that many respondents were interested in the survey topic (COVID-19). Rural response rates were likely lower due to lower electricity access to charge phones, making rural respondents less available. Anecdotally, interviewers noted that response rates were lower during rainy weeks, when rural respondents, many of whom relied on solar electricity, were not able to charge their phones.
The few other high frequency phone surveys in SSA have also found participants are willing to engage over time. In Tanzania, over 33 rounds of interviews (25 weeks of weekly calls, 8 weeks of bi-weekly), the overall nonresponse rate was 25% (Hoogeveen et al. 2014). More recent research also by the World Bank in five SSA countries, had a range of response rates, from 60% in Ethiopia to 93% in Uganda (Gourlay et al. 2021).
A strength of the study is that we used a large, population-based survey reflective of the population – in contrast to a random digit dialing sample that would be less representative of the general population given that men, the young, and educated are more likely to pick up a random digit dial survey. The results are generalizable to other Southern African countries where phone ownership is high as presumed to be representative (usually 80% phone ownership is sufficient for representativeness). In countries with lower phone ownership, patterns may differ. A limitation of our analysis is that we cannot link the phone survey to LePHIA2020 beyond just location, age and gender. Although attrition is minimal, it would be informative to learn more about non-respondents and those who were lost to follow-up.
LeCellPHIA provides an example of a high-frequency, longitudinal surveillance system with high retention and participation, demonstrating the feasibility of such an approach. High-frequency, longitudinal cell phone-based data collection, for participatory surveillance or surveys, is an expanding opportunity in SSA and allows for research questions that require multiple data points over time.
Lead author contact information
Abigail R Greenleaf, MPH PhD
60 Haven Avenue, New York NY, 10032.
arg2177@cumc.columbia.edu
Among the 11,968 selected households, 10,282 were occupied and 9,665 were interviewed for an overall household response rate of 93.2% (American Association for Public Opinion Research [AAPOR] response rate 4). The response rate assumes that the percentage of eligible households among the households with unknown eligibility is the same as what we got from those with known eligibility, thus slightly decreasing the response rate from the raw estimate. Of the 17,590 individuals (7,443 men and 10,147 women) eligible to participate in the survey (aged 18 or older or emancipated minor 15-17, slept in the household the night before the survey), 16,466 participated. The interview response rate was lower for men (89.8%) than women (96.1%). Among those interviewed, 92.1% of men and 93.3% of women had their blood drawn. See Figure 1 study enrollment flowchart.
To create weekly response rates, we considered any participant for whom the interviewer marked their interview as complete, or who were marked as partially completed but answered the key question “In the past week, have you had any flu like symptoms (e.g., fever, dry cough, shortness of breath)?” as a respondent. All participants eligible to be called that week were included in the denominator. A participant could be excluded from a week’s call list if they had not responded for 4 weeks or if an interviewer did not call the respondent, which rarely happened. All response rates are at the individual level (i.e., do not account for report of household member symptoms). A participant was “lost” or removed from the sample if they died, moved out of the LePHIA2020 EA, or withdrew. This approach to creating weekly response rates applies to overall quarterly estimates, and estimates disaggregated by sex (male and female) and location (urban, peri-urban and rural).
The Lesotho Bureau of Statistics defines urban areas as high population density or high level of economic activities or infrastructure. Peri-urban have moderate population density or a lesser extent of economic activities or infrastructure and rural minimal population or little infrastructure or economic activities.