Survey modes comparison in contingent valuation: internet panels and mail surveys

Stated preferences methods are extensively applied in health economics to elicit preferences. Although mailed surveys were commonly used to collect data, internet panel (IP) surveys are being increasingly used. This raises questions about the validity of responses and estimated willingness-to-pay (WTP) values generated from IP surveys. We conduct the first study in health to compare a contingent valuation IP survey with a mailed survey using the electoral roll. Our IP has a higher response rate and lower item missing response rate. The difference is reduced but remains when restricting comparisons with valid WTPs. Sample characteristics differ, with significant differences between modes for gender, age, income, and attitudes and knowledge. Although difference in WTP values exist, with the IP resulting in higher values, we find limited evidence that such differences are statistically significant. The mail survey has lower initial cost per response; however, once restricting samples to valid WTP responses with nonmissing respondent information, the cost per response across modes is similar. Our results, suggesting that IPs generate valid and cost-effective values, are encouraging as researchers move increasingly to IPs to collect preference data.


Introduction
Stated preferences methods are extensively used in applied economics to elicit preferences. With the emergence of internet-based surveys and falling response rates to mailed surveys, it has become increasingly common to collect data using internet panels (IPs). IPs have advantages in comparison to traditional used modes, including: avoidance of transcription and manual input errors; faster administration; and removal of interviewer bias (Dillman et al., 2009). However, it remains unclear if different administration modes introduce biases.
Biases are an important consideration in health economics where stated preference methods have found numerous applications. However, literature on survey mode effects in health economics is limited in both the number of studies and the modes compared. Three studies have investigated mode effects within the health state/health outcome literature. Mulhern et al. (2013) compared a Computer Assisted Personal Interview (CAPI) survey (n=201) and an IP survey (n=221) looking at health state valuations. Preferences were elicited using a discrete choice experiment (DCE). To ensure overall comparability of the sample characteristics across administration modes samples were recruited following procedures employed in typical surveys. Whilst significant differences in respondents' characteristics were found (online respondents were educated to a higher level and CAPI respondents reported significantly better general health and health/life satisfaction), there was no significant difference in preferences across modes. Norman et al. (2010) compared online questionnaires and face-to-face interviews when eliciting health state preferences using the time-trade-off method. Data were collected from a convenience sample of adults (n=135) randomised 1:1. They found the online survey had higher variability, and more extremes, than in-person interview valuations. Rowen et al. (2016) compared online questionnaires (n=302) and face to face interviews (n=69) in a pairwise comparison study of social preferences for burden of illness. Whilst the invited sample was representative of the UK for age and gender, the mode of administration affected respondent characteristics; compared to the general population, interview respondents were older and more likely to be retired whereas online respondents were younger, less likely to be employed or retired, and had poorer health.
Whilst willingness to pay (WTP) is commonly used in health economics to assess values, only two studies have investigated how WTP is impacted by survey mode; both used the DCE method. In a study eliciting preferences for pharmacy services, Watson et al (2019) compared four modes: CAPI; mailed survey; and two UK IP surveys (Ipsos Mori and ResearchNow). Samples were recruited following standard approaches and each mode targeted 1000 respondents. Modes were compared according to objective measures (response rate; sample representativeness compared to the UK population; elicited values; theoretical validity; and cost per response) and subjective/self-reported measures (time taken to complete study; perceived study consequentiality; and stated attribute nonattendance). Sample characteristics differed across modes as did estimated WTP. On most measures CAPI were superior but were more expensive. On all measures, except response rates, IPs outperformed the mail survey and were cheaper. In a DCE concerned with patient preferences for health insurance, Determann et al. (2017) compared responses for two samples drawn from an online panel -one sample completed the questionnaire online (n=533) whilst the other received a paper version (n=365). They found no evidence that online surveys yield inferior results compared to paperbased surveys, but the price per respondent was lower We present the first study comparing the direct contingent valuation (CV) method, using an IP survey and a mailed survey using the Electoral Roll (ER). We compare modes according to: response rates; item-response rates; WTP response classifications; cost per response; sample characteristics; and monetary values.

Context
The CV survey elicited willingness to pay (WTP) for treatment strategies for illicit drug users. For detailed information on the study design see Matheson et al. (2014). In summary, the questionnaire had four sections: demographics (e.g. gender, age); experience of drug misuse (e.g. experience with illegal drugs in the last year); knowledge and attitudes about drug users; and WTP for five treatment strategies (needle exchange; maintenance with oral methadone; community detoxification; residential detoxification and maintenance with prescribed heroin). Respondents were asked to imagine the Government were considering expanding treatment programmes paid from tax contributions. Respondents were then asked how much they would personally be willing to pay to expand each of the five treatments. Respondents who stated they were unwilling to contribute anything were asked to explain why; the aim was to distinguish 'protesters' (those who objected to the CV valuation method) from those who genuinely did not wish to contribute (those who did not value treatment strategies and had a 'valid' zero). A section of the questionnaire invited open comments, which were analysed qualitatively to aid classification of WTP responses.

Modes of data collection and sample frames
We designed a single master CV survey using word-processing software, and this was the basis for scripting the IP survey. The self-completion survey was administered using two different samples of the Scottish population who received either a mailed paper self-complete questionnaire or were part of an internet panel (IP) who completed the survey on line. Samples were recruited using standard approaches:  The paper survey was mailed to a random sample of 3000 people over 18 years from the Electoral Roll (ER). The sample was provided by an independent sampling company and stratified by age and geographical location. Invitations to participate were to named individuals. Respondents entered a prize draw for a £100 shopping voucher as an incentive. Two reminders were sent at three weekly intervals; data collection was closed at eight weeks. Respondents were identified using ID numbers on reply-paid envelopes. Questionnaires were separated from envelopes prior to data entry to preserve anonymity but allow identification of non-responders.
 For the IP mode a stratified sample (age and geographical location) was drawn from a UK volunteer IP, ResearchNow (IP-RN), producing a guaranteed 300 respondents. Potential participants were selected at random from those eligible within strata. Invitations to complete the questionnaire were emailed to the sample. When individuals responded to the invitation they were screened against the quotas (for age and geographical location) until the target number of responses had been received. Following standard practice, IP respondents received the standard nominal incentive used by the company. Data collection took one week.

WTP response classifications
Combining stated WTP values and respondents' qualitative comments, WTP responses were classified as: protest (no value given and a protest against CV e.g. "I think the government should be paying for this"); cost-based (respondents who attempted to estimate the cost of services as opposed to the value e.g. "no idea of cost", see Ryan and San Miguel, 2000 for more on cost-based responses); missing (no WTP provided) or valid. Distinctions were drawn between 'full' and 'restricted' (i.e. responses without any missing values for covariates/sample characteristics) samples.

Mode cost-effectiveness
Costs for the mailed questionnaire included: questionnaire printing; paper; postage (outgoing and reply paid reminders); administration; data input; and electoral roll access (one-off payment). IP-RN sample costs included initial fixed set up costs for scripting the survey plus a cost per participant for completion. We calculated cost per WTP response and cost per 'valid' WTP response across the 'full' and 'restricted' samples.

Comparison of sample characteristics
We compared sample characteristics across modes for both the 'full' sample and for individuals with valid WTP responses. This distinction provides insight into whether representativeness is conditional on validity.

Comparison of valid WTPs
We compared WTP values and non-zero (positive) WTP values across modes for all five WTP questions. We present results for the 'full' and 'restricted' samples; this latter group is likely to be of most interest to researchers.

Regression framework
Two or more-part models have been proposed to analyse data where behaviour is separable in parts (Duan et al., 1983). This has found applications in CV with two-part specification modelling WTP conditional on non-zero WTPs (Hammitt and Zhou, 2006). Others have incorporated higher dimensions to separate zero and missing responses (Yu and Abler, 2010). We adapted this framework and fitted a three-part model.
First, the probability of stating a valid (j=1), protest (j=2), cost-based (j=3) or missing (j=4) WTP follows a multinomial logit: (1) Second, the probability of positive WTPs, conditional on valid WTP is a probit: Third, average WTP, conditional on positive, valid WTP was modelled as a generalized linear model with log gamma link that captured the potential long tail in the WTP distribution and has often been proposed for the health care cost modelling (Basu et al., 2004): where is a vector of individual characteristics.
Finally predicted WTPs were calculated as E( | , = , > 0) for both modes and differences were bootstrapped with 500 repetitions. Table 1 presents response rates, item-response rates, WTP classifications and cost-effectiveness across modes for the full and restricted samples. We find:

Response rates, WTP classifications and cost effectiveness comparisons across modes
 Mail and IP-RN surveys achieved sample sizes of 1067 (38% response rate) and 302 (guaranteed) respectively.
 For both the full and restricted samples around two-thirds of respondents (61% and 66% respectively) provided a valid WTP in the mail survey, with the remainder consisting mainly of missing WTP responses (as opposed to protest or cost-based responses). The IP-RN mode had a higher valid response rate (79% in both groups) and lower missing item-response rate (6%).

Sample characteristics comparisons
Tables 2 and 3 present sample characteristics by mode for the full and valid WTP samples. Differences exist for: age (IP-RN had the largest proportion of respondents (34%) in the 55 to 64 years category while mail had the largest proportion (27%) over 64 years); income (mail had a larger proportions of low income earners (29% cf 19%); alcohol drinker (IP-RN 88% cf mail 77%); and knowledge and attitude scores (IP-RN scoring higher in both), with the difference in attitude score no longer significant in the valid WTP sample. The mailed survey has a significantly higher number of missing item responses, ranging from 0.7% of the full sample for age to 13.3% for knowledge scores, compared to no missing values for the IP-RN. These percentages are reduced in the valid WTP sample (range 0%-10.6%).

WTP mode comparisons
Except for residential detoxification, we find no statistical difference in WTP between modes (Table  4). When considering only responses with a positive WTP, IP-RN respondents display significantly larger values for three programs (maintenance with oral methadone, community and residential detoxification), albeit for two of them only at the 10% level, implying a larger proportion of zero WTP values for the IP-RN sample.

Concluding comments
Our IP-RN survey had higher valid response rates; this may reflect quality/updating of IP mailing lists (compared to the electoral roll). The IP-RN also had a lower missing item response rates; the comparison with the mailed survey shrinks but remains when restricting comparisons to valid-WTPs. This finding is a consequence of IP-RN requiring all questions to be answered. Whilst a mail out with online link could potentially overcome this problem, in pilot work Watson et al (2013) found the response rate for mail to online surveys much lower than all other modes compared (14/1200; 1.2%) and subsequently did not take the comparison of this mode forward to their main study. Sample characteristic comparisons show statistically significant differences. Whilst differences in WTP values exist, with the IP resulting in higher values, we find limited evidence that such differences are statistically significant. Questions may be raised about the generalisability of our results. The mail survey had a larger percentage of over 64-year-olds. In this application, illicit drug use, the sample characteristics may not be driving differences in values. However, for applications where the condition is more likely to affect older people (e.g. social care), IPs and mailed surveys may result in differences in WTP. Future research should explore this. Further, over time this difference may reduce as older people become more computer literate. Our mailed survey had lower initial cost per response; however, for the restricted sample the cost per response across modes was similar and not significantly different. It is recognised for larger sample sizes the mail survey may become more costeffective.
Our finding that IPs generate similar values to mailed survey, and are equally cost-effective, are encouraging as researchers move increasingly to IPs to collect preference data. They are also relevant for researchers collecting health-related quality of life or other health-related questionnaires. IPs have potential advantages when designing studies, including: the opportunity for: animations/interactive explanations; more complex questions/bidding games; and pivot designs where starting values are specific to the respondent. They also allow for collection of response time data (useful when exploring data quality) and quicker data collection (useful when quantitative data from pilots is required to inform the main design). Given the ease with which IP surveys can be undertaken researchers should look to exploiting their many advantages in the design of surveys.