Flipping the classroom: is it the type of flipping that adds value?

ABSTRACT Interest in the flipped classroom in higher education has burgeoned despite the literature revealing that the effects on assessment outcomes remain elusive. In this paper, we present the results of an empirical analysis designed to compare the impact on assessment outcomes of different approaches to the flipped classroom (didactic and non-didactic). Focusing on a cohort of Intermediate Economics students we investigated the influence of these approaches on their examination results by utilising an OLS (Ordinary Least Squares) regression and probit followed by quantile regression. Our analysis revealed small positive effects when students were exposed to the ‘non-didactic’ flipped classroom but no effect when pre-lecture materials were used didactically to mimic the material given in traditional lectures. This study demonstrates the need for further meta-analysis and longitudinal studies to investigate the relationship between different forms of the flipped classroom and student assessment outcomes.


Introduction
'Some people talk in their sleep. Lecturers talk while other people sleep.' Camus (Long and Lock 2014) During the past couple of decades, there has been increasing pressure to focus on teaching methods and effective student learning. In the UK, this can be traced back to the mid-1990s when Higher Education (HE) began to move from being the preserve of the elite to a mass system and to the introduction of tuition fees (Webb et al. 2017). As a result, research exploring the costs and benefits of degree courses has burgeoned, with contributions from a variety of disciplines including physiotherapy, dentistry and law (for example Asgary and Robbert 2010;Rivers et al. 2015;Stafford et al. 2014;Tamanaha 2013). This changing milieu has driven UK universities to focus on metrics such as reputational standing, league tables and student feedback, especially the National Student Survey. Importantly, the sector is placing increasing importance on training for HE staff, teaching methods and teaching quality, as evidenced by the recent introduction of the Teaching Excellence Framework (HEA 2018).
With focus more heavily on teaching, you do not have to go far before discovering that the reputation of the lecture to deliver module learning outcomes and assurance of learning is being somewhat derided. With Bligh (2000) typically used as the backbone for this anti-lecture stance, the rhetoric used tends to be rather combative in tone, for example, Folley's (2010) article includes in the title 'The Lecture is Dead … ' and Gibbs (2013) refers to 'Drone Warfare', going on to state: 'more than 700 studies have confirmed that lectures are less effective than a wide range of methods for achieving almost every educational goal you can think of'. In addition, Schmidt et al. (2015) found lectures encourage a 'superficial and abbreviated' outlook inconsistent with the promotion of critical thinking. They suggest time constraints can lead to an overly regimented approach, destroying any possibility of catering for the diverse variation in the students' skills set. Similarly, McFarlin (2008) proposes students prefer self-pacing in learning, implying this is incompatible with the traditional lecture.
Despite the many issues associated with the lecture, the available pedagogical literature (negative or otherwise) should be treated with caution, not least because providing a definition of a lecture is not as straightforward as it may first appear. Bligh (2000, 4) writes that lectures 'are more or less continuous expositions by a speaker who wants the audience to learn something'. In reality, there is no clearcut definition of a lecture that would enable support for an anti-lecture consensusthe lecture can take many forms and quality and perception is reliant on a myriad of (sometimes unquantifiable) factors, from the charisma of the lecturer to the diversity in pedagogical approaches adopted. There is such disparity that reaching an unambiguous conclusion of the relative failure of the lecture format is almost impossible (this should be borne in mind while reading this paper).
Although problems of definition and comparison abound this has not slowed a consensus forming that the lecture is flawed and that innovative new approaches are the futureoften despite the dearth of empirical evidence to support such claims. Of resonance is the ongoing popularity of the so-called 'pyramid of learning'. This evolved from Edgar Dale's 'cone of experience' (Dale 1969) and categorises learning gains very precisely, but without robust empirical support: we supposedly remember 5% of information from lectures; 10% from what we read; 20% presented from audiovisual mechanisms; 30% from demonstrations and 50% through participation in discussions.
As the bad press for lectures has gathered pace, technology has enabled radical change in the (possible) approaches to teaching, with a rapid rise in the use of 'blended learning' methods within HE. Such methods combine traditional classroom activities with the virtual environment and web-based technologies. For example, video capture software has become an increasingly common tool across HE. The Campus Computing Project (2014) reported that 9.4% of classes in US public universities use such software. Such technologies are also highly incentivised within UK higher education. As Joseph-Richard et al. (2018, 377) observed, the recently introduced Teaching Excellence Framework (TEF) 'rewards institutions for providing state-of-the-art technology and lecture recording systems' despite inconclusive empirical evidence for the efficacy of such technologies in enhancing student learning. At the forefront of these new approaches, and one of the most discussed and utilised methods over the past decade (portrayed as the panacea to the lecture) is the 'flipped classroom' as pioneered by the likes of Bergmann and Sams (2012) and Berrett (2012).
This study has two key aims. First, it explores the academic evidence for the effects of the flipped classroom on student learning outcomes compared to a traditional lecture. Accordingly, we begin the paper with a brief introduction to the flipped classroom followed by a literature review of studies that have investigated the effects of the flipped classroom on student assessment outcomes. This revealed the effects of the flipped classroom on assessment outcomes remain inconclusive. Second, the empirical analysis in this study tests the effects of different approaches to the flipped classroom (didactic and non-didactic) with the traditional lecture through an empirical analysis of examination outcomes for a 2015-2016 cohort of Intermediate Economics students. We found small positive effects of nearly 4% when students were exposed to the non-didactic flipped classroom but no effect when pre-lecture materials were used didactically to mimic the material given in traditional lectures. We conclude by discussing the main contributions of our study and implications for future research and practice.

The flipped classroom
The flipped classroom, is 'an educational technique that consists of two parts: interactive group learning activities inside the classroom, and direct computer-based individual instruction outside the classroom' (Bishop and Verleger 2013, 4). Students are asked to prepare, prior to the in-class session, by engaging with video lectures, podcasts, reading materials, PowerPoint slides and so forth. This enables the lecturer to change their pedagogical approach such that the student is placed at the centre of the learning experience during the session, which replaces the traditional lecture.
The flipped classroom encourages students to focus on knowledge and comprehension prior to attending the sessions, so the content of these sessions can be focused on higher levels of Bloom's (Anderson et al. 2001) taxonomy. In this respect, it shares similarities with concepts such as the 'inverted classroom' (Lage, Platt, and Treglia 2000) and 'peer instruction' (Crouch and Mazur 2001;Mazur 2009). While the flipped classroom remains quite diverse in execution, a common approach is to record the standard lecture content beforehand and then upload these videos to a virtual learning environment. This then allows, commonly in conjunction with audience response systems, the face-to-face session (during the time that would have been used for the traditional lecture) to be used for more interactive exercises, including quizzes or assignments (see Chen and Lin 2012;Gulley and Jackson 2016;Wozny, Balser, and Ives 2018 for examples). Advocates of the flipped classroom argue it provides a more interactive session, with opportunities for self-pacing of the didactic transmission of key knowledge, and the development of critical reasoning skills (for example, see Roach 2014;Kong 2014Kong , 2015. By facilitating greater discussion between student(s) and lecturers it has been claimed it allows clearer identification of the material students find understandable and where they require help. For example, Berrett (2012) contends 'the immediacy of teaching in this way enables students' misconceptions to be corrected well before they emerge on a midterm or final exam'.
The arguments for the benefits of the flipped classroom, taken at face value, seem convincing. For example, in a massified HE environment it is difficult to argue that flipping will not promote more selfconfidence and knowledge transfer than the superficial talk through a set of lecture slides. However, despite its intuitive appeal, the empirical evidence for its advantages over the traditional lecture are inconclusive, with, for example, both Bishop and Verleger (2013) and Zuber (2016) pointing to a lack of concrete evidence to support the many claims of the benefits of the flipped classroom. In addition, there are studies that allude to improvements in student satisfaction but that find no significant impact on students' assessment performance (for example, see Blair, Maharaj, and Primus 2016;Guerrero et al. 2015;or Sparks 2013). Kwak, Menezes, and Sherwood (2015) even report strong negative effects when learning is deemed to be cumulative whilst Wozny, Balser, and Ives (2018, 115) observe that 'despite the recent interest in flipped classrooms, rigorous research evaluating their effectiveness is sparse'.
To further investigate the evidence regarding the effects the flipped classroom may have on student performance, we searched ISI Web of Science Core Collection (selecting all indexes) for journal articles, review articles, books and book chapters with the terms 'flipping the classroom', 'flipped classroom', 'inverted classroom', 'flipped learning', 'flipped education', 'reverse classroom', 'backward classroom' or 'inverse classroom' in the title, abstract or keywords through until 8th May 2019.
With respect to the disciplinary setting for this paper, economics education, there are only a handful of articles focused on evaluating the effects of the flipped classroom compared to traditional lectures (see, for example, Wozny, Balser, and Ives 2018). Within these, findings are again mixed with respect to the impact on student performance. Researchers have provided evidence of small positive gains for flipping the classroom relative to traditional lectures on student performance in assessments (see Balaban, Gilleskie, and Tran 2016; Calimeris and Sauer 2015; Caviglia-Harris 2016; Ficano 2019; Olitsky and Cosgrove 2016). Chen and Lin (2012) found that supplemental video lectures generate an overall improvement in exam performance of around 4 percentage points. However, other scholars have found non-statistically significant differences when such methods are directly compared with traditional teaching methods (for example, Brown and Liedholm 2002;Terry and Lewer 2003;Olitsky and Cosgrove 2013).
Literature reviews of the impact of the flipped classroom on assessment outcomes have also reported contradictory findings. For example, in engineering education, a systematic review by Karabulut-Ilgu, Cherrez, and Jahren (2018) highlighted that whilst some studies report beneficial effects on learning outcomes, others report no benefits, or that the flipped classroom is less effective. Similarly, in the field of nursing, Betihavas et al. (2016) and Ward, Knowlton, and Laney (2018) identified neutral or positive effects of flipped learning, whilst a review by Evans et al. (2019, 74) 'did not reveal compelling evidence for the effectiveness of the method in improving academic outcomes above that of traditional classroom approaches'. Further, Gill, Andersen, and Hilsmann (2019) reported the flipped classroom as one of the least effective strategies for teaching pharmacology to undergraduate nursing students. Similarly, a systematic review of studies in medical education reported inconclusive evidence for flipped classrooms in promoting knowledge acquisition compared to the traditional lecture (Chen, Lui, and Martinelli 2017).
Whilst the majority of literature reviews have understandably been limited to a particular discipline, Akcayir and Akcayir (2018) presented an excellent multidisciplinary, systematic review of 71 articles, based on an initial search that identified 206 articles from the Web of Science Social Sciences Citation Index (SSCI) published up until the end of 2016. They noted the most frequently reported advantage of the flipped classroom was the improvement in student learning performance. However, as the results of our own literature search highlight, using Web of Science Core Collection to search the literature up until May 2019 (including all indexes in the search rather than focusing on the SSCI) reveals well over four times the number of publications on flipped learning with 701 (62%) published since the beginning of 2017.
Recent times have also seen the publication of meta-analyses centred in the field of nursing (Hu et al. 2018;Tan, Yue, and Fu 2017), mathematics education in HE (Lo, Hew, and Chen 2017) and across disciplines (Chen et al. 2018). Chen et al. (2018) present an excellent analysis of 46 studies published through until June 2016, principally drawn from the field of health sciences but also including other disciplines. They reported higher academic achievement for flipped classrooms relative to the traditional lecture. However, given the large number of studies published since 2017 and the databases Chen et al. (2018) used to identify articles for inclusion in their analysis (e.g. MEDLINE, PubMed, CINAHL, ERIC, EMBASE) it is unclear the extent to which their findings, and those of other meta-analyses, reflect recent empirical studies across all disciplines.
In summary, despite the valuable contributions of recent studies, literature reviews and meta-analyses, there arguably remains 'inconclusive evidence of an improvement of assessment outcomes for students' relative to the traditional lecture (Zuber 2016, 97). The inconclusive evidence with respect to the impact of flipping on student assessment performance is unsurprising as performance is likely to vary according to a number of factors including the topic area; the institutional setting; the specific design of pre-class and in-class activities; the students' prior performance and the timing of the assessment (see Wozny, Balser, and Ives 2018 for a discussion in economics). In relation to prior performance, Wozny and colleagues found slightly larger impacts of the flipped classroom on assessment performance for students with above-median GPAs compared to below-median GPA students, for medium-term assessments. They also found positive effects of the flipped classroom on long-term assessments, but only for above-median GPA students. Similarly, Ficano (2019) found positive effects for students with strong mathematics literacy and non-minority students. Negative effects were reported for students with weaker mathematical literacy and from minority backgrounds.
Relatedly, student motivation may also cloud analysis of the impact of flipping on assessment performance. Indeed, recent studies outside of economics have begun to explore the link between motivation and performance (see Abeysekera and Dawson 2015;Ngoc, De Wever, and Valcke 2017;Liu, Raker, and Lewis 2018). Within the UK HE context, data from the National Union of Students (see Table 1 below) shows that the majority of students are career motivated, and while the 'option openers' may share some traits, the percentage of 'academics' suggests that mastery goals are unlikely to have a substantive effect on performance goals.
As the evidence points to students attending university to assist them with their career goals, it may mean that any newly introduced teaching innovation may be captured by students who use the innovation to target the grade they desire, thus skewing impact on assessment performance. For example, in modules that students believe are: less important; less challenging or have less impact on future employment prospects, students may well satisfice -targeting a specific (lower) grade and expending less effort -in effect 'grade targeting' (see Allgood 2001). Allgood's idea, which builds upon the framework first introduced by Harackiewicz et al. (1997) is that student motivations are dependent on two goals: a mastery goal which is linked to the 'the desire to develop competency' (that is, to learn as much as possible); and a performance goal 'to evaluate performance relative to some benchmark' (for example, to complete work whilst minimising effort). The relative importance of these goals is likely to be dependent on students' motivation for attending university: … the primary reason many students enter college is to get a job with a desired set of characteristics. Students who view their education in this manner are likely to be motivated to set goals that allow them to reach a desired level of performance with minimum effort (Allgood 2001, 486) Grade targeting adds an element of noise into any attempt to estimate the benefit of the newly introduced pedagogical technique. What is potentially being captured is the ability of students to use the technique to more accurately target the result they want. As a consequence, when the results of the entire cohort are analysed, there is no effect on overall assessment performance. This may also help explain positive student feedback regarding innovative new techniquesin that it is linked to student motivation(s). As such, by lowering the time costs associated with acquiring the required knowledge, the teaching innovation may simply enable more time to target other, more challenging modules, or pursue alternative non-academic activities. Lending support to this notion, Overbaugh and Nickel (2011) find that: When a blended course is developed, supported and implemented well, researchers have found that a majority of the students will be as satisfied or more satisfied with the blended course as they have with previous face-to-face courses. (165) This may also help to explain anecdotal evidence of students' valuing video content being provided prior to in-class sessions, despite the absence of significant effects compared to traditional lectures on either outcomes such as retention and understanding of course material, or effective use of time in class (see Gulley and Jackson, 2016). As described in the methodology section below, our paper builds on recent contributions that have recognised the importance of exploring the differential impacts of various forms of flipping on student performance (for example, Caviglia-Harris 2016; Zhu and Xie 2018). Specifically, we adopt an empirical approach that explores the links between the mode of delivery (traditional lectures compared with different forms of the flipped classroom), students' motivation to learn and student examination outcomes. This enables a more in-depth investigation that goes beyond a binary 'flipping versus traditional' approach used in the majority of previous studies (e.g. Gulley and Jackson, 2016;Wozny, Balser, and Ives 2018).

Methodology
This study compares the effects of two approaches to the flipped classroom on the examination grades of a cohort of 127 intermediate microeconomics students studying at a UK university. The first approach we define as 'didactic' flipping and the second as 'non-didactic' flipping. We define 'didactic' flipping, in line with earlier discussions, as the commonly used approach whereby lecturers utilise material that has been recorded when delivering lectures to a previous cohort of students. The 'didactic' pre-lecture materials involved using a variety of screen-casting and multimedia software in order to upload complimentary materials for a standard intermediate microeconomics textbook on a weekly basis onto the student's Virtual Learning Environment (VLE). The term didactic is adopted as the material is constructed in a systematic way, with the student being led through theoretical material starting from first principles. This approach develops definitive conclusions and provides a clear narrative, with resources being used to fully describe and cover the technical demands of the subject material. The session now differs from the traditional lecture and is more open to the mathematical application of the theory, with problem-based questions considered to further assist student understanding. Rather than changing the learning process, the approach adopted focuses on the so-called 'multimedia principle' which asserts that the student achieves greater and deeper learning from studying a variety of different forms of material, as illustrated by Mayer (2014, 43): The cognitive theory of multimedia learning is based on three cognitive science principles of learning: the human information processing system includes dual channels for visual/pictorial and auditory/verbal processing, each channel has a limited capacity for processing, and active learning entails carrying out a coordinated set of cognitive processes during learning.
The 'flipped classroom' can then provide a more interactive environment, replacing the traditional lecture and subsequently engineering two key learning gains. First, it provides opportunities for further discussion of any material that students have found confusing. Second, and arguably integral to the pedagogical approach, there can be a focus on any perceived theoretical limitations in the material that has been covered in the pre-session resources. The 'flipped classroom' then enables the delivery of both technical analysis and the critical outlook required in essays.
In contrast, our second approach, the 'non-didactic' pre-lecture flipping, e-learning materials were designed so there was no single way of engaging with the material and no definitive conclusions given to the students. Rather, numerous stand-alone perspectives were provided, allowing students to decide for themselves how they considered the information. This non-didactic 'flipped classroom' approach hypothetically engineers a further learning gain, by encouraging students to reflect on the material and critically appraise the relevance of often conflicting perspectives (c.f., Kong 2014Kong , 2015. To assist this non-didactic approach, in-class sessions are designed to include 'devil's advocate' games, for example, by using audience response systems, students are asked to consider realworld outcomes and assess the relevance of particular perspectives. To construct these non-didactic resources, we take advantage of the pluralism debate in economics. Adopting Denis (2013), we accept the stance that pluralism requires the recognition that there are multiple schools of thought within the economics discipline. Materials are therefore written to recognise the coexistence of approaches which, unless theoretically or empirically refuted, can be presumed to be equally valid. These different perspectives include Neoclassical Economics, Marxism, Institutionalism, Feminism and Post-Keynesianism. There is, therefore, no insistence that there exists a single doctrine, as these pre-session materials are not written to indicate any preference for the replacement of any mainstream thought with a specific heterodox alternative. Instead, to ensure a focus on academic freedom, the student is simply asked to consider the existence of competing paradigms. As they themselves compare across these conflicting approaches, it is envisaged that pluralism could encourage a deeper level of learning. Students would have to make their own critical evaluation over which theoretical approaches are cogent for any real-world application.
The particular context in which the study is conducted presented a number of challenges. As discrimination across students over the availability of resources is not possible, and we were not able to randomly allocate students between our two types of pre-lecture material, we minimise empirical issues by adopting the following procedure. First, to ensure that we could test for the differential effects of the learning resource, the lecture topics are split equally, and then randomly assigned, across the two forms of pre-lecture materials. There are therefore four didactic and four non-didactic e-learning resources, all which are accompanied by their own online quizzes as mechanisms to test knowledge and understanding. Didactic resources are constructed for lectures in Consumer Theory, Producer Theory, General Equilibrium and Market Failure. Non-didactic resources are instead provided for Theory of the Firm, Monopoly, Oligopoly and Labour Markets. This ensures that they are equally relevant as revision aids for the final examination. For cohorts prior to the development of these resources, no significant difference in student performance across the questions on these topics is found. Second, the general structure of the pre-session materials is communicated to students. Given it is unlikely that all students will comply and watch every pre-session resource, student preferences are likely to lead to pre-selection, according to preference.
We hypothesise that the 'didactic' pre-session material would make it simpler for the target-orientated student to meet their objective. This suggests that these methods are less likely to positively impact on student outcomes. In contrast, we propose that the 'non-didactic' pre-session is driven by a commitment to encouraging debate from the onset: perceptions of a simple truth are avoided and, from the beginning of their learning, students are encouraged to maintain an open mind. Hypothetically, this could change student motivation for attending the lecture/session. Forming their own opinions over the conflicting approaches, it can be argued that the role of the lecturer is transformed. Rather than being the 'instructor', they become an 'information provider' who cedes their hierarchical position in order to take the role of assisting navigation through the various debates. As such we hypothesise: Hypothesis: A non-didactic approach to the flipped classroom, rather than a didactic one, will improve student performance.
Our primary aim is, therefore, to test for differences in assessment outcomes according to student engagement with didactic and non-didactic lecture methods. As discussed in Gelman and Stern (2006), this must go beyond simply comparing statistical significance on assessment outcomes. We, therefore, adopt a confidence interval approach to investigate two issues. First, the extent that we can recommend the adoption of flipped classroom methods. Reporting small effects, for example, might indicate that the additional staff costs imposed by transition to these alternative teaching methods are unjustified. Second, the extent that there is crossover between the confidence intervals for the estimated effects from our two types of pre-session materials. Such a comparison will help determine whether it is possible to conclude that one approach does outperform the other.
To meet our objective we conduct an OLS and probit analysis before employing quantile regression to provide a more complete understanding of the impact of non-didactic flipping on student performance (following the work of Moffatt and Robinson 2015;Ng, Pinto, andWilliams 2011 andSiriopoulos andPomonis 2009). While Ordinary Least Squares (OLS) provides estimates of the mean effect on student performance, quantile regression allowed us to differentiate between effects on distinct groups of students, in our case, relatively low performers and relatively high performers. Using this approach, Moffatt and Robinson (2015) dismiss any significant positive effect of online multiple-choice revision quizzes within the VLE. As our online resources also include quizzes, our approach also provides a check on the findings of Moffatt and Robinson. We develop our model and test two dependent variables. The first being the final exam mark in the intermediate microeconomics exam and the second, given its use as a key performance indicator for grade improvement, being a dummy for achieving a 'good honours' outcome (that is, scoring an exam mark of 60% or more). We then regress various independent variables against these dependent variables. First, in line with recent research, we proxy for ability levels by including dummy variables for previous assessment performance (e.g. Wozny, Balser, and Ives 2018). For this, we use the performance achieved in a previous introductory microeconomics module in addition to the mark that the student achieved in a discussion board for critical evaluation skills. We define the didactic student by the number of times they access those resources in the VLE and the non-didactic student by the number of times they access the broader resources made available to students via the VLE. We also include variables for the number of quizzes that a student completed while on the module, and dummy variables to control for: gender; overseas students and those undertaking a joint honours degree. We expect that our didactic variable would be insignificant and that students embracing the non-didactic approach would achieve a significantly higher grade. To confirm, our OLS and probit model is specified as below: XD, XND, XQuiz, Gender, Overseas, JointH) where performance is either Exam (i.e. the mark received in the final intermediate microeconomics exam) or GoodH (i.e. a dummy that equals 1 if a 'good honours' outcome is achieved, with a mark of 60% or more). Ability is proxied by two variables: exam outcome in Introductory Microeconomics as students build on their knowledge and mark achieved in a discussion board for critical evaluation skills. XD is the number of 'didactic' pre-session resources that have been read by the student. XND is the number of 'non-didactic' pre-session resources that have been read by the student. XQuiz is the number of quizzes completed by the student. Gender is a dummy that equals 1 if the student is male. Overseas is a dummy that equals 1 if the student is an overseas student. JointH is a dummy that equals 1 if the student is on a joint honours scheme.
Given we are unable to separate our cohort such that they only have access to one form of presession material, we are concerned that the results may be confounded by a close correlation between reading of the didactic and non-didactic resources. To resolve this issue, we added a measure of engagement, with the engaged utilising both forms and the non-engaged ignoring both types. The partial correlation, however, was found to be 0.579 and in none of the multiple specifications we investigated is the coefficient on XD found to be statistically specific. This includes a simple specification that excludes both XND and XQuiz variables. We are also concerned that there may be issues of endogeneity. In particular, rather than being a causal relationship of the learning materials, it could simply be that the more highly motivated students exhibit both higher exam marks and also higher online engagement with the VLE. However, prior to any modelling, we find no evidence that this is a significant concern. There is, for example, no substantive correlation between use of the non-didactic resources and either overall VLE use or overall degree classification.

Findings and discussion
Despite one hundred and twenty students on the module, the data shows disappointing student engagement with the flipped online materials. As the descriptive data in Table 2 shows, on average, less than half of the materials are read, whether they were the didactic materials (just under 48%) or the non-didactic materials (just over 41%). This is despite the fact that flipped classroom students are expected read these materials before attending the session where this subject matter will be discussed and dissected by them. In addition, just under 5% of the quizzes were completed.
The OLS results in Table 3 below are broadly consistent with Moffatt and Robinson (2015) and our control variables are as expected. The variable for previous exam results, proxying differences in overall microeconomic knowledge, is positive and statistically significant. The discussion board outcome, controlling for differences in critical evaluation skills, is similarly positive and significant. There are also significant differences between students on single and joint honours, perhaps reflecting difficulties created by any reduced exposure to general microeconomic tools. The significant differences between home and overseas students is also notable. Given exam questions tend to be more essay-orientated this highlights the general need for continuing investment in the provision of generic skills. As with Moffatt and Robinson (2015), we find no evidence that completing online quizzes positively affects student performance levels.
Our key result, however, is that there is evidence of a substantial difference between the impact of our 'didactic' and 'non-didactic' pre-session materials. The 'didactic' pre-session variable is statistically insignificant; in contrast, the 'non-didactic' pre-session variable is statistically significant. However, it is through a 95% confidence interval that we can more fully compare the impact of these two flipped classroom methodologies. The 'didactic' interval suggests that engagement with one of these resources will impact on the student mark by between −3.42 and 1.28 percentage points. At the higher end, this suggests a small improvement in performance. Arguably such small effects provide little support for use of flipped classroom methods. At the lower end, perhaps reflecting overreliance on such directed resources, reductions in student performance are predicted. In contrast, the 'non-didactic' interval ranges from 1.36-5.76 percentage points. At the higher end, this implies that sizable performance improvements are available. Moreover, it is noticeable that there is no crossover between the two confidence intervals. Our results are therefore supportive of the hypothesis that non-didactic methods produce superior outcomes compared to the didactic alternatives.
To further investigate these effects, we now turn to the quantile regression approach. Overall there was no substantial evidence this approach is more revealing than standard OLS. For brevity, as shown in Figure 1 above, we focus on the coefficient estimates on our pre-session resources. The results for 'didactic' pre-session broadly confirm our previous results. However, at least for some lower quantiles, there is evidence of a small negative effect. Understanding this loss in performance is difficult, but confirms that using previous lecture materials for the 'flipped classroom' environment may be counter-productive. Our 'non-didactic' pre-session outcome indicates that the effects are smaller for the higher quantiles.

Conclusions
In this paper, we set out to explore whether the flipped classroom positively impacts upon student performance in assessments, firstly through a multidisciplinary review of the existing literature and secondly through an empirical analysis. The design of our study is consistent with recent calls for attention to be paid to how students' prior performance, motivation and the specific design of pre-class and in-class activities in flipped classrooms may impact upon student assessment performance (see Abeysekera and Dawson 2015;Ngoc, De Wever, and Valcke 2017;Liu, Raker, and Lewis 2018;Taylor, 2018;Wozny, Balser, and Ives 2018). Specifically, we investigated the effects of two different approaches to the flipped classroom on student examination performance relative to the traditional lecture. Using students enrolled on an Intermediate Microeconomics module, we constructed two forms of pre-session 'flipped' materials: 'didactic' and 'non-didactic'. We hypothesised that arguing that flipping relies upon well-planned, carefully constructed pre-session learning resources and without this, students' motivation to learn will be unchanged and the pedagogical impact on student assessment outcomes would be minimal compared to a traditional lecture. We investigated the effects of these different approaches to flipping upon examination results, first by utilising an OLS regression and probit and then by adopting quantile regression following the work of Moffatt and Robinson (2015).  Our paper makes four main contributions. First, the literature search identified 1136 journal articles, review papers, books and book chapters in ISI Web of Science Core Collection concerned with the flipped classroom. 1073 of these articles (95%) have been published since the beginning of 2015 with 416 (37%) published since 2018 alone. This confirms the burgeoning interest in the flipped classroom.
Second, the review corroborated previous research which has argued there is 'inconclusive evidence of an improvement of assessment outcomes for students' relative to the traditional lecture (Zuber 2016, 97). Whilst studies, across different disciplines, have reported positive effects of flipping on student performance (see Hew and Lo 2018;Lundin et al. 2018;Maciejewski 2016;Ngoc, De Wever, and Valcke 2017;Ward, Knowlton, and Laney 2018) others found no significant effects, or even significant negative effects, on student performance when compared with lectures (for example, Hsiao et al. 2019;McCabe, Smith, and Ferreri 2017). This finding was mirrored in the disciplinary context for this study, economics education. Although few studies have evaluated the effects of the flipped classroom compared to traditional lectures, some scholars have reported small positive impacts of flipping the classroom on student performance in assessments relative to traditional lectures (see Balaban, Gilleskie, and Tran 2016;Calimeris and Sauer 2015;Caviglia-Harris 2016;Chen and Lin 2012;Ficano 2019;Olitsky and Cosgrove 2016) whilst others found non-statistically significant differences when directly compared with traditional teaching methods (for example, Brown and Liedholm 2002;Terry and Lewer 2003;Olitsky and Cosgrove 2013). Whilst there have been some excellent literature reviews and meta-analyses in recent times (e.g. Betihavas et al. 2016;Gill, Andersen, and Hilsmann 2019;Hu et al. 2018;Karabulut-Ilgu, Cherrez, and Jahren 2018;Lo, Hew, and Chen 2017;Tan, Yue, and Fu 2017;Ward, Knowlton, and Laney 2018) the majority are understandably focused within specific disciplines (see Akcayir andAkcayir 2018 andChen et al. 2018 for notable exceptions). Moreover, as highlighted in our review, the literature into the flipped classroom has burgeoned recently with 62% of the sources we identified published since the start of 2017. This presents an opportunity for further, multidisciplinary meta-analyses to explore the impact of the approaches to the flipped classroom on student assessment outcomes.
Third, with respect to the findings of our empirical analysis, it was interesting to note that on average, less than half of the online materials provided to students on the Intermediate Microeconomics module were read. This suggests lecturers must, in the first instance, encourage students to interact with the subject and the materials if they are planning to flip the classroom.
Fourth, returning to the National Union of Students (2008) breakdown into student motivations, the higher quantiles are likely to include the students looking to be stretched intellectually. It is therefore probable that the change in resources will impact on the academically motivated students less, given they are typically already committed to reading around the subject area. The higher effects on the lower quantiles is therefore consistent with changing motivation to learn on the 'grade targeting' student. There also could be positive effects from higher engagement, as the technical material that is traditionally covered in intermediate economics can be understood better as it is embedded within wider economic debates.
Finally, in relation to the impact of flipping on student examination performance, our analysis indicates assessment outcome gains from flipping the classroom cannot be taken for granted and that the design of pre-class and in-class materials is critical. In particular, if the 'flipped classroom' is constructed around 'lecture capture' based on past/older material, and the approach is didactic in the treatment of the subject area, our study suggests a positive impact on student assessment outcomes is less likely to be observed. By mimicking the standard lecture approach, the satisficing student will simply use the additional resources as a time-saving opportunity. In contrast, encouragingly we found that the 'non-didactic flipped classroom' generated small positive effects on examination performance of nearly 4%. Of course, we need to remain mindful of our limitations. This is only a one-year snapshot. To fully disentangle the effects, and offer a more detailed response as to why student performance can deteriorate, we would need to test it over several years. Ideally, we would advocate an approach closer to a natural experiment. For example, rather than offering two types of flipped materials, we would switch over in alternate years to confirm differences in outcome. In summary, it is premature to recommend introducing the flipped classroom as a mechanism for enhancing student assessment outcomes compared to the traditional lecture. As described above, there is an opportunity for further meta-analysis and longitudinal studies to explore the relationship between different forms of the flipped classroom and assessment outcomes.

Disclosure statement
No potential conflict of interest was reported by the authors.