Dialects, Cultural Identity, and Economic Exchange

We investigate whether time-persistent cultural borders impede economic exchange across regions of the same country. To measure cultural differences we evaluate, for the first time in economics, linguistic micro-data about phonological and grammatical features of German dialects. These data are taken from a unique linguistic survey conducted between 1879 and 1888 in 45,000 schools. Matching this information to 439 current German regions, we construct a dialect similarity matrix. Using a gravity analysis, we show that current cross-regional migration is positively affected by historical dialect similarity. This suggests that cultural identities formed in the past still influence economic exchange today.


Introduction
Nations are by no means monolithic linguistically-typically, there are hundreds of regional dialects within the same language. These dialects reflect the everyday experience of individuals living in different parts of the country and strongly shape their cultural identity.
Someone from Boston, say, sounds very different than someone from Texas, and if they speak to each other, they will have a good guess as to where the other is from. Some dialects are more closely related than others. For example, the Liverpool dialect ("Scouse") has many Irish and Welsh influences, but it is quite distinct from the English spoken in other parts of the United Kingdom, including the neighboring regions of Chesire and Lancashire. What is more, depending on their own regional provenance, people tend to associate certain images and stereotypes with particular dialects; as George Bernard Shaw puts it: "It is impossible for an Englishman to open his mouth without making some other Englishman hate or despise him" (Pygmalion, 1916). Similar phenomena exist in many other languages, but the economic consequences of dialect differences are poorly understood.
In this paper we investigate whether dialect differences across regions of the same country pose barriers to economic exchange. We evaluate, for the first time in the economics literature, detailed linguistic micro-data about the intra-national variation of phonological and grammatical attributes. We then analyze the effect of dialect similarity on gross regional migration flows in a gravity analysis.
Specifically, we study the case of German, which, from a linguistic point of view, is one of the best documented languages worldwide. The data on dialects are taken from a unique language survey conducted by the linguist Georg Wenker between 1879 and 1888. By the order of the just established German Empire, Wenker collected detailed data about the language characteristics of pupils from about 45,000 schools across the Empire during a period when dialect use was common and a standardized national language had not yet become prevalent. 1 Based on these data, we construct a dialect similarity matrix between 439 German districts, the current NUTS3 regions (Landkreise). The characterization of each district's dialect is based on 383 linguistic features having to do with the pronunciation of consonants and vowels as well as with grammar. We then analyze pair-wise gross migration 1 To this day, the Wenker survey is the most complete documentation ever of a nation's language and has defined standards in the linguistics discipline (for a detailed introduction, see Lameli 2008). A "language" can be defined as a symbolic representation of social groups with an official status, such as nations. Languages can be subdivided into related variants. If such variants depend on their geographical distribution we refer to them as "dialects." There are also variants without geographical relevance ("styles"), which we do not discuss here. See Crystal (1987) for a detailed discussion of these linguistic concepts. flows across German districts over the period 2000-2006. Our central result is that current regional migration is significantly positively affected by similarity of the dialects prevalent in the source and destination areas in the late 19 th century. This result remains robust even after controlling for physical distance and travel time across regions and for origin and destination fixed effects, as well as for a host of region-pair-specific characteristics.
How should this finding be interpreted? First of all, it should be noted that the local dialects as recorded in the 19 th century were clearly shaped by past (i.e., pre-19 th century) interactions, including prior mass migration waves, religious and political divisions, ancient routes and transportation networks, and so forth. Almost like a genome, language acts as a sort of memory that stores such information, a point made by anthropologists such as Cavalli-Sforza (2000), who stresses the close resemblance between linguistic and genetic evolution. Phonological and grammatical variations across space are thus by no means random; they are imprints from the past. 2 Why does an individual who decides to migrate today-all else equal-prefer destinations with a dialect similar to that found in the source region more than 120 years ago? We argue that the likely interpretation is that cultural differences at the regional level are persistent over time and have long-lasting causal effects on economic behavior, such as migration decisions. Individuals seem to dislike moving to culturally unfamiliar environments, and the perception of today's cultural differences between German regions can be well measured by such historical dialect differences.
Using different empirical strategies, we argue that our main finding is unlikely to be due to a persistence of cross-regional migration flows that in turn led to dialect assimilation.
Furthermore, we show that the effect of dialect similarity is not confounded with other types of region-pair-specific cultural congruencies, like a common religious or political history. Of course, we cannot capture a causal effect of language, in the sense of asking a question such as: What is the effect of historical dialect similarity on current migration that does not reflect any other persistent cultural difference across regions? Indeed, we argue in this paper that dialects are a good comprehensive measure of regional cultural identity that goes beyond capturing single influences like religious or political divisions, but that also includes many more otherwise immeasurable domains. Hence, our empirical results may answer the broader question: How much is current economic exchange across regions impeded by persistent intangible cultural borders?
Related literature: There is an extensive literature arguing that language commonalities are essential in saving transaction costs. For example, Lazear (1999) develops a model of a multi-lingual society where individuals can conduct economic transactions only when they speak a common language. The focus of our paper is different because we study historical spatial variation of the same language, rather than the current coexistence of domestic and foreign languages within one country. 3 Our finding that even small dialect differences matter for internal migration decisions is therefore unlikely to be caused by a transaction cost mechanism similar to that in Lazear's (1999) model. Dialect differences matter, not because people would be unable to communicate in different regions, but because they seem to have a preference for living in culturally familiar environments.
This insight is consistent with previous research on the effects of cultural similarity between different countries. For example, Guiso et al. (2009) show that common cultural and linguistic roots enhance trust between countries, which in turn boosts international trade and investment. 4 Our analysis adds to this literature by showing that intangible borders that impede economic exchange also exist within nations and thus on a much finer geographical scale. Our study is also related to a few recent contributions that consider the economic effects of genetic differences across countries. Spolaore and Wacziarg (2009) find a positive relationship with differences in current income, as populations more closely genetically related are more apt to learn from each other, and Desmet et al. (2009) show that countries with more distant gene profiles exhibit stronger cultural differences. These papers thus emphasize that groups that are more closely related genetically tend to have closer economic contacts. We obtain a consistent result for linguistically related groups, even on a more finely disaggregated geographical level. Below, we provide some further discussion about the relationship between genetic and cultural differences across populations.
The remainder of this paper is organized as follows. In section 2 we describe our linguistic data and discuss in greater detail the meaning of dialects, especially in the historical context of our study. Section 3 sets out a simple gravity model for current migration flows that 3 Other important contributions to the literature on multi-lingual countries include Alesina and La Ferrara (2005), who study the effects of the diversity of foreign languages and ethnicities on the economic performance of the host country. Melitz (2008) provides a detailed gravity analysis on the effects of language commonalities on cross-country trade flows by distinguishing different modes of communication, whereas Rauch (1999) and Rauch and Trindade (2002) show that immigrant networks help overcome communication barriers when the host country trades with the immigrants' native country. 4 Numerous studies show that individuals exchange and cooperate more the more they trust each other. See, among others, Glaeser et al. (2002), Knack andKeefer (1997), andWatson (1999).
serves as the underlying framework for the empirical analysis. Section 4 presents our estimation results. Section 5 concludes.

Historical background and the measurement of linguistic characteristics
In the centuries following Charlemagne, France, Spain, England, and Habsburg Austria developed into states where power was wielded by a centralized sovereign. In contrast, the The Wenker data: Between 1879 and 1888, Wenker asked teachers and pupils in more than 45,000 schools to translate 40 German sentences into their local dialect. These sentences were especially designed to reveal specific dialect characteristics. The survey covered the entire area of the German Empire and revealed pronounced differentiation of local language variants, since at that time (more so than today) dialects were the people's common everyday speech.
Wenker's surviving material contains millions of phonological and grammatical observations in the form of handwritten protocols of the language characteristics recorded in the individual schools (see Figure 1a for an example). These raw data were integrated by Wenker and collaborators into a linguistic atlas of the German Empire (Sprachatlas des Deutschen Reichs). The Sprachatlas was developed between 1889 and 1923 and contains more than 1,600 hand-drawn maps showing the detailed geographical distribution of particular language characteristics across the German Empire (see Figure 1b for an example). In an evaluation process that spanned several decades, Ferdinand Wrede, one of Wenker's collaborators, determined the prototypical characteristics most relevant for the  Sprachatlas do not conform to this classification system. We therefore use GIS (Geographical Information System) technology to juxtapose digitized versions of these linguistic maps and the map of the current administrative districts. We then quantify the dialect of each district in the form of binary variables.
The following example illustrates this approach. One of the linguistic attributes is the German word for pound. Depending on the dialect, it is pronounced as "Pfund," "Pund," or "Fund." The corresponding map in the Sprachatlas shows the variant "Fund" mostly in the eastern parts of Germany, "Pund" mostly in the northern areas, and "Pfund" mostly in the southern parts. These variants are then transferred into a binary coding of the type: "Fund" = {1 0 0}; "Pund" = {0 1 0}; "Pfund' = {0 0 1}. Comparing the individual linguistic map for the word pound and the current administrative map of Germany, we assign one of these codes to each of the 439 districts. This approach is unambiguous when there is no intraregional variation of this particular language characteristic, i.e., when the entire area of some district r exhibited the same pronunciation according to the map in the Sprachatlas.
Typically this has been the case. However, the spatial distribution of this particular language attribute and the current boundaries of the districts are not in all cases perfectly coincident. If we found intra-regional variation of pronunciation, we then chose the most frequent variant within the district as representative. The entire matching procedure was accompanied by several linguistic plausibility tests and cross-checks with the underlying raw data on the phonetic protocols from the Wenker survey.
Repeating this procedure for all 66 language characteristics, we end up with K=383 binary variables representing the dialect that was spoken in the area of a district in the late 19 th century. More formally, the historical dialect of the current district r is represented by a 5 Wrede combined local extractions of variants to a dialect classification (see Wrede et al. 1927Wrede et al. -1956. One advantage of this classification over more recent categorizations of the Wenker data (e.g., Wiesinger 1983b) is that it lends itself quite easily to a mathematical representation of dialects (see below

What does dialect similarity capture?
In this subsection we discuss some examples suggesting that the geography of dialect similarity as recorded in the 19 th century is far from random, but instead reflects long-term evolutionary processes of region-pair-specific congruencies and past (i.e., pre-19 th century) interactions.
Before turning to these examples, it is worth pointing out that anthropologists have long been aware of the coherence between genetic, cultural, and linguistic evolution. As a thought experiment, albeit an extreme one, consider a number of initially identical populations that became separated from each other at a certain point in time and have henceforth no contact with each other. The genetic profile of each isolated population evolves over time as a result of mutation, natural selection, and genetic drift, and the DNA profiles of any two groups are likely to drift apart due to the random elements of evolution. As forcefully argued in Cavalli-Sforza (2000), the same phenomenon is likely to occur in regard to cultures and languages.
Isolated populations, even if initially identical, develop idiosyncratic habits and expressions.
After the passage of a certain amount of time, it would be difficult for members of two initially identical groups to even understand each other if they had the chance to meet. In fact, linguistic evolution would be much faster and more drastic than genetic evolution, i.e., language differences across groups would become visible earlier and be clearer than DNA differences in this hypothetical scenario. Next, imagine that our now differentiated populations initiate cross-border contact. This exchange, which may occur through migration, is one major force behind diffusion. The more intensively two populations 7 As a robustness check we also calculated two different similarity indices. First, Jaccard's (1901) similarity index is computed as follows: Given the two vectors i r and i s of length K, let M 11 be the number of vector columns where both i r and i s have the value 1, M 10 the number of cases where i r has a 1 and i s has a 0, M 01 the number of cases where i r has a 0 and i s has a 1, and M 00 the number of cases where both vectors have a 0. The Jaccard similarity index is then defined as M 11 /(M 11 +M 10 M 01 ). Second, Kulczynski's (1927)  interact, the more diffusion occurs and the more similar these groups will once more become. Linguistic and cultural diffusion (adaption of words, habits, etc.) would again be faster and more intensive than genetic diffusion, but it would still occur slowly.
In short, as already noted by Charles Darwin himself, both genes and languages are the product of evolution and are persistent over time. 8 In this paper we characterize long-term differences between local German populations by using comprehensive linguistic data.
Comparable genetic data on the DNA profiles of local populations are not available to the best of our knowledge, but Darwin's argument suggests that if such data did exist, one would probably find a strong correlation between genetic and linguistic differences across regions.
We now turn to some specific examples of linguistic evolution in Germany.

Religion:
The map on the left in Figure 2 illustrates the regional similarities to the dialect spoken in Waldshut, a district located in the southwest of Germany (Baden-Württemberg).
The reference point Waldshut is marked. Warm colors indicate a high, and cold colors a low, degree of similarity with the dialect in Waldshut. The map on the right in Figure 2 zooms in on Baden-Württemberg and compares the spatial pattern of dialect similarity with the religious geography of that area.
As is well known, the Reformation of the 16 th century resulted in distinct Protestant and Catholic localities in Germany (see also Becker and Woessmann 2009 The main message conveyed by Figure 2, however, is that the geography of dialect similarity is strikingly similar to religious geography. Waldshut itself was and always remained  Barbujani et al. (1996), Dupanloup de Ceuninck et al. (2000), and Manni (in press). 9 This stability is even more remarkable in light of the fact that it was not until after the Peace of Westphalia (1648) that a newly-converted ruler became prohibited from forcing his new religion on his subjects, which had been common practice ever since the Peace of Augsburg in 1555 (see Cantoni 2009). Other factors apart from social practice that might have a stabilizing effect on religious orientation include natural boundaries such as the Black Forest or the Rhine, or national and administrative borders, in this case the border of the archbishopric Freiburg.

Figure 2: Distribution of Religious Denomination in Southern Germany
Notes: Similarity of all districts to the reference point Waldshut (marked). Red indicates highest familiarity and yellow indicates higher familiarity, while the green and blue indicate less familiarity. Data on religious denomination are taken from Steger et al. (1989).
Catholic, and it can be seen that the dialects of other Catholic districts resemble the one in Waldshut more closely than do the dialects of Protestant districts. This finding aligns itself nicely with the discussion on linguistic evolution. Catholic localities are in closer contact with other Catholic localities; Protestants are more in contact with Protestants. Hence, religious and linguistic similarities co-evolve, and they do so until today (Stoeckle, in press).

Mass migrations:
Language is also reflective of previous migration waves. To illustrate this point, let us consider the example of the Goslar district. The map in Figure 3 illustrates the dialect similarity between Goslar (white) and all other German districts.
Linguists view the Harz Mountains in Goslar as a language enclave in the sense that the dialect spoken there is not similar to dialects spoken in neighboring districts but instead more resembles a dialect spoken about 300 kilometers away in the mountainous Erzgebirge, where, in Figure 3, we find an accumulation of warm colors (indicating high similarity). The historical explanation for this phenomenon is the revival of silver mining in the Goslar area between 1520 and 1620, motivating migration to that area by starving miners in Saxony.
This 16 th -century relationship between the two regions is still visible in dialect data from the late 19 th century (also see Wiesinger 1983a), which illustrates the degree of inertia inherent to evolutionary processes.
An important aspect of pre-modern migration is that it was nearly always a social or mass phenomenon, and thus much different from current migration, which is strongly based in individual economic motives. With very few exceptions, these mass migrations in Germany ended during the 18 th century (Wiesinger 1983a). Therefore, at the time Wenker conducted his language survey (1879-1888), roughly one and a half centuries had elapsed without such major perturbations. 10 The local cultures and dialects had thus some time to harden.
Distance: Geographical distance certainly plays a role in dialect similarity. As seen in Figure 2, the districts adjacent to Waldshut tended to have similar dialects. However, we also find districts relatively close to Waldshut that are less similar than districts that are farther away. This suggests that our dialect data contain information that goes beyond what can be explained by mere physical distance, a point made clearly by the Goslar example (Figure 3), where there is virtually no relationship between geographical distance and dialect similarity. 10 The last incident known to us that can be classified, albeit rather broadly, as a mass migration occurred between 1749 and 1832. Initially, a rather small community of people from the Palatinate decided to immigrate to America, but ended up as settlers in a region near the city of Kleve. The reason for migrating was hunger caused by a poor harvest. Once settled in that area, other families from the Palatinate followed.

Notes: Similarity of all districts to the reference point Goslar (white spot). Red indicates highest familiarity and warmer tints (yellow and green) indicate higher familiarity, while the bluish tints indicate less familiarity.
Dialect similarity could, however, still reflect the existence of old trading routes, which, by taking advantage of rivers, natural passages, and forts, historically led to more contact between certain regions. And, indeed, the importance of transport routes for the spatial structuring of language attributes is made evident by the example of the so-called Rheinstaffel. Klausmann (1990) notes a difference in linguistic development depending on the topological relation of individual locations to the Rhine river, i.e., dialect similarity may also be influenced by ancient transportation networks.
Historical borders: At the time Wenker collected the data, the German Empire had just been created out of formerly independent territories. These territories had previously been in existence for centuries, and thereby also contributed to linguistic evolution. In fact, dialectologists since the 19 th century were aware of the congruencies between the areal distribution of historical territories and language (see Haag 1898; Aubin et al. 1926; and, more recently, Barbour and Stevenson 1990). One reason for this persistence may be that the territories tended to encourage internal traffic, and discourage, or at least not improve the means for, travel external to their borders. Hence, communication and exchange between territories was somewhat hindered (Bach 1950:81). From an evolutionary perspective, such limitations can lead to a higher degree of dialect similarities among regions that formerly belonged to the same historic territory.
Taken together, these examples suggest that dialect similarity between regions is higher the more intensive was their interaction and exchange in the course of history. The various influences that have been discussed, such as common religious and historical political borders, distance and the influences of ancient transportation networks, as well as unique historical events and previous migration waves, all left some long-lasting imprints on the local dialects. Dialect similarities between regions are correlated with these other types of regional congruency, but are likely to capture other (and less well measurable) aspects of cultural similarity and emotions (see Schifferle 1990). The dialects should therefore be interpreted more broadly as comprehensive measures of local cultural identity.
Culture, of course, is not restricted to language, but occurs in many other domains such as art, traditions, habits, etc. However, regional differences within these cultural domains are likely to be reflected in dialect differences, as cultural and linguistic evolution proceeds in parallel. Put differently, as argued in the sociology literature by Brewer (1991) and in the linguistics literature by Chambers and Trudgill (1998), language is the strongest marker of cultural identity. It has the added advantage of being an overt one; people can disguise their true norms and values, but not their regional dialect, which is formed during early childhood and is enormously difficult to suppress. Finally, dialects are relatively easily measurable using linguistic techniques.

A gravity model of current regional migration
The main aim of this paper is to investigate to what extent historical dialect differences affect current bilateral economic exchange. Specifically, we investigate the effects on current cross-regional migration flows. To this aim, we derive a theoretically grounded gravity equation for migration flows in this section, which serves as the underlying framework for our empirical analysis.
Gravity equations are a standard tool for analyzing trade flows across regions or countries (see, e.g., Anderson and van Wincoop 2003), but the conceptual idea behind gravity can be applied to migration flows as well. 11 There are two main reasons why we focus on current migration rather than on current trade (or other cross-regional flows) as the outcome variable. The first issue is data availability. While there are accurate and highly disaggregated current regional migration data for Germany, there is no information at the regional level about commodity flows, goods or service trade, or financial flows. Second, while trade flows would certainly be an interesting region-pair-specific outcome variable for studying the effects of intangible cultural borders, we believe that migration flows are at least equally well suited for this purpose. Individuals do not migrate very often during a lifetime, even at the regional level. 12 Hence, moving from one region to another is a substantial act, and cultural biases may influence such a decision even more strongly than, say, they would the decision to trade goods with someone from a different region.

Current regional migration data
We use data on pair-wise gross migration flows for the 439 German districts averaged over the period 2000-2006 as provided by the German Federal Statistical Office. 13 11 In fact, gravity was applied to migration flows even before it was used to investigate trade flows. The earliest reference is Ravenstein (1885). Other important contributions include Schwartz (1973) and Greenwood (1975). 12 Using Japanese data at the prefecture level from 1954-2005, Nakayima and Tabuchi (2008) report that individuals in Japan move on average only 2.3 times during their lifetime. 13 In Germany, every person who changes his or her place of residence is legally required to register at the new residence within at most two weeks (even earlier in some states). The migration data are thus very accurate. Table 1 provides an overview of these data and points out two basic facts about internal migration flows in Germany. First, across all regional pairs, there has been some gross migration in more than 96% of all cases. That is, migration occurs not only from economically poor to rich regions, but also in the other direction. 14 This suggests that individuals are heterogeneous in their perceptions of different regional characteristics when making location decisions. Second, Table 1 indicates that migration flows in Germany are rather small. The average annual gross migration flow between a pair of regions was seven migrants per 100,000 inhabitants in the district of origin, which implies a total gross emigration rate of only 3% for the typical German district. This low number suggests that the costs of cross-regional migration are substantial. In particular, these migration costs are distance dependent as the data clearly indicate larger flows over short than over long distances. The simple gravity equation accounts for both these basic facts of internal migration: it features two-way gross flows (which can be larger than net flows), and it takes into account that individuals are heterogeneous and face distance-dependent mobility costs should they decide to move.

The model
Our gravity equation for gross migration flows is derived from a simplified version of the economic geography models with locational taste heterogeneity by Murata (2003) and Tabuchi and Thisse (2002). Consider a country that consists of The variable u r stands for the economic level of well-being in region r. This includes the local wage level, unemployment rate, price level, etc. This economic level of well-being is the same for all individuals in a region. For our purposes it suffices to think of u r as being exogenously given. That is, we abstract from market interactions and assume for the sake of simplicity that the regional levels of economic well-being do not respond to the location decisions of the workers. The term h r ε in Equation (1) is an idiosyncratic term for individual 14 The presence of two-way gross migration flows is not easily reconciled with standard models of regional labor mobility (e.g., Krugman 1991) that predict only one-way migration flows. Furthermore, there is a large literature on net internal migration flows (e.g., Pissarides and McMaster 1990) showing that net flows tend to be directed toward areas that offer good job prospects, high wages, low unemployment rates, etc. h and region r capturing his or her perception of the attributes and characteristics associated with that particular region.
As shown in Anderson et al. (1992:ch. 3), this type of individual taste heterogeneity can be modeled such that the actual matching value between a worker and region is the realization of a random variable. We follow this modeling strategy and assume that h r ε is distributed i.i.d. across individuals and regions. Furthermore, we adopt the standard parameterization of a double exponential distribution, The larger β, the more heterogeneous are the individual attachments to the regions. If β → 0, The mobility costs are region-pair-specific. We not only include standard pecuniary mobility costs (for moving furniture, finding accommodation, etc.), which are denoted by rs d and will be approximated by physical distances or travel time across regions. We also incorporate, in the spirit of Sjaastad (1962), non-pecuniary costs of migration at the regionpair level, denoted rs l , which capture the psychic costs of moving to a culturally unfamiliar environment. In the empirical analysis, we measure cultural mobility costs by the historical dialect similarity. We assume the following specification: [ ] where we add a standard error term rs e . Notice that 15 Such a specification is standard practice in the gravity literature in international trade. The fixed effects capture all impact variables that vary only at the regional level in our cross-sectional analysis, such as wages and housing prices, as well as time-invariant unobservable regional features. This fixed effects specification also takes into account the problem of interdependent flows in a multi-region economy (Anderson and van Wincoop 2003). As shown by Feenstra (2004) in the context of trade flow analysis, this fixed effects specification allows for a consistent estimation of region-pair-specific impacts such as mobility costs.

Baseline results
We estimate the gravity equation (Equation (3)) by ordinary least squares with origin and destination fixed effects. Table 2  The results show that dialect similarity has a positive and highly statistically significant effect on gross regional migration flows. When including only dialect similarity without controlling for geographical distance, as in specification 1, we find a sizable (scaled) elasticity with a value around 2.2. That is, doubling the historical dialect similarity between two districts, all else equal, would lead to an increase of the gross migration flows between those regions by more than 220%. This specification thus indicates that there are sizable cultural mobility costs that impede internal migration in Germany. The results are similar for working-age migration (see panel b).
Geographical distance: As illustrated by the examples in Section 2, dialect similarity is correlated with geographical distance, which per se is likely to have a negative impact on migration flows. To address this issue we first separately study the impact of geographical distance without considering dialect similarity. In specification 2 we use the linear physical distance between the centers of the source and the destination district as our proxy for pecuniary mobility costs. The results show that doubling the physical distance between two regions, all else equal, drives down gross migration flows by roughly 140-150%. In specification 3 we use an alternative distance measure, namely, the travel time by car between any pair of regions (in minutes), which may better capture the true regional accessibility. The results indicate that the elasticity with respect to travel time (176-178%) is a bit larger than for physical distance, which is intuitive as the latter might not always match the shortest travel distance due to natural barriers like rivers or mountains. When including both measures at the same time (as in specification 4), it turns out that most of the negative impact is captured by physical distances, with travel time having some small additional impact. Altogether, these findings on the detrimental effect of geographical distance on migration flows are consistent with the previous literature on internal migration (see, e.g., Greenwood 1975).    (6) and (7). Geographical distance, travel time, and dialect similarity are in logs in all specifications. Robust standard errors are reported in parentheses. *** statistically significant at the 1% level; ** statistically significant at the 5% level; * statistically significant at the 10% level.   (6) and (7). Geographical distance, travel time, and dialect similarity are in logs in all specifications. Robust standard errors are reported in parentheses. *** statistically significant at the 1% level; ** statistically significant at the 5% level; * statistically significant at the 10% level.

Dialect similarity and geographical distance:
The important question is whether the positive effect of dialect similarity on migration flows prevails once we control for geographical distance. In specification 5 we simultaneously include dialect similarity and both proxies of pecuniary mobility costs. As can be seen, the coefficient 2 α drops substantially compared to column 1, which is due to the correlation of linguistic and geographical distance. However, even conditional on geographical distance (and origin and destination fixed effects), we find a positive and highly significant effect of dialect similarity on gross migration flows. 16 The estimated elasticity ranges between 18% and 20% and is similar for total and for working-age migration. This elasticity in column 5 of Table 2 is the benchmark result of our empirical analysis. 17 Heteroskedasticity and zero flows: Columns 6 and 7 of Table 2  in the estimation of gravity equations (see Disdier and Head 2008;Helpman et al. 2008). As shown in Section 3, zero gross migration flows across German districts account for less than 4% of all cases and therefore would appear to be a minor issue. Nevertheless, we tackle this potential problem by employing a two-stage Heckman procedure that uses a non-linear probit equation for selection into migration in the first stage, and then estimates Equation (3) in the second stage. 18 In the PPML estimation (see column 6), the elasticity with respect to dialect similarity is around 11% and thus somewhat lower than in the benchmark specification. The two-step Heckman selection model (column 7) yields estimates that are 16 In the literature on how genetic similarities affect international trade flows, Giuliano et al. (2006) argue that there may actually be no such effects once transport costs across countries are properly controlled for. Our estimation in column 5 takes such issues into account because actual travel time across regions can be thought of as an analogue of actual transport costs for goods. 17 Crossing the border of a federal state (Bundesland, NUTS1-region) may systematically increase pecuniary mobility costs, e.g., because of different regulations and laws applicable to various occupational groups. It is, for instance, more difficult for teachers or lawyers to change jobs across than within a state. To take this issue into account, we also considered a specification with a dummy variable that equals unity if the source and the destination region are not located within the same state. Results show that state borders significantly reduce gross migration flows. The effect of historical dialect similarity hardly changes, however. 18 We thus rely on the normality assumption for identification of our second-stage estimates.
very similar to the benchmark. All in all, these specifications confirm the positive and significant effect of historical dialect similarity on current bilateral migration flows across German regions. Table 2 additionally shows that our results are also robust with respect to the linguistic similarity index. We replace the simple count index with the similarity index by Jaccard (1901) in column 8, and with the similarity index by Kulczynksi (1927) in column 9, while returning to ordinary least squares estimation. 19 Regardless of which similarity index we use, our results are very similar to the benchmark specification.

Linguistic similarity index:
Effect heterogeneity: In Table 3 we investigate the effect of dialect similarity on migration flows for different types of regional pairs, where local populations may vary systematically in their view of cultural differences. In particular, we divide the 439 German districts into 178 urban and 261 peripheral regions. Since we can observe two-way gross migration flows for each pair of regions, we can create four categories of flows: urban-to-urban (U-U), peripheral-to-peripheral (P-P), urban-to-peripheral (U-P), and peripheral-to-urban (P-U). We then estimate Equation (3) separately for each sample.
Notice that the U-U and P-P samples consist of more homogeneous pairs of regions than the U-P and P-U samples. These four different samples thus permit us to investigate whether the impact of dialect (cultural) similarity on migration decisions is dependent on whether the source and the destination area are heterogeneous or homogeneous, and the distinction of urban and peripheral regions seems to be the most natural division to capture this type of effect heterogeneity. The results in Table 3 suggest that the impact of dialect similarity on migration is rather similar in all cases. It is a bit lower for the P-P group, but we consistently find a positive and significant impact of cultural similarity for all types of cross-regional migration flows. 20 Cultural differences therefore seem to affect all types of migration decisions in a similar way.

Discussion:
The results reported in Table 2 and 3 imply that an individual who decides to migrate today, all else equal, will prefer a destination characterized by a dialect similar to the  (2) we consider migration flows where the origin and destination are both "peripheral" regions. In column (3) we consider urban-to-peripheral, and in column (4) we consider peripheral-to-urban migration flows. "Urban" regions are defined as regional types 1-5 in the classification system of the German Federal Board for Regional Planning (BBR). "Peripheral" areas are defined as regional types 6-9.Robust standard errors are reported in parentheses. *** statistically significant at the 1% level; ** statistically significant at the 5% level; * statistically significant at the 10% level.
one prevalent in his or her source region more than 120 years ago. How to interpret this finding? We argue that these results point at significant cultural mobility costs, which impede internal migration flows in Germany. That is, our empirical findings indicate that individuals dislike moving to culturally unfamiliar environments, and current cultural differences between German regions can be well measured by historical dialect differences.
This interpretation rests on two important conditions. First, it requires that dialect differences are a good measure for cultural differences across regions that are persistent over time.
Second, it supposes a causal effect of dialect (cultural) differences on migration, rather than a persistence of migration flows that has affected the geography of dialects. We now turn to several robustness checks that specifically address these estimation concerns and shed light on the economic interpretation of our results.

Omitted region-pair-specific and region-specific characteristics
With respect to the first estimation issue, it should be noted that time persistence of dialect differences per se seems to be a very reasonable supposition. Certainly, there has been some linguistic diffusion during the 20 th century, and dialect use is less common today than it was when Georg Wenker collected the linguistic data. One factor behind this diffusion is the migration that has occurred since that time. During the 20 th century, migration became an increasingly individual phenomenon, and even if the migration of individuals does not cause perturbations as major as those that resulted from the mass migrations of earlier times, it still contributes at least something to the local language mix. The ubiquity of modern mass media may be another factor that has facilitated linguistic diffusion. However, even if these developments led to some assimilation across regions, they have certainly not completely nullified local dialect differences. 21 It is therefore not surprising that linguists frequently note a close correspondence between current and historical dialect characteristics in Germany (see, e.g., Bellmann 1985:213). What is more, dialect differences today may be absolutely smaller than they were in the 19 th century, but the diffusion processes described above are not markedly region-pair-specific. That is, the relative linguistic differences across regions are particularly likely to have endured. 21 Although cultural evolution progresses faster than genetic evolution, a period of 120 years is still much too short to erase all regional cultural differences given the enormous degree of inertia inherent in evolutionary processes. Recall the Waldshut example from Section 2, which illustrated the stability of religious orientation over the period 1546-1820. If one were to draw a map of the religious geography of that area today, one would find a spatial pattern that is still strikingly similar to the one from 1546.
However, even if dialect differences are persistent over time, their impact may still be confounded with the effects of other persistent, but omitted, factors that drive contemporary migration and that are also correlated with historical dialect patterns. In that case our estimations would suffer from an omitted variable bias. Notice that our estimate for the dialect similarity elasticity should still be consistent as long as omitted variables are purely region-specific, as the fixed effects should take into account all persistent factors for the source and the destination area. A problem would clearly arise, however, if we omit relevant region-pair-specific variables. We therefore introduce additional region-pair-specific control variables in order to address this estimation concern.
Region-pair-specific control variables: We argued in Section 2 that dialect similarity reveals a spatial pattern that often corresponds to other types of historically determined congruencies between the regions, including religious orientation as illustrated by the Waldshut case. Another possible confounding factor is former administrative borders, since we emphasized above that the geography of dialect similarity is also correlated with the borders between the territories out of which the German Empire was created (as noted, e.g., by Barbour and Stevenson 1990). Dialect differences may thus simply capture the persistent effects these regional differences have on current migration flows.
To address this possibility, we control for differences in religious denominations in 1890, roughly the same time at which the linguistic data were collected. We define a dummy variable that equals unity if the majority of the population in the source region had a different religion than those in the destination region in the late 19 th century. Furthermore, we include a dummy that equals unity if the current migration flow extends across a historical administrative border. More specifically, we consider the borders of 38 member states and 4 independent cities that were part of the German Confederation at the time of its foundation in 1815. These borders are a good representation of the politically fragmented environment that prevailed until the German Empire was established.
If cultural differences between current German regions are manifested mainly along those religious and political lines, and if dialects simply pick up these persistent effects, we would expect the elasticity of migration with respect to dialect similarity to turn insignificant (or at least to drop substantially) once we include these additional control variables.

Results and discussion:
In columns 1 and 2 of Table 4 we control for the new variables separately; they are considered jointly in column 3. The results suggest that there is  Notes: This table reports OLS results with fixed effects for both origin region r and target region s. In columns (1) and (3) we control for differences in religious denominations in 1890 by including a dummy variable that equals unity if the majority of the population in the source region had a different religion than those in the destination region. In columns (2) and (3) we include a dummy that equals unity if the current migration flow extends across a historical administrative border between 38 member states and 4 independent cities that were part of the German Confederation at the time of its foundation in 1815. Robust standard errors are reported in parentheses. *** statistically significant at the 1% level; ** statistically significant at the 5% level; * statistically significant at the 10% level.
significantly more current migration between regions with historically different religious denominations, while historical administrative borders exert a negative impact on current migration flows. The main insight of Table 4, however, is that the effect of historical dialect similarity hardly changes. These results underline our previous argument that dialect similarity is a well-suited comprehensive measure of regional cultural similarity. Our linguistic measure does not merely reflect obvious religious or political congruencies that are correlated with the geography of dialects, but seems to capture many more dimensions of cultural similarity across German regions. 22 Thus, although we can never be sure that we have ruled out all possible omitted variables at the region-pair level, our empirical approach seems to come as close as possible to correctly measuring persistent cultural differences across German regions.

Persistence of migration flows
Turning now to the second estimation concern discussed in Section 4.1, the question remains whether we can interpret our main finding as a causal effect of cultural similarity on internal migration. Even though our estimation certainly does not suffer from a simultaneity problem, due to the long time lag between the dialect and the contemporary migration data, there is still the concern that migration flows may be persistent over time and have, inter alia, shaped the geography of dialects.

Network effects and social interactions:
One intuition for such a persistence can be network effects and social interactions in migration. 23 In a long-run dynamic perspective, social interactions may result in a clustering of migrants from the same source region at the same destination region. Suppose that at the time Georg Wenker collected the linguistic data (in the late 19 th century) there was already a previously established migration connection between particular pairs of regions. Say, families in some region r can draw on an already existing network of social contacts in some other region s, as well as vice versa, and these 22 The other time-persistent factors may influence today's regional migration via other channels than cultural identity. In particular, the positive effect of religious differences on migration may capture an enduring prosperity difference between Catholic and Protestant areas, which was recognized early on by Max Weber and studied further by Becker and Woessmann (2009). Moreover, we find that the historical border dummy turns insignificant when we add current administrative borders in the same way as described in footnote 17. This suggests that current and historical borders overlap, so that the historical borders partly capture the negative impact of Federal State borders on migration that operates via an increase in pecuniary mobility costs. 23 Network effects in migration are extensively studied both theoretically (Carrington et al. 1996) andempirically (Munshi 2003;McKenzie and Rapoport 2007;Woodruff and Tenteno 2007;Chen et al. 2010).
network effects constantly influence migration decisions. This would lead to a correlation of current region-pair-specific migration flows with the flows from 120 years ago and, in turn, even with flows from earlier times. If this is so, the prediction would be that dialect distance slowly disappears between the source and destination regions experiencing high migration exchange. Dialect similarity would then not actually cause contemporary migration, but persistent migration would lead to dialect assimilation. Our estimations would then capture a spurious correlation.
To answer the question of whether the positive effect of historical dialect similarity on current migration flows can be attributed to persistent cultural differences rather than persistent migration flows, we can turn to a quasi-natural experiment in German history. Put differently, if our baseline findings only reflect the persistence of migration flows, we would expect to find no (or at least substantially lower) effects of dialect similarity on contemporary migration flows within a subsample of migration flows across the inner German border only. By contrast, if we still find a positive effect of dialect similarity on contemporary migration flows for these cases, such would suggest that cultural identity at the regional level really is persistent over time and actually does affect migration decisions. Table 5 shows the results for the East-West and the West-East subsamples and, in fact, the coefficient of language similarity is still significantly positive and of similar magnitude as in the benchmark specification. These results are thus much more in line with a persistent causal effect of cultural similarity on migration flows, rather than with the opposite causality of persistence in migration flows.
Geological regional features and persistence over the very long run: In the last step of the analysis, we investigate another possible source of persistence in migration flows that may have caused the geography of dialects. Specifically, there may be deep regional differences that have persistently driven migration flows over the course of history and, thereby, also linguistic development. In particular, think of first-nature geographical features which have determined the economic prosperity of the regions over the very long run.
Salient candidates are indicators of a region's suitability for agriculture and forestry, all of which were major sources of wealth before the Industrial Revolution. As argued by Combes et al. (in press), soil characteristics can be regarded as a major determinant of local labor demand in an agrarian society. Accordingly, geological indicators for the suitability of the soil for agriculture and forestry should provide a meaningful insight into the distribution of regional wealth before the heyday of industrialization. These soil characteristics should then be related to ancient migration patterns. As regions with good soil tended to be economically prosperous, they were likely to attract mass migration waves, particularly from areas with bad soil characteristics. A similar point can be made for the slope of a region, which is also likely to have influenced agricultural productivity, hence regional prosperity, in former times. Slope may have had another effect on ancient migration patterns -transport routes probably avoided large differences in steepness or ruggedness.
If these very basic geological factors have affected migration waves over the very long run, they could also have influenced the spatial pattern of dialects in Germany. Specifically, the smaller the difference in soil quality and the larger the slope difference between two regions, the lower the probability that local populations interacted very often. This, in turn, may have resulted in less similar dialects between such regions. To the extent that these geological features still affect current regional migration, our estimations may be capturing a spurious correlation between dialect similarity and migration flows.
As argued in Section 4.2, the fixed effects specification of the gravity model should, in principle, take into account this potential problem. Consider a region with very favorable geographical features. The resulting pull effects on migration into that region, which have persistently occurred across time and may still occur today, should be captured in the estimation: The fixed effects should level all actual differences in economic prosperity between the origin and the destination, regardless of whether these differences have their origin in history or are the result of current developments. However, to complement this approach, we again create different subsamples of regions that limit the degree of heterogeneity of the respective source and destination areas. For pairs of regions with similar soil and slope characteristics, we may expect very long-run push and pull effects to matter relatively little. This may have led to few cross-regional contacts and therefore to little dialect assimilation over the very long run. In other words, if we find that dialect similarity matters for current migration also for these homogeneous pairs of regions, then a long-run persistence of migration flows is unlikely to be reason. Such a finding would rather suggest that we actually capture a causal effect of cultural similarity on migration decisions.
To address this issue, we sort regions into those with "good" soil and those with "bad" soil.
Good soil is suitable and imposes no limitations for agriculture, whereas bad soil imposes such limits because the soil is overly gravelly, stony, or lithic. 25 Using this classification scheme, we can create subsamples of regional pairs and separately study migration flows for cases where both the source and the destination area have good soil, where the source has bad but the destination has good soil, etc. A similar approach is adopted to distinguish between regions with different slope characteristics. Slope is measured as the difference between the maximum and minimum elevation in meters within a region. We can then classify "steep" (above average) and "flat" (below average) regions and create appropriate samples of regional pairs. The results of our gravity estimation for these samples of regional pairs are reported in Table 6a and 6b, respectively.
As can be seen, the results are qualitatively similar for all the considered samples. That is, even for those cases where source and destination area are relatively homogeneous in their geographical features, we find a positive and significant impact of dialect (cultural) similarity on current gross migration flows (see columns 1 and 2 of Tables 6a and 6b). This again suggests that our estimation results are not capturing a spurious correlation, but reflect a causal effect of persistent cultural differences on current gross migration flows across German regions. 25 We are deeply indebted to Gilles Duranton for providing the data for these indicators (see the Appendix and Combes et al. for a more detailed description). To use current indicators of soil quality we need to assume that soil characteristics have not changed during the past centuries, and there are good reasons to believe that this condition is met by our binary distinction between good and bad soil. We also tried a variety of other indicators related to the climate and soil of a region, but this did not crucially affect our empirical results.  (2) we consider migration flows where the origin and destination both have bad soil quality. In column (3) we consider migration flows from regions with good to regions with bad soil quality, and in column (4) we consider migration flows from regions with bad to regions with good soil quality. "Good soil quality" refers to regions with no limitations to agricultural use according to the European Soil Database (esdb) compiled by the European Soil Data Centre. "Bad soil quality" refers to regions with one ore more limitations to agricultural use. Robust standard errors are reported in parentheses. *** statistically significant at the 1% level; ** statistically significant at the 5% level; * statistically significant at the 10% level.  (2) we consider migration flows where the origin and destination both are flat regions. In column (3) we consider migration flows from regions with steep slope to regions with good slope, and in column (4) we consider migration flows from regions with flat slope to regions with good slope. For each region, slope is measured as the difference between the maximum and minimum elevation in meters. We can then classify a region ith above-average slope as "steep", and with below-average slope as "flat". Robust standard errors are reported in parentheses. *** statistically significant at the 1% level; ** statistically significant at the 5% level; * statistically significant at the 10% level.

Conclusion
In this paper, we evaluate detailed linguistic micro-data from the 19 th century on the intranational variation of phonological and grammatical attributes within the German language.
We find an economically meaningful effect of historical dialect similarity on current regional migration flows. As illustrated above, dialects were shaped by past interactions, prior mass migration waves, religious and political divisions, ancient routes and transportation networks, and so forth. Dialects act as a sort of regional memory that comprehensively stores such information. Consequently, language variation is probably the best measurable indicator of cultural differences that one can come up with.
Our findings imply that there are intangible cultural borders within a country that impede economic exchange across its regions. These intangible borders are enormously persistent over time; they have been developed over centuries, and so they are likely to be there also tomorrow. Even on a low geographical level people seem to be unwilling to move to culturally unfamiliar environments. The average Bavarian will not easily move to Saxony, nor vice versa, unless he or she is compensated by considerably better economic prospects or job opportunities in the other region. The existence of cultural borders thus clearly limits the integration of the national labor market.
It is beyond the scope of this paper to discuss whether it is possible, or desirable, to downsize such borders. Policy initiatives in the European Union aiming for a preservation of regional languages tend to suggest that there is currently no interest in cultural equalization, but rather that linguistic diversity is perceived as valuable for a society. It is thus a natural extension for future research to explore the welfare consequences of cultural differences at a low geographical level in greater detail.

Geographical Distance
The geographical distance between two districts is calculated as Eucledian distance between each pair of districts' centroids.

Historical Border Dummy
Historic borders refer to 38 member states and 4 independent cities that were part of the German Confederation at its foundation in 1815. Data are taken from a map in Putzger -Historischer Weltatlas, 89 th edition, 1965. The dummy equals unity if a region pair does not belong to the same historic state.

Religious border dummy (1890)
The districts' historic shares of Catholics and Protestants in 1890 are calculated from a map in Meyers Konversations Lexikon, 4 th edition, 1885-1892. The dummy equals unity if a region pair has different religious affiliations, i.e. an above average share of Catholics and Protestants respectively.

Soil
Soil concerns the main limitation to agricultural exploitation. The variable distinguishes between regions that have no limitation to agriculture and regions that have limitations due to less suitable soil characteristics. 1 no limitation to agricultural use 2 gravelly (over 35% gravels diameter < 7.5 cm) 3 stony (presence of stones diameter > 7.5 cm, impracticable mechanization) 4 lithic (coherent and hard rock within 50 cm) 5 concretionary (over 35% concretions diameter < 7.5 cm near the surface) 6 saline (electric conductivity > 4 mS.cm-1 within 100 cm) 7 others For our purpose, we collapse all limitations and create a binary variable that distinguishes regions that are more or less suitable for agriculture. The data stem from the European Soil Database (esdb) and are compiled by the European Soil Data Centre.

Slope
Slope is measured as the difference between the maximum and minimum elevations in meters. Flat regions are regions with a below average slope while steep regions are characterized by an above average slope.

Variable
Description and Source (continued)

Travel Distance
The travel distance is calculated in car minutes from one district's capital to the other.

Urban
This variable is based on a standard classification of German districts (siedlungsstrukturelle Kreistypen) according to their density and their spatial status (cf. Federal Office for Building and Regional Planning 2003). For our purpose, urban areas are districts characterized by a minimum city size of 100,000 inhabitants or a population density larger than 150 inhabitants per km². All other regions are classified as peripheral areas.