Original Article
Full Access

Combining Labour Force Survey data to estimate migration flows: the case of migration from Poland to the UK

First published: 10 March 2016
Citations: 7

Summary

In May 2004, Poland and seven other countries from central and eastern Europe joined the European Union. This led to a massive emigration from Poland, especially to the UK. However, relatively little is known about the magnitude of migration flows after the 2004 enlargement of the European Union. In the paper Labour Force Survey data from the sending and receiving countries are utilized in a Bayesian model to estimate migration flows. The estimates are further combined with the output of the ‘Integrated modelling of European migration’ model. The combined results with accompanying measures of uncertainty can be used to validate other reported estimates of migration flows from Poland to the UK.

1 Introduction

Increasing international migration is an important factor in shaping population structures, influencing social, cultural, political and economic aspects of developed countries in the world, including the European Union (EU). After the enlargement of the EU in May 2004, the UK, along with the Republic of Ireland and Sweden, fully opened their labour markets for the citizens of the new member states. Together with the estimated 600000-people shortage in labour in the UK (Fihel and Piętka, 2007), this opening triggered the massive influx of Poles, primarily to the UK (Grabowska-Lusińska and Okólski, 2009). For example, according to UK's 2011 census, there were 579000 Polish-born residents in England and Wales, compared with 58000 in the 2001 census (Office for National Statistics, 2012a).

Little is known about migration dynamics in between the censuses and after the 2004 enlargement of the EU. Our knowledge is limited by the poor quality of the data collected on migration flows (Poulain et al., 2006; Kupiszewska and Nowok, 2008). The other available estimates, such as those provided by De Beer et al. (2010) or Grabowska-Lusińska and Okólski (2009), are not free of shortcomings, such as a lack of measures of uncertainty.

The contribution of this paper is threefold. First, a Bayesian model for estimating international migration flows with UK Labour Force Survey (LFS) data is developed. The model is applied to estimate migration flows from Poland to the UK in years 2002–2007. Second, it is demonstrated how the estimates based on the LFS data can be combined with other estimates obtained in a Bayesian model, such as those developed in the project ‘Integrated modelling of European migration’ (IMEM) (Raymer et al., 2013; Wiśniowski et al., 2013). Third, the combined estimates are used to assess the uncertainty about the size of migration flows from Poland to the UK after the enlargement of the EU and to validate figures that have been obtained from other sources, such as official migration statistics, the worker registration scheme (WRS), the national insurance number, NINO, database and the estimates of Grabowska-Lusińska and Okólski (2009).

In general, the method for estimating migration flows based on the LFS data can be used as a tool for validating and supplementing officially produced statistics. The results should not be treated as an alternative for the official data, but they have a large potential in enhancing the available information about migration or imputing missing observations. Moreover, it may be expected that the quality of the data on migration from large-scale surveys, such as the LFS, will improve over time and allow for producing more trustworthy estimates of migration flows (Knauth, 2012).

The paper is structured as follows. In Section 2, an overview of measurement of migration in Poland and the UK is presented, alongside a review of recent efforts in improving migration statistics and the use of the LFS in measuring migration. Section 3 contains a description of the data on migration flows that can be identified in the LFS. In Section 4, a model for estimating migration flows based on the LFS data is developed. The outcome is further combined with the output of the IMEM model. Section 5 provides a summary of findings with an outline of further research.

2 Background

2.1 Review of migration statistics

Collection and compilation of migration data in the EU are governed by Regulation (EC) No. 862/2007 of the European Parliament and of the Council of July 11th, 2007. It introduces a set of common rules and definitions for statistics on migration. It also allows the use of statistical methods for producing migration statistics (Article 9). The definition of a migrant corresponds to the definition that was recommended by the United Nations (1998). According to it, migration occurs when a migrant changes his or her place of usual residence for between 3 and 12 months (short-term migration) or at least 12 months (long-term migration).

In general, availability and quality of migration statistics in the EU are problematic (e.g. Bilsborrow et al. (1997), Poulain et al. (2006), Kupiszewska and Nowok (2008) and Kupiszewska et al. (2010)). The most important reasons for this are
  1. a lack of availability, i.e. data not being collected or disseminated,
  2. reliability, resulting from underreporting of migration or imperfect coverage, and
  3. comparability, i.e. different definitions of migration being used by various countries or over time.

A lack of high quality statistics considerably limits the ability to analyse migration in the EU and its contribution to the population change.

Migration flow data collected by the UK's Office for National Statistics and the Polish Central Statistical Office are not free of shortcomings. In the UK, except for the census, the methods that are currently applied to measure flows fail to provide reliable data (Singleton et al., 2010), despite the efforts that have been undertaken by the authorities to improve the quality of data (Singleton et al., 2010; Raymer et al., 2011, 2012; Office for National Statistics, 2012b). The most prominent reason is a small sample size of the International Passenger Survey (IPS) which is used to collect flow data, which hinders estimation of the required characteristics of migrants. The alternative sources are databases of national insurance numbers and a historical source of the WRS. However, both suffer from delayed registrations and underreporting (Raymer et al., 2012).

The availability and quality of migration statistics in the Polish population register is considered ‘very poor’ (Kupiszewska, 2009). The most important reason for this is the definition of a migrant which is based on the permanent duration of stay criterion, in comparison with the 12 months or more in the UK. Further, data are deemed to be heavily biased by the lack of deregistration of emigrants (Nowak et al., 2007). Even though various initiatives have been undertaken to improve the quality of statistics (e.g. Nowak et al. (2007) and Kostrzewa et al. (2010)), the above shortcomings of the Polish and UK's migration statistics lead to huge discrepancies between both sources. For instance, in 2004–2006, total emigration recorded by Polish register was 22000, whereas the official flow of Polish immigrants to the UK was estimated to be 127000.

2.2 Improving migration statistics

Recently, there have been several attempts to improve the quality and comparability of the data on international migration in the EU. First, legislative attempts have been undertaken, which aim at introducing required changes in the legislation regarding collection, processing and dissemination of migration statistics. Here, the above-mentioned United Nations recommendations (United Nations, 1998) and Regulation No. 862/2007 are the best examples. Another endeavour is the ‘Migration statistics mainstreaming’ programme, which was adopted by Eurostat (Knauth, 2012; Kostrzewa et al., 2010) and devoted to improving sampling frames of large-scale surveys in the EU to enhance the quality of information that is collected on migrants.

The second approach to improving comparability of migration statistics is based on the available data on migration. Methods vary in scope and complexity. The method of Lemaître (2005) and Lemaître et al. (2006), partially implemented in the International Migration Outlook (e.g. Organisation for Economic Co-operation and Development (2006)), utilizes residence permit data to produce standardized statistics. These statistics cannot, however, capture returning nationals, movements within the EU or emigration (Kupiszewska et al., 2010).

Third, optimization procedures can be applied to migration data, such as in the project ‘Migration modelling for statistical analyses’ (MIMOSA) (De Beer et al., 2010; Raymer et al., 2011). MIMOSA provided a harmonized table of flows between the 31 countries in the EU and European Free Trade Association for 2002–2007. The method was refined by Abel (2010) to estimate completely missing data and to provide measures of uncertainty for them and DeWaard et al. (2012) to relax the assumption about the relative quality of data in various countries. In general, this approach relies on many ad hoc decisions, such as the choice of a benchmark country deemed to have reliable migration statistics. MIMOSA estimates of the Poland–UK flows remain at the virtually same level before and after the enlargement of the EU, which contradicts observations, such as the official data or 2011 census.

In the IMEM project, harmonized statistics with measures of uncertainty are produced by a Bayesian model (Raymer et al., 2013). The methodology integrates the available data on migration from both sending and receiving countries, covariate information and expert judgement. The IMEM results include interpretable parameters, which correct the inadequacies of the data and measurement problems: undercount, varying duration of stay criteria, coverage and accuracy of the data collection method. Similarly to the MIMOSA output, the IMEM estimates relate to migration for at least 12 months. In this paper, the IMEM results are subsequently combined with the proposed estimates that are based on the LFS data.

2.3 Labour Force Survey in measuring migration

Since the LFSs include questions about country of birth, nationality and year of arrival, they have been extensively utilized to measure migration stocks and to analyse the situation of migrants on the labour markets (e.g. Fihel and Piętka (2007), Anacka (2008), Shields and Price (1998), Blackaby et al. (2005), Dustmann et al. (2005), Drinkwater et al. (2009) and Khan (2009)).

The LFS data from Poland and the UK were utilized by Grabowska-Lusińska and Okólski (2009) to estimate the size of emigration from Poland to the UK and explain the reasons and possible consequences of the large outflow after the enlargement of the EU in 2004. They evaluated the size and structure of the flows, as well as changes in the composition of the stocks of migrants in the receiving countries. However, a scarcity of migrants in the samples may lead to a large sampling error of their estimates, which is not reported.

The quality and comparability of migration statistics on stocks and flows, based on the LFS data from the 15 EU countries, was assessed by Martí and Ródenas (2007). They concluded that the LFS data can be used for compiling statistics on stocks of migrants, but not flows, which are usually underestimated. However, they seem to neglect differences in the definitions of a migrant that are used in official statistics and the consequences of applying different data collection techniques in various countries, as well as possible biases in the official data. Moreover, Kupiszewska et al. (2010) criticized the LFS for
  1. small sample sizes,
  2. high non-response rates, especially among foreigners, and
  3. the fact that immigrants may live in collective households which are usually not included in the sampling frames (see also Eurostat (2003)).

However, it is suggested that the LFS in Germany, Italy, Estonia and Switzerland may be suitable for measuring flows, as the sampling frames in these countries can potentially permit capturing migrants without bias.

In contrast, Rendall et al. (2003) concluded that the estimates based on the LFS samples are correctly reflecting the patterns of migration over time, but the levels are systematically underestimated by about 15–30%. Further, flows by a single country of origin may not be sufficiently accurate; hence, it is advised to aggregate the LFS data over groups of countries or over time.

In this paper, migration flows from Poland to the UK are estimated with the data from the corresponding LFSs in both countries. These flows are assumed to be sufficiently large to be captured in the samples. The model takes the possible bias into account. Finally, Bayesian inference provides a natural probabilistic framework for assessing the uncertainty of the estimates. Hence, it allows for validation of the existing statistics and estimates.

3 Data on migration flows in the Labour Force Survey

The LFS is one of the largest and the oldest surveys carried out in the EU. Its primary objective is to obtain information on the labour market across all sectors of economy. The survey targets a sample of individuals or households which is meant to be representative of the population in a given country (for details see, for example, Eurostat (2003)). Hereinafter, the abbreviation LFS denotes the survey without reference to a particular country; the LFS in Poland is denoted by its Polish abbreviation BAEL (Badanie Aktywności Ekonomicznej Ludności), whereas in the UK it is the British LFS (BLFS).

3.1 Measuring migration flows in the Labour Force Survey

Migration flows in the LFS can be measured by using the so-called transition-based, rather than event-based, approach (Rees, 1977). In the transition method, the population of a country is compared at two points in time, say t and t+1, rather than over a given period (t,t+1] as in the event approach. A person is considered an emigrant when he or she is present in the country at t but absent at t+1, unless this person died. Analogously, a person who is absent at t but counted in the country at t+1 is considered an immigrant, unless that is a newborn infant.

Despite the large scale of the survey, migrants in the LFS usually constitute a tiny fraction of the total sample size. Consequently, migrations that can be captured in the LFS are very rare, which leads to a large sampling error. Apart from that, there are at least five other issues that may influence the measurement of flows in the LFS:
  1. non-response,
  2. imperfect coverage in the sampling frame,
  3. undercount due to emigration of entire households,
  4. using censuses for updating sampling frames and slow entry of migrants to it and
  5. non-sampling variability

(Eurostat, 2003; Rendall et al., 2003; Martí and Ródenas, 2007; Kupiszewska et al., 2010; Office for National Statistics, 2011). Shortcomings (a)–(d) usually lead to bias in the estimates, whereas (e), as well as sampling variability, affects their accuracy. The other limitations of the LFS relate to differences between countries and over time in the survey design and sampling frames. For details on differences between the design of the BAEL and the BLFS see Eurostat (2012), Central Statistical Office (2009), Ker et al. (2009) or Office for National Statistics (2011).

3.2 Finding migrants in the Labour Force Survey

Both in the BAEL and in the BLFS, basic information about all members of a sampled household, irrespective of their age, is collected. Hence, information about migrants relates to the entire population. Immigrants can be identified in the BLFS by using country of residence 12 months before the survey. A person who has resided in Poland 12 months before the survey is considered an immigrant. It can also be assumed that relocation to the UK took place within the last 12 months.

This information became available in the BLFS for all years under study around July 2015. Before, this variable has not been disseminated for years before 2005. However, there are discrepancies between the current metadata related to the updated variables and their descriptions in the data set. Also, there is a noticeable break in the series before 2005 and after, which may suggest that the undercount of Polish migrants in the BLFS was much larger before the expansion of the EU, or may result from the relatively large amount of missing data. After 2005, the question on country of residence 12 months before the survey is deemed to be more appropriate for the identification of immigrants. An alternative method of measuring migrants relies on a country of birth and a year of first arrival to the UK. Specifically, immigrants are people who are interviewed in the first quarter of a given year and who arrived within the previous year. These people have stayed in the UK for at most 12 months. The main flaw of this method is that it measures flows by country of birth of a migrant, rather than country of previous residence.

Until 2008 interviewers in the BLFS were instructed to survey only those who remained in the UK for more than 6 months (Ker et al. (2009), page 10). This introduces a lower limit of the minimum duration of stay. However, it can be argued that this criterion had no effect on the collection of basic information and, thus, the number of migrants captured. Additional confusion is caused by the fact that people who resided in the UK for less than 3 months had been interviewed in the BLFS before the minimal criterion was removed. The use of this criterion may introduce bias, the magnitude of which is difficult to assess.

To identify emigrants in the BAEL, information about country of residence of a person who is absent in the household is utilized, along with the time of absence, which can be either less or more than 12 months. Information about absent members is gathered if they are absent for more than 2 or 3 months, as in 2007 a minimum of 3 months replaced the 2-month criterion. Further, a simplifying assumption is made that flows that are measured in this way relate to migration for between 3 and 12 months.

Immigrants for less than 3 months in the years 2002–2004 can be identified in a similar way to that described above. Specifically, in the sample from the second quarter of a reference year, people whose country of birth is Poland and who arrived in the UK in the year of the survey are selected as immigrants. After 2005, immigrants can be identified by using country of residence 3 months before the survey. Emigrants for less than 3 months in the BAEL can be identified by comparing the country of residence of a person between two surveys carried out in the two subsequent quarters in a given year. Information for this identification procedure is available since 2006.

Counts of migrants and sample sizes from the BLFS and BAEL, together with the population sizes, are presented in Table 1. The officially reported population sizes in Poland, urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0001, and the UK, urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0002, were obtained from the Eurostat database. The population size for Poland is likely to be overestimated because of the restrictive migration definition. However, sampling frames for the period analysed are based on extrapolations from the 2001 census (Central Statistical Office, 2009); thus, they match the sizes of the population reported to Eurostat. The number of emigrants from Poland to the UK captured in the BAEL, urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0003, is similar to the counts urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0004 of immigrants yielded by the BLFS, but counts of emigrants are obtained by using samples (which are denoted by urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0005 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0006 for the BAEL and the BLFS respectively) around half those in the BLFS. When comparing the resulting rates of emigration and admission from the BAEL and the BLFS respectively, it is observed that emigration is larger than immigration. This may indicate a consequence of the 6 months’ minimal duration criterion, or other biases in the BLFS data.

Table 1. Migrants for 3–12 months and less than 3 months in the BAEL and BLFS†
Year Results for BLFS Results for BAEL Results for IPS,
⩾12 months
<3 months 3–12 months <3 months 3–12 months
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0007 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0008
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0009 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0010 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0011 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0012 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0013 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0014 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0015 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0016 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0017 urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0018
2001 4 140424 59.00 17 60440 38.25 4 255184
2002 2 138816 7 134833 59.24 15 60170 38.24 3 255307
2003 3 133473 28 130082 59.50 32 59651 38.22 2 250860
2004 5 128524 39 126587 59.79 70 58794 38.19 10 258758
2005 17 126587 80 124106 60.18 133 57223 38.17 30 279352
2006 24 124106 153 123715 60.62 22 27770 154 54893 38.16 48 279299
2007 19 123715 93 122049 61.07 22 26808 93 53482 38.13 72 259068
2008 7 122049 61.57 12 27022 38.12 80 244455
  • †Source: own elaboration based on the BAEL, BLFS, IPS and Eurostat data.

As mentioned above, the counts of migrants that are captured in the BLFS and BAEL are relatively small compared with the respective sample sizes. However, the official estimates of the number of people arriving from Poland and intending to stay in the UK for at least 12 months are based on the weighted numbers of respondents captured by the IPS. For comparison, raw counts of those migrants and the corresponding sample sizes are presented in the last two columns of Table 1.

4 Estimating flows from Poland to the UK

The data from the BLFS and BAEL can be used to estimate migration flows from Poland to the UK. In the model that is proposed in this section, weights for observations in the LFS data are not utilized. This assumption has two reasons. First, the weights in the BAEL are computed with stratification variables and techniques that are different from those in the BLFS. Second, the weights are available for the immigrants who were identified in the BLFS, but not for the emigrants who were identified in the BAEL, as these people are missing from the sample and information about their place of residence is provided by the other members of the household. Hence, those who are not in the sample cannot represent the population and are assigned a null weight.

4.1 Bayesian model of migration flows

The model of migration flows is based on the population balance equation. In that equation, the population size in year t+1 is calculated by using the transition approach. In this paper, an example of two countries is considered. Migration flow from the first to the second appears in the balance equations of the sending and receiving countries; thus, it is assumed that this flow is measured in both of them.

Let urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0019 denote the count of people who resided in the sending country S on January 1st of year t−1 and who survive in the receiving country R on January 1st of year t. In the transition approach, this count represents the flow between January 1st of t−1 and January 1st of t. Then, the population size urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0020 in year t in country S can be calculated by using this approach as
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0021(1)
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0022(2)
A dot ‘·’ denotes summation over the relevant index, 0 in the superscript represents the rest of the world, urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0023 is a count of infants who were born in all countries since t−1 and surviving in country S at t. Analogously, urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0024 denotes the count of deceased between t and t+1. Further, the population size in country R can be calculated as
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0025(3)
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0026(4)
Then, the population in country S for year t−1 can be calculated by using equation 2:
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0027(5)
It can now be noted in equation 5 that counts urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0028, which for the sending country S represent emigration in year t, are the same as counts of immigrants in year t for the receiving country R in the balance equation 3.
Further, the counts of migrants in the LFS samples of sending and receiving countries, which are denoted by urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0029 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0030 respectively, are assumed to be realizations of independent Poisson distributions with means urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0031 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0032 respectively:
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0033(6)
where urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0034 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0035 represent the undercount of emigration and immigration respectively. Parameter urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0036 relates mainly to the emigration of entire households and their ‘escape’ from the BAEL sample. Immigration undercount, which is reflected by urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0037, captures bias resulting potentially from
  1. the requirement of interviewing people who have stayed in the UK for at least 6 months (Ker et al., 2009),
  2. exclusion of institutional households from the sampling frame,
  3. relatively high rates of non-response and refusal to answer (Rendall et al., 2003) and
  4. inadequate construction of the sampling frame of the BLFS for capturing immigrants.
Next, the ratios of the expected unbiased LFS migration counts urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0038 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0039 to the corresponding sample sizes urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0040 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0041 are assumed to be imperfect equivalents of the ratios of the true counts of migrants, urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0042, to the respective population sizes of the sending (equation 5) and receiving (equation 3) country respectively, i.e.
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0043(7)
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0044(8)
where τ denotes precision (inverse variance). The normal distribution that is assumed for the logarithms of the expected ratios in the LFS, urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0045 where m ∈ {S, R}, reflects the overdispersion due to the fact that these ratios are imperfect measures of the true unobserved ratios of migrant counts to the population size urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0046. Imperfect measurement includes the sampling error, error resulting from the sampling scheme being inadequate for capturing migrants in the LFS and other non-sampling variability such as differences in timing, i.e. the different quarters from which the samples are taken within a survey from one country, as well as differences between the sending and receiving countries. Further, urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0047 is simplified to urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0048.

Since the reasons for undercounting migrants differ between the BAEL and the BLFS, it cannot be assumed a priori that one of the parameters urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0049 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0050 is necessarily larger than the other. Equations 7 and 8 ensure identification of the ratio of the undercount parameters, but not of their absolute magnitude. To address this problem, two sets of specifications of the model are analysed. In the first, it is assumed that the immigration data are measured perfectly, i.e. urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0051, whereas urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0052 is estimated. In the second specification the role of undercount parameters is reversed. In this way, the data from the BLFS and BAEL can be compared and undercount can be measured in relation to the reference data.

The obvious shortcoming of this method is that the entire undercount cannot be estimated. For this, the true number of migrants would have to be known. This information can be obtained in the form of the above-mentioned subjective prior distributions elicited from the experts, as in the IMEM model (Wiśniowski et al., 2013), or by using other unbiased sources of data, such as the census.

4.2 Prior distributions

The selection of the prior distributions for the model parameters is based on the simulation study, in which samples of various sizes were first simulated from the model with different sets of assumed known parameters and then analysed by using the model proposed. For the migration counts urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0053, uniform distributions are insensitive to the specification of the hyperparameters, in contrast with the generalized beta distributions. When conjugate gamma prior distributions are assumed for precision parameters urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0054 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0055, the results stabilize for the values of the rate and shape hyperparameters lower than urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0056, i.e. implying very vague prior information or approximately non-informative. Hence, vaguely informative gamma priors are assumed:
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0057
For the undercount parameter it is assumed that
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0058
which is non-informative in the sense that it assigns equal probability density value to any value of the undercount from the range (0,1). For urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0059, a uniform prior on a range that is relatively large to the sending country's population size is assumed:
urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0060
Again, this prior is selected on the basis of the simulation study. This specification allows relatively easy inclusion of expert information by means of a constant c, which denotes a maximal fraction of the population in the sending country that can emigrate in a given year. Here, it is set to c=0.02. In the sensitivity analysis, c=0.02 led to virtually the same results as larger values.

Samples from the posterior distributions of model parameters were obtained from the Markov chain Monte Carlo slice sampler (Neal, 2003) implemented in R (R Core Team, 2014). The code and data that were used for computations are available from

http://wileyonlinelibrary.com/journal/rss-datasets

Convergence was assessed by visual inspection of the Markov chain Monte Carlo samples and by the cumulative sum statistic (Yu and Mykland, 1998). The number of iterations of the Markov chain Monte Carlo algorithm is 500000 with the burn-in of the same size. Thinning of order q=50 implies an effective sample size of 10000 iterations.

In terms of the point estimator of flows, the posterior median is recommended and utilized. As tested in the simulation study, this estimator has no or very mild sampling error, compared with the mean. However, the Bayesian approach offers the end users of the models a formal approach for computing point estimators through Bayesian decision theory; for introductions to applications in demography see, for example, Alho and Spencer (2005) or Bijak (2010). In this approach, a so-called loss function must be specified. For example, the median and mean are the optimal estimators under the absolute and quadratic value loss functions respectively. If an asymmetric loss function is utilized, it can lead to a specific percentile as an optimal point estimator. The most challenging part is the specification of this function (Bijak, 2010).

4.3 Results

4.3.1 Migration flows for 3–12 months

Posterior characteristics of migration flows resulting from the model with assumed emigration undercount urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0061, i.e. immigration is measured perfectly, are presented in Fig. 1(a), whereas the results based on the BAEL emigration data as benchmark are presented in Fig. 1(b). A table with underlying posterior characteristics can be found in the on-line supplementary material (Table A.1). Medians of both posteriors from both approaches are characterized by the similar trend over time. The results with urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0062 are larger by around 20–25% and their 95% predictive intervals are wider than the results with urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0063. However, posterior distributions with urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0064 are heavy tailed: their 99th percentiles are larger than for urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0065 by 10–50%. Also, in both cases, a large increase of migration since 2004 is observed.

Details are in the caption following the image
Posterior characteristics of Poland–UK flows for 3–12 months (image, median; image, 95% predictive interval; image, interquartile range): (a) urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0066; (b) urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0067

Posterior densities for parameters urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0068 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0069 are presented in Fig. 2(c) and Fig. 2(d) respectively. The posterior median of the emigration undercount parameter is 0.9, whereas for immigration it is 0.53. This suggests that, when immigration is assumed to be measured without bias, it implies that around 10% of the emigrants from Poland to the UK have relocated with their whole households (hence, they did not participate in the BAEL surveys). In contrast, when it is assumed that the BAEL emigration data are unbiased, only 53% of Polish immigrants are captured by the BLFS. The comparison of magnitudes of undercount suggests that the bias in the BLFS data is larger than the undercount of emigration in the BAEL. This observation is further confirmed by the mode of the posterior distribution of urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0070 being equal to 1 (the same result was obtained for all prior specifications tested).

Details are in the caption following the image
Histograms of the posterior distributions of the undercount parameters λ: (a) urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0071, flows for less than 3 months (p(1%)=0.08; p(25%)=0.58; median =0.78; p(75%)=0.91; p(99%)=1); (b) urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0072, flows for less than 3 months (p(1%)=0.08; p(25%)=0.28; median =0.33; p(75%)=0.38; p(99%)=1); (c) urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0073, flows for 3–12 months (p(1%)=0.41; p(25%)=0.81; median =0.9; p(75%)=0.96; p(99%)=1); (d) urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0074, flows for 3–12 months (p(1%)=0.28; p(25%)=0.48; median =0.53; p(75%)=0.58; p(99%)=0.99)

4.3.2 Migration flows for less than 3 months

Posterior characteristics of the flows for less than 3 months are presented in Fig. 3 and Table A.2 in the on-line supplementary material. The medians follow a similar trend to that observed for flows for 3–12 months. Nevertheless, the differences between the results for both undercounts are even more striking, which can be attributed to the strong undercount of immigrants in the BLFS relative to the BAEL. Medians of the flows with urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0075 are around 2–3 times larger, whereas their 99th percentiles are by around 70–85% smaller. Large uncertainty can also be attributed to a smaller sample size: there are only three observations of emigration flows in the BAEL.

Details are in the caption following the image
Posterior characteristics of the migration flows for less than 3 months (image, median; image, 95% predictive interval; image, interquartile range): (a) urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0076; (b) urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0077

Histograms of the posterior distributions for the parameters urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0078 and urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0079 for flows for less than 3 months are presented in Figs 2(a) and 2(b). Again, posterior characteristics are similar to those obtained for 3–12 months flows. The posterior median of urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0080 given urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0081 is 0.78, but the distribution is very flat and the mode is again in 1. In contrast, the posterior median of urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0082 given urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0083 is 0.33, which indicates a very large undercount of immigrants in the BLFS in relation to the BAEL data.

4.3.3 Undercount of migration flows—a discussion

A conclusion that can be drawn from these results is that the BLFS immigration data are biased in relation to the BAEL data on emigration. Imposing no bias in the immigration data leads to the relatively low median estimates of the flows but with inflated uncertainty. Therefore, the estimates that are based on the assumption that the measurement of immigration is biased provide a more realistic picture of flows from Poland to the UK. These estimates are also utilized in further analyses. Nevertheless, these results are still flawed, as the undercount of the number of emigrants in the BAEL cannot be identified and estimated, and the entire immigration undercount remains unknown.

The most obvious explanation of the immigration bias is the minimal duration of the stay criterion of 6 months. However, imperfect coverage of the sampling frame also seems to be important. It is likely that immigrants for a relatively short time, e.g. less than 12 months, will more often be accommodated in institutional households, such as boarding houses or caravan sites provided by their employers (Robinson et al., 2007; Audit Commission for Local Authorities and the National Health Service in England and Wales, 2007). Moreover, entry to the sampling frame by the immigrants, based on the Postcode Address File, may be significantly affected by their high internal mobility within the UK, especially at the initial stages of their migration experience (Robinson et al., 2007; Trevena et al., 2013).

Another explanation for the lack of participation of migrants in the BLFS can be non-response and refusal to answer, possibly due to the language barrier (Thomas, 2008). Also, when census results are used as a basis for updating the sampling frame, the areas that are considerably affected by migration can be misrepresented. For instance, immigrants to the UK tend to cluster in particular localities (e.g. Trevena et al. (2013)). Hence, migrants can enter the sampling frame only when the census data are updated every 10 years. A review of measures undertaken or considered in the EU countries to increase participation of migrants and to reduce their non-response in the LFS was carried out by Barnes (2008). They include increasing coverage by conducting focus groups, linking population registers with census data and using incentives, as well as improving the response rate by computer-aided interviewing, translating questionnaires into foreign languages, engaging interpreters or multilingual interviewers and providing information about the survey in the media.

4.4 Combining estimates based on the Labour Force Survey with ‘Integrated modelling of European migration’ output

Combining estimates from the IMEM model with the LFS-based results is in line with the recommendations by Willekens (1994) and Raymer et al. (2012) for utilizing data from several sources to compile statistics on migration. In this paper, the data being combined include the officially reported statistics on migration flows between the 31 countries in the EU and European Free Trade Association and the LFS data from Poland and the UK.

The combined IMEM–LFS estimates can facilitate comparisons with the other sources of data and estimates, as they can be constructed to include all migrants who stay in the UK regardless of the duration of stay or those who stay longer than 3 months. Hence, these estimates can be compared with the figures from the WRS and NINO-data, which practically disregard the duration criterion, as well as the estimate that was provided by Grabowska-Lusińska and Okólski (2009), which relates to migrants for more than 3 months. These figures can be then validated by using measures of uncertainty of the combined IMEM–LFS estimates.

Posterior distributions of the entire flows from Poland to the UK are constructed by summing elementwise the samples of the same length from the posterior distributions of the flows obtained from both models. That is, flows for more than 12 months (IMEM) and flows for shorter periods based on the LFS data with assumed immigration undercount, i.e. urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0084, are added to each other. The resulting sample of sums is treated as a sample from the posterior distribution of the total flow regardless of the duration criterion in the years 2002–2007. Characteristics of this distribution are presented in Figs 4 and 5 and Table A.3 in the on-line supplementary material.

Details are in the caption following the image
Flows from Poland to the UK (in thousands) regardless of the duration of stay criterion (source: own elaboration, as well as De Beer et al. (2010), MIMOSA estimates, Raymer et al. (2013), IMEM estimates, UK Office for National Statistics, Polish Central Statistical Office, WRS and national insurance number data: image, median of combined IMEM–LFS estimates; image, predictive interval of combined IMEM–LFS estimates; image, interquartile range of combined IMEM–LFS estimates; urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0085, median LFS estimates, 3–12 months; ▵, reported immigration in the UK; ×, reported emigration in Poland; ◯, MIMOSA; □, WRS; urn:x-wiley:09641998:media:rssa12189:rssa12189-math-0086, national insurance numbers issued to Polish citizens
Details are in the caption following the image
Posterior characteristics of migration flows from Poland to the UK (combined IMEM and LFS-based results) (image, median; image, 95% predictive interval; image, interquartile range): (a) migration flows for more than 3 months; (b) migration flows regardless of the duration criterion

The assumption underlying the summation of the Markov chain Monte Carlo samples from two models is that these samples are independent. This can be justified by the fact that the IMEM and LFS-based models analyse two different types of migration data: officially reported and LFS respectively. The measurement of migrants by the IPS (official source) is independent of the measurement in the LFS samples. Also, both surveys rely on different sampling frames. In practice, it may happen that a migrant who is surveyed in the IPS declares having an intention to remain in the UK longer than 12 months and later in the same year takes part in the BLFS and is considered a migrant for less than 12 months. However, the reported IPS estimates are corrected to represent actual migration. Second, the WRS data can be used to justify, at least partially, the assumption of independence. At registration with the WRS, on average, only 11% of migrants from countries that joined the EU in 2004 intended to stay in the UK for more than 12 months.

For the total migration flows before the enlargement of the EU, i.e. summed over 2002 and 2003, the posterior median is 57000, with 95% predictive interval (43000, 80000) people. A majority of migrants stayed in the UK for less than 12 months (median 37000). Meanwhile, there were 14000 new applications of Polish citizens for national insurance numbers. Moreover, NINO-figures before 2005 fall below the 95% predictive interval of the combined IMEM–LFS estimate. This may indicate that the number of immigrants before the enlargement of the EU could have been underreported in this source. Alternatively, migrants with an intention of a short stay did not used to apply for NINOs before enlargement of the EU. Grabowska-Lusińska and Okólski (2009) page 74, estimated that the ‘net emigration’ from Poland between May 2002 and May 2004 was around 118000 people, 10% of whom emigrated to the UK. In the light of the combined LFS-based and IMEM result, both estimates seem to undercount the overall number of Polish emigrants.

For 2004–2007, the overall median flow is 720000, with 95% predictive interval (575000, 951000). In the same period, the total number of unique Polish applicants to the WRS was 484000, whereas 609000 applied for national insurance numbers. Both figures are lower than the median estimate, and only the NINO-data fall within the predictive interval of the combined IMEM–LFS statistic. The WRS and NINO-data can, however, be biased. The main reasons are a lack of registering spouses of workers and not registering those who had already worked in the UK for more than 12 months (especially the WRS), as well as delays in registration and only a single recording of a given person regardless of any further migration history. In general, the undercount of migrants from eastern and central Europe in these sources is estimated to be around 25–33% (Pollard et al. (2008), pages 18–19).

Grabowska-Lusińska and Okólski (2009), page 74, compared the stock data on migrants from the Polish 2002 census and the population register in 2004, and supplemented them with the BAEL data. Their result is that 1.1 million people left Poland for the other EU countries within 24 months after May 2004 (their figure relates to migrants for 3 months or more). These people migrated predominantly to the UK, the Republic of Ireland and Germany. According to the Polish Central Statistical Office (after Grabowska-Lusińska and Okólski (2009), page 84), the share of migrants to the UK in the total outflow from Poland within 24 months after enlargement of the EU was 31.3%, i.e. 343000. This result seems to be supported by the data from the WRS and NINO-sources, in which, since May 2004 to the end of 2006, there were 340000 and 368000 Polish applicants respectively.

Another source of data to compare with are the results of the Polish 2011 census (Central Statistical Office, 2012). In 2002, it was estimated that 780000 people with permanent residence in Poland stayed abroad for more than 2 months. In 2011, the number of residents staying abroad for more than 3 months was 1.94 million. The difference between the two figures is 1.16 million, which is very close to the above-mentioned figure of Grabowska-Lusińska and Okólski (2009) but the figure based on the 2011 census seems to be more trustworthy. Theoretically, it should be free of underreporting that is common for the population register. However, it refers to a much longer period within which migration events could have taken place.

Both estimates of Grabowska-Lusińska and Okólski (2009) and the estimate based on comparing successive censuses are constructed by taking the difference between the stocks of people counted at two distant points in time. However, this difference neglects the number of possible return migrants, as well as natural change; thus, it cannot reflect the true dynamics of the migration process. Also, the precision of these estimates remains unknown. In this context, the combined IMEM–LFS results, though not directly comparable, provide measures of uncertainty that can be used to validate these estimates. For example, the posterior median of the IMEM–LFS total flows for more than 3 months for 2004–2006 is 463000, whereas the estimate by Grabowska-Lusińska and Okólski (2009) of 343000 is smaller by 25%. It also lies below the 95% predictive interval of the IMEM–LFS estimates, which is (359000, 619000).

In Fig. 4 it is also observed that the officially reported data from both the UK and Poland, as well as the MIMOSA estimates, lie well below the combined IMEM–LFS outcome. The difference can be explained by different duration of stay criteria: 12 months or more in the UK's data and MIMOSA and permanent stay in the Polish register. Here, the IMEM estimates, which relate to migration for 12 months or more, should be used for validation.

The combined estimates are not free of limitations. Whereas the IMEM model mitigates bias resulting from the undercount in the official migration data by using prior information elicited from experts, the undercount that was identified in the LFS-based estimates remains uncorrected. There is also an inconsistency between the measurement of migration flows in the LFS and in the official statistics. The former rely on the transition approach (people), whereas the latter relate to migration events. However, for the LFS data and migrants for less than 12 months, the difference between the number of people and the number of migration events experienced by them can be assumed to be negligible, as it is not likely that these people changed their country of residence multiple times within 12 months.

5 Conclusions

In this paper, a Bayesian model for estimating migration flows between two countries with LFS data has been developed. Results can be used as a source of auxiliary information about migration flows, or to validate other estimates externally. In the context of missing or unreliable official data, as well as the requirement of Eurostat to provide harmonized statistics on migration, this tool can be applied to produce, harmonize or enhance the statistics on migration flows. The advantage of the model is the fact that the LFS is carried out in all member states of the EU and European Free Trade Association as well as the EU candidates. Thus, the model can be used for monitoring current patterns in international migration in the EU. It has also been demonstrated how the LFS-based estimates can be combined with the outcome of the IMEM model to construct estimates of flows with various duration criteria.

The combined estimates have been used to assess the size of migration flow from Poland to the UK. It is estimated that flows between 2004 and 2007 were 720000 people, with a 95% predictive interval of 575000–951000. These results are larger than the figures reported officially in Poland and the UK, as well as in sources such as the WRS or applications for national insurance numbers, but they follow a similar trend over time. The results also suggest that the flows under consideration are significantly larger than the estimates that were obtained by Grabowska-Lusińska and Okólski (2009). Still, the LFS-based results are not free of shortcomings: they can be characterized by large sampling variability and are biased because of the emigration of entire households, and immigration undercount resulting from the duration criteria used in the UK's LFS, non-response and slow entry of migrants to the sampling frame.

Possible paths of further exploration relate to the modelling framework and the scope of application. First, the opposite direction of migration, from the UK to Poland, can be estimated. This estimation would rely on the data from the Polish LFS, as the information about absent members of the household is not collected in the British LFS. Second, the model can be extended to analyse more recent periods for which larger samples are available. This analysis could shed light on migration dynamics after the financial crisis of 2007–2008. However, owing to the changes in methodology in the Polish LFS in 2012, the results may not be directly comparable with those for previous years (Saczuk, 2014). Third, the model can be extended to capture flows between many countries or their groups. Here, Swedish, Swiss, Italian and German LFS data seem to be of interest because their sampling frames are more adequate for measuring migration. Using these data, as well as recent data from the UK, the undercount can be estimated. Fourth, when the analysis is extended to more than two countries, it would be possible to include additional explanatory variables for the undercount parameter or the count of migrants. Fifth, population sizes can be treated as estimates with uncertainty (Wheldon et al., 2013). Sixth, elicited expert opinion can be used to inform model parameters, especially related to the unknown undercount. Seventh, the Bayesian framework offers tools for model selection or averaging, when a variety of models is available. Finally, Bayesian decision analysis can be used to obtain point estimates driven by an elicited loss function.

To sum up, exploration of the LFS, as well as other large-scale surveys carried out in Europe, can contribute to the analysis of reasons and consequences of migration. The results can be used to monitor changes in the population as well as liquid, temporal or officially unrecorded movements, especially when major innovations in social or migration policies are introduced, such as in the case of the expansion of the EU.

Acknowledgements

The author thanks Marek Kupiszewski, Jacek Osiewalski, Dorota Kupiszewska, Jakub Bijak, James Raymer, Peter W. F. Smith and Jonathan J. Forster for their help and suggestions throughout this work, as well as three reviewers who greatly helped to improve the manuscript. Part of this research was carried out at the Economic and Social Research Council Centre for Population Change and Southampton Statistical Sciences Research Institute, University of Southampton. The LFS and IPS data were downloaded from the UK Data Service; the BAEL data were obtained from the Warsaw School of Economics, courtesy of Irena E. Kotowska and Paweł Strzelecki.