Official statistics are public information. Statistics are produced for the benefit of the society on the basis of a national or European Union statistical programme . Official statistics are accessible to all and help everyone make decisions in their private or work lives. Official statistics comply with international classifications and methodologies and with the principles of impartiality, reliability, relevance, cost-effectiveness, confidentiality and transparency. In Estonia, the producers of official statistics are Statistics Estonia and Eesti Pank.
One year in Statistics Estonia
- Over 150 statistical activities
- 90,000 data providers
- 420,000 answered questionnaires
- 65,000 calls and e-mails answered by customer service
- 132,000 variables collected in questionnaires
- 1.3 million website visits
- Nearly 2 million statistical database visits
- Over 3,500 requests for information
1. Specifying needs
People have always been curious – it is the basis of human development and rational behavior. Many questions start with the words “how many” or “how much”. Today, questions such as the following are often presented to Statistics Estonia:
- How many children in county N are going to school next year?
- How many households there are in which the partners are not officially married?
- How much do people earn on average in a month in Valga county, Ida-Virumaa county or in Estonia as a whole?
Answers to questions for which there is significant public interest can often be found in the statistical database on the website of Statistics Estonia. We regularly collect feedback from users about the kind of information the society needs. The main users of statistics are public authorities, industry associations, research and educational institutions and local government associations.2. Production system design
Today, relevant information is often partly or fully collected in state databases. In the production of statistics, existing information is used as much as possible, including information from state databases such as the population register, commercial register, Estonian Education Information System and the register of buildings.
In the production of statistics, information generated by automated processes, such as mobile positioning data, social networking data, satellite images, etc. is of more and more interest. If the studied characteristics include assessments (e.g. satisfaction) or information that cannot be obtained from databases, it is necessary to interview people or economic entities. Compiling a questionnaire is one of the more labour-intensive stages in the preparation of a survey.3. Building the production system
Personal interview methods have improved over time, but so far traditional face-to-face interviews have not been eliminated, although they are time-consuming and expensive. Telephone interviews and web interviews are increasingly used to collect data. Usually, the best results are achieved by combining several methods. Most of the surveys of economic entities are carried out online, which is the most convenient and flexible method for respondents.4. Data protection
When setting up a research task, the population is specified, i.e. the number of persons or objects about which conclusions are expected. In the case of a sample survey, a part of them are selected, i.e. a sample is drawn and its size is determined. Therefore, each person or entity in the sample represents a whole range of similar persons or economic entities.
Data collection is the most expensive and time-consuming stage. If it fails because the subjects do not respond or their responses are illogical, the study will not reveal anything, as there cannot be reliable results without reliable data.5. Data processing
The preparation of data for analysis has become much quicker due to technical progress and checks in data collection programmes, which do not allow logically inconsistent responses (e.g., a 17-year-old respondent cannot have higher education). Data are also mostly electronically coded.
However, electronically collected data are not always correct. Anyone who handles data knows that datasets include typos. Hidden errors are detected when complex checks are run and comparisons are made with other sources. A major problem in datasets are data gaps that interfere with data processing, especially when more sophisticated models are to be applied. To replace missing values, different imputation methods are used, in which some object values are replaced with those of similar objects.
In order that the sample data could represent the population, a weight or expansion factor must be calculated for each object in the sample. The weight of a sample object shows how many similar population objects this sample object represents. The weight of a sample object is always one or more than one. In business entity surveys, the weight of large entities is often one because they are sufficiently unique in Estonia and there are no similar entities in the population.6. Calculation and analysis of data
The information collected during the data collection phase must be converted to a format that would allow answering the questions raised at the beginning of the study. Quite often, average values (average gross monthly wages) or totals (total number of unemployed persons) as well as various indexes are calculated. Price indices indicate price developments over time. The most complex, combining statistics of various domains, is the calculation of the gross domestic product.
When publishing statistics, Statistics Estonia must ensure that the information in data tables does not allow any economic entity, person or other object to be identified. Various mathematical methods are applied for this purpose.
A large part of survey results are presented in breakdowns and tables, but sometimes further analysis is carried out and more sophisticated statistical methods are applied. Models are increasingly used in statistical activities. These help to find relationships and causes. One might ask, for example, what influences the size of income. Complex models are used to analyse time series.7. Publication of results and statistical dissemination
Analyses are of value only if the results are made available to the users. Today, electronic dissemination is the prevailing dissemination method. The advantage of electronic tables and charts is that they are interactive: the users can choose tables and design graphs on the basis of their own interests and needs. A major improvement in spatial statistics is interactive maps with multiple layers. The number of printed texts has decreased considerably, but paper books and magazines have not disappeared, rather they include more specific information in compact form.8. Data evaluation
Quality evaluation of published statistics is based on five principles: relevance, accuracy and reliability, timeliness and punctuality, coherence and comparability, accessibility and clarity.
There are many quality indicators for measuring accuracy and reliability, the most important
of which are the standard error and variation coefficient. The totals and mean values calculated on the basis of samples are estimates of the actual totals and mean values, which we generally do not know unless we interview all the objects in the population. The distance of the estimate to the actual size is indicated by the standard error or variation coefficient. The smaller these are, the more accurate the calculated estimate. Other quality indicators of statistics are the response rate, imputation rate, shift due to under- and over-coverage.