In our quest to create tailor-made solutions that become a perfect fit, "style counsel" and taking measurement meticulously are essential. That is why we start in each project with an in-depth needs and structure analysis of the data sources to be processed and the planned system for data production.
In close contact with our customers, we also develop evaluation benchmarks for an ongoing quality assurance. We understand the deduction of suitable reference figures and target specifications as a decisive building block for successful modeling.
Having many years of experience in media research, we take special care to ensure that external and internal structures of a data set are made as stable and consistent as possible during the modeling process.
Data structure must be decomposed to a level of detail that allows to identify the anchor that ties one data set to the other.
Framework and setting of reference quotas must happen in a way that helps to maintain relationships within the data. In terms of media research that means to understand specific structure of media products and their audiences. This is a prerequisite to design a proper modelling. If that is a given, it is secondary where the specific data is derived from (sourced from household or individual samples, survey, tracking or log file data etc.)
Design of data processing can then follow this condition to optimize results within a multi-dimensional structure:
usage patterns (such as devices, channels)
Facing the challenges of our clients’ projects, we are used to thinking along holistically from preparation of data input till the assessment of data output. That is why we consider right from the beginning about how the modeling work will be integrated into a regular data production.
ANKORDATAscience framework – WEIGHTING
Considering the methods of data transformation we love to apply, it is essential to know about the target figures that form the boundaries of the modelling work. Thus, the weighting and extrapolation becomes quite important.
Weighting helps to correct some unbalance within the processed data sets – which requires quite often a complex weighting scheme. We are also used to take into account the specific designs of samples, the pitfalls of data collection and the requirements for continuous reporting. Tailoring a customized data model this step means to put the “fabric” into shape and balance.
As a supplier to AGOF and AGF, we have acquired unrivaled expertise in development and application of weighting and extrapolation methods. We perform these complex weightings by automation and on a daily schedule in our pipeline. For the weighting schemes we consider a socio-demographic specification as well as usage parameters from census measurement.
But we do not stop with the implementation of iterative weighting procedures. For various projects we are also in charge of analysis and optimization of existing weighting models. Here we can rely on the experience also gained from our projects in calibrating data sets which follow a hierarchical structure of media content and ads. This helps us to preserve the labyrinthine relationships within the statistical distribution of media and their audiences.
ANKORDATAscience framework - FUSION
We design and implement different data fusion methods for merging several incomplete data sets into an overall evaluation basis. Talking mathematics, we are dealing with ‘discrete optimization procedures’ with a variety of criteria for the optimizations.
Research surveys of our clients, such as AGF, AGOF and agma, often present an exhaustive catalog of characteristics for evaluation. We distinguish those characteristics by:
Common characteristics - so-called anchor variables, which are available in all partial data sets
Characteristics to be transferred, which are only retrievable from partial data sets (donors) and must be transferred to the remaining records (recipients).
The common characteristics and their overlaps into characteristics recorded in the donor dataset is the most important piece for joining the elements. For the majority of projects we seek to map the distributions that can be observed within the donors to the recipient dataset. By our logics of fusion we guarantee to derive appropriate reference data from donors and replicate that structure within the recipients. Existing correlations must be preserved during this merger, as it means to create a consistent data set for media planning. By this way of fusion, the resulting data set can be enabled for evaluation across all information and dimensions.
ANKORDATAscience framework - IMPUTATION
What does a talented tailor do with a hole in his pants? You are right, he stuffs it instead of simply patching it with another piece of fabric. This is what we do by imputation:
In audience research we observe a tendency of datasets that remain incomplete from surveying. Caused by technical measurement of media consumption which cannot be deployed identically to all households. Or in case of lacking acceptance of certain questions, such as for net income or economic situation.
Dealing with that situation, we design algorithms and processes to impute the missing data. As an example, we enhance responses to query on income of a household for surveys of agma and Radiotest. Parameters are derived from the existing data in a bespoken model, and we inject those into the gaps. In contrast to the method of fusion, distribution of the results compared to the originally collected data might be different.
Our models are mainly designed to achieve a high plausibility of the imputed data. During the integration into the pipelines and the automated processing, it is important to care about measures for persistent quality assurance and validation.
ANKORDATAscience framework – NET REACH MODELLING
Our clients usually see the predictable determination of reach and frequency in media as the major constituent of their work.
So, we do aim to exploit the measurement results on media consumption for the planning of advertising campaigns. The task is to develop prospective data models which can be applied on the probability of usage. We focus on validly determining the accurate number of consumers (net reach) and seek to keep the overlaps of various audiences reliable.
There are mostly two major use cases when forming the net reach by a data model:
• occasionally collected survey data
• continuously recorded measurement data
Survey: questionnaires on media consumption often contain specifically formulated questions on the last media contact with good recall and the frequency of usage. Assuming a sufficient sample size, we can employ segmentation as the method for calculating probabilities of usage and the net reach. This is what we have established as a proven model for some of agma’s projects. We aim to differentiate media users into segments by individual probabilities of usage from infrequent to regular use. This information can be derived from additionally surveyed variables, correlating with media usage.
In case of smaller samples, we rely on a lean procedure such as optimization of frequency cluster. For this method we derive the probability of usage from optimizing statistical distribution of target groups across the various clusters of frequency.
Continuous measurement: if this type of data is available, we apply discrete modelling methods, such as (negative) Binominal- or Poisson-models. Measurement data is considered as external reference in this case. By a multi-step approach, we determine the net-reach for various dimensions (e.g., devices, audiences) and several levels of hierarchy (e.g., channels, programmes, content, ad inventory). Key is that the modelling results must fit within the boundaries of the measurement results.
Those methods from our toolset can be applied to form data sets on respondent level (micro modelling) as well as for statistical projection of aggregated results across content and ads (macro modelling). For many media research projects, the combination of the two perspectives must be seen as the best way for net-reach model. This is how we adjust the net-reach at the respondent level to specifications derived from aggregations of results (Calibration).
ANKORDATAscience framework – CALIBRATION
Similar to tailoring customized fashion pieces, data modelling in media research gets the final shape by the last adjustment.
But from our observation it seems, that calibration is a truly underestimated capability which helps quite often to safeguard the investments for measurement and data modelling. Calibration means here to ensure stability, consistency, and plausibility of the results to a level which is required for the purpose of media planning.
In a first step, reliable target specifications or references need to be determined. This might be sourced from a technical census measurement of media consumption. Next, the data at respondent level, such as from a panel survey, is compared to it and discrepancies will be detected. Root causes of such discrepancies is in many ways:
Insufficient panel sizes for long tail content with low panel usage
Bias in selection resulting in skewed figures of usage
Recruitment issues in attracting some target groups
Panel mortality with corresponding gaps in usage data
Device fragmentation (incomplete coverage of all devices used by a panelist, @work devices, secondary devices, ...)
For the sake of a coherent resulting data set, it is our conviction that this kind of "measurement gaps" cannot be eliminated just by correction of frequency data. It is likely that biases in the observation of usage frequencies will also affect the results of net reach. Rather, internal structure and overlaps between the measured media and advertising must become a material part of the data model.
Considering those multi-dimensional boundaries among the media and advertisement, we apply individually adapted "gradient descent" methods for optimization of calibration. As a blueprint, this has been proven and is persistently maintained by our solutions for AGF and AGOF.