Limitations of using simple indicators for evaluating agricultural emission reductions at farm level — evidence from Kenyan smallholder dairy production

Limitations of using simple indicators for evaluating agricultural emission reductions at farm level — evidence Kenyan production. Abstract National-scale quantification of the carbon footprint of milk on smallholder


INTRODUCTION
Carbon footprints -an expression of greenhouse gas (GHG) balances associated with production activitiesprovide information to support the transition towards a "Net Zero" economy that does not emit more greenhouse gases than it removes from the atmosphere. Agriculture, which is directly responsible for about 14% of annual GHG emissions [1] and up to one third when pre-and post-farmgate emissions are included [2,3] , has been quick to adopt the concept of carbon footprints, as evidenced by the existence of more than 18 carbon footprint calculators and countless associated methodologies [4] . The quantification tools typically apply Intergovernmental Panel on Climate Change (IPCC) equations to identify opportunities to lower farm emissions, monitor and report on changes, and document stories of progress. Many tools can be applied using Tier 2 methods that use farm-specific data and are able to reflect differences at the farm level. However, such tools rarely account for uncertainties involved in carbon footprint calculations [5] , even though their consideration is explicitly recommended in the IPCC guidelines [6] . The absence of uncertainty estimates makes it difficult to gauge the accuracy and robustness of calculated carbon footprints and any related emission reduction claims. Quantifying the change in farm-level carbon footprints with a known level of confidence is particularly relevant where estimates are used to generate tradeable carbon credits with which to incentivize improved farm performance.
Emissions of methane, most of which arise from livestock production, make a significant contribution to the global warming potential of the agriculture sector [7,8] . Cattle produce methane during digestion, and this enteric methane is often the most significant single agricultural emission source in African countries. For example, Kenya's approximately 17.5 million cattle contribute about 30% of the greenhouse gas emissions in the country. Mitigating the climatic impact of cattle therefore features prominently in the country's Climate Smart Agriculture Strategy 2017-2026 [9] , which describes the agricultural ambitions of Kenya's Nationally Determined Contribution (NDC) to curb climate change. Although dairy cattle comprise a relatively small proportion of Africa's total cattle population, dairy cattle production has been identified as a sub-sector with high growth potential in some regions, including East Africa [10] . The East African dairy sector is largely smallholder-based and contributes significantly to farm incomes, off-farm employment, agriculture sector value added, and export earnings. Significant growth in demand is projected due to increasing urbanization and rising incomes [11] . Current low levels of productivity, associated with poor animal genetics and inadequate availability of quality feed, indicate a high potential to increase productivity on many smallholder farms [12] .
The Gold Standard Smallholder Dairy GHG Quantification Methodology (SDM) [13] represents one effort to provide a standardized method for baseline setting in a sector with strong potential to achieve sustainable development benefits, particularly in the least developed countries [14,15] . The SDM approach differs from other existing Clean Development Mechanism (CDM) methodologies that require detailed data to characterize the baseline on each participating farm as well as annual monitoring data. Instead, SDM uses a representative sample survey to quantify the relationship between milk output per farm and the GHG emission intensity of milk production in the project region [13] . Using this relationship, the emission intensity on individual farms is estimated using data on milk production of participating farms (already collected by most dairy development initiatives). While this approach may reduce the transaction costs of project monitoring, the simplification and standardization it introduces may come at the cost of weaker environmental integrity [16,17] .
The SDM methodology strongly relies on the GHG emission accounting equations recommended by the IPCC [18] , which are used to estimate emissions from all animals encountered during the representative survey. These estimates are then aggregated at the farm scale to quantify the relationship between milk production per head and the emission intensity of milk production [13] . According to the methodology, this relationship may allow crediting farms for emission reductions based solely on an evaluation of the change in milk production per head. In this context, it is worth noting that the IPCC equations were originally intended to be used at national or other high-level scales, where variation among individual farms or animals is smoothed out by aggregating large numbers of animals. Applying the equations at finer scales or deriving information about farm-level emissions from higher-scale estimates may lead to considerable errors and uncertainties.
An important source of uncertainty related to the SDM methodology is model uncertainty derived from the formulation of the IPCC equations and default parameters used to estimate the carbon footprint of milk production. A second source is measurement uncertainty for farm production parameters due to the measurement methods or due to enumerator or farmer recall errors. Finally, there is uncertainty in the fundamental relationship between milk yield and greenhouse gas emission intensity. The SDM does not require the estimation of uncertainty, and the effects of these sources of uncertainty on the environmental integrity of emission reduction estimates are not considered.
Here, we identified the sources of uncertainty in the calculations used in the SDM based on a production survey of 414 households that supported the development of the Kenyan national GHG inventory using the IPCC Tier 2 approach [19,20] . We used uncertainty analysis, in the form of a Monte Carlo simulation, to answer four questions: (1) What is the variation of emissions and emission intensity across households?
(2) What is the minimum level of increase in milk yield required to be able to state with high confidence that emission intensity reductions have occurred, based on an empirically derived relationship between milk yield and emission intensity?
(3) Given the main sources of uncertainty, what are the priorities (and potential methods) for reducing the uncertainty that would optimally support carbon credit and mitigation incentive schemes while ensuring the environmental integrity of the crediting program?
(4) Given the uncertainty associated with estimated emission reductions, what methods could be used to avoid rewarding households that have not achieved emission reductions (false positives, also referred to as Type I errors) or failing to recognize households that have achieved emission reductions (false negatives, also known as Type II errors)?

METHODS
To assess greenhouse gas emissions from livestock, we used recognized good practices for GHG quantification regarding the estimation model and available information. The model's equations were taken from Chapter 10 of Volume 4 of the 2006 IPCC Guidelines for National Greenhouse Gas Inventories [18] . The model included all equations required to calculate emissions from individual animals, as well as feed production and transport emissions. It was originally intended for use on a national scale, but it has frequently been applied to estimate finer-scale emissions. The dataset we used is based on a survey of 414 farms reported by Wilkes and colleagues [21] . The survey collected information on milk yields and all aspects of farm management of milk-producing farms that are required to calculate carbon footprints (e.g., herd size, feed, and milk yield). Details on the survey methodology are provided in the Supplementary Materials and in the work by Wilkes et al. [20] . Thirty-one households had dairy cattle but did not produce milk in the year prior to the survey, most commonly because their cows did not lactate at any point during the year or farmers only recently purchased a heifer. These farms were excluded, leaving 383 with 1294 individual cattle for further analysis.

Simulating milk yield and greenhouse gas emissions
The IPCC provides a general estimate of the likely uncertainty range for livestock GHG emissions calculated using the Tier 2 equations and for some of the emission factors and parameters used [18] . Other studies have provided estimates of the uncertainty of intermediate parameters in the IPCC Tier 2 equations [22,23] and estimates of measurement error for key input parameters [24,25] . We expanded these estimations by assessing the uncertainty of all inputs to the IPCC model and evaluating the implications of this uncertainty for estimates of milk production and greenhouse gas emissions.
To include error estimates in assessing the GHG emissions of cattle, we implemented the IPCC equations [18] as a Monte Carlo simulation, a method that allows accounting for uncertainty in model input variables and model parameters [26] . In a Monte Carlo simulation, uncertain model inputs are described by probability distributions that express the likelihood of the variables assuming particular values [27] . The simulation is then run by drawing many sets of random values for each variable and executing the model for each such combination. The result is a distribution of model output values that expresses the probability of obtaining particular outcomes, given the uncertainty about input data. We implemented this simulation using functions of the decisionSupport package [28] in the R programming language [29] .

Estimating input data
We estimated GHG emissions for the population of 1,294 individual cattle surveyed across 383 households by Wilkes et al. [20] . According to Dong et al. [18] , emissions from each individual animal are strongly influenced by the feeding system and the nature and source of feed. In terms of feeding systems, the IPCC guidelines distinguish between stall-fed cattle, pasture-fed animals, and cattle that graze freely on large ranges. The effect of feed quality on greenhouse gas emissions is estimated via the digestibility and crude protein content of the available feed. For feed that is not obtained via grazing, calculating a carbon footprint requires that emissions during feed production and transport are considered. Characteristics of each individual animal impact emission estimates, which differ according to breed, animal type (lactating cow, mature bull, calf, etc.), live weight, mature weight, and weight gain. For cows, pregnancy status, milk production, and milk fat content were considered. Emissions from animal manure were also included, with emissions depending on the manure management system. Since our accounting was limited to on-farm emissions, emissions from manure that was sold (28 households sold, on average, 22% of their manure) were not considered.
While some of the characteristics of animals and farms listed above can easily be observed in the field (e.g., the animal type), others involve possible measurement errors (e.g., live weight or weight gain). Additional errors arise during subsequent calculations, when field data are translated into technical coefficients such as the methane conversion factor of the manure management system or the feeding system-specific animal activity coefficient. The recommended equations also contain various empirical constants that are typically used without considering possible errors.
We evaluated the impact of all these uncertainties on greenhouse gas emission estimates per animal for all surveyed farms. We assumed that enumerators accurately evaluated animal types, feeding systems, and manure management systems, but that all other input data were subject to measurement uncertainty. Wherever possible, we used published indications of these uncertainties, e.g., from subsequent refinements of the IPCC guidelines [18,30] [Supplementary Table 1]. Where no such evidence could be identified, we used our expert judgment to estimate plausible value distributions [ Table 1].
To implement data-based farm scenarios, we added functionality to the decisionSupport analysis package [28] to allow the specification of distinct scenarios when running the simulation (the scenario_mc function). Such scenarios were specified for all farms contained in the survey (survey results are provided in Supplementary Table 2; farm scenario specifications are presented in Supplementary Table 3). For most other model input variables, we amended the numbers returned by the survey by error estimates [ Table 1]. We assumed that distributions of all variables were independent of each other. We also computed a benchmark estimate by running the model a single time, with inputs based on precise numbers returned by the survey, as well as default values recommended by the IPCC guidelines [Supplementary Table 4]. This benchmark is used to illustrate the possible errors in emissions and emission intensity estimates that may arise from failure to consider uncertainty (illustrated for a single farm in Figure 1).
To make emission estimates comparable across farms, we expressed emissions on a per-head basis by dividing overall emissions by the total number of animals recorded during the survey. We also computed the emission intensity of milk production by dividing these per-head emissions by milk production per head. For each farm, we ran 10 simulations, with inputs generated by randomly drawing values from the farm-specific input distributions defined for each farm. This resulted in a total population of 4,140 simulated farms.

Evaluating simulation results at the population scale
Total greenhouse gas emissions from livestock production as well as the emission intensity of milk production vary considerably across farms (illustrated for the present dataset in Figure 2; emissions per head of cattle are shown in Supplementary Figure 1 and summary statistics in Supplementary Table 5). A key motivation of this work is to explore the relationship between these climate impact indicators and farmscale milk production per head, because the latter constitutes an easy-to-measure indicator that would be useful for monitoring the success of mitigation actions in the Kenyan livestock sector.
From all farm-scale data pairs of milk production per head and emission intensity [ Figure 3A], we interpolated a probability surface to generalize this relationship [ Figure 3B]. For each level of per-head milk production, the corresponding emission intensity can be described by a slice through the density surface at the respective production level (indicated by the vertical lines in Figure 3B). To generate such slices, we used the uncertainty R package [32] , to which we added a function (varslice_resample) that allows drawing random samples from the resulting slice. We used this function to characterize the emission intensity at specified per-head milk production levels [ Figure 4A]. Note that we restricted these population-scale analyses to farms for which we simulated a per-head annual milk production of > 500 kg. This step, which excluded 501 simulated farms that produced some milk but did not appear focused on dairy production, was necessary to avoid spurious emission intensity outliers resulting from low milk yields. We retained 3329 simulated farms, for which we evaluated the relationship between emission intensity and milk yield per head.

Comparing emission intensity across productivity levels
Based on the slices through the probability distributions, we compared the emission intensity of milk production across different productivity levels. For many such comparisons, the distributions of plausible Table 1

Model input variable Bounds of 90% confidence intervals Source
Milk yield per cow ± 27.5% Migose et al. [25] Emissions from feed production ± 25% Vellinga et al. [31] Emissions from feed transport ± 25% Vellinga et al. [31] Animal live weight ± 14.5% Goopy et al. [24] Animal weight gain ± 14.5% Goopy et al. [24] Feed digestible energy content ± 10 percentage points Own estimate Feed crude protein content ± 10 percentage points Own estimate   (shown in Figure 4B). values for different productivity levels showed considerable overlap, due to differences between the farms and additional variation introduced by the Monte Carlo procedure [ Figure 4B]. The data used to characterize the slices consist of a large number (1000 in this case) of random samples from the corresponding emission density distributions. We could therefore estimate the probability that emissions at a specific productivity level are lower than those at another level by comparing the respective sets of random values that correspond to these two levels. We implemented this evaluation by pairwise comparisons between all elements of the two sets of random values. The share of comparisons that featured lower values for a specific milk production level was interpreted as the probability of emission intensity at this production level being lower than that of the other production level.
We generalized this procedure across the whole spectrum of milk production levels encountered in the dataset. We produced probability density slices for production levels within this range, at intervals of 100 kg milk head −1 year −1 , followed by pairwise comparisons across all combinations of productivity levels. In these comparisons, we randomized the order of samples for one of the respective distributions (to avoid comparison of identical distributions for similar productivity levels). From the population of all these pairwise comparisons, we produced a generalization of the probability of being able to use data on the milk production level to decide which of a pair of two farms features a lower emission intensity [ Figure 5]. We also compared these results to the use of the 95% confidence interval of a LOESS regression between milk yield per head and emission intensity (shown in Figure 3A) as a criterion to evaluate whether emission reductions have occurred. However, since this measure represents the error of the regression equation rather than the error of the prediction, the result of this comparison is only shown in Supplementary Materials [Supplementary Figure 2].

Emission-reducing interventions
Greenhouse gas emissions from dairy cattle can be reduced through a number of measures, such as improved feeding or changes to the herd structure [33] . An appropriate indicator of farm-scale emission intensity should be able to identify farms that have taken such measures. To evaluate whether per-head milk production can serve this purpose, we simulated the impacts of five farm-scale interventions.

Breed change
Within the present population of dairy cows, eight different breeds were identified, including a category for cross-bred cows of unknown pedigree. Comparison of milk yields across these breeds indicated large differences [Supplementary Figure 3], with Holstein Friesian cows producing an average of 6.7 kg milk per day, whereas cows within the "unknown cross-bred" category only produced 3.0 kg milk per day [Supplementary Table 6]. We simulated the impacts of an intervention that encouraged farmers to replace cows of low-yielding breeds with Holstein Friesian cows, assuming that this would raise milk yields by a factor that corresponds to the ratio between the median milk yields of the two breeds involved in this breed change. Note that we did not consider the "Boran" breed for this intervention, since only a single cow of this breed was encountered. We divided the milk yield of all dairy cows by an intervention factor consisting of the ratio of the median milk yield of the present breed divided by the median milk yield of a Holstein Friesian cow.

Retiring unproductive bulls
A small proportion of households in the sample kept bulls used for reproductive purposes. We simulated a scenario where all bulls are removed from the herds, such as might occur if effective artificial insemination services were accessible and replaced the use of bulls for breeding.

Fewer replacement males
In the absence of artificial insemination, which is not routinely applied on most smallholder livestock farms in Kenya, farms need to raise, borrow, or purchase bulls to ensure that their cows remain productive. Many farms keep more than the necessary number of replacement males. We simulated an intervention that corresponded to a reduction in the number of replacement males by 23%, a percentage that we estimated as a reasonable reduction based on the present herd structures. We implemented this intervention by making approximately 23% (randomly generated with an adoption probability of 23%) of farms remove all replacement males from their herds.

Calliandra feeding
Greenhouse gas emissions can also be influenced by changing animal diets. Makau et al. [34] reported that adding a modest amount of foliage from the leguminous shrub Calliandra calothyrsus to the animal diet can raise milk yields. We simulated an intervention in which 30% of farmers adopted supplementary Calliandra feeding, which increased milk yields by 0.156 kg head −1 day −1 for lactating cows, increased the digestible energy content of animal diets by 0.14 percentage points, and decreased the crude protein content of the diet by 0.07 percentage points. Assuming that Calliandra replaces maize stover, we assumed a decrease in feed production emissions by 0.726 kg CO 2 -eq animal −1 year −1 , which corresponds to the emissions from production and transport of the same amount of maize stover.

Balanced diets
Inappropriate use of concentrate feeds can be a major driver of greenhouse gas emissions [21] . Evidence suggests that changing feeding regimens to ensure that nutrients are supplied according to animal needs can have beneficial impacts on milk production and greenhouse gas emissions [35] . We simulated the impacts of such an intervention by assuming that milk yields on adopting farms increased by 0.75 kg per day, and greenhouse gas emissions associated with purchased feeds decreased by 328.5 kg per animal and year. The assumed level of increase in milk yields is based on the results of a meta-analysis by Bateki et al. [12] , and the assumed level of decrease in feed emissions is based on decreased rates of concentrate use achieved in other balanced feed ration programs [36] and a country-specific estimate of the carbon footprint of feed concentrate [21] .

Intervention impacts and ability to detect adopters
To test the impact of the five interventions, we produced an R function (run_model) that allowed running the greenhouse gas emission model based on pre-defined sets of input values. Since in a typical Monte Carlo simulation run with the decisionSupport package all model inputs are stored, this run_model function allows precise reproduction of the simulation results based on these stored values (provided that no additional random effects are introduced by the model). The run_model function also allows modifying the input table to implement the emission-reducing interventions. Based on model runs for the original and the modified input tables, we were able to simulate intervention impacts against the backdrop of a true counterfactual scenario, in which all boundary conditions were identical, except for those affected by the interventions.
In reality, efforts to monitor greenhouse gas emissions from livestock-producing farms cannot be based on true counterfactuals. To assess whether a particular farm has achieved a reduction in greenhouse gas emission intensity, evaluators would need to compare farm-level indicators at two points in time, with the estimate at each of these times being affected by random effects. To simulate this situation, we executed a second run of the Monte Carlo simulation based on new random input values for all farms. This simulation provided a second plausible distribution of farm-scale emissions and productivity for the same population of farms. We followed the same procedure as in the first Monte Carlo simulation to simulate impacts of the same interventions on the same farms, obtaining an alternative intervention scenario in which farms were affected both by the interventions and by random effects that differed from those in the first simulation.

Classification errors
A suitable indicator of farm-scale emission intensity should be able to identify farms that have taken steps to reduce emissions, and it should be able to distinguish these farms from others that have not adopted any such measures. Where farms are classified incorrectly, it is common to distinguish between false positives (Type I errors) and false negatives (Type II errors). The Type I error expresses the percentage of farms that have made no changes to their production practices but are nevertheless classified as adopters by the respective indicator. The Type II error, in contrast, is the percentage of farms that have implemented an intervention but are not identified as innovators because their per-head milk production does not show sufficient improvement.
In the presence of uncertainty, classifications of farms into adopters and non-adopters can never be done with complete certainty, but the confidence of such classifications can be quantified based on an analysis of the population-level relationship between indicator and outcome measure. The population-level surface of the probability of detecting differences in emission intensity based on differences in milk yield per head [ Figure 5] allows determining, for each innovator farm, the maximum level of confidence that emission intensity has decreased that would lead to the household being correctly labeled as an adopter. Since this confidence level can be determined for each adopter, the reliability of detecting an innovating farm can be expressed as a function of the confidence level required by an adopter-detecting authority. Naturally, this reliability is high when adopting a low level of required confidence and lower when applying stricter classification standards. Farms that are falsely classified as adopters (Type I errors) can be identified using the same logic, by extracting the confidence level that would lead to classification as an adopter for each non-adopting farm. Evaluating the whole population of farms allowed computing the percentage of nonadopters that are falsely classified as adopters as a function of the confidence level required by the evaluator.

Sensitivity analysis
As expected, the emission intensity estimates generated through Monte Carlo simulation featured considerable uncertainty. Some of this uncertainty may be reduced by improved procedures to collect data or by specific studies on particular input parameters that might increase clarity on their regionally appropriate values [37] . Given that most estimates of uncertainty were derived from general recommendations or relatively uninformed expert estimates, such uncertainty reductions seem quite possible. Through additional information, the widths of input distributions could be reduced, reducing the extent of the parameter space and leading to more precise emission intensity estimates.
To identify variables with high potential for information gains, we conducted a sensitivity analysis that related variation in the distributions of input parameters to variation in model outputs. This analysis was implemented via Partial Least Squares regression analysis [38,39] between model inputs and outputs, with the variable-importance-in-the-projection (VIP) [40] score serving as an indicator of the influence of each input variable.
Based on the most influential uncertainties identified by the sensitivity analysis, we simulated the impact of gaining precision on the respective variables. We repeated all analyses described above based on an updated parameter set, in which the measurement error in the assessment of milk yield per dairy cow was reduced from 27.5% to 15%. We then compared these results with those from the model runs with the initial uncertainty and evaluated the impact of this simulated knowledge gain on the suitability of milk yield per head as an indicator of the emission intensity of milk production.

Variation in greenhouse gas emissions from livestock
Across the population of all simulated farms, greenhouse gas emissions per farm ranged from 0.7 to 58.2 Mg CO 2 -eq year −1 [ Figure 2A]. The median of farm-scale emissions was 7.5 Mg CO 2 -eq year −1 , with emissions on 90% of farms ranging between 2.5 and 24.0 Mg CO 2 -eq year −1 . On a per head basis, emissions ranged from 0.6 to 8.8 Mg CO 2 -eq head −1 year −1 , with a median of 2.8 Mg CO 2 -eq head −1 year −1 and 90% of emissions between 1.6 and 4.8 Mg CO 2 -eq head −1 year −1 [Supplementary Figure 1]. On dairy-focused farms (minimum milk production of 500 kg head −1 year −1 ), the median emission intensity of milk production was 2.2 kg CO 2eq kg −1 , with emission intensity ranging from 0.6 to 8.6 kg CO 2 -eq kg −1 across all farms and between 1.2 and 4.4 kg CO 2 -eq kg −1 for 90% of all farms [ Figure 2B]. Variation was thus substantial for all emission metrics, indicating the considerable potential for emission-reducing interventions.

Impact of interventions
All simulated interventions reduced the greenhouse gas emission intensity of milk production compared to a counterfactual situation, in which the respective farms made no changes in herd composition or management ( Figure 6; see Supplementary Figure 4 for absolute changes). In quantitative terms, however, interventions differed greatly in effectiveness, with changes in dairy cow breeds and reductions in the number of unproductive males and surplus replacement males potentially more than doubling per-head milk production on some farms and cutting emission intensity in half. Such major impacts were only observed for a relatively small number of farms that initially featured particularly unfavorable production settings, such as exclusive reliance on low-performance dairy breeds or a large percentage of unproductive bulls or replacement males [Supplementary .
Compared to the herd structure interventions, feed interventions had a more consistent but relatively modest impact. The greatest simulated changes caused by adding Calliandra foliage to animal diets were an increase in milk yield by 22% and a 18% reduction in emission intensity [ Figure 6], but 95% of adopting farms experienced per-head milk yield increases of less than 4% and emission reductions of approximately the same magnitude [Supplementary Table 7]. Similarly, the maximum simulated impacts of balancing animal diets were a milk yield increase of 70% and an emission intensity reduction of 53%, but 95% of adopting farms experienced milk yield increases of less than 17% and a drop in emission intensity of less than 23%.
It is worth pointing out that all interventions promise to cause reductions in emission intensity, so they appear worthy of implementation from a mitigation standpoint. It is also apparent that, for all farms that Figure 6. Intervention impacts on relative per-head milk production and relative greenhouse gas emission intensity, compared to a counterfactual farm with similar properties that does not implement the respective intervention. Note that the distinct-value pattern for the herd structure interventions (top row) results from all intervention impacts arising from the removal or replacement of individual animals in relatively small herds.
adopted an intervention, reductions in emission intensity were strongly associated with increases in perhead milk yield.

Expected changes on innovative farms
Intervention impacts are easy to detect when outcomes for farms that adopt an intervention can be directly compared with those of an identical farm that does not adopt it. Such true counterfactual situations can be simulated [ Figure 6], but they cannot be observed in the real world, where farms are subject to random changes in addition to the systematic effects of interventions. To generate a more realistic setting for evaluating the relationship between milk yield per head and emission intensity, we therefore ran a second Monte Carlo simulation, in which we implemented the same interventions on the same farms as in the original simulation. For farms that were selected as adopters of the five interventions, we compared outcomes for the first Monte Carlo simulation without the interventions (the baseline) with outcomes for the second Monte Carlo simulation with the interventions (the endline). The changes detected between these simulations represent the combination of systematic intervention effects and random variation.
Unsurprisingly, the impacts of the interventions were less clearly visible when the random variation was added to the endline (Figure 7; see Supplementary Figure 8 for absolute changes) than for the true counterfactual [ Figure 6]. We still detected significant positive effects on milk yields and negative effects on emission intensity [Supplementary Table 8], but these were difficult to distinguish from the random changes that also affected all other farms.

Ability to detect adopters and classification errors
Since, in the presence of random effects, milk productivity and emission intensity outcomes of farms that adopted any of the five interventions did not differ strongly from those of farms that made no changes to breed composition, herd structure, or feeding regimen, reliable detection of such farms through the easy-to- Figure 7. Intervention impacts on milk production per head and greenhouse gas emission intensity, in comparison to random changes between two evaluation times. The gray dots illustrate random changes that occurred on farms that made no management changes, whereas the colored dots show changes on farms that adopted the respective interventions.
measure indicator of milk yield per head proved challenging. Very few of the intervention farms exhibited changes in milk yields that were large enough to allow a confident call that emission reductions had occurred [ Figure 8]. Relaxing the confidence requirements for making such calls increases the percentage of innovator farms that are correctly identified, but many would still be missed even at confidence levels > 50%. In fact, many innovator farms experienced increasing emission intensities despite their mitigation efforts, since the positive impacts of these efforts were overcompensated by unfavorable random effects.
The difficulty in distinguishing adopters from non-adopters is reflected in high rates of either Type I or Type II errors [ Figure 9] at all levels of required confidence. While a low confidence requirement allows detecting most innovators (low Type II errors), such standards would lead to unwarranted attribution of mitigation efforts to large numbers of non-adopters (high Type I error). Increasing the required confidence level to the point where less than 50% of non-adopters would be falsely recognized for their mitigation efforts would deny such recognition to approximately half of the farms that adopted any of the interventions. Reflecting the greater impacts of the herd structure interventions compared to the feed interventions, herd structure intervention adopters are more easily detected, but, even for these interventions, high error rates should be expected.

Major determinants of emission intensity
Several model input variables showed strong relationships with the emission intensity of milk production [ Figure 10]. Particularly strong influences were detected for the average milk yields of the three cow types, as well as the digestible energy content of the animal diet. The respective coefficients of the PLS model were negative, indicating that high values for the respective variables were associated with low emission intensities. The number of mature bulls in the herd and greenhouse gas emissions from feed production were also found to be important, with high values associated with high emission intensities. These findings lend support to the interventions that were simulated, all of which impacted one or more of these influential variables. The low detectability of the innovating farms may be related to the influence of several other important variables, which were also strongly associated with emission intensity but not targeted by the interventions. Among these were the number of cows that experienced their first pregnancy, the number of lactating cows, the number of young heifers, and the methane conversion rate. While not all of these are amenable to modification through interventions, they nevertheless introduce effects that dilute the impact of the simulated interventions.

Impact of improved milk yield measurement
The sensitivity analysis indicated uncertainty about milk yields per dairy cow as a major driver of uncertainty in emission intensity estimates. Assuming that the measurement error of this variable can be reduced from 27.5% to 15%, as indicated by Migose et al. [25] , we updated the input variable distributions accordingly and repeated all analysis steps (see Supplementary Figures 9-13 for detailed results). While all results were still characterized by considerable uncertainty and variation, the prospects of using milk yield per head as an indicator of the emission intensity of milk production improved markedly, in particular for the "breed change" and "retiring unproductive bulls" interventions [ Figure 11]. At a required confidence level that emission reductions have occurred (x-axis in Figure 11) of approximately 50%-55%, Type I and Type II errors reached a common minimum of about 25% for the "breed change" and about 15% for the "retiring unproductive bulls" intervention. While errors of this magnitude may still be large for an evaluation scheme, they represent substantial progress compared to the simulation with initial uncertainty. Further precision gains regarding key uncertainties, which still include dairy cow milk yields and the digestible energy percentage of animal feed [ Supplementary Figure 13], may further improve the feasibility of inferring emission intensity from milk yields.

Milk yield and emission intensity projections and their uncertainty
Our model-based estimations of emissions from livestock herds in Kenya demonstrate the extent of Figure 9. Probability of Type I and Type II errors incurred by choosing specific levels of confidence that emission intensity changes have actually occurred. Type I errors indicate the probability of recognizing households for reducing emissions, even though they did not adopt an innovation, whereas Type II errors represent the percentage of households that made changes but are not recognized for them.
variation in farm performance [ Figure 2]. The calculations also illustrate that uncertainty about input parameters to the IPCC's greenhouse gas emission equations potentially introduces critical errors into farmscale estimates [ Figure 3]. This uncertainty adds noise to the correlation between milk yield per head and emission intensity, which raises concerns about the usefulness of milk yield as an indicator of emission intensity. These findings extend results from an earlier study on the same dataset by Wilkes et al. [21] , who estimated an error of −22.8% to +28.2% for the carbon footprint of milk in central Kenya, by highlighting the implications of such uncertainty for development programs aiming to reward farms that adopt emission-reducing innovations. The strong influence of uncertain model input parameters on estimated farm-scale outcomes lends support to warnings expressed in earlier studies [41,42] that uncertainty limits the feasibility of comparing point estimates of different systems or to rank farms according to their carbon footprint. It also highlights key leverage points where future efforts should be focused on to enable robust monitoring of emission reductions.  . Probability that Type I and Type II errors incurred by choosing specific levels of confidence that emission intensity changes have actually occurred, comparing initial uncertainty with improved measurements of milk yields (assuming a reduction of measurement errors to 15%). Type I errors indicate the probability of recognizing households for reducing emissions, even though they did not adopt an innovation, whereas the Type II error represents the percentage of households that made changes but are not recognized for them.

Impacts of mitigation and productivity interventions
All interventions we simulated had a marked impact on milk production and emission intensity [ Figure 6]. Milk production per head is expected to rise by up to 25% in response to feed interventions and by up to 150% following breed or herd structure interventions. These results are consistent with the results of previous assessments [33] . FAO and the New Zealand Agricultural Greenhouse Gas Research Centre [33] estimated a 6.5%-12.3% reduction in enteric methane emission intensity and a 4.3%-12.4% increase in milk production from supplementation with concentrates. For leguminous fodder supplements, they estimated a 7.3%-24.4% reduction in enteric emission intensity and a 7.7%-32.2% increase in milk yield.
Emission reduction potentials varied considerably among farms, especially for breed or herd structure interventions, depending mostly on the initial composition of cattle herds [ Supplementary Figures 5-7]. Nevertheless, meaningful improvements were detected on almost all farms that adopted the interventions, compared to a counterfactual situation in which identical farms did not implement them [ Figure 6]. Detecting these changes in a more realistic setting where all model input variables were subject to random variation within the expected error ranges proved considerably more difficult [ Figure 7]. This noise severely limited the feasibility of selecting a convincing threshold for confidently concluding that a particular farm had successfully reduced its carbon footprint through the proposed interventions [ Figure 8]. High confidence that emission reductions occurred was only achieved for very few farms, and this goal was only within reach for farms with particularly unfavorable initial conditions, e.g., for farms that almost exclusively relied on cross-bred cows of unknown pedigree or farms that featured a large proportion of unproductive male animals [ Supplementary Figures 5-7].
The influence of random noise on emission estimates precluded the definition of a convincing farm classification scheme based on milk yields per head. Requiring high confidence that emission reductions have occurred would have placed very few of the intervention farms in the "adopter" category [ Figure 9]. A reduction in the required confidence level to 60%-70% for the breed and herd structure interventions and almost to 50% for the feed interventions would be required to ensure that at least half of the innovator farms are recognized. Such a reduction in required confidence, however, would strongly increase the rate of noninnovating farms that are recognized as adopters despite having made no efforts to reduce their carbon footprints (Type I error in Figure 9). Since a convincing carbon crediting scheme should be able to reliably classify all farms -not only the ones that start off with particularly unfavorable conditions -we see considerable challenges in the use of milk production per head as an indicator of emission intensity, unless the precision of model input parameter estimates can be improved.

Improving the precision of model inputs
Estimates of farm-scale milk production and carbon footprint are uncertain because of our limited knowledge about the values of many input variables. The influence of these knowledge gaps varied considerably across the spectrum of model inputs [ Figure 10], highlighting opportunities for strategic measurement efforts that might enhance the precision of the assessment. In this context, a measurement is any effort to reduce uncertainty [43] . Such efforts may consist of additional survey or monitoring activities, but given the generic nature of many of the initial error estimates, which were largely derived from recommendations in IPCC guidelines, elaboration of more context-specific uncertainty bands for input variables may also be a feasible option.
Addressing critical knowledge gaps showed the potential to raise the feasibility of using milk yield per head as a carbon footprint indicator and to correctly identify adopters of mitigation interventions [ Figure 11].
Our assumption of improved precision in milk yield estimates reduced classification errors and improved the ability to distinguish adopters from non-adopters at certain levels of required confidence. However, given the high sensitivity of the classification scheme to the choice of required confidence level and the considerable classification errors that would remain, just enhancing milk yield measurement precision may not be sufficient for reliable approximation of carbon footprints through milk yield measurements. Further precision gains, especially regarding milk yields of dairy cows and the composition of animal feed, would improve the prospects for using this easy-to-assess metric for carbon footprint estimations. It is important to consider, however, that practically all possible precision gains come at a cost, which must be carefully weighed against the value of information gains.

System boundaries and leakage risks
The system boundary in the SDM is the smallholder farm household. For farms with insufficient replacement rates to maintain the current herd size, emissions from replacement heifers raised off-farm are considered in the SDM methodology. Given that almost all farms in our sample had sufficient replacement rates, this was not modeled here. Other potential impacts of farm-scale productivity increases that occur beyond the farm boundary (e.g., market leakage and supply chain emissions) are not considered in the SDM. However, we note that such off-farm impacts may be quite relevant if dairy development interventions were implemented at larger scale (e.g., at national level). To ensure that emission reductions are achieved at such scales, potential leakage risks need careful consideration and may require expansion of the SDM methodology beyond the farm gate.

CONCLUSION
The relationship between milk production per head and the greenhouse gas emission intensity of milk production on smallholder dairy farms in Kenya is subject to considerable uncertainty, caused mainly by limited knowledge of farm and animal characteristics. This uncertainty severely compromises the reliable identification of households that adopt carbon footprint reduction interventions based on changes in milk yield per head. Nevertheless, we remain cautiously optimistic about the use of simple indicators of complex farm-scale outcomes since such indicators often represent the only feasible assessment criteria in diverse and dispersed agricultural settings. Our analysis highlights the importance of understanding the nature of the relationships between such metrics and the outcome measures of interest, as well as the difficulty of detecting potentially weak signals among the significant noise that is introduced by real-world variability and uncertainty. This calls attention to the need to thoroughly investigate what can and what cannot be inferred from such indicators. In particular, further efforts are required to identify cost-effective ways to reduce the uncertainty of milk yield and diet composition estimates at individual animal and farm levels. Insights from such investigations should be taken into account when designing reward schemes for climate change mitigation programs.