Sex, Drugs and Economics: Generative AI is mastering 'metrics

Wednesday, 25 June 2025

Generative AI is mastering 'metrics

The capabilities of generative AI continue to grow. In the latest example, some enterprising economists have developed an agentic AI that can complete tasks using econometrics (the economist's statistical toolset) - 'mastering 'metrics', as Angrist and Pischke would say (see my review of their excellent econometrics text). The agentic AI approach is outlined in this new working paper by Qiang Chen (Shandong University) and co-authors. As they explain:

We propose and implement a zero-shot learning framework, called Econometrics AI Agent, that enables AI agents to acquire domain knowledge without costly LLM fine-tuning. The framework’s core component is an econometrics “tool library” implementing popular econometric methods, including IV-2SLS, DID, and RDD.

...we augment each econometric tool with detailed “prompts”—comprehensive method descriptions that specify inputs, hyperparameters, and outputs. These prompts are provided alongside corresponding Python implementations, creating a standardized interface between the econometric methods and the AI agent. This design allows the LLM to leverage both its general econometric knowledge and the specifically crafted prompts and tools, enabling it to conduct complex econometric analyses through multi-round interactions with users. The resulting framework empowers Econometrics AI Agent to independently handle applied econometric tasks, delivering comprehensive results that include parameter estimation, inference, and analytical discussions.

The Econometrics AI Agent that Chen et al. created is available here. That site also includes detailed installation instructions, and a helpful demonstration video. Coming back to the paper, Chen et al. show the capabilities of the model by testing it on several real-world problems:

We evaluate the Econometrics AI Agent through two sets of inquiries. The first comprises 18 exercises from the coursework assignments of a doctoral-level course titled “Applied Econometrics” at the University of Hong Kong, with Python-generated standard solutions. These exercises cover OLS & PanelOLS regression, propensity score matching, IV-2SLS regression, Difference-in-Differences (DID) analysis, and Regression Discontinuity Design. The second set consists of test datasets from randomly selected seminal articles in reputable journals, primarily accompanied by Stata-based replication packages.

They compare the performance of their agent, in terms of creating code that works correctly, and in terms of the resulting estimated coefficient of interest, in comparison to three alternatives:

...(i) direct LLM generation in Python code, (ii) direct LLM generation in Stata code, and (iii) baseline general-purpose AI agents without specialized econometric tools and domain knowledge.

So, this is an approach to replication, which is important (for example, see here), and is a point that I will return to later. Overall, in comparison to LLMs and general-purpose AI agents, the Econometrics AI Agent performs much better. In terms of the econometrics coursework assignments, Chen et al. find that:

The Econometrics AI Agent demonstrates superior performance with a 95% directional replication rate and average coefficient value errors below 3%. In contrast, both GPT-generated Python and Stata control groups show incorrect directions in over half of test cases. While the general AI Agent achieves a 78% directional replication rate, its coefficient values frequently deviate significantly from true values.

The rate of 'perfect replication' (which Chen et al. defined as the errors in the coefficient, standard error, and p-value all within 1% of the 'true' value) was 51.85 percent for the Econometrics AI Agent, but less than 30 percent for the other models. Turning to the published paper replications, Chen et al. find that the rate of 'perfect replication' was just 27.41 percent for the Econometrics AI Agent, but that was still far higher than the other models, which all had rates under 18 percent. In relation to those results, Chen et al. note that:

...the Econometrics AI Agent does show room for improvement. For example, its performance declines for complex econometric methods like DID and RDD compared to simpler approaches such as OLS and IV-2SLS. Similarly, results slightly deteriorate when moving from straightforward coursework problems to more sophisticated paper replication tasks. However, these limitations can be addressed through the AI agent’s domain knowledge architecture—specifically by developing customized tools and enhancing prompt instructions to better support complex algorithms and detailed requirements.

Indeed, it is the modular nature of the agent's architecture that may be its key advantage, allowing modules relevant to each econometric task to be added or updated over time. On this point, Chen et al. note that:

Unlike the costly and often infeasible process of fine-tuning an LLM to keep pace with rapid academic advances in developing new techniques, our agent can be updated simply by adding new tool functions and descriptions to the prompt library. This modularity allows the agent’s knowledge base to expand alongside the field’s developments, making the integration of recently published procedures as straightforward as adding new modules.

So, we can expect that the agent's capabilities, and its accuracy, will only improve over time. We may not be very far away from a time when empirical economists spend far less of their time on coding esoteric econometric code in order to extract meaningful results. What will we do with our free time? Maybe we'll be able to turn our attention to a greater variety of research questions.

There is a further positive aspect to these results. The replication crisis is real in many disciplines, including economics. Having AI agents that can automate the steps required to generate econometric results will decrease the time cost of completing replications. That means that we may expect more paper replications in the future (at least, more of the type of replication that Miguel and Christensen call a 'verification'). This move will certainly be a positive, leading to improvements in the quality of research in the future.

[HT: Marginal Revolution]

Sex, Drugs and Economics

Wednesday, 25 June 2025

Generative AI is mastering 'metrics

No comments:

Post a Comment

Get new posts by email: