Wednesday, 1 January 2025

An update on using ChatGPT for economics research

The blog has been a bit quiet over the last few weeks, as I've taken a break from writing. I haven't taken a break from reading or other related activities though, and in particular I've spent a bit of time thinking about how to utilise generative AI to improve my teaching practice next year. That thinking was inspired by two seminars - this seminar by Justin Wolfers, and this seminar by Kevin Bryan.

I've also been thinking about how to be more intentional in my use of generative AI in supporting my research. As those who interact with me often in real life will know, I have far too many research projects on the go and at various stages of incompletion, as well as many research ideas that are gathering dust as I have no time to progress them. That has left me severely disappointed on a couple of occasions over the last two years in particular, during which time some cool ideas that I had have appeared in the literature before I even had a chance to try them out for myself!

Given my aims to improve productivity through generative AI, I was delighted to see Anton Korinek's (University of Virginia) recent NBER Working Paper (ungated alternative version here) that updates his earlier work on using generative AI for economics research (I wrote about his earlier paper here). The update covers use cases for generative AI, but importantly highlights the advances since the earlier paper. Specifically, Korinek notes that:

A key focus of this update are the new access modes enabled by the cumulative performance gains over the past year. Workspaces for interactive collaboration, such as Anthropic's Claude Artifacts and OpenAI's ChatGPT Canvas, Microsoft Copilot or Cursor, are changing how we interact with LLMs. They create an environment where users can iteratively develop and refine ideas, shifting away from static chat-style interactions towards more dynamic, document-oriented collaboration. They allow users to work in tandem with LLMs, offering real-time feedback and allowing for iterative editing. Another example is a new generation of real-time voice assistants that can also assist with research tasks.

A second recent breakthrough are LLM-based reasoning capabilities, exemplified by OpenAI's o1 series. I describe why traditional LLMs were not very good at reasoning and how work on a new generation of reasoning models is helping to overcome these barriers, enabling AI models to engage in multi-step problem-solving and logical deduction. This advancement opens new avenues for LLM use in economic research.

Finally, LLM-powered search, newly integrated into ChatGPT in November 2024 and also offered by Google Gemini and startups like Perplexity, is starting to become a useful tool to provide up-to-date answers to questions that are grounded in facts found on the internet, together with the requisite citations - a crucial capability for researchers...

In addition to describing these new developments, I also cover several other novel examples and use cases for LLMs in the subsections below, many of which center around new ways of promoting research outputs - the current generation of LLMs is highly capable of processing the main insights of research papers and accurately translating and conveying them into a wide range of formats:

  • Advanced mathematical derivations using o1-preview
  • Sophisticated coding tasks using o1-preview
  • Creating a 'deep dive' podcast of your research paper
  • AI-powered search
  • Creating presentation slides
  • Drafting blog posts
  • Conducting interviews 

On the new access modes, Korinek writes that:

Enabled by longer context windows and greater speeds, LLM providers have introduced innovative workspace environments in recent months that allow users to interactively collaborate with LLMs on content... The resulting products allow users to move beyond the limitations of traditional chat interfaces and to interact with AI assistants in a more dynamic and collaborative environment for content creation, editing, and interaction.

And on voice assistants:

A significant innovation in access modes is a new generation of real-time voice assistants. Earlier voice assistants transcribed a user's spoken language into text that was fed into an LLM; after processing the request, the LLM's response was translated into audio again. By contrast, the new generation natively processes spoken text with all its nuances in tone and emotional expression and responds accordingly. Moreover, it also allows users to jump in and interrupt the flow mid-sentence in a way that allows for a more natural and fluid conversation.

The improvement in reasoning capability is something that is obvious for even causal users, and is both really positive (from a research perspective) and really negative (from a teaching and learning and assessment perspective). As Korinek notes:

An influential mechanism to obtain better-reasoned results has been chain-of-thought prompting, which instructs LLMs to proceed step-by-step when generating responses to a prompt. This technique has delivered significant performance gains by guiding LLMs to break down complex questions into smaller logical steps that are easier to accomplish - akin to a student who performs better on an exam when asked to report his intermediate steps.

That extends into better reasoning on mathematical problems, which is of particular value for economists (and economics students). However, on LLM-powered search, Korinek notes that the models are improving but are still not good enough on their own:

The recent advances in LLM-powered search have significantly improved the ability of LLMs to help with literature research... Moreover, dedicated websites such as https://elicit.org allow literature searches and compile lists of papers on the requested topics, together with LLM-generated summaries.

However, when performing literature reviews, LLMs still struggle with synthesizing the state of the art on a given research topic - they lack the deep background knowledge that an experienced scholar has regarding the relative usefulness of different papers in their field of expertise and over-emphasize papers with titles that closely match the requested literature search.

In terms of promoting research outputs, I really like this example (which I was aware of and had seen in action already):

In September 2024, NotebookLM introduced a groundbreaking feature called "Deep dive conversation," which allows users to generate podcasts based on the provided materials. These podcasts feature an insightful conversation between two virtual agents who explain the essence of the material in simple and clear terms for non-experts, narrated in a highly engaging manner. For research use, this is one of the most impressive capabilities that have emerged in recent months. It offers a novel way to disseminate complex economic research to a broader audience, potentially increasing the impact and accessibility of economic studies. The tool is available for free at https://notebook.google.

I can also see a use case for economics teaching, since this would allow students to engage with economics research in a novel way, without the effort required of reading the underlying paper (there are both pros and cons to this, of course). I'm thinking that we can use this in the Waikato Economics Discussion Group in 2025, particularly to allow more engagement for first-year economics students in the group.

Korinek's paper is the go-to resource for a succinct summary of use cases for generative AI in economics research. He is even maintaining a website that has the paper as well as lots of examples of the use cases in action. If you, like me, are interested in improving your research productivity (or just interested in general), you should check it out!

Read more:

No comments:

Post a Comment