Who published the most influential AI research? Google is far ahead, and the conversion rate of OpenAI results beats DeepMind

Source: Heart of the Machine
This article is about 2900 words, and it is recommended to read for 9 minutes. After counting the 100 most cited papers in the past three years, we found that...

Who is publishing the most influential AI research? In today’s era of “a hundred flowers blooming”, this question has a lot of room for exploration.

You may guess some conclusions: For example, top institutions such as Google, Microsoft, OpenAI, and DeepMind, such conclusions are only half correct, and there are other information that reveal to us originally unknown conclusions.

With the rapid development of AI innovation, it is crucial to obtain some “intelligence” as soon as possible. After all, few people have time to read everything, but what is certain is that these papers compiled in this article have the potential to change the direction of artificial intelligence technology development.

The real test of the influence of the R&D team is of course how the technology lands in the product. OpenAI released ChatGPT at the end of November 2022, which shocked the entire field. Another breakthrough after “Training language models to follow instructions with human feedback”.

It is rare for a product to land so quickly. Therefore, in order to gain insight into more information, Zeta Alpha recently adopted a classic academic indicator: Citations.

A detailed analysis of the top 100 most cited papers per year for 2022, 2021, and 2020 provides insight into the institutions and countries currently publishing the most influential AI research. Some preliminary conclusions are: the US and Google still dominate, and DeepMind has also achieved brilliant achievements this year, but considering the output volume, OpenAI is indeed at the forefront in terms of product impact and research, and can be quickly and widely adopted. quote.

Source: Zeta Alpha

As shown in the figure above, another important conclusion is that China ranks second in terms of influence in terms of research citations, but there is still a gap compared with the United States, and it is not “equaled or even surpassed” as described in many reports.

Using data from the Zeta Alpha platform, combined with human curation, this article collects the most cited papers in AI for 2022, 2021, and 2020, analyzing the authors’ affiliations and countries. This enables ranking of these papers by R&D impact rather than pure publication data.

To create the analysis, we first collect the most cited papers per year on the Zeta Alpha platform, and then manually check the first publication date (usually an arXiv preprint) to place papers in the correct year. This list was then supplemented by mining highly cited AI papers on Semantic Scholar, which has a wider coverage and can be sorted by citation count. This mainly found papers from outside of high-impact publishers such as Nature, Elsevier, Springer and other journals. The number of citations each paper had on Google Scholar was then used as a proxy metric, and the papers were sorted by this number to arrive at the top 100 for the year. For these papers, we used GPT-3 to extract authors, affiliations, and countries, and manually checked these results (taking the country where the organization was headquartered if the country was not obvious in the publication). If a paper has authors from multiple institutions, each institution is counted once.

After reading this list, the boss Yann LeCun expressed his relief: “At Meta AI, we prefer to publish quality rather than quantity. That’s why among the 100 most cited artificial intelligence papers in 2022, Meta AI wrote (or co-authored) 16 papers, second only to Google’s 22. Our research is having a huge impact on society. (Plus, NYU ranks very well).”

So, what are these Top papers we were talking about?

Before diving into the numbers, let’s take a look at the top papers of the past three years. I’m sure you’ll recognize a few of them.

Hot Papers of 2022

1. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models

Paper link: https://academic.oup.com/nar/article/50/D1/D439/6430488

Organization: DeepMind

Citations: 1372

Topic: Using AlphaFold to augment protein structure database coverage.

2. ColabFold: making protein folding accessible to all

Paper link:

https://www.nature.com/articles/s41592-022-01488-1

Citations: 1162

Topic: An open-source and efficient protein folding model.

3. Hierarchical Text-Conditional Image Generation with CLIP Latents

Paper link: https://arxiv.org/abs/2204.06125

Organization: OpenAI

Citations: 718

Subject: DALL?E 2, complex prompted image generation that left most in awe

4. A ConvNet for the 2020s

Paper link: https://arxiv.org/abs/2201.03545

Institution: Meta, UC Berkeley

Citations: 690

Topic: A successful modernization of CNNs at a time of boom for Transformers in Computer Vision

5. PaLM: Scaling Language Modeling with Pathways

Paper link: https://arxiv.org/abs/2204.02311

Agency: Google

Citations: 452

Topic: Google’s mammoth 540B Large Language Model, a new MLOps infrastructure, and how it performs

Hot Papers of 2021

1. “Highly accurate protein structure prediction with AlphaFold”

Paper link:

https://www.nature.com/articles/s41586-021-03819-2

Organization: DeepMind

Citations: 8965

Topic: AlphaFold, a breakthrough in protein structure prediction using Deep Learning

2. “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”

Paper link: https://arxiv.org/abs/2103.14030

Organization: Microsoft

Citations: 4810

Topic: A robust variant of Transformers for Vision

3. “Learning Transferable Visual Models From Natural Language Supervision”

Paper link: https://arxiv.org/abs/2103.00020

Organization: OpenAI

Citations: 3204

Topic: CLIP, image-text pairs at scale to learn joint image-text representations in a self supervised fashion

4. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”

Paper link:

https://dl.acm.org/doi/10.1145/3442188.3445922

Organization: U. Washington, Black in AI, The Aether

Citations: 1266

Topic: Famous position paper very critical of the trend of ever-growing language models, highlighting their limitations and dangers

5. “Emerging Properties in Self-Supervised Vision Transformers”

Paper link: https://arxiv.org/pdf/2104.14294.pdf

Agency: Meta

Citations: 1219

Topic: DINO, showing how self-supervision on images led to the emergence of some sort of proto-object segmentation in Transformers

Hot Papers of 2020

1. “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale”

Paper link: https://arxiv.org/abs/2010.11929

Agency: Google

Citations: 11914

Topic: The first work showing how a plain Transformer could do great in Computer Vision

2. “Language Models are Few-Shot Learners”

Paper link: https://arxiv.org/abs/2005.14165

Organization: OpenAI

Citations: 8070

Subject: This paper does not need further explanation at this stage

3. “YOLOv4: Optimal Speed and Accuracy of Object Detection”

Paper link: https://arxiv.org/abs/2004.10934

Organization: Academia Sinica, Taiwan

Citations: 8014

Topic: Robust and fast object detection sells like hotcakes

4. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”

Paper link: https://arxiv.org/abs/1910.10683

Agency: Google

Citations: 5906

Topic: A rigorous study of transfer learning with Transformers, resulting in the famous T5

5. “Bootstrap your own latent: A new approach to self-supervised Learning”

Paper link: https://arxiv.org/abs/2006.07733

Institutions: DeepMind, Imperial College

Citations: 2873

Topic: Showing that negatives are not even necessary for representation learning

Leading Institution Ranking

Let’s take a look at how some leading institutions rank in the top 100 for number of papers:

Google has been the strongest player, followed by Meta, Microsoft, UC Berkeley, DeepMind, and Stanford. While industry is “calling the wind and rain” in AI research today, a single academic institution won’t have that much impact, but those institutions have a much longer tail, so when we aggregate by organization type, there is a balance .

In terms of total research volume, Google ranked first in the past three years, Tsinghua University, Carnegie Mellon University, Massachusetts Institute of Technology, Stanford University and other universities ranked high, while Microsoft ranked third. On the whole, the number of researches conducted by academic institutions is greater than that of technology companies in the industry, and the number of researches published by the two technology giants Google and Microsoft in the past three years is also at a high level.

In fact, Google’s scientific research strength has always been strong. In 2017, Google published the paper “Attention Is All You Need”, marking the advent of transformers. To this day, transformers are still the architectural basis of most NLP and CV models, including ChatGPT.

Last month, on the occasion of Bard’s release, Google CEO Sundar Pichai also stated in an open letter: “Google AI and DeepMind have pushed the development of state-of-the-art technology. Our Transformer research project and our 2017 field paper, as well as our Important advances in diffusion models underlie many current generative AI applications.”

Of course, as the company behind the newly promoted ChatGPT, OpenAI’s research results conversion rate (Conversion Rate) in the past three years has an absolute advantage. In recent years, most of OpenAI’s research results have attracted great attention, especially in large-scale language models.

In 2020, OpenAI released GPT-3, a large-scale language model with 175 billion parameters, which subverted the game rules in the field of language models to a certain extent, because it solved many problems in large-scale language models. GPT-3 set off a frenzy of large-scale language models. Over the past few years, the parameter scale of language models has been continuously broken, and people have been exploring more potential of large-scale language models.

At the end of 2022, ChatGPT was born, which has attracted great attention to text generation and AI dialogue systems. In particular, ChatGPT has demonstrated very high capabilities in generating knowledge-based content and generating code. After Google and Microsoft successively announced the integration of functions similar to ChatGPT into the next generation of search engines, ChatGPT is considered to lead a new revolution in AIGC and intelligent tools.

Finally, let’s take a look at the 100 most cited papers in 2022:

There is also an increase in tweet mentions, which is sometimes seen as an early indicator of impact. But so far, the correlation appears to be weak. Further work is needed.

Original link: https://www.zeta-alpha.com/post/must-read-the-100-most-cited-ai-papers-in-2022

Editor: Huang Jiyan