GitHub black market exposed! High-end brushing stars cost 6 yuan each, and 97% of the most bizarre open source projects are brushed!

Click the Card below and follow the “CVer” public account

AI/CV heavy-duty information, delivered as soon as possible

Click to enter->[Computer Vision and Transformer] Communication Group

Scan the QR code to join CVer Knowledge Planet, you can quickly learn the paper ideas from the latest top conferences and journalsand CV from entry to Proficient in information, as well as cutting-edge projects and applications! Highly recommended for paper submission!

6251dd23ec1a8515e1596411b2697ba5.jpeg

Mengchen Crexi from Aofei Temple
Reprinted from: Qubit (QbitAI)

How much does it cost to buy GitHub stars on the black market?

The most expensive ones are as high as 6 yuan a piece.

Entrepreneur Yassin Eldeeeb spent his own money to test it. He spent a full 20 euros (about 156 yuan) and only bought 25 “high-end stars”.

f44739608b49713a2720e414575edc88.png

Yes, there are high and low levels of earning GitHub stars on the black market.

Advanced ones use accounts that have been registered for more than a year. Personal information such as nicknames, avatars, workplaces, etc. are very natural and never the same.

There is even at least one contribution record to other open source projects, etc. Not only can the algorithm not detect it, but there is nothing wrong with the naked eye.

c097f64e6513da35ef1070c21cc500ac.png

The cheapest ones can cost as little as 0.4-0.88 yuan per star. This is the simplest way to register a new account with a default avatar and a randomly generated nickname.

A month after I bought it, I found out that their accounts had been banned by the platform, and the stars I bought had also disappeared.

But the craziest thing about this cheap service is: if it fails, you can contact the seller and get a free reinstall.

There is no way to know exactly how many people purchased this service, but Eldeeeb noticed that his bill number was #57189, indicating that the transaction volume was definitely not a small number.

495104f37459dd53d345ccc434d69130.png

“Black market” star-earning services like this have been frequently exposed recently and have become a frequent topic in the developer community.

4db89f5dc97f634a1b56b7433115c9af.png

In everyone’s discussion, there was one of the weirdest open source projects, and it was found that 97% of the stars were fake.

False star detector

The person who detected this weird project was another entrepreneur, Fraser Marlow, who accidentally discovered the existence of the GitHub black market.

At the same time, he also noticed that investors are paying more and more attention to the number of GitHub stars as an indicator for evaluating open source products.

However, as the head of growth at data pipeline service company Dagster, he not only failed to give his product stars –

Instead, we worked with experts in identifying spam to collect data and develop a fake star detector.

a143611a1cda2e89773d62fc2234c4ae.png

Specifically, it is divided into two algorithms. The simple algorithm can only detect those that are “fake at first glance”.

For example, a large number of accounts give stars to the same two projects, and there is no contribution record. Except for the avatar and user name, everything else is the same.

81dab08257daa8ee1c97832d268a80b6.gif

But for the 6 yuan one-star premium account mentioned at the beginning, the simple algorithm is powerless.

To this end, Dagster also designed a more complex supervised clustering algorithm.

The principle is also very simple. A batch of fake accounts will have similar characteristics and can be clustered together in the visualization.

The characteristics of normal users should be quite unique, statistically very scattered, and should not belong to any large group.

For example, normal GitHub accounts do not have activity records every day. If a group of accounts are active on the same date, it means that they are likely to be controlled by the same script.

In order to verify the reliability of the algorithm, they created a target warehouse and actually purchased the star-brushing service.

The clustering algorithm performed very well in the test, with a matching rate of close to 100%.

On more complex real data, it also achieved a precision of 98% and a recall rate of 85%.

3352a1ca721983921e745e06b08c18c6.png

Next, the team tested the two algorithms comprehensively on the Github Archive public data set.

The total number of stars for okcash, which is the most serious fraud, is 759. The simple algorithm only found one suspected fake star. When combined with the clustering algorithm, it jumped to 97%.

Due to the high computational cost, only stars obtained on or after January 1, 2022 were analyzed in the test.

In other words, there are still many projects that will gain stars before 2022 that have not been discovered.

4dd01a0d653624135a1f2c77138b6672.png

In comparison, they tested their own product Dagster and several peers, and the star rate was relatively low. It seems that the data pipeline industry is still relatively healthy.

5c0cc6ca8b2bdc7dcec89f03443ea649.png

After that, they shared these findings with the GitHub team and made the detector open source.

Within 48 hours of the exposure, both GitHub and the fake stars suppliers took action, and all the fake stars in their “target warehouse” for testing disappeared.

According to GitHub’s response, in fact, it has been actively cracking down on star brushing for many years, but it still happens frequently and cannot be controlled at all.

There has been academic research before, and 63,872 suspicious accounts were identified through data analysis, but less than 5% of them were detected and banned by the GitHub platform itself.

2ba6260ea057120179d63bdf3ce4abd8.png

Research speculates that the star brushing industry earned profits of US$3.41 million to US$4.37 million as early as 2018-2019.

So why do people spend a lot of money to buy GitHub stars? Can it really bring actual benefits?

Investors: We just love to see stars

An important purpose for an open source project team to choose “star brushing” is to attract the attention of investors.

Pratima Aiyagari, a partner at a venture capital firm, said that it is very likely that you will not make money for a long time from doing open source projects.

Since the income cannot be used as a reference, it is necessary to take a closer look at the condition of the product itself.

The most accurate way to examine open source projects is to view the code, but this method is complex, cumbersome and highly professional, and has not become the preferred method for investors.

So investors found an alternative-looking at the stars-in effect, they were hard-wired to look for fast-growing new accounts.

24f8ec7b148bd9413cab271d8cd54cf0.png

In addition to absolute numbers, venture capital firm Runa has also specially designed an indicator called the ROSS index to rank teams based on the annual growth rate of stars.

Konstantin Vinogradov, a partner at Runa, said that the ROSS indicator has become an important standard for open source projects to follow, and one-third of the top-ranked open source projects have received financing.

However, with the emergence of the “star brushing” phenomenon, investors’ views on the number of stars have also begun to weaken.

Investor Kevin Zhang said that the number of stars may be a “stepping stone”, but it does not mean that investors will “meet a second time” with the project team because of the number of stars.

This also confirms the view of the academic community – Stuart Geiger, an assistant professor at the University of California, San Diego, said that over time, indicators (number of stars) may become invalid on their own.

a62a0befbf0acefe5ff8bd8b8daeb36e.jpeg

This involves two laws of social science-Campbell’s law and Goodhart’s law.

Campbell’s Law states that the more important an indicator is used in decision-making, the easier it is to manipulate it.

Just like online shopping, we can’t see or touch the real thing, so we will naturally refer to the reviews of other buyers, so the phenomenon of “fake orders” came into being.

Goodhart’s law holds that if an indicator is deliberately pursued by people, it is not (or no longer) a good indicator.

But in the absence of better alternative indicators, it is necessary to ensure the authenticity of the data, just like sparing no effort to combat cheating in exams.

ef2ddf173c2f55278f4f0a6878725269.jpeg

However, in addition to teams that want to attract investors, there are also many individual developers who will also “grow stars.”

The purpose is similar to that of entrepreneurial teams, except that they attract HR instead of investors. We hope that high-star projects can bring them advantages in job hunting.

In fact, some companies do use GitHub information as an indicator to evaluate job seekers, and some people even got offers from Google by relying on shell projects.

132411429d51044ccdbd88920876fc7e.png
a7443d4f428dfb4bfedc16f8dae059e8.png

In addition to selecting people, the same is true when selecting technology – many people (especially non-professionals) will tend to use high-star projects.

e4a3d9a2e0005799388235c93a147a71.png

In addition to GitHub, media such as Product Hunt, a comprehensive product release platform, Kaggle, a data product platform, and StackOverFlow, an IT Q&A platform, are also receiving more and more attention from investors.

However, if we cannot establish an effective “anti-brushing” strategy, we may not be able to escape the fate of changing the soup without changing the medicine.

Regarding this phenomenon, some people summarized it as “Fake it till they make it.”–

Just like “get on the bus first and pay later”, pretend you have succeeded until you really succeed.

One More Thing

The development of AI, especially large models, makes it increasingly difficult to detect fake accounts.

In the past, counterfeiting might only be done by swiping stars and likes. The way to judge the authenticity of a user was mainly to look at the characteristics of the account itself.

But since the introduction of ChatGPT, you can also write comments that are fake and different.

If an account meets the characteristics of a fake account, but the replies posted are exactly the same as those of a real person, how to judge its authenticity?

Reference links:
[1]https://www.wired.com/story/github-stars-black-market-coders-cheat/
[2]https://news.ycombinator.com/item?id=37990338
[3]https://dagster.io/blog/fake-stars
[4]https://dl.acm.org/doi/10.1145/3427228.3427258

ICCV/CVPR 2023 paper and code download


Backend reply: CVPR2023, you can download the CVPR 2023 papers and code open source paper collection

Backend reply: ICCV2023, you can download the collection of ICCV 2023 papers and code open source papers
Computer Vision and Transformer exchange group established
Scan the QR code below, or add WeChat: CVer444, to add CVer assistant WeChat, and then apply to join the CVer-Computer Vision or Transformer WeChat communication group. In addition, other vertical directions have been covered: target detection, image segmentation, target tracking, face detection & recognition, OCR, pose estimation, super-resolution, SLAM, medical imaging, Re-ID, GAN, NAS, depth estimation, automatic Driving, reinforcement learning, lane detection, model pruning & compression, denoising, fog removal, rain removal, style transfer, remote sensing images, behavior recognition, video understanding, image fusion, image retrieval, paper submission & communication , PyTorch, TensorFlow and Transformer, NeRF, etc.
Be sure to note: Research direction + location + school/company + nickname (such as target detection or Transformer + Shanghai + hand in + Kaka). Note according to the format to get passed and invited to the group faster


▲Scan the QR code or add WeChat ID: CVer444 to join the communication group
CVer Computer Vision (Knowledge Planet) is here! If you want to know about the latest, fastest and best CV/DL/AI paper express delivery, high-quality practical projects, AI industry cutting-edge, and learning tutorials from entry to mastery, please scan the QR code below and join CVer Computer Vision (Knowledge Planet). Nearly ten thousand people have been gathered!

▲Scan the QR code to join Planet Learning

▲Click on the card above to follow the CVer official account

It’s not easy to organize, please like and watch96f109591d34f0effa6a4d8499bf3984.gif