[Repost] Why Can Some Computer Science Undergraduates Publish at Top Conferences While Many PhD Students Have None?

This article was converted by SimpRead, original article at www.zhihu.com Thoughts Memo​

Laxative, currently in Qingyuan, just turned on my computer. During my senior year, I published one ACM KDD top conference paper [1], one CCF T1 Chinese journal paper [2], and one IEEE TKDE top journal paper [3].

KDD paper in Chinese: KDD’22 | Momo Vocabulary: A Memory Algorithm Based on Temporal Models and Optimal Control [AI + Education]
TKDE paper in Chinese: IEEE TKDE 2023 | Momo Vocabulary: Optimizing Spaced Repetition Scheduling by Capturing Memory Dynamics

Due to my rather strange research path (unorthodox), I don’t really understand how most people conduct research, let alone how they publish papers.

I’m not like those outstanding undergraduates, I have neither competition experience, nor am I a disciple of a top advisor, nor do I have any great senior brothers. I am immersed in my own research world, and if I hadn’t published papers, perhaps people would consider me a folk scientist and that wouldn’t be surprising.

Below is my record of publishing two papers, which might inspire you:

Ye Junyao: How I Published a Top Conference Paper as an Undergraduate (Includes Open Source Code and Datasets)

My research direction is optimized spaced repetition [4] scheduling (in plain words, improving students’ memory efficiency by adjusting review arrangements), more specifically building memory models of students, then designing algorithms based on the model and optimization goals to find the optimal memory strategy. Writing these two papers took less than one month, but behind them were four years of exploration and thought. Compared with other researchers, one could say that I am an ā€œoutlier,ā€ especially in today’s peer review and up-or-out academic culture, who would dare to first look for their research direction in non-academic content under the high risk of not publishing? So my experience may have little reference value, but I also believe there are people in the world with situations similar to mine once. I hope this memoir can help you.

Sparking Interest

If tracing the origin of these two papers, I want to start from my interest in the field of spaced repetition. Although to most people, research is very abstract and mysterious, research as a human activity is definitely connected with concrete people, concrete life, and concrete interests.

I first encountered spaced repetition around 2017, when I gradually transformed from a high school student who hated reviewing to one who wanted to improve review efficiency. In April 2017, I searched on Zhihu for review scheduling software and miraculously found Anki through

@Yu Shixing’s answer. Anki is an open-source spaced repetition software. Using Anki for over a year helped me improve my score by more than 100 points [5], and eventually get admitted to the Computer Science department of Harbin Institute of Technology at Shenzhen. This made me interested in the principles behind this niche software’s ability to improve review efficiency.

Afterward, through Anki, I learned about spaced repetition, the spacing effect, the testing effect, and other cognitive psychology concepts. Conveniently, Anki is open source, and I happened to see questions on Zhihu related to Anki’s algorithm, so I read Anki’s source code and wrote related articles [6]. That was around August 2018.

Understanding Tradition

By reading the source code and manuals, I learned that Anki’s review algorithm is a variant of the SuperMemo 2 algorithm, which sparked my curiosity about SuperMemo itself. But at that time I was busy promoting Anki, selling Anki notes, and designing products mimicking Anki, so these explorations were put on hold. Until the end of 2019, after a failed startup made me deeply feel my inadequacy, I wanted to focus on things I could handle independently. I began ā€œtranslatingā€ (DeepL machine translation is always amazing) the History of Spaced Repetition on the SuperMemo website[7], learning about spaced repetition’s practice tradition, inseparable from its creator, Dr. Wozniak [8].

The creator of SuperMemo, Wozniak, is a person of fantastic color. He is passionate yet stubborn and brilliant. The History of Spaced Repetition is almost his autobiography. Coincidentally, the problems he encountered in learning are strikingly consistent with mine— for an efficiency fanatic like him, random forgetting and inefficient reviewing were intolerable. Starting from this, Wozniak embarked on a 30-year quest for memory algorithms. His work inspired me, making me feel this is a lifelong career.

Woz’s memory theory is not complicated. In his dual-component memory model [9], the state of a single atomic memory in the brain can be described by two variables:

  1. Memory retrievability
  2. Memory stability

Memory retrievability is the probability of recalling a certain memory at this moment. Why probability? Think about real life: forgetting is not deterministic. If today I read a point in a new book, I may still remember it now, but 10 days later, I cannot be sure if I definitely remember it or definitely forget it. This is a random event, thus describable by probability.

But memory retrievability alone cannot fully describe memory, just like knowing how far a car has traveled cannot tell when it will arrive at its destination. Knowing retrieval probability at this moment does not tell us when forgetting will be complete. Therefore, Woz introduced memory stability to measure forgetting speed. The relationship among retrievability, stability, and time can be described by the formula:

where ( R ) is memory retrievability, ( S ) is memory stability, and ( t ) is time since last review. This formula is the mathematical model of the forgetting curve. Analyzing data from SuperMemo, Woz found the exponential function fits memory decay over time best. From the formula, some simple conclusions:

  • The longer the time, the lower the recall probability;
  • Forgetting speed is fast at first then slows after review;
  • Higher memory stability means slower forgetting.

Thus, memory forgetting is characterized. However, this model cannot guide review scheduling yet, because it omits review’s effect on memory. On this basis, Woz proposed the three-variable memory model:

Observing data, Woz found successful recall during review increases memory stability, while forgetting reduces stability. But this qualitative analysis is insufficient to guide review planning. We want to know how much it increases or decreases. For convenience, call this stability gain (denoted as ( C ) in the formula). Woz believed stability gain depends only on the current memory state—that is, given memory stability and retrievability, stability gain is determined (essentially a Markov chain). But ideals meet reality; data did not perfectly support Woz’s theory, so he patched it by introducing a third variable—memory difficulty. With difficulty fixed, the relation between stability gain and memory state became clearer.

In my view, this just shifted the contradiction, because then one must find the variation law of memory difficulty. Woz’s subsequent research focused on difficulty prediction, but frankly, there was a lot of mysticism and little mathematics. Introducing difficulty was not wrong; essentially, it transformed the original Markov model into a hidden Markov model. (I first learned about hidden Markov models from Zhihu’s answer: How to explain Hidden Markov Model with simple examples?)

By the way, here is something I found quite absurd: Woz shares his stories during research in theoretical papers, like disappearing for a hundred days in a mountain cabin for study [10]. Very interesting. His passion infected me as I read his papers.

Entering the Industry

But passion alone cannot accomplish anything, and merely studying traditional theory makes it hard to conceive innovations or improvements. Locking oneself within tradition for theoretical amusement cannot effectively change the world. The most embarrassing thing is, memory is full of uncertainty, and without massive data there’s no way to verify theories or ideas. By chance, in the summer of 2020 I applied for a data mining internship at ByteDance, was rejected, but next day received an internship invitation on Zhihu from the head of data algorithms at Momo Vocabulary. The reason was he was also researching SuperMemo, and found my translation on Zhihu. Rejected by ByteDance, I saw Momo had tens of billions of review records, so I immediately went to the city where Momo is based—Qingyuan—a place like my hometown Ningde—and continued my exploration in the industry.

First, I validated a lot of Momo’s data against SuperMemo’s theory, found most patterns consistent, giving me some comfort and great confidence. At least, I didn’t waste time on theory, and memory’s rules don’t change regardless of learner or material. Then, to study memory better, I pushed for changes in Momo’s data collection system. Research is unpredictable and unplannable. If questions are wrongly posed, data collection will be flawed. Woz always obsessed over memory stability, retrievability, and difficulty, but I think he oversimplified the model too early. For deeper memory analysis, recording the most original data is essential. For this, I believed comprehensive recording is necessary: every user’s every review content and time, success or failure, complete review history, etc. Later I realized what I really needed was the sequential information of memory, which is the foundation supporting my memory research.

Independent Thinking

During deep data study, I gradually found flaws in Wozniak’s theory. In his three-variable memory model, memory stability and retrievability matched patterns in Momo’s data, but memory difficulty remained elusive. Also, he lacked a definition of effective reviewing. This led me to independent thinking and trying to fill gaps in the puzzle.

My first approach was simulating the memory process to find the most efficient reviewing strategy. Based on Woz’s theory and Momo’s data, I wrote a spaced repetition simulator in early 2021 [11], then simulated the memory situation under various strategies to verify their effectiveness [12]. I imposed many constraints on the review process, such as fixing memory quantity per day and simulation time span, then measured the strategy’s effect by total memory amount at simulation end. This is essentially a Monte Carlo simulation. Although I didn’t formally study it, it didn’t stop me from using it.

In simulation verification, I found Woz’s strategy of maximizing memory stability was not the most efficient. This led me into uncharted research territory. I realized the three-variable memory model couldn’t fully support my search for efficient memory strategies because it lacked the constraint of memory cost. So, in a vague intuition, I defined a new concept: review pressure.

I think efficient reviewing is forming long-term memory with minimal review pressure. Some intuition: as each successful review occurs, stability grows, and review pressure lessens. So review pressure must relate to review count. But exactly how to express this mathematically was beyond my math skills at the time, so I kept experimenting on scratch paper. In this exploration, I realized review pressure is actually the expected number of reviews, which requires knowing the memory state changes and review strategy. This process involves both deterministic models and policies as well as stochastic forgetting. I remembered reading on Zhihu about a mathematical tool that could describe this—stochastic process.

Finding this tool thrilled me. I immediately found Ross’s Introduction to Probability Models (actually I first found stochastic process, but it was hard to understand, until reading this answer Which Ross stochastic process book version is better? I learned the author has an easier one), and started studying related knowledge. Soon I found an example very close to my mental calculation of review pressure:

The question is: a miner faces three doors, one door takes two hours to reach the destination, the other two doors take three and five hours respectively to return to origin. If the door choice is equally probable, what is the expected time to reach the destination?

You might wonder what this has to do with review pressure. Let me make a simple analogy: suppose there are only two doors, one called successful review, one called failed review. Successful review increases memory stability, bringing the goal of long-term memory formation one step closer. Failed review does not increase (or reduces) stability, so either no progress or retreat. Each door choice costs time (review takes time). So review pressure is the total expected time cost to form long-term memory, analogous to the miner’s expected arrival time. But memory scenarios are more complex: probabilities of choosing each door vary with review time, and doors vary each time. Still, the essence is the same. I quickly used this math tool to describe the problem clearly.

When a fuzzy real problem turns into a clear math problem, half the battle is won. The above problem calculates expected review pressure for a given review strategy. Are there problems reversed to find the strategy minimizing expected review pressure? I kept looking for tools in stochastic processes and found Markov Decision Processes.

Essentially, review scheduling optimization is decision optimization based on known memory models to minimize review pressure. But Ross’s book does not detail how to optimize decisions. I searched Zhihu for articles about Markov Decision Processes (e.g., 2.1. Markov Decision Processes (MDP)). Then I learned about Bellman equation (Bellman Equation for Markov Decision Processes), and found it’s basically dynamic programming with some randomness added, calculating expectations.

Then I associated memory stability as a series of discrete states, review cost as path weights between them, and recall probability as path success probability, turning the problem into a stochastic shortest path problem. I quickly combined my existing dynamic programming and stochastic process knowledge, then wrote the first version iterative algorithm to compute the optimal reviewing interval and corresponding recall probability for each memory stability.

At this point, I thought I’d solved the problem of optimizing review strategies, and began overhauling the whole memory scheduling system.

Starting to Write

First Paper

Essentially, optimizing spaced repetition scheduling has two parts: one is memory modeling, the other strategy optimization. My first paper, ā€œLong-term Language Learning Memory Prediction Model Based on LSTM,ā€ focused on the former.

At the time in September 2021, I was pulled back to school for my thesis but wanted to continue research at Momo, so I made a deal with my advisor: I publish papers, he lets me keep the internship. So I spent about a month writing the first paper. Since it was my first academic paper, I started reading related literature. I found that the latest memory modeling paper was from Duolingo in 2016, and I summarized notes on Zhihu [13]. They proposed a memory half-life regression model using statistical features to predict memory half-life—which is memory stability. I breezed through their paper and saw huge room for improvement. They used statistical features, but the memory process is essentially a time-series event. So using time-series features + time-series models should make an easy paper. The most common time-series model related to neural networks is none other than LSTM.

So I started reading PyTorch docs, copying LSTM calling code from GitHub, then used Pandas and Numpy to organize Momo’s data into Tensor inputs for LSTM, finishing experiments in one week. This took a toll on my MacBook (bought with my Momo internship salary), since I had no GPU… To link my paper with orthodox academic literature (Woz’s papers look like folk science writing), I replaced memory stability with half-life and memory retrievability with recall probability in my paper, naming the model LSTM-HLR to indicate my improvements on Duolingo’s work.Then I spent about another week finishing the first draft of my thesis (in Chinese). The structure of the thesis was also modeled after the Duolingo paper. Then my advisor asked me to research conferences where I could submit. The funny thing was that almost none of the AI + education interdisciplinary conferences were recognized by the CCF [14]. Finally, my advisor asked me to submit to the Journal of Chinese Information Processing, submitted in October, with one rebuttal and one format revision, and it was accepted in November. Then I happily went back to Momo to work, of course under a very legitimate name — it was called an off-campus graduation project, lol.

The Second Paper

After returning to Momo, I applied the review strategy I had previously thought about to real business and conducted an AB test. The results were gratifying; years of research finally got a response and helped millions of users improve their review efficiency. I was planning to continue optimizing the algorithm when my advisor came to me again around January 2022. I thought he was asking about how my graduation thesis was going, but it turned out he was ambushing me, wanting me to write another paper. He said that since the previous paper was accepted by a CCF B-level Chinese core journal, it would be advantageous to have another paper accepted by a CCF A-level English conference. Honestly, I wasn’t very keen on writing more, but considering that publishing an international paper could allow my research to be used by more people and help more users, I pushed myself and started writing the second paper, A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling.

This paper mainly focused on optimizing review strategies. For this, I read a lot of literature and found that many people liked reinforcement learning. I was not very interested in reinforcement learning because setting rewards is difficult, and doing online reinforcement learning on Momo’s vocabulary app is also hard to implement. Then I came across a paper using Markov processes and optimal control, and the math was very elegant (though I couldn’t understand stochastic partial differential equations). I then thought about how to use a more elegant mathematical language to package the minimal review pressure iterative algorithm I had previously conceived. Coincidentally, at that time I was searching Zhihu for articles related to stochastic shortest path and found the notes on ā€œReinforcement Learning and Optimal Controlā€ (for example, this one: ć€å¼ŗåŒ–å­¦ä¹ äøŽęœ€ä¼˜ęŽ§åˆ¶ć€‘ē¬”č®°ļ¼ˆäŗŒļ¼‰éšęœŗę€§é—®é¢˜ēš„åŠØę€č§„åˆ’). It was like striking gold: the optimal control theory already has relevant mathematical tools to describe this problem, so I rewrote my earlier draft into the form of the Bellman equation.

However, I then encountered a problem: what kind of memory model to use. Although my LSTM-HLR model was accepted before, it hadn’t been published, and would have to be introduced again in the new paper, which felt problematic. So I adopted Woz’s three-variable model and tried calculating model parameters using machine learning. In the process, to better observe the regularity of memory data, I first considered how to reduce the dimensionality of memory time series data. Woz’s SInc matrix inspired me; each memory action can be represented by stability, retrievability, difficulty, and post-review stability. If we represent difficulty by color, then the other three attributes can be projected into a 3-D space. Based on a recommendation from netizens, I learned the data visualization library plotly and manipulated Momo’s data:

After visualization, I found that a simple linear regression model was sufficient, maybe with some nonlinear transformation on features. I referred to Woz’s stability growth function form here.

Then this handcrafted time series model was completed. Next, I compared it to the HLR model and found it indeed performed better. This showed that statistical features have inherent flaws. It’s easy to understand: forgetting once and remembering again is not the same as remembering once and forgetting again. Statistical features lose temporal information, but temporal information is very important in changes in memory states, so better performance wasn’t surprising. The surprising thing was that no one had used temporal information for memory modeling and prediction before (actually some did, but those were very old and didn’t use machine learning; I didn’t understand how they calculated the parameters). Later I found out that the only open-source dataset in the memory field was from Duolingo. To facilitate replication, I discussed open-sourcing the dataset with the company’s leadership, aiming to further develop this field, and eventually decided to make the dataset and experimental code public:

Ye, Junyao, 2022, ā€œReplication Data for: A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Schedulingā€, https://doi.org/10.7910/DVN/VAGUL0, Harvard Dataverse, V1

maimemo/SSP-MMC: A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling (github.com)

Back to this time series model, I looked into the ā€œReinforcement Learning and Optimal Controlā€ again to find how to solve the optimal policy with known state transition equations, and discovered that my iterative method was actually the value iteration method, so I packaged it in pseudocode:

It looked pretty decent: J is the cost matrix, π is the policy matrix, f is the memory state transition function, d is difficulty, h is the memory half-life, p is recall probability, and a and b are the costs of review success or failure. By continuously traversing the review intervals under each memory state, calculating the expected review cost, searching for the review interval that minimizes expected cost, and iterating, the algorithm finally converges to the optimal cost and policy matrices.

After writing most of this, I updated my advisor on the progress and said I hadn’t used deep learning, wondering if it would be hard to submit. He said, then submit to KDD. I researched KDD and found many companies published papers there, and it had a special applied data science track. So I started revising the paper according to KDD’s requirements. I didn’t expect KDD to require a 9-page two-column format. I had to add some cool visualizations because my senior said if it’s not full enough, it might be seen as not solid. Haha.

Then I thought about a legitimate reason to add visualizations. I recalled that earlier many deep reinforcement learning methods were black-box and lacked interpretability. I could visualize model weights and patterns of the optimal policy to explain the model mechanism:

Then just explain the figures. Here are some conclusions observed from the memory model:

  1. As difficulty increases, the potential for half-life growth continuously decreases
  2. As recall probability decreases, the potential for half-life growth increases, indicating ā€œdesirable difficultiesā€
  3. As half-life increases, the potential for half-life growth declines, meaning memory cannot be consolidated indefinitely

And phenomena observed from the optimal review strategy:

  1. Higher difficulty corresponds to higher expected review cost
  2. Expected review cost decreases as half-life increases
  3. With the same half-life, the optimal review interval increases as difficulty grows

Later, to intuitively show the time series model and stochastic shortest path problem, I made a flowchart with http://draw.io:

By the way, this was my first time writing an academic paper in English, which was very painful. I read many thesis writing tutorials on Zhihu (such as ā€œHow to write a good introduction section of an English paper?ā€, ā€œDrustZ’s paper class [Related Work]ā€, and ā€œAre there more advanced English composing correction and polishing software than Grammarly?ā€). DeepL + Grammarly + QuillBot saved me from disaster!

Then I revised the paper with my advisor, finally meeting the deadline before the Lunar New Year in February, and submitted. In May, we received the acceptance notice, and that’s roughly where the story ends.

Bonus

But the story wasn’t over. At the end of April, my advisor came again, saying KDD is quite tough and asked me to prepare a NIPS paper… But after only half a month, just finishing the draft, KDD acceptance came, and my advisor changed his mind to expand it into an English journal paper. So I’ve been writing a journal paper for two months, 12 pages in two-column format making my scalp tingle. But I finished it and even included previous ideas. Hopefully it can be accepted this year.

Summary

Looking back on my research journey, it was full of coincidences and surprises. I dare not claim to have made any outstanding contributions. Without the help of Anki, Wozniak’s theory, Momo’s vocabulary data, and various learning resources produced by Zhihu users, this research could not have continued.

But one thing is certain: I chose my own path, learning freely through exploration, driven by passion, constantly moving forward, and doing everything possible to break free from school control.

I firmly believe that memory research can advance educational technology and bring low-cost learning tools to every learner.

Through free learning, I found my life path: transforming from a small-town problem-solver to an algorithm engineer in an educational technology company, becoming a researcher in the memory field. I believe in the future, every student can break free from the shackles of the decaying compulsory school system [15], achieve self-realization through free learning [16], and live a good life in the secular sense.

PS: By the way, here’s the open-source project based on my paper:

open-spaced-repetition/fsrs4anki: A modern Anki custom scheduling based on free spaced repetition scheduler algorithm (github.com)

References

  1. ^A Stochastic Shortest Path Algorithm for Optimizing Spaced Repetition Scheduling https://www.maimemo.com/paper/
  2. ^ LSTM-based Language Learning Long-term Memory Prediction Model http://jcip.cipsc.org.cn/CN/Y2022/V36/I12/133
  3. ^Optimizing Spaced Repetition Schedule by Capturing the Dynamics of Memory https://ieeexplore.ieee.org/document/10059206/
  4. ^ Efficient Spaced Repetition Learning https://zhuanlan.zhihu.com/p/420105707
  5. ^ Want to know how to succeed in the college entrance exam? See how I increased my score by 157 points using Anki! https://zhuanlan.zhihu.com/p/41568928
  6. ^ A Brief Discussion on Anki Algorithms and Terminology https://zhuanlan.zhihu.com/p/42921090
  7. ^ 0 directory ā€œHistory of Spaced Repetitionā€ https://zhuanlan.zhihu.com/p/375379522
  8. ^ Peter Wozniak https://zhuanlan.zhihu.com/p/303204832
  9. ^ 05 1988: The Two Components of Memory https://zhuanlan.zhihu.com/p/99505568
  10. ^10 1995: Hypermedia SuperMemo https://zhuanlan.zhihu.com/p/447640544
  11. ^L-M-Sherlock/space_repetition_simulators https://github.com/L-M-Sherlock/space_repetition_simulators
  12. ^Analysis of the Impact of Anki Card Creation on Review Memory Efficiency https://zhuanlan.zhihu.com/p/346463057
  13. ^ Starting from Duolingo’s Machine Learning Algorithm: Analyzing Memory Data Feature Engineering https://zhuanlan.zhihu.com/p/345172257
  14. ^ Research/Tirade on AI + Education Interdisciplinary Academic Conferences (2022) https://zhuanlan.zhihu.com/p/419815179
  15. ^ Why Compulsory School Systems Should Be Removed from This World https://zhuanlan.zhihu.com/p/544801748
  16. ^ Free Learning https://zhuanlan.zhihu.com/p/272543239

PearsonChen

Many people think intelligence differences cause this, but the most despairing thing is actually resource inequality.

My first PhD paper won the best paper award, was featured in the university news, and was even recommended by the editor to be republished in a journal, but was I really that smart?

I’m actually not very smart. I never scored full marks in exams; others easily got 4.0 by slacking off and playing games every day, while I attended office hours and kept struggling with textbooks. Other people finish programming assignments in one or two days, but I had to check how each function was written, going through beginner tutorials at least 20 times. It took me two weeks to finish one assignment, and my GPA was about 3.7. If I had given up, I probably would have dropped below 3.2. In a community like Zhihu, where everyone is a genius, a pure hard-working PhD like me was basically at risk of expulsion.

But at the same time, many PhD students were struggling to publish papers, while I kept publishing papers and winning many awards and scholarships. The reason was that the lab had abundant resources.

In fact, most of the time publishing a top conference paper (including IROS, ICRA, CVPR, AAAI, etc.) only requires open-sourced methods + rare datasets. Especially in medical robotics, real human data is much better than those simulated models. Many articles published in Nature and Science appeared because datasets were rare and the methods were nothing extraordinary. So most of the time, you just need to know how to git clone and do a good application to get a top conference paper.

We have collaborating partners providing such high-quality datasets. Even medical models are based on real human data provided by partners, so real that you can’t get more realistic, with details maxed out. For example, our heart model can beat following the correct rhythm and movement principles. These models are not available for sale on the market and have to be custom-made (the cheapest one still costs over a thousand dollars).

With so many resources, even a rookie like me could rise and survive the hardest first year.

Moreover, my boss is an editor/chair of various big/top conferences, so basically if he is involved with the conference, you just submit and it’s done. He tells you your paper will be accepted with 100% chance if he says so.

Most undergraduates publishing in top conferences have similar luck: big-name backing + abundant resources. Even algorithm papers mostly come from ideas given by senior students or advisors, and if done well, they go to top conferences. Essentially, it’s still a matter of resources.

BoulderBecause the gap between people is even greater than the gap between people and dogs.

Undergraduates who publish at top conferences look like this:

  • You were a star in academic competitions in high school and easily got into a top 985 university through the college entrance examination. The undergraduate courses come easily to you.
  • In your junior year, you excel in a machine learning / computer vision / AI course, and a top professor notices your potential and sincerely invites you to join the research group.
  • On the first day in the group, the professor gives you a page listing six or seven research directions, saying, ā€œKid, choose one direction to work on. Don’t worry, each direction has senior students to guide you.ā€
  • Your senior, Senior A, is kind and approachable. At the first meeting, they give you a reading list, saying, ā€œThere are countless papers in this direction, but here are twenty that are truly worth reading. I have arranged them chronologically. If you finish reading them in half a month, your foundation will be solid. We meet twice a week, and you can ask me if you don’t understand anything.ā€
  • When you leave, you notice Senior A’s bookshelf displaying several awards such as XXXX Best Student Paper Award and XXXX Outstanding Reviewer Award.
  • Communicating with Senior A boosts your progress rapidly. After half a month, you read top conference papers as easily as cutting a melon.
  • While watching a movie with your girlfriend, you come up with a brilliant idea and excitedly call Senior A to share it. Senior A smiles and says, ā€œDefinitely a promising talent, let’s quickly run experiments!ā€
  • When you rush to the lab ready to work hard coding, Senior A has already applied for eight 3090 GPUs for you, saying, ā€œI estimate this will be enough.ā€
  • You work intensively for a month, and your idea works.
  • You share your preliminary experimental results at the group meeting and receive 12 constructive comments. After improving the experiments, your results become stronger.
  • Before the deadline of a top conference, you finish writing a paper. Senior A sends revision comments within three hours, saying, ā€œYou’re very talented, but the storytelling is a bit immature.ā€ Senior A repackages your paper, making it instantly reach a new level.
  • One week before the acceptance results, the professor casually mentions your paper while having tea with the chair. The chair says, ā€œI suspected it was your work, definitely no problem.ā€
  • You get accepted to your first top conference and excitedly report to Senior A and your professor, who say, ā€œWe knew you would make it.ā€
  • You are about to graduate. The professor originally planned to recruit you for a direct Ph.D. program, but you decide to apply for a doctoral program in country M with your girlfriend. The professor regrets to see you go but writes recommendation letters for both of you.

A Ph.D. student without any top conference papers looks like this:

  • From elementary school to undergraduate, you have been an unremarkable, routine student.
  • Because of a poor job market, you continue to a master’s and then a Ph.D.
  • Your school, research group, and advisor are all ordinary.
  • You changed research directions several times during your master’s and Ph.D.
  • Coincidentally, your advisor was the same way in the past.
  • Your advisor assigns you to do non-core projects. Your advisor makes you teach undergraduate classes in his place. Your advisor asks you to handle travel reimbursements. Your advisor even asks you to pick up his child.
  • You communicate with your advisor academically once a month. The most common things he says are ā€œNot bad, you try it,ā€ and ā€œI don’t understand, you handle it yourself.ā€
  • Your whole group has only one 8-GPU server. You almost fight with a new junior because he killed your process to get the machine.
  • Then you stopped being mad because he secretly added your name on his paper.
  • You excitedly share new ideas at the group meeting, but the audience is stunned speechless.
  • After more than half of your Ph.D., you finally muster up the courage to submit to a top conference.
  • You wait a week to get the professor’s revision comments, which aside from some typos contain nothing else.
  • Shaking, you open the review comments and faint from seeing terms like trivial, incremental, lack of novelty, technical soundness, bad English, poor writing, and ā€œyou should cite XXXX.ā€
  • When you wake, the lab is locked. You stay up all night writing a rebuttal.
  • Unsurprisingly, the top conference rejects it.
  • After several rejections, your paper is finally accepted by a C-class conference.
  • You report to the professor that you finally got accepted by an XXXX conference. The professor looks puzzled and asks when you submitted; he doesn’t remember.
  • At night, you scroll WeChat and see a push notification from X Zhiyuan: ā€œUndergraduate academic star from XXX University wins top conference, achieves new SoTA in object detection! Code to be open sourced soon!ā€
  • Then you lose sleep again.

It hurts to talk about it, it really hurts to talk about it.

It’s so real. The current academic circle is essentially a feudal-style guild. When we do medical deep learning, the budget for a dual 4090 GPU server requires several months of back-and-forth with the logistics and procurement departments of our institution, while the big shots from our partner organizations use supercomputers directly without any queuing.

Recently discovered this expert Ye Junyao, who has been researching memory algorithms out of personal interest, and is now an algorithm engineer for Momo Vocabulary, having developed his own frsr algorithm, which is now built into the Anki software.

Gained a lot from his sharing.

The world is uneven, hahaha