Reflections on Reaching 1 Million People on Stackoverflow
This week, I have officially reached more than 1 million people on the website stackoverflow.com! I wanted to take a moment to reflect on this “achievement”, what it means for my professional career, and why I simultaneously believe that it is sheer luck (and a LOT of procrastination) that got me here.
As always, if you have any questions or comments, feel free to message me at firstname.lastname@example.org or reach out on Twitter.
As an “aspiring academic”, for a long time, my expectation was that (if at all), recognition of my work would come in the form of citations on my published research. This would indicate to me that people have meaningful interactions with my published work, and build on the ideas that were presented there. Of course, anyone more familiar with the academic world, specifically for Computer Science, will know that papers (and citations) are often a poor indicator of “success”. Managing to create a truly long-lasting or impactful contribution is an extremely rare occurrence for most, and requires lots of luck as part of the process. There are surely exceptions to the rule, but for the vast majority, having a few citations here and there is the norm (and totally fine for a productive and satisfactory career, I should add). In many ways, the ideal of citations and “reach” on any social media are in that sense very similar in my experience.
How it Started
For the remainder of this article, it is very important to consider the “historical” context of platforms of Stackoverflow, and programming in general. The ability to find reliable answers to common programming problems was (and still is) one of the core ideas behind programming. Before tools like ChatGPT were able to give direct answers to questions, the main source of information was likely from platforms like Stackoverflow.
As a Computer Science student during the 2010s, Stackoverflow was for a long time just that: a way to source help for my problems, which most of the time had already been answered before, and just required me to correctly find the right question (and answer).
However, in 2018, while I was interning at SAP, I worked on a problem that required me to run experiments with frequent re-builds of Docker images, coupled with spinning up (and tearing down) entire Kubernetes clusters. This was frustratingly slow, and the fact that we had very little to do while waiting for results meant that I spent a good while just idling around. Being the good intern that I was, I did not want to appear to be slacking off by browsing a social media site, so I started more actively looking at Stackoverflow as a means of procrastination.
As many of you probably can relate, I initially felt very overwhelmed by the prospect of giving my own answers: most of the easy problems were already asked and answered, it seemed like every other person would be more qualified to answer a question than myself, and ultimately answering also meant “putting myself out there”.
Particularly that last point is critical because Stackoverflow is not necessarily a welcoming place for beginners. It adheres to very strict guidelines and unusual formalisms (e.g., how questions should be formatted, tag etiquette, up/down votes), all of which get quickly enforced. With that being said, there is a real concern about feeling hurt just because some internet person decides to criticize your answers, taking a lot of the initial enthusiasm about contributing away.
However, I quickly realized two things; first, different communities (usually associated with a particular question tag) respond very differently and have totally orthogonal dynamics in some cases. For example, the C/C++ communities are notorious for (sometimes harshly) responding to bad questions or formatting, with a hint of “being part of the group” of established contributors. On the other hand, Python as a whole felt much bigger to me, just by the amount of questions being asked at the time, and as such had no such strict code being enforced. Secondly, the main advantage for me was that I had all that free time, and could regularly refresh the new questions for a particular tag, which allowed me to look for questions that I felt comfortable answering. Looking back at some of my early answers were rather short and highly specific. They revolved around topics that I was working with at the time (a lot of basic Python, but also highly specific libraries, such as BLAS integrations, etc.). This gave me the confidence to provide an answer, without feeling the overwhelming sentiment of “someone else can do it better”. What also helped was the fact that these questions were not necessarily getting a lot of attention. Partially because they might not have been properly addressing a single core questions, but again also the specificity of some of these subjects. That usually meant that I was the first to answer, and sometimes even the only one bothering to write something upl.
I think these early questions helped shape the basic understanding of how Stackoverflow works as a contributor. I could get my feet wet, without having to worry about the quality for a highly popular post.
However, as time went on, I discovered that the specific idea of focusing on a more niche set of tags allowed me to get a similar feeling of contributing to unanswered questions, without the constant need of refreshing the tag site (as it was the case for the generic
My Main Contributions
Out of the areas that I am most familiar with in Python, many of the Machine Learning-related tools were still growing at the time. Tensorflow had already established itself as an industry-leading framework, but people were slowly switching to the much easier interfaces of Keras and Pytorch around that time.
Simultaneously, many of the questions in ML tags were often more theoretical in nature. They didn’t purely require code understanding to debug, which made them less likely to be answered, but also more broadly approachable without having a deep understanding of said frameworks.
I strongly believe that it was the fortunate combination of the three factors 1) time, 2) framework familiarity (I had worked with Pytorch for quite a few lectures), and 3) ability to conceptualize some of the ML problems, that allowed me to shift more and more towards these particular focus areas and start answering more regularly. The psychological feedback loop of also getting some upvotes on my answers made me eventually just spend the first half hour of my day answering questions (at this point sorry to my former supervisor, David!). Would I not have spent (procrastinated) so much time on answering during my early phase, I think the whole story would have turned out quite different!
By the end of my internship later that year, I had already answered more than 100 questions, about 40% of my current total (230 answers). Many of them had received none or only a single upvote. But quite a few were accepted as the eventual answer by the original question askers, and some of them had seen moderate attention over time, leaving me quite motivated to continue spending time on the site.
My rate of answers still slowed down quite a bit; the combination of having less time to focus on answering, but also having my own thesis to write, meant that I often found myself tired of writing answers on Stackoverflow. I still continued to survey tag-specific questions during (or after) lunch breaks, and had some minor answer here and there, but nothing much worth mentioning during practically all of 2018 and 2019.
However, in 2020, Covid happened and Germany went into lockdown. This meant a sudden increase of time spent at my computer. Coincidentally, around the end of 2019/beginning of 2020, I also started working more and more with a library called
transformers, made by the awesome people at Huggingface.
It was revolutionary for myself (staying around as a PhD student at university since mid 2019), because it allowed for a drastically more reproducible way of interacting with research artifacts at the bleeding edge of Machine Learning science.
Especially during those early days,
transformers was still lacking a lot of important documentation, but the library was also small enough that it was possible to understand the key components quite well as a student, especially because I had to interact quite a bit with it anyways.
And particularly around that time in early 2020, I noticed that a lot of these trivial questions for the library were still unanswered: cache location (my most upvoted answer to this day), some functional differences that are hard to tell, or just model-specific particularities that I could only answer because I had read the original research paper.
So it was again a lucky series of coincidents of timing, framework familiarity, and it being a niche topic at the time, that I was able to answer some of the more fundamental questions related to a framework.
After the initial lockdown was over, and I found motivation to again focus more on my own research in these spare moments, my answer frequency also went down again. I continued to answer (mostly
transformers-related) questions, up until some time in 2022, where I simply was more focused on my own research.
This hasn’t particularly changed since I moved to my current full-time role, as my PhD thesis is still unfinished (and in some ways, this post is procrastinating on writing the thesis ;-)), so there really is not much time left in a day to hop on Stackoverflow. I noticed that my overall “contributions” have also shifted more towards repositories on Github itself, and trying to directly interact with authors to solve problems (e.g., by improving documentation or fixing issues), instead of going through the “proxy” of answering questions about that particular problem on Stackoverflow. But this is probably worth exploring in a separate blog post, as I strongly suspect this has something to do with the shift in focus that came with experience!
On the other hand, Stackoverflow (as the company behind the site) has also made several questionable decisions leading to a lot of much more talented and highly motivated folks to depart the Stackexchange network, and also cause a ripple effect of lowered morale for others. I think the effect is only slowly starting to be felt across the platform, and tools like ChatGPT have similarly stirred another round of discussions among the community, to only name one of the more recent issues. Whether this has also affected my own stance towards the site is hard to tell, but it certainly did not increase my motivation to contribute overall. In that sense, I am not entirely sure where the site is headed in the upcoming years, and can only hope that it will still remain something close to being the page to look for programming answers moving forward.
Finally, I also want to call out some of the positive effects that my contributions have had in other areas.
I still fondly remember the “Jobs” section on Stackoverflow, which I found genuinenly helpful, unlike many others.
When I was still unsure about pursuing a PhD in late 2018, some of my first job interviews came through contacts that found me on Stackoverflow.
It also clearly demonstrated my abilities to navigate a particular language (in my case, Python), and many of the job ads there clearly (or at the very least, broadly), specified the technologies I would be working with at the job. This is something that still does not really show on sites like Linkedin a couple years later!
I sometimes even get personal e-mails asking me new questions, or whether I could answer a person’s problem directly. While I certainly feel flattered at the attention this is getting me, I also want to publicly state that I will either answer on a public forum like Stackoverflow (if the question is relevant to others), or charge a consulting fee for any matters that require my detailed attention for problems that cannot be shared. Because, in the end, while I am happy to help, it does not really help a lot of folks if I do it through mail.
Another time, I even got an invitation to be a consultant, although I have to admit that I blundered on that one, because of a series of unfortunate personal circumnstances at the time. And, without having any real confirmation for this, I am strongly suspicious that it helped me both directly as an applicant by bolstering my early resume, but also indirectly through the “soft skills” that it taught me. Quickly familiarizing myself with a given problem context and trying to give a concise answer is definitely useful in coding interviews, even if the pretext is a different one.
None of my other work has had anywhere close to the same impact as my contributions on Stackoverflow. Sure, some of my code repositories have collected a few stars, and I do have a small trickle of citations flowing in, but nothing even comes close to the amount of people reached this way. In some way, having achieved something like this, stemming from a place of procrastination, is extremely motivating, and I hope that I will still find the time to contribute answers (or questions) from time to time, whether it be on Stackoverflow or elsewhere :)