13,000 Future Founders
Analysis of 13k founders on Y Combinator's co-founder matching service.
👋 Hey I’m Abe (@abe_clark). I recently quit my job as Director of Engineering at LoanSnap to go build a startup. After killing my first startup experiment, I have interviewed over 70 non-technical people in search of my future co-founder.
I recently joined Y Combinator’s co-founder matching service. When I completed my profile, they presented me with this very enticing banner. 🤤
Wow. 13k potential co-founders? Amazing.
But, also completely unrealistic to actually engage with that many.
The data lover in me was pulling at the leash. Please let me analyze! So, I took a quick peek at the underlying data format of the site. It seemed relatively easy to get standardized information about these candidates programmatically (More technical details at the end of the article👇).
I was able to gather basic data on 13,008 potential matches.
Why does this data matter?
At the most basic level, it satisfies an elemental desire of mine to know more about people like me (founders early in their journey).
But, more than that, it offers an inside perspective to one of the biggest overlooked advantages handed to top venture capital firms. Firms like YC have thousands of people begging to tell them their version of what the future of society looks like. A single person is relatively insignificant, but at scale the data translates into power when making investment decisions.
Additionally, the sample size (13,008) is quite large. It does represent bias toward one venture capital firm (Y Combinator), but they are arguably the best-known. I consider these data points to be strongly indicative of the landscape of future founders globally.
In this article, I outline the data I’ve collected. I also point out a few observations that stuck out to me. Lastly, I propose some questions that came up for me. I’d love to hear what questions and insights this data creates for you! Consider joining the discussion on Hacker News.
The data is divided into 3 major categories:
Where are these future founders located?
What are these future founders like?
What are these future founders looking for in a co-founder?
China and Brazil are two of the biggest countries in the world in terms of population. The relative representation on this map is staggeringly low.
Globalization / the internet has paved the way for a largely US-based venture firm to interact meaningfully with entrepreneurs from Siberia to Tasmania and everywhere in between!
⚠️ Warning … mini-existential crisis here. With regard to the heavy concentration of founders in Europe and the USA — Did colonialism ever end, or did we move from taxing land/people directly to taxing them indirectly via technology?
Nigeria was a big surprise to see in the #5 spot. Fantastic to see the startup fervor in Africa.
I’m excited to watch the progression of India as a technology powerhouse. This decade I’m convinced India will continue to move rapidly out of the “offshore” category and into a position of strong technology leadership, especially in Asia.
What is the best way to ensure tech creates a positive upward cycle in emerging markets?
Los Angeles is quickly gaining ground on SF and NY as a startup center in the USA.
North America, Europe, and Asia all have multiple startup powerhouses. Africa has only one and there are zero in Australia or LATAM.
Where will the next generation of startup enthusiasts live? Will the success of Nubank and others create new LATAM hotspots?
Are there any real estate funds monitoring these movements in emerging markets? These data points feel like strong leading indicators of residential and commercial price appreciation in the next 10-20 years. Imagine buying in Seattle pre-Microsoft/pre-Amazon.
No big surprises in the current rankings. Massachusetts ranked higher than expected, most likely due to the colleges in and around Boston.
COVID bumped tech salaries outside of historical tech hotspots. Assuming the trend of remote friendly work continues (it will), will this list look drastically different in the next 5-10 years?
85% Male (15% Female) is sadly not surprising, but very sobering. It’s difficult to see an increase in women-founded companies if the top of that funnel (women actively interested in starting a company) is so out of balance.
39% technical means that technical people have strong leverage in their co-founder hunt. Technical people don’t even need a great idea, just time and willingness to sort through potential co-founders. For reference, 78% of people said they are looking for a technical co-founder.
What is the most meaningful lever to help increase female founder representation?
~40% stated they are technical, but ~80% are looking for a technical co-founder. Does this signal that 50% of technical people want their co-founder to be technical as well? Or, does it mean that ~50% of people who stated they are technical are overstating their skills?
I’m so proud of the 21% who have a company name but not a domain name. Such discipline! I’ve spent far too much money on domain leases that never see any movement. 😂
Why is it that we are drawn so heavily to branding over team building and idea validation? Do we overstate the importance of brand in building an initial team and validating the idea, or is it a crucial component at this stage?
Note: Candidates are able to select multiple interest areas. The bars represent the percentage of that group who expressed interest in a given area. For example, ~55% of women stated an interest in AI, vs ~62% of men.
Women are proportionally more interested than men in a handful of areas: Consumer, Education, Healthcare, Non-Profit.
The biggest proportional differences in interest between women and men are in: AI, Fintech, Blockchain, Energy, Hardware, and Hard Tech.
Blockchain is still middle-of-the-pack. But, given it hasn’t been mainstream for more than about five years, it holds impressive mindshare (ranking above Healthcare, Entertainment, AR/VR, and many more).
Technical people match or exceed the aggregate interest levels in all categories except: Consumer, Marketplace, and E-Commerce. Finding a technical co-founder in these categories will be incrementally harder.
“I want to change the world” is a common refrain from founders. Yet, interests skew heavily away from objectively high-impact areas toward high-reward areas. What needs to change to incentivize builders toward social-positive pursuits?
Does knowing how popular a space is dissuade entrepreneurs from building there (ie: look for less competition), or are interests innate and inescapable against all odds?
25% of technical people don’t want to be in charge of engineering. This could be further evidence of people overstating their technical abilities.
Technical people feel like they are highly skilled at product management (In my experience this is sometimes true, but not anywhere close to 80%)
Technical people feel most deficient at sales and marketing. Put another way, if you want to impress a potential technical co-founder, illustrate your proficiency in this category.
Female founders feel more confident than men in design, sales and marketing, and operations.
Setting engineering aside, design is the category where everyone feels weakest. Does this signal high demand for design-savvy co-founders, or do people write-off design as a "down-the-road” or outsourced priority?
How can I find the confidence of the 5% of non-technical people that expect to run engineering? 😂
Prospective founders are overwhelmingly remote-friendly
Being from the same country seems to provide an additional level of comfort or an assumption of less overhead.
Are the people in the “anywhere” column misguided? Is it really possible to have a successful startup when your co-founder is more than three time zones away?
Engineering is in extremely high demand (no surprise there).
Non-technical people see adding an additional co-founder with skills in sales and marketing or operations as redundant.
Almost 45% of technical people are looking for a co-founder to run engineering. Do these people want the experience of being CEO? Do they feel under-qualified to be a CTO? Do they trust other technical people more than those coming from business backgrounds?
I hope you found these statistics as interesting as I did (But even if you didn’t, ✌️).
I’d love to hear your thoughts on the questions I propose as well as any insights from the data that I missed.
I also have a few freeform text fields (Personal description, company pitch, etc.) as well as LinkedIn profiles that I did not turn into charts. I’m still grappling with the right way to analyze the data. Ideas / approaches welcome!
If you liked this article, please consider subscribing, following me on twitter (@abe_clark) and sharing with your network.
As promised, here are a few notes on my approach to gathering this data.
If you’re non-technical, probably best to close the tab now.
When I popped open chrome dev tools I noticed that YC uses graphql. Luckily for me, they pool all of their queries into one large request each time a user is presented with a new potential match.
I grabbed this query, cleaned it up (they use several fragments which I needed to inline) and then used postman to verify I could replicate the same response.
I checked to see if they happened to leave graphql introspection on in production (nope, good job YC tech team). Instead, I spent a bit of time trying other variable names in the schema to see if I could unlock better data. None of my first guesses worked, so i decided to run with the data I had.
The query takes in one parameter: slug. I noticed that slug could take the exact slug of a profile OR it could simply take the word “next”. The frontend passes in “next” when the client advances to the next potential match.
I tested this out a few times and confirmed I was receiving new data.
I wrote a short nodejs script to request the data, sleep for five seconds, then request the data again, passing in “next”. I added the sleep because a couple of years back I nearly got banned from LinkedIn for doing some automated browsing. I am very sure YC’s security is nothing like LinkedIn’s but I was in no rush and a five second sleep would almost certainly ensure I wouldn’t hit a rate limit.
I set up a MongoDB Atlas instance (free!) and simply pushed the graphql result directly into the DB. I added a simple check to see if the slug existed in the database already and skipped saving in that case.
Then, I let the script run. I ran it in several batches while I was working on other projects.
After I got to 13k documents, I noticed every new query was already cached in MongoDB.
From there, I dumped the data to CSV and moved things to google sheets to chart it.
The last step was to run the addresses through a short script referencing the google geocoding API to get the lat/long and also standardize the data (YC data was formatted OK, but there was a long tail of strange formats).