An approach based on Stack Overflow’s developer survey data from 2011 to 2018.
I have a question permanently rounding my head, I simply don´t understand how is it possible that women are currently closing the gender gap in engineering or maths, but are extremely underrepresented in tech industry. Is it that they simply don´t like it? Are they not encouraged to break into the field due to the overpopulation of men?
It is certainly hard to answer, but there is a clear fact: An evident waste of talent by missing women in tech.
Given the fast growth of Big Data, intuition is becoming one of the highest demanded skills nowadays (not an easy to measure one, though), and is easy to understand that women can contribute with a huge load of it. Data holds stories and patterns within it, so an intuitive mind is needed to extract the valuable information. Of course it involves programming, statistics and so on, but the keypoint of it is about asking the right questions, and it´s a lot of fun.
So, what does data suggest about this? Evidently, to know about the real situation and what could happen in the close future, the best place to look for, are places where those who are curently in the field ask questions and share ideas. There are several sites where everyone involved in programming constantly visit to solve issues, so they could be good sources of data about the tech field. If you have put hands on any programming project, you certainly know what StackOverFlow is.
That´s why I used data from the Annual Developer Survey that StackOverFlow has been doing since 2011. Each year it´s been increasing the number of developers surveyed and questions raised, but mainly focusing in finding trends in countries where developers work, gender, salary, hobbies, formal education, programming languages used at work or out of it, field of development,… Ultimately, they give an interesting overview of the tech field from developer´s point of view.
Part I. Where we are?
We can start asking questions as:
Ok, but what is in reality the proportion of women in tech? Is it increasing with the time?
Is the proportion of students (non-proffesional) different between genders?
Which technologies are growing faster regardless of gender?
The first assumption one could take by intuition is that probably the representation of woman is still quite low, but it should have been growing along the years. Survey´s data shows I was wrong in my assumption:
Please, understand ‘Others’ as all ‘those defining themselves as non male nor female, or did not specify their gender’. Apart from the slight up-ward trend in women between 2016 and 17, and considering the blast of Big Data and tech industry along the period of time analyzed, our data shows that males dominate the field and out of 10 tech workers, less than 1 is a woman, remaining almost as a constant since 2014
This analysis lead us to check the proportion of students per gender. In other words, what percentage of women is working in tech and what percentage is still studying or doing it for hobby?
There is a clear inequality as, when compared with males, it seems that out of the total, the proportion of females studying and not working is higher. However, the difference is not remarkable, meaning that females that study eventually end up landing a job, similarly as males, perhaps with a bit more difficulty. We can deduct from this that the underrepresentation of women in tech is closely related with the traditional little interest of them in the field. Nevertheless, the proportion of students-workers shows that current oportunities are close to be equal.
From this point I asked myself how proffesional use of technologies have evoluted, regardless of the gender. It could be an interesting point of view in order to advice anybody wanting to get into the tech field. It takes time to learn to code, thus it would be smart to get on-board with some of the growing and most demanded programming languages.
As a starting point, it is more efficient to learn what is going to be used 2 years in the future rather than some language is not going to be widely used by that date.
- Python is probably the best choice to start with. It´s simple, fast to learn and versatile, with many different applications as Data Science, backend development … The revolution of Machine Learning, Deep Learning and other Data Science fields should have had a strong relation with its growth.
- Ruby. Although our data doesn´t it grows, it doesn´t say the opposite. Another object oriented languages becoming really popular in backend development.
- Go. Developed by Google in 2009 it is the youngest but with a promissing close future. Designed to replace C++ in the company, it is used in server side development (back end), but can be used for almost anything.
HTML and CSS have been intentionally excluded of the plot, because as most developers need to handle web content, you will need to use them anyway regardless of what intepreted language you learn.
Part II. Where to focus to increase the chances of landing a job?
After having our general picture on mind, let´s dive in what our data can show about people currently working in tech. I used data from 2017 survey, just those ones declaring a real salary (working people), then calculated the proportion of genders in different features. In other words, what countries, types of education, years of experience or tech areas are the ones to look at when trying to start a career being a woman?
Plots below show the percentage of women working out of the total number of workers. For example in Countries, females represent around 9% of the workforce in the United States.
In terms of education our data clearly shows that higher degrees increase the chances for women to land a job, with lower proportion for other education profiles. In contrast, females represent around 15% of working people that have been coding for less than two years, showing the increasing trend of their involvement and success in the field at least in ‘junior ’or ‘entry-level’ positions. This makes sense when looking at the fields where women are more represented: Marketing, Quality Assurance, Machine Learning, Data Analysis and BI, all of them within the last years top growing areas in tech.
Geographically, probably Europe, North America and Australia are the places where you want to be as a woman to get into the tech field, though places as India or Brazil emerge as growing markets.
Part III. What features have more influence on an individual´s salary? Is it different between genders?
Probably one of the first approaches anyone would try to spot differences between genders, is the comparison of absolute salary, realizing that sadly it is often higher for men even in equal conditions and experience. We will do it later, but before let´s have a look at the influence of different features in salary for both genders. What is more important in order to predict a salary, the country, education, experience, communication skills,…?
For that purpose we splitted the data into working females and males, built a predictive model for each group, optimized its performance and plotted the coefficients of the top 15 features. In other words, barplots below show the relevance of top influential features in salary prediction.
Currency and country seem to be very relevant to predict salary for both genders, along with the years of coding experience in the field. It is logical to think that the more experience you get coding, the best your salary.
“Sadly, the lack of information specially in the females dataset (Less than 1000 surveyed females with a declared salary) results in a very noisy model, where features as how you pronounce the word ‘gif’ raises as the second more influential feature, which doesn´t make sense in real life.”
Despite the innacuracy showed in the females model, we can assume that there are not huge differences in what define a salary for men and women as age or programming language, which is relevant to our study case.
We need to take into account the different ranges withing salaries (it is not the same the people earning over or below 50K) in order to have a more realistic picture of the situation. I divided each dataset by the medium value (median), then calculated the average for each subgroup. The results are as follows:
- Female High Salaries Average : $ 90 140
- Male High Salaries Average : $ 88 217
- Female Low Salaries Average : $ 27 676
- Male Low Salaries Average : $ 25 832
Our data shows that in average women salaries are higher than men in tech industry. We need to take into account the inequal number of salaries we have used for the analysis (almost 10 times more male declared salaries), so this is not a purely realistic picture of reality, although it definitely shows that the salary gap (in global terms) is inexistent or quite small.
Through the present article we have answered some questions about the situation and evolution of women in the tech industry.
- We gathered information from several years to visualize the evolution of women working in tech, compared the proportion of students/workers for both genders and analyzed the change in use of different programming languages along time.
- We analyzed the proportion of genders currently working in tech, regarding several features as countries, areas within tech industry, years of coding and education.
- Finally we had a look of influential info when predicting a salary both for males and females, apart from comparing salaries.
Things are changing very fast in almost every field, and tech will be responsible for a great part of it. Talent, passion and intuition will be fundamental skills in the near future, and thousands of new roles will pop up, specially in data-related fields.
Women represent just around 15% of the global tech workforce, meaning there is a huge space for them to come on board. Regarding our analysis, we are lucky to forecast that the balance of the gender gap is a matter of time, as women entered the field much later than men. Indeed it seems that women are gaining traction specially in entry-level positions, and there isn´t a great difference in terms of salary and oportunities to access the industry. We hope that the picture will be quite different in a couple of years.
Every finding reached here are merely observational, they are not part of a formal study.
Does your data says something different? What about your experience? Please share it and make this analysis grow.