[Outreachy] Why I took up a non-technical internship after getting Math and CS (STEM) degree

I have talked about Outreachy in the past  and how the Outreachy program is helping grow diversity in Open Source community. However, some people have been asking me why I opted for a non-technical internship even though I know programming and have studied Math and CS.  Isn’t this directly in opposition to the cause I am championing – bring more women into STEM and tech?

I don’t agree. At all. I have always been interested in understanding people. How they process information, make decisions, behave in a certain way, engage with others etc. My STEM degree hasn’t inhibited my primary interest in understanding humans – rather, it has helped me develop my rational thinking abilities and grow my passion towards it. Data Science has empowered me to try to find patterns in behavior and use them to not only learn and grow but help others. I feel that my internship at Mozilla is a natural follow up to my previous decisions in my quest to develop my understanding of people. Plus, how can I pass up an opportunity to bring more women and other minorities in tech ?!

At Mozilla, I haven’t given up my work related to data analytics. Instead, I am using data to derive insights into Diversity and Inclusion in Mozilla Community and drive strategic decision making. Having the satisfaction of working on something so impactful for the future generations is what drives me everyday and learning things ranging from succesful interviewing and communication to management – things I would not get the opportunity to learn in a technical role – are just a few of the additional perks along the way 🙂


So what do I do all day then ?

Mozilla’s mission is to ensure the Internet is a global public resource, open and accessible to all. An Internet that truly puts people first, where individuals can shape their own experience and are empowered, safe and independent [1].

At the heart of Mozilla is people — Mozilla is committed to a community that invites in and empowers people to participate fully, introduce new ideas and inspire others, regardless of background, family status, gender, gender identity or expression, sex, sexual orientation, native language, age, ability, race and ethnicity, national origin, socioeconomic status, religion, geographic location or any other dimension of diversity [2]. In lines with this, Mozilla is currently working towards creating a Diversity and Inclusion strategy for Participation.

Focus Groups and Interviewing for Mozilla

In the first phase, Mozilla is  asking Mozillians to self-nominate, or nominate others for a series of focus groups with D&I topics relevant to regional leadership, events, project design and participation in projects and beyond. These insights will generate initiatives and experiments that lead to a first version of the strategy. I have been working with Emma Irwin who has been leading this project on Participation side for Mozilla on understanding Focus Groups, their importance and how to conduct them. In short, surveys assume that people know how they feel. But sometimes they really don’t. Sometimes it takes listening to the opinions of others in a small and safe group setting before they form thoughts and opinions. Focus groups are well suited for those situations. You can read more about Focus Groups here.

We also had a mock Focus Group Sessions, reviewed the script for Focus Groups and learnt about the best practises for Interviewing. Apart from English, we are also trying to conduct Focus Groups in first language in some regions so that language doesn’t lead to exclusion. It is highly important for the interviewees/ focus group candidates to feel connected and comfortable with the interviewer and this has been my prime focus in my research and contributions related to Focus Groups till date. I am also working on conducting Focus Groups in/around India – especially, in person at Bangalore, if possible.

Research on succesful Diversity and Inclusion initiatives in India

At Mozilla D&I team, we are working towards building a library of curated best resources for Diversity and Inclusion from different parts of the world. To have a world wide impact related to Diversity and Inclusion, we need to understand the community’s cultural, historical, national and language contexts and tailor the initiative accordingly. We need to learn from programs beyond FOSS an Open Source and bring the learnings from those into FOSS constructs. India being such a vast and diverse country offers immense opportunity to learn from different programs and ongoing initiatives – to understand their succeses and failures. Currently, I have divided my research into two main focus areas :

  1. Diversity and Inclusion Initiatives started in India.
  2. Diversity and Inclusion Initiatives started outside India, adopted in Indian context.

Programs like ‘Beti Bachao, Beti Padhao’ by Government of India which encourages girl child education, Loreal India Young Women in Science program which offer scholarships for women pursuing STEM degrees are some of the programs which fall in the first category.

Other programs like Grace Hopper Conference India, Girls in Tech , Women Who Code and so on which are mainly programs started in the US and have succesful chapters or initiatives in India fall in the second category. I am also especially interested in understanding the huge success of Outreachy in India.

If you know of any other active/inactive tech/non-tech Diversity and Inclusion programs in India, do let me know and I would be happy to include them in my research. To know more about my findings, keep tuned – I plan to release a blog post every week.

I have also been working on Diversity and Inclusion oriented community metrics. However, I plan to write a more in-depth post on it and hence I have decided to cover this topic in my next post.

Till then, Sayonara !

Diversity in FOSS : Outreachy


tldr ; As a part of my work with Mozilla, I try to analyse different programs working towards diversity and inclusion in FOSS and their successes and failures. I wanted to collect some statistics for Outreachy and understand how it has helped increase the diversity in FOSS projects.

From Outreach Program for Women(OPW) to Outreachy..

Like the rest of the tech industry, the number of women participating in FOSS projects too is generally low. Outreach Program for Women were started to bridge this gap in FOSS projects and bring more women onboard. Women contact the FOSS organizations in the program they want to work with and write up a project proposal for the summer. If accepted, they are mentored by organization members over the duration of the program on their project.

The GNOME Foundation first started the internships program with one round in 2006, and then resumed the effort in 2010 with rounds organized twice a year and is currently in it’s thirteenth round. For the May 2015 round, the program was renamed to Outreachy with the goal of expanding to engage people from various underrepresented groups and was moved to Software Freedom Conservancy as its organizational home. In the December 2015 round, the program opened to people of color from groups underrepresented in technology in the United States, in addition to being open to women (cis and trans), trans men, and genderqueer people internationally.

How Outreachy has helped women

By having a program targeted specifically towards women, the Outreachy organizers have found that they reached talented and passionate participants, who were uncertain about how to start otherwise.

According to the organizers, the program is a welcoming link that will connect you with people working on individual projects in various FOSS organizations and guide you through your first contribution.

Personally, I couldn’t agree more with these two points. Outreachy has surely helped reduce the apprehension women face while first contributing to a FOSS project and feel more included. Additionally, I also feel that the program provides you with a nurturing community and network which remains with you beyond the program.

Impact of Outreachy in numbers and graphs

I wanted to collect some statistics for Outreachy and how it has helped increase the diversity in FOSS projects.

Till now, there have been 13 rounds of Outreachy (including Outreach Program for Women) and 368 women have taken part in the program and worked with FOSS organizations.

To understand the growth of the program and its impact in introducing women to FOSS projects, I created a graph showing the number of organizations participating in each round and number of selected participants.


Some numbers for Outreachy Round 13..

45 participants were selected to work on 41 different projects offered by 14 different organizations in this round like Linux Kernel, Fedora, Mozilla, OpenStack, Wikimedia, Zulip etc.

The following map shows the distribution of Outreachy participants in Round 13 according to their location.


India topped the list with 16 women being selected in the program(40% of total selections) and North America followed behind with 9. I was particularly astonished to see Brazil being 4th in the list with 3 participants ! The selected participants truly form a diverse community with participants from all parts of the globe like Australia, Africa(Cameroon), Russia and even smaller countries like Albania and Philippines. Together there were 9 selections from European continent !

Here is a table showing the number of selected participants with their country :

1 INDIA 16
12 UK 1
18 SPAIN 1


Diversity in projects

From what I could understand, projects were mainly offered in these four categories : software development, research, UI – UX/ design, documentation and data analytics. UI – UX and data analytics projects involved some coding but didn’t seem to be completely development based and hence I have mentioned them separately.

Development 32
UI – UX 5
Research 4
Documentation 2
Data Analytics 2


Does this diversity extend to mentors and program co-ordinators ?

Two-thirds of the program co-ordinators (12 out of 18) from each organization were women. Overall, 12 mentors for different projects were women. Just to note, while the numbers are same – while there are some intersections, not all women mentors are co-ordinators and vica versa 🙂


In a short span of six years, the program has been successful increased the participation of women in FOSS projects and I feel has played a major role in not just working towards bridging the gender gap in FOSS community.



Fedora – A peek into IRC meetings using meetbot data

Many Fedora projects and groups use  IRC channels on for their regular meetings. (Know more about IRC here) Generally, meetings take place in one of the three fedora-meeting channels, #fedora-meeting, #fedora-meeting-1 and #fedora-meeting-2. However, there is no requirement that a meeting take place in these channels only. Many ad-hoc or one-time meetings take place in other channels. Such meetings in IRC channels are normally logged. There is a Meetbot IRC bot in every channel to assist with running meetings, meeting summaries and logging. (Know more about meetbot here and check out the summaries and logs of past meetings on Fedoraproject Meetbot page here .)  To help meeting attendees, Meetbot provides a set of commands like #startmeeting , #endmeeting , #info , #help , #link etc.

With a aim to gather information about Fedora IRC meetings and especially understand about how Fedora contributors interact in these meetings, I turn towards Datagrepper. Datagrepper is a JSON API that lets you query the history of the Fedora Message bus or fedmsg for corresponding data. (Know more about Datagrepper here ). Here is a quick look of raw feed of Datagrepper from fedmsg bus with messages for topics like buildsys.rpm.sign and buildsys.task.state.change :

Screenshot from 2015-10-23 21:47:06

fedmsg has a few meetbot-related topics corresponding to meetbot commands using which I gather daily,weekly and monthly IRC meeting data. You can construct queries for a time period by specifying  by the start and end parameters for the query.Use count variable from JSON data dump to get total number of messages pertaining to our query. (Check out the meetbot-related fedmsg topics here and documentation for constructing queries for Datagrepper here ). You can also use Datagrepper Charts API for some basic visualizations. (Check it out here).

meetbot.meeting.start Messages on this topic get published when an IRC meeting starts.(using #startmeeting meetbot command)

meetbot.meeting.complete : Messages on this topic get published when an IRC meeting ends.(using #endmeeting meetbot command) .

mcomplete mstart

On an average, 99 IRC meetings take place in a month over different channels.(The mean #IRC meetings started monthly is 98 while mean #IRC meetings completed monthly is 100) During December – February, this value has dropped considerably. After looking at weekly #of IRC meetings started as well as completed, we can see that the drop in #IRC meetings in December can attributed due to two weeks during Christmas season( #IRC meetings started start dropping approximately week before Christmas and continue till after New Years).


Weekly mean for #IRC meetings started is 23.05(median 26 highest 33) while that for completed IRC meetings is 23.51.(Median 27 Highest 35) Also, #IRC meetings is particularly low(mostly zero IRC meetings started/completed per day) during Mar 11-18 2015 and Jan 28-Feb 1 and Feb 7-15 2015 ( Bot Outage?) .

dcomplete  dstart

On a normal weekday, generally 3-4 meetings are started/completed. Saturdays have lower values(~1-2) and no meetings are generally held on Sundays.(The average #IRC meetings per day started is 3 while that of #IRC meetings completed is 3.3 and the median for both is 4.) Highest value for #IRC meetings started as well as completed across different channels occurred on 23rd March 2015 (Started 11 Completed 14) – Monday(next working day) after a week with particularly low #IRC meetings. (Mar 11-18 2015)


Using daywise percentage stacked representation, we see that #IRC meetings started and completed is generally the same(started and completed have equal percentage) thus allowing us to conclude that meetings are generally of small durations(less than 24 hrs).The small delta in #IRC meetings started and completed can be attributed to the IRC meetings overlapping between two periods.Also, the deviations caused are during the weeks where #IRC meetings started/completed is very low and hence the large percentage value i.e. Mar 11-18 IRC meetings started is 1 ,but meetings completed is 0, hence 100% of total is due to meetings started(complete blue streak in the graph for such a case).

For visualizations generated using Datagrepper Charts API :

Check here for meetbot.meeting.start

Check  here for meetbot.meeting.complete

meetbot.meeting.topic.update : Messages on this topic get published when meeting topic is updated.(using #topic command)


This is correlated with #IRC meetings with very low values occur in December to March period and in the weeks where #IRC meetings(started/completed) is particularly low. The monthly average for topic update messages generated during IRC meetings is 556.16 (median value is 618 and highest no. of topic.update messages in a month is 708). Weekly mean is 130 messages(median value is 143 and highest no. of topic.update messages in a week is 202).


On an average, 18 topic.update messages are published per day(median value 20) with highest messages published on July 19, 2015(56 messages) .Plotting the daywise average topic.update messages per IRC meeting(we consider #topic.update messages/#IRC meetings started as meeting duration is generally less than a day), we can see that generally meeting topics are updated 4-5 times per meeting (mean 4.07, median 4.71) but there have also been 11-12 average topic updates per IRC meeting.

You can also find visualizations generated using Datagrepper Charts API for meetbot.meeting.topic.update here. : Messages on this topic get published when attendees call for help on items.(using #help meetbot command). This topic was introduced in March end and hence previous values are not available.


The help command, as been by the graphs, is rarely used by IRC meeting attendees with only being used once per month in the past two months. 

For Visualizations generated using Datagrepper Charts API for , Check  here. : Messages on this topic get published when attendees link information to an item(using #link meetbot command). This topic was introduced in March end and hence previous values are not available.


The monthly average number of items linked is 404(median 465) and highest number of items linked in the past year is 567.The weekly average number of items linked is 104.25(median 109) and highest number of items linked in the past year is 183.


On an average, 14 items are linked to in IRC meetings in a day with highest being 61 item linkings within a single day. Also, in an IRC meeting,generally 3-4 items are linked to with 14 being the highest number of items linked to in an IRC meeting.

For visualizations generated using Datagrepper Charts API for, Check here.

Meeting Attendees and Chairs : To get an overview of statistics related to the Fedora contributers attending IRC meetings(attendees and chairs both),I used the meetbot.meeting.complete messages(meetbot.meeting.start messages only show the initial attendees). I used the data for past three months(Aug-Oct 2015)


During the past 3 months, 337 IRC meetings have taken place.On an average, 10 people attended an IRC meeting including the chairs and the mean for size of group of chairs was 4.67 for an IRC meeting(mean 4.67 median 5). Also  the largest meeting in the past three months comprised of 27 attendees and the largest group of chairs included 10 Fedora contributors.

Other Questions to ask :

1.Is there any specific time period in day when IRC meetings generally occur ?

2.Are any channels specifically used? Especially what % of meetings are conducted on channels #fedora-meeting, #fedora-meeting-1 , #fedora-meeting-2 ? Is the distribution of meetings within this channel equal ?

3.Are messages generated equivalently by both chairs and non-chairs or is the message generation partial ?

4.Are messages generated in the past only due to a specific set of users always using this command?

Also check out @threebean ‘s blog posts on Datagrepper here. He is one of the super awesome people behind fedmsg and Datagrepper.

Here is a fun word cloud visualization of IRC meeting attendees over past three months ( Fedora CommOps seems to be very active – can see a lot of CommOps members here @decause , @threebean , @mattdm , @lmacken and @jflory7 and @mailga too !! Yayy !! )