analytics

[Outreachy] Why I took up a non-technical internship after getting Math and CS (STEM) degree

I have talked about Outreachy in the past  and how the Outreachy program is helping grow diversity in Open Source community. However, some people have been asking me why I opted for a non-technical internship even though I know programming and have studied Math and CS.  Isn’t this directly in opposition to the cause I am championing – bring more women into STEM and tech?

I don’t agree. At all. I have always been interested in understanding people. How they process information, make decisions, behave in a certain way, engage with others etc. My STEM degree hasn’t inhibited my primary interest in understanding humans – rather, it has helped me develop my rational thinking abilities and grow my passion towards it. Data Science has empowered me to try to find patterns in behavior and use them to not only learn and grow but help others. I feel that my internship at Mozilla is a natural follow up to my previous decisions in my quest to develop my understanding of people. Plus, how can I pass up an opportunity to bring more women and other minorities in tech ?!

At Mozilla, I haven’t given up my work related to data analytics. Instead, I am using data to derive insights into Diversity and Inclusion in Mozilla Community and drive strategic decision making. Having the satisfaction of working on something so impactful for the future generations is what drives me everyday and learning things ranging from succesful interviewing and communication to management – things I would not get the opportunity to learn in a technical role – are just a few of the additional perks along the way 🙂

mozilla_berlin_01

So what do I do all day then ?

Mozilla’s mission is to ensure the Internet is a global public resource, open and accessible to all. An Internet that truly puts people first, where individuals can shape their own experience and are empowered, safe and independent [1].

At the heart of Mozilla is people — Mozilla is committed to a community that invites in and empowers people to participate fully, introduce new ideas and inspire others, regardless of background, family status, gender, gender identity or expression, sex, sexual orientation, native language, age, ability, race and ethnicity, national origin, socioeconomic status, religion, geographic location or any other dimension of diversity [2]. In lines with this, Mozilla is currently working towards creating a Diversity and Inclusion strategy for Participation.

Focus Groups and Interviewing for Mozilla

In the first phase, Mozilla is  asking Mozillians to self-nominate, or nominate others for a series of focus groups with D&I topics relevant to regional leadership, events, project design and participation in projects and beyond. These insights will generate initiatives and experiments that lead to a first version of the strategy. I have been working with Emma Irwin who has been leading this project on Participation side for Mozilla on understanding Focus Groups, their importance and how to conduct them. In short, surveys assume that people know how they feel. But sometimes they really don’t. Sometimes it takes listening to the opinions of others in a small and safe group setting before they form thoughts and opinions. Focus groups are well suited for those situations. You can read more about Focus Groups here.

We also had a mock Focus Group Sessions, reviewed the script for Focus Groups and learnt about the best practises for Interviewing. Apart from English, we are also trying to conduct Focus Groups in first language in some regions so that language doesn’t lead to exclusion. It is highly important for the interviewees/ focus group candidates to feel connected and comfortable with the interviewer and this has been my prime focus in my research and contributions related to Focus Groups till date. I am also working on conducting Focus Groups in/around India – especially, in person at Bangalore, if possible.

Research on succesful Diversity and Inclusion initiatives in India

At Mozilla D&I team, we are working towards building a library of curated best resources for Diversity and Inclusion from different parts of the world. To have a world wide impact related to Diversity and Inclusion, we need to understand the community’s cultural, historical, national and language contexts and tailor the initiative accordingly. We need to learn from programs beyond FOSS an Open Source and bring the learnings from those into FOSS constructs. India being such a vast and diverse country offers immense opportunity to learn from different programs and ongoing initiatives – to understand their succeses and failures. Currently, I have divided my research into two main focus areas :

  1. Diversity and Inclusion Initiatives started in India.
  2. Diversity and Inclusion Initiatives started outside India, adopted in Indian context.

Programs like ‘Beti Bachao, Beti Padhao’ by Government of India which encourages girl child education, Loreal India Young Women in Science program which offer scholarships for women pursuing STEM degrees are some of the programs which fall in the first category.

Other programs like Grace Hopper Conference India, Girls in Tech , Women Who Code and so on which are mainly programs started in the US and have succesful chapters or initiatives in India fall in the second category. I am also especially interested in understanding the huge success of Outreachy in India.

If you know of any other active/inactive tech/non-tech Diversity and Inclusion programs in India, do let me know and I would be happy to include them in my research. To know more about my findings, keep tuned – I plan to release a blog post every week.

I have also been working on Diversity and Inclusion oriented community metrics. However, I plan to write a more in-depth post on it and hence I have decided to cover this topic in my next post.

Till then, Sayonara !

Advertisements

2015 in Numbers : Fedora CommOps

CommOps is the newest official sub-project in Fedora, and the team’s role is to assist other sub-projects in Fedora. This is done by building and improving interactions within the internal Fedora community, as well as by increasing communication across the Project as a whole. 2015 was an important milestone for the Fedora Community Operations (CommOps) team in so many ways. Remy DeCausemaker,the Fedora Community Action and Impact LeadJustin Flory(jflory7) and the CommOps team as whole recently published an excellent Year-in-Review article on Fedora Community Blog describing the CommOps Team highlights of 2015 and their vision for the upcoming year of 2016.

I did crunch some numbers about the growth of CommOps in 2015 – however they could not be included in the article (which I feel is primarily my fault –  I added them on etherpad but couldnt add them in the post since I was out of town and couldn’t find a WiFi connection with reasonable net speed). Nonetheless, I do feel the need for sharing the analytics and hence this article.

Fedmsg Activity of CommOps Team
        The image shows CommOps fedmsg activity for 2015. This was taken in Jan and hence the sudden drop a month ago due to holiday season – but boy are we rising !
Screenshot from 2016-01-22 15-43-53
        Check out the raw fedmsg activity of CommOps here and datagrepper visualization here . Other teams can generate this graph by replacing commops in the link by their most frequently used team name i.e. ….&contains=commops will become ….&contains=<TEAM_NAME_HERE>

Mailing List Activity  of CommOps Team

You can check out the mailing list archives of CommOps here. Here is a quick graph of the activity on the CommOps mailing list for 2015 :

CommOpsML2015

Some of the longest discussions on CommOps ML have revolved around :

5ftw article (11 comments 4 participants) – CommOps started contributing to etherpad containing possible 5ftw ideas for aticle by mattdm

Marketing meeting timings for 2016  discussion (8 comments , 5 participants) – a good number of CommOps team members are a part of marketing team too

Onboarding new contributors via Outreachy (7 comments 4 participants) – Outreachy is a program which aims at increasing diversity in FOSS. Fedora participated in Dec – March 2016 round with slots for CommOps and Hubs .

Community Blog status (7 comments 3 participants) – One of the biggest milestones of 2015 with Community Blog being launched ! Yaay ! 🙂

Some of the most participated threads have been :

Trac Guide ( 6 comments, 6 participants) – CommOps moved to Ticket based meetings

Fedora Elections(5 comments 5 partcipants) – CommOps helped with organizing Fedora Elections making it 4th most participated election in all time – Yaay ! I helped jkurik organize this round of elections and learnt so much !

Design Team article on CommBlog (5 comments 5 participants)

FLOCK Bids (5 comments, 4 participants)

IRC Activity of CommOps Team

While there are no records of interactions on IRC on team channels, meetings in open channels (#fedora-meeting, #fedora-meeting-1, #fedora-meeting-2 ) are recorded by meetbot. Worth mentioning here is that CommOps just became an official subproject in Fedora – so we now have our own place in meetbot logs.

CommOps started 7 IRC meetings in 2015 in #fedora-meeting-2 channel. You can find some of the meeting logs here and here. IRC151

The above graph shows the number of attendees and chairs amongst then in IRC meetings. While CommOps team meeting size has grown gradually, it is interesting to note that number of chairs has grown too – perhaps because team members are taking a more permanent role in workings of CommOps and are here to stay for long 🙂

Another interesting statistic is the lines spoken in the meeting by attendees where we can see that attendees are not just idle and that CommOps has very interactive meetings 🙂

IRC20152

All in all, the numbers assert that CommOps is a growing community with high interaction amongst its members.

Fedora Community Blog (CommBlog)

The first major accomplishment of CommOps as a sub-project was on November 9th with the announcement of the Community Blog ! Within a short span of three months, CommBlog has had 53 posts published with 11977 views and 38 comments till date.

CommBlog has 62 users in Fedora Community –  48 contributors , 4 editors and 1 author – with the top contributor to CommBlog being bee2502 with 4 posts ( thats me – but wait, what? where is jflory7? )

CommBlog had most views in a single day on 10 Nov 2015 with 1168 views in all. Fedora 24 release dates and schedule was published on CommBlog that day which generated 519 views on the first day itself  The article is also incidentally the most viewed article on CommBlog with 1727 views in all. Wayland and  Porting python packages to python-3 articles come in close second and third respectively.

Elections Retrospective article has the 5th highest views with other posts and candidate interviews being commented on 🙂 In terms of comments, IRC analytics article had the most number of comments (4  comments) while Porting python packages to python-3 article had the most number of pingbacks(5 pingbacks)

Another rising post on CommBlog is the Share your Year in Review article which is garnering a lot of attention 🙂

Check here for a detailed version of analytics related to CommBlog which contains insights on CommBlog viewers, their locations and search activity.

Want to Help?

  1. Join our team in #fedora-commops on Freenode
  2. Join the Community Operations Mailing List
  3. Participate in our weekly meetings

2015 in Numbers : Fedora Community Blog

CommOps is the newest official sub-project in Fedora, and the team’s role is to assist other sub-projects in Fedora. This is done by building and improving interactions within the internal Fedora community, as well as by increasing communication across the Project as a whole. 2015 was an important milestone for the Fedora Community Operations (CommOps) team in so many ways.

Fedora Community Blog (CommBlog)

The first major accomplishment of CommOps as a sub-project was on November 9th with the announcement of the Community Blog ! Within a short span of three months, CommBlog has had 53 posts published with 11977 views and 38 comments till date.

CommBlog Users

CommBlog has 62 users in Fedora Community –  48 contributors , 4 editors and 1 author – with the top contributor to CommBlog being bee2502 with 4 posts ( thats me – but wait, what? where is jflory7? )

Views and Comments on CommBlog articles

CommBlog had most views in a single day on 10 Nov 2015 with 1168 views in all. Fedora 24 release dates and schedule was published on CommBlog that day which generated 519 views on the first day itself  The article is also incidentally the most viewed article on CommBlog with 1727 views in all. Wayland and  Porting python packages to python-3 articles come in close second and third respectively.

A few things I found interesting here are :

  1.  Also these Top 3 articles have > 1000 views while the others have < 300 views – which is still nice but a HUGE difference !
  2. Wayland article is not SEO optimized and still has second highest views.(Perhaps many of our viewers do not come from search engines? Or somehow already know the article links ? More on this later in the post )

I would also like to mention here that Election related posts and interviews also gathered a lot of attention on CommBlog in terms of views as well as comments.

Elections Retrospective article has the 5th highest views with other posts and candidate interviews being commented on 🙂 In terms of comments, IRC analytics article had the most number of comments (4  comments) while Porting python packages to python-3 article had the most number of pingbacks(5 pingbacks)

Another rising post on CommBlog is the Share your Year in Review article which is garnering a lot of attention 🙂

CommBlog Traffic and Social Media

It is also interesting to note that the number of viewers coming in through search engines are only a bit more than Magazine and Twitter.

2015social

Geographical Location of Viewers 

viewsgeo

viewsgeo1

Viewer Clicks and Search Terms  

Viewers seems to be generally searching for Fedora 24 release dates or Election related updates.

searchterm

clicks

 Some things to ponder on / Future work :

  1. A similar posts suggestion – even great if personalized
  2. What type of posts are getting more traction and why ?
  3. How to get contributors to engage more ?

 

 

Fedora – A peek into IRC meetings using meetbot data

Many Fedora projects and groups use  IRC channels on irc.freenode.net for their regular meetings. (Know more about IRC here) Generally, meetings take place in one of the three fedora-meeting channels, #fedora-meeting, #fedora-meeting-1 and #fedora-meeting-2. However, there is no requirement that a meeting take place in these channels only. Many ad-hoc or one-time meetings take place in other channels. Such meetings in IRC channels are normally logged. There is a Meetbot IRC bot in every channel to assist with running meetings, meeting summaries and logging. (Know more about meetbot here and check out the summaries and logs of past meetings on Fedoraproject Meetbot page here .)  To help meeting attendees, Meetbot provides a set of commands like #startmeeting , #endmeeting , #info , #help , #link etc.

With a aim to gather information about Fedora IRC meetings and especially understand about how Fedora contributors interact in these meetings, I turn towards Datagrepper. Datagrepper is a JSON API that lets you query the history of the Fedora Message bus or fedmsg for corresponding data. (Know more about Datagrepper here ). Here is a quick look of raw feed of Datagrepper from fedmsg bus with messages for topics like buildsys.rpm.sign and buildsys.task.state.change :

Screenshot from 2015-10-23 21:47:06

fedmsg has a few meetbot-related topics corresponding to meetbot commands using which I gather daily,weekly and monthly IRC meeting data. You can construct queries for a time period by specifying  by the start and end parameters for the query.Use count variable from JSON data dump to get total number of messages pertaining to our query. (Check out the meetbot-related fedmsg topics here and documentation for constructing queries for Datagrepper here ). You can also use Datagrepper Charts API for some basic visualizations. (Check it out here).

meetbot.meeting.start Messages on this topic get published when an IRC meeting starts.(using #startmeeting meetbot command)

meetbot.meeting.complete : Messages on this topic get published when an IRC meeting ends.(using #endmeeting meetbot command) .

mcomplete mstart

On an average, 99 IRC meetings take place in a month over different channels.(The mean #IRC meetings started monthly is 98 while mean #IRC meetings completed monthly is 100) During December – February, this value has dropped considerably. After looking at weekly #of IRC meetings started as well as completed, we can see that the drop in #IRC meetings in December can attributed due to two weeks during Christmas season( #IRC meetings started start dropping approximately week before Christmas and continue till after New Years).

wcompletedwstart

Weekly mean for #IRC meetings started is 23.05(median 26 highest 33) while that for completed IRC meetings is 23.51.(Median 27 Highest 35) Also, #IRC meetings is particularly low(mostly zero IRC meetings started/completed per day) during Mar 11-18 2015 and Jan 28-Feb 1 and Feb 7-15 2015 ( Bot Outage?) .

dcomplete  dstart

On a normal weekday, generally 3-4 meetings are started/completed. Saturdays have lower values(~1-2) and no meetings are generally held on Sundays.(The average #IRC meetings per day started is 3 while that of #IRC meetings completed is 3.3 and the median for both is 4.) Highest value for #IRC meetings started as well as completed across different channels occurred on 23rd March 2015 (Started 11 Completed 14) – Monday(next working day) after a week with particularly low #IRC meetings. (Mar 11-18 2015)

dstacked

Using daywise percentage stacked representation, we see that #IRC meetings started and completed is generally the same(started and completed have equal percentage) thus allowing us to conclude that meetings are generally of small durations(less than 24 hrs).The small delta in #IRC meetings started and completed can be attributed to the IRC meetings overlapping between two periods.Also, the deviations caused are during the weeks where #IRC meetings started/completed is very low and hence the large percentage value i.e. Mar 11-18 IRC meetings started is 1 ,but meetings completed is 0, hence 100% of total is due to meetings started(complete blue streak in the graph for such a case).

For visualizations generated using Datagrepper Charts API :

Check here for meetbot.meeting.start

Check  here for meetbot.meeting.complete

meetbot.meeting.topic.update : Messages on this topic get published when meeting topic is updated.(using #topic command)

mtopicwupdate

This is correlated with #IRC meetings with very low values occur in December to March period and in the weeks where #IRC meetings(started/completed) is particularly low. The monthly average for topic update messages generated during IRC meetings is 556.16 (median value is 618 and highest no. of topic.update messages in a month is 708). Weekly mean is 130 messages(median value is 143 and highest no. of topic.update messages in a week is 202).

dtopicdavgupdate

On an average, 18 topic.update messages are published per day(median value 20) with highest messages published on July 19, 2015(56 messages) .Plotting the daywise average topic.update messages per IRC meeting(we consider #topic.update messages/#IRC meetings started as meeting duration is generally less than a day), we can see that generally meeting topics are updated 4-5 times per meeting (mean 4.07, median 4.71) but there have also been 11-12 average topic updates per IRC meeting.

You can also find visualizations generated using Datagrepper Charts API for meetbot.meeting.topic.update here.

meetbot.meeting.item.help : Messages on this topic get published when attendees call for help on items.(using #help meetbot command). This topic was introduced in March end and hence previous values are not available.

mhelpwhelp

The help command, as been by the graphs, is rarely used by IRC meeting attendees with only being used once per month in the past two months. 

For Visualizations generated using Datagrepper Charts API for meetbot.meeting.item.help , Check  here.

meetbot.meeting.item.link : Messages on this topic get published when attendees link information to an item(using #link meetbot command). This topic was introduced in March end and hence previous values are not available.

mlinkwlink

The monthly average number of items linked is 404(median 465) and highest number of items linked in the past year is 567.The weekly average number of items linked is 104.25(median 109) and highest number of items linked in the past year is 183.

dlinkdavglink

On an average, 14 items are linked to in IRC meetings in a day with highest being 61 item linkings within a single day. Also, in an IRC meeting,generally 3-4 items are linked to with 14 being the highest number of items linked to in an IRC meeting.

For visualizations generated using Datagrepper Charts API for meetbot.meeting.item.link, Check here.

Meeting Attendees and Chairs : To get an overview of statistics related to the Fedora contributers attending IRC meetings(attendees and chairs both),I used the meetbot.meeting.complete messages(meetbot.meeting.start messages only show the initial attendees). I used the data for past three months(Aug-Oct 2015)

attend

During the past 3 months, 337 IRC meetings have taken place.On an average, 10 people attended an IRC meeting including the chairs and the mean for size of group of chairs was 4.67 for an IRC meeting(mean 4.67 median 5). Also  the largest meeting in the past three months comprised of 27 attendees and the largest group of chairs included 10 Fedora contributors.

Other Questions to ask :

1.Is there any specific time period in day when IRC meetings generally occur ?

2.Are any channels specifically used? Especially what % of meetings are conducted on channels #fedora-meeting, #fedora-meeting-1 , #fedora-meeting-2 ? Is the distribution of meetings within this channel equal ?

3.Are item.link messages generated equivalently by both chairs and non-chairs or is the message generation partial ?

4.Are topic.help messages generated in the past only due to a specific set of users always using this command?

Also check out @threebean ‘s blog posts on Datagrepper here. He is one of the super awesome people behind fedmsg and Datagrepper.

Here is a fun word cloud visualization of IRC meeting attendees over past three months ( Fedora CommOps seems to be very active – can see a lot of CommOps members here @decause , @threebean , @mattdm , @lmacken and @jflory7 and @mailga too !! Yayy !! )

cloud