Sep 24 2015

R and Google Spreadsheets (and the context)

R-logoR and Google spreadsheets are powerful partners for verification, exploration and delivery of data to people in communities and organizations. Recently I wrote a convenience wrapper around the gs_upload function in Jenny Bryan’s wonderful googlesheets package.  But it takes some context to explain how I use it the way I do and why I think it’s important, so this post has a little bit of R code and a lot of context.  It expands on a Lighting Talk I gave at the Portland R User Group last Wednesday.

In general I’ve found that getting tables or data frames from R into a document is fairly clumsy.  There are several packages available that are designed for the purpose, but to me they seem like more trouble than they are worth.  Often they are easy on the R side but working with the output on the word processing side ends up being laborious and clumsy. However, once a data frame is uploaded to a Google spreadsheet, cutting and pasting some or all of it into a document is very easy,  fast, and it doesn’t add junk that you have to remove later on.  Apart from that, a Google spreadsheet is sometimes even more handy than RStudio’s data viewer (which is saying a lot) for inspecting and pondering what’s actually going on in a data frame.

But the real reason for the small bit of R code that I am sharing is that it helps of with the verification, exploration, delivery, and use of data in a social context.  A Google spreadsheet makes it easy to control access to your data by sending someone a URL and simply giving them access to it.  It’s very easy for two people to look at it together, say, on a phone call.  One or more people can annotate, sort, filter, plot, or pivot the data in it  to their hearts content (depending on their level of skill). If the conversation involves annotating it, a Google spreadsheet is nice because you know that everyone is looking at the same version.  In addition, it actually works as a delivery method in that the data can then be downloaded into whatever environment someone wants.

The context of data and of analysis

JWT-EDABut back to the subject of context.  Back in the 1970s I read a lot of books and articles by statistical visionary John Tukey.  (My copy of EDA is one of my most thumbed, battered and annotated books.)  I can’t find the exact quote, but I’m sure that somewhere Tukey said that “Domain knowledge is essential for data analysis.” I found the statement to be particularly troubling because I didn’t think I had any (certainly not enough) domain knowledge.  I wanted techniques for data analysis that would give me insight in domains where I was pretty sure I was clueless. I wanted to be able to analyze data without knowing much about the context.  Disappointingly, it turns out that, despite what you might think judging by most of discussion about them, statistical techniques and software are no context-free silver bullet!  In retrospect I was learning to appreciate the importance of domain knowledge when in those years I would always take a sick day to read an entire SUGI proceedings the day after it arrived — in part to check out what contexts framed the work of other SAS users.

As I think about that issue decades later, it seems to me that domain knowledge is indeed involved in at least these facets of sense-making with data:

  • Data collection – where and how the data was collected and what it means
  • Coding – how it was recorded and coded (and why and how to decode it)
  • Retrieval – what kinds of ethical and technical issues there are around getting and using the data
  • Munging – the pain and pleasure of cleaning, combining, and organizing the data for use
  • Exploration – having some intuition as to where to look and how to look at it
  • Analysis – understanding what kind of data reduction or analysis is relevant or customary
  • Value assessment – judging the value of the data and the results of an analytical effort
  • Communication for action – communicating the results to people who can take action

These contextual facets (I just made up or unconsciously stole this list, and would love to hear about your list) seem very important to me.  Part of my motivation here is to argue that we need bring more context into R Group discussions and presentations.  In Why Information Grows: The Evolution of Order, from Atoms to Economies, César Hidalgo  (2015) points to this same issue:

It is hard for us humans to separate information from meaning because we cannot help interpreting messages. We infuse messages with meaning automatically, fooling ourselves to believe that the meaning of a message is carried in the message. But it is not. This is only an illusion. Meaning is derived from context and prior knowledge.

Although we manipulate information with R, what we care about and what we seek are the messages that it encodes.  Therefore we always need to be aware of context, drawing on whatever domain knowledge we have that is is consciously or not turning our information into messages.

Communities and their technologies shape data and message

Slow forward 35 or 40 years and I do have some domain knowledge about stewarding technology for communities.  How communities, organizations, and technology all interact has been a long term interest: their interaction is important and determines what data that the community might have or need about itself. I think each item on my list of sense-making facets interacts with issues of community, organization, and technology.

In following diagram from a blog post 2 years ago, the red lines represent organizational boundaries and the ochre lines suggest community boundaries.  I’m still fascinated by the two examples on the far right of this diagram, where an organization (linked “personbytes” of knowledge, in Hidalgo’s terms) is present and plays a role but does not contain the community:CoP-org-configurations-annotated

Although the landscape changes constantly, combining technologies, understanding how they support memory practices, how they do or don’t work together, and how they support being and learning together are a big challenge for communities and for organizations — just as much as they were when we were were writing Digital Habitats.  Here are some examples of the data issues that pop up for free-standing communities that depend on technology for their existence but are not contained by an organization:

  • The Portland R Users group is just fine using Meetup for chit-chat, scheduling, and sharing resources.  A nice meeting room like the one that Simple has provided is a key resource.  And of course people’s willingness to give talks is what keeps it alive.  Although members may be affiliated with an organization, the community’s “organization” is the determination of one individual who keeps the conversation going (and brings pizza!).  Meetup’s clever “Good to see you!” follow-up emails accomplish an individual purpose (recognizing and greeting other people) at the same time that they gather data about the community: attendance and social network information.  The data on a participant’s ratings of a session or who they greeted are available through an API and may be used by Meetup Inc., but it is not readily available to the community itself.  The community’s “organization” is supplied by Meetup.
  • An open source project like OpenRefine relies on Github for its front door, to manage its code, its binary downloads, and its documentation wiki.  An email list, a custom search engine covering community blogs and a Twitter hash tag complete the community’s basic technology infrastructure.  Its “organization” basically consists of the list of contributors. Although that is preserved, imagine how much community history was lost as the tool transitioned from its original creators, Metaweb Technologies, Inc., when they were acquired by Google, which then spun it off as a community-supported product.  Communities are often capable of holding long histories but it’s not automatic, nor necessarily supported by community infrastructure.
  • KM4Dev sprawls over an email list, a wiki, a Ning site, a G+ community, a hash tag, group meetings, Skype group meetings, and other platforms and venues that are unknowable unless you were there or heard about them from someone who was there.  It’s “organization” is constantly in question but it has managed to survive a long time and obtains funding to study itself and accepts donations even though it isn’t a formal entity.  Each of its platforms has a different way to keep track of a member and her activity so data integration is very difficult.  That means the community can’t use its data to argue to the big employers where its members work that KM4Dev is a key part of their professional infrastructure.  It may be that its anti-organization stance, which is reflected in the loose coupling between its tools, is a response to the over-organization of large development bureaucracies.

These three free-standing communities are viable and productive with a minimum of organizational structure. Their data resources mostly serve their needs and are in alignment with community energy. However, as communities grow in size,complexity, ambition, or age, they need some kind of organization at their center (as depicted on the right in the diagram above).  The churches, temples, and meditation centers that I’ve been studying and working with over the last 5 years all eventually need some kind of organization to carry out administrative functions on behalf of their communities.  The question is always: how much “organization” is enough? — or too much?

Collecting data and using it depends on the organized activities that typically happens in an organization.  The arduous task of collecting complex data, using it for diverse purposes, across time, across platforms, and across diverse social contexts requires even more organization. But what is depicted, what the messages are about, is often about community participation — the voluntary and more chaotic side of life that can’t be captured.  The question today is how much data resource is enough?  — or too much?

To get at the messages in the information that one of these organizations keeps, we need to remember that the organization and the community jointly frame data gathering, storage, integration, use, and meaning.  To understand data issues, we have to consider questions such as:

  • Balance between community and organization: are the ways that one serves the other effective and well-understood?  are parts of the community or the organization more important (or out of reach)?
  • Life span and length of memory: how important is it to remember participation, adherence or contribution? how far back does history go?  how much change is required to “stay the same”?
  • Social context and diversity: what locations, languages, or different purposes are represented? how consistently are messages encoded and decoded?
  • Technology dependency, diversity and integration: what parts of the organization or community’s life need to take place on technology owned by the organization itself versus platforms like Facebook or LinkedIn?  how spread out over multiple technologies is the community and how important is integration?

These questions might sound ponderous if we’re just talking about one query or data project but I think they emerge when we do more with data resources in a community-related organization.  We need to deal with all the traditional organizational issues as well as the kind of sense-making issues that communities are always engaged in.

From my R data frame to a distributed Google spreadsheet

Moving information from R to a Google spreadsheet is fairly straightforward.  Taking care to transfer the messages requires some extra steps. For example, what’s a convenient, clear, and consistent name for an object in my R code is not necessarily helpful when delivered to someone else.  Here are some changes I make to names as I upload an R data frame to a Google spreadsheet:

  • A terse data frame name becomes a longer and more descriptive spreadsheet name
  • Variable names are expanded to be more descriptive column headers
  • I never use capitals in variable names but I find that they make column headers easier to read
  • I replace underscores and dots in variable names with spaces, so that column headers consist of words that easily flow into more than one line

Here is an example where I upload a small data frame to a Google spreadsheet.Screen Shot 2015-09-22 at 1.06.40 PM

Here’s a snapshot of the resulting Google spreadsheet:

Screen Shot 2015-09-22 at 1.11.38 PM
Once the data frame is in a Google spreadsheet it’s helpful (and very easy) to:

  • Freeze the first row, so that column labels don’t disappear when you scroll down
  • Bold the first row, so it stands out clearly
  • Center and flow the text in the first row, so that the longer column header isn’t cut off
  • Set the width and formatting of each column appropriately (e.g, set decimal places)
  • Turn “filter” on to allow subsetting at the click of a button
  • Sometimes, specific rows or columns are set to a different color to call attention to a specific issue

Getting to the community’s message

So what domain knowledge that is relevant to data about, by and for a community with an organization at its center?  Despite years accumulating domain knowledge about communities, organizations, and data analysis, there is a lot that I don’t know about the creation and use of the data I’m interested in. Working on behalf of some 250 centers of different sizes, nationalities, and levels of maturity around the globe means that even narrowing my focus to one database, there is not one context but many.  On the data creation side, I find that there are different data entry practices and the volunteers who enter the data turn over regularly; learning about the data and its context is an ongoing process for new volunteers and therefore for me.  Despite common intentions, many inconsistencies and blind spots aren’t visible until people can see the results of their work in a larger or comparative context — like a handy Google spreadsheet.

When a data frame is a report that involves greater complexity than just a simple list, it requires additional explanation such I suggest in the following example.  Hints and suggestions expand on this column-by-column documentation:Screen-Shot-2015-09-16-at-4.52.39-PMAlthough some of the data frames I upload to Google spreadsheets are single-use, look-once, copy-once, and throw away, some of them are longer-lived.  When R is joining information from many different sources (e.g., MySQL, Google Analytics, MailChimp, web scraping, etc.) or is replicating a report many times over, a complete description of the data and its context is worth the time.

But nothing I write about the data is the last word.  Eliciting knowledge from sense-making partners in a community and its organization is a key step in making the data resource useful. A Google spreadsheet seems like an ideal vehicle for negotiating and understanding the different assumptions and meanings that transform the information that I have into a meaningful message for my partners.  I find that to them “information” is boring, but messages about people, processes and possibilities are interesting because they can lead to growth and benefit.


Comments Off on R and Google Spreadsheets (and the context)

Jan 16 2014

Learning about fast learning

Sean Murphy and I have been experimenting with a form of fast learning — and figuring out what we’re doing as we go along.  We’ve evolved from thinking of it as a clinic to something else. The current tag that I’m using on Twitter and in Evernote is mvpOODA.

At the very beginning the OODA idea wasn’t part of it.  But after sessions with Dixie Good, Terry Frazier, Phillip Grunewald, Eugene Chuvyrov, and a team from NWEI, we saw that the sense-making we were exploring in these sessions had a larger context.   OODA for Observe -> Orient -> Decide -> Act seems to fit the bill.

We see the “Orient” step as a kind of sensemaking that is crucial but subject to our cultural biases and blind spots.  Orienting or re-orienting either takes a long time or we have to rely on habits or existing maps when we are in complex situations (e.g., all situations where learning is required).  What’s interesting about the people we’ve talked to in this series is that they are subject to harsh time pressures at the same time that they are trying to break the mold somehow.  They are entrepreneurs.

Just throwing ideas about solutions at them doesn’t seem to help so much, so we are asking them to:

  • Describe what they see
  • Describe the vantage points or sources of data that might be available (or could be constructed)
  • Describe the interactions (including between people, process, platform, and practices) as they understand them
  • Talk about goals and outcomes

And we avoid giving advice: the Decide and Act steps are really not up to us.

Tomorrow we’re meeting with Rob Callaghan and Herbie Winsted to talk about a very complex business growth issue.

Hopefully we are helping our panelists and they are clearly helping us figure out what this process is about.

photo credit: Georgie Pauwels via photopin cc

Comments Off on Learning about fast learning

Dec 03 2013

Minimum viable process-practice-platform-product

Sean Murphy and I are continuing to explore the process whereby entrepreneurs and innovators figure out what the minimum viable P (process or practice or platform or product) could be.  The idea is that the viable P enables further development and learning.  Without a viable P (that’s fully social) we have blind spots that are insurmountable.  So that first step is really important.  Sean has just published the Recap from our November 20, 2013 MVP Clinic with Phillip Grunewald and Eugene Chuvyrov.  Here’s a recap of the recap:

Overview: exploring how to identify some key problems in communities where the presenters members, trying to understand how to research them, and how to contribute to solving those problems.  Two very different people facing analogous situations: one is a researcher looking for action research topics in the KM4Dev community, the other is an entrepreneur who wants to make athletic contests more engaging for contestants and the audience by providing more information that is mobile device friendly.

(You can listen to the audio recording from

We have scheduled two more MVP Clinics — please plan to join us by registering now!

Commonalities between the two cases that were presented on November 20, 2013

  • Challenges in understanding the embedded (often invisible) interests, incentives and assumptions of different groups
  • Assumptions about boundaries of organizations that interact with those communities
  • Change management perspective is necessary but is challenging to apply in a community context — it is more of an organizational term, based on a high degree of control
  • watching a school of fish trying to determine how they decide to change direction
  • both were familiar with communities but may not have appreciated impact of incentives

Panelists for this session:

Comments Off on Minimum viable process-practice-platform-product

Dec 01 2013

Mapping a community – easy and not-so-easy

I’m resonating with how Joitske Hulsebosch has organized “Tools for social network analysis from beginners to advanced levels.”  It’s always safer when you can start at “the shallow end of the pool” and get into deeper and deeper water as you go.  In diving into the real-life river of data, we may not know just how deep things are — and only discover the shallow end (meaning “easier”) methods later! This is the second in a series of posts about my attempt to hold a mirror up to KM4Dev in a project funded by a grant from IFAD.

As I mentioned in my previous post, sometimes there is a ready-made visualization, if we can turn it to our current purpose.  Figure 4 in my Phase One Report is almost lifted straight out of Dgroups – the host for KM4Dev’s email discussion list. Here is a world map showing where KM4Dev members are from:


If you are a member of KM4dev you can have a look for yourself, as membership changes and the screen-shot above is now out of date:

However, I found that map to be hard to read, even thought it’s a nifty way of looking at the community’s geographic distribution.  So I used Snagit to capture it and try to make it more legible for inclusion in the report.  Snagit has a color substitution tool the make the grey and the green darker: Fig-4-dgroups-membership-map-oct2013


To me, the number of countries and people registered from “the developing South” is one of the indicators of KM4Dev’s value – it brings practitioners from all over the world into an ongoing conversation. The Dgroups platform has other depictions but this is the most vivid one.  Increasing the color contrast is very easy and it’s a big improvement.

The next example is a good bit more involved, using the wonderful and free data cleaning tool OpenRefine to standardize the names of the countries people give when they register on the KM4Dev Ning site.  One of OpenRefine’s most remarkable features is its ability to compare and reconcile values in your dataset to a public source of “standard names”, in this case country names:

Original Value Modified Count
Armênia Armenia 1
Côte d’Ivoire Cote d Ivoire 3
China China, mainland 6
Colômbia Colombia 39
Congo Congo, Republic of the 1
Congo, The Democratic Republic Of The 2
Laos Lao Peoples Democratic Republic 3
Moldova Moldova, Republic of 1
Palestinian territories Palestinian Territory, Occupied 3
Russia Russian Federation 7
South Korea Korea, Republic of 6
Syria Syrian Arab Republic 5
Tanzania Tanzania, United Republic Of 13

When you have almost 3,500 rows (representing one member in each row) a tool like OpenRefine is pretty important. This was my first time using OpenRefine’s reconciliation process; it felt like I was stumbling a bit, so I’m not going to try to describe it.  But once the country names have been standardized, a Pivot table (in Excel or Google spreadsheet) gives you the counts by country.  Google Spreadsheets has a nifty tool to chart countries on a map.  I presumed that standard country names were required.


The dataset and this map are available.  Larger and darker dots represent a greater number of registered members and I’ve made the ocean bluer than Google’s version. A bit of playing around with the mapping tool suggested that mapping the logarithm of the count of members was more useful than mapping the raw count.  Note that with this many countries in the table, the map takes a while to render.  Until I figured out it was just slow, I was worried that I had somehow dropped the counts for the United States, which has a lot of members but is alphabetically toward the end.

Although this map takes a lot more effort to produce than the Dgroups example, it doesn’t suggest a radically different distribution of KM4Dev members around the world, even though it depicts data from a completely separate registration process (people register on Dgroups but not Ning, and vice-versa).  However, there’s nothing like having the detail data in hand to permit further analysis.  I did not have time to do so, but several possibilities spring to mind imemdiately:

  • Amount of activity: are people in all countries just as active?
  • Recency: how has Ning registration spread over time?
  • Role: are leadership roles spread as evenly as membership is?

It might be hard to stop.  So many questions, so little time.

The last example looks at yet another dataset that represents the KM4Dev community, this time a survey of members reported in the KM4Dev Baseline L&M Survey 2013. The authors of the survey were kind enough to give me a spread-sheet from Survey Monkey.  I was wondering whether survey respondents were as widely distributed geographically as KM4Dev members (as suggested in the other maps above).  Let’s start with what I found and look at the details afterward.  The survey did not ask for any details that would identify an individual respondent so cross-referencing of any sort was not possible.  However, I used the IP address that Survey Monkey collects to give me some assurance that the survey includes people from around the world.  Here are the results in tabular and map form:

Many responses per country: United States: 29; United Kingdom: 17; Netherlands: 10; Switzerland: 7; Canada: 6; Ethiopia: 6; India: 5.
Four responses per country: Colombia, Lithuania, and Uganda.
Three responses per country: Brazil, Nepal, South Africa, and Spain.
Two responses per country: Bangladesh, Belgium, Costa Rica, Ecuador, France, Germany, Kenya, Malaysia, Mexico, Nigeria, and Philippines.
One response per country: Australia, Botswana, Burkina Faso, Chile, Denmark, Djibouti, Fiji, Indonesia, Jordan, Pakistan, Paraguay, Peru, Senegal, Trinidad and Tobago, Tunisia,
Three responses had blank IP addresses so no country information is available.

This map was produced by the Google Spreadsheet map chart tool using the same data:
L&M-survey-respondents-2013-locationThat seems like a nice footnote to the a good study: geographic distribution makes us trust the survey’s conclusions more.  But this last example is also a reminder that mapping in particular and data analysis in general can take a lot of effort to gain modest insights.  But you don’t know until you look, so you ought to look.

Specifically I was interested in whether people’s satisfaction with KM4Dev’s several platforms had a geographic pattern.  Does lack of bandwidth affect satisfaction with more bandwidth-intensive means of communication (e.g., Ning compared to Dgroups)? The received wisdom is that email is more accessible when you are in a low-bandwidth situation or where Internet access tends to be intermittent.  Here’s the final table I arrived at which compares “South” and “North” in terms of “satisfaction with the Ning platform.” “North” here includes Europe, North America and Australia, where roughly speaking we can assume higher and more stable bandwidth than what is found in other countries (here, following European custom, labeled “the South”):

South North South % North %
Very much 1 4 3% 7%
Much 2 9 5% 16%
Sufficiently 12 11 30% 20%
Not very much 15 21 38% 38%
Not at all 10 10 25% 18%
 Total 40 55 100% 100%

I don’t see a systematic difference between the South and the North columns, do you?

Of course this table is more of a provocation than “an answer” because of who chose to respond to the survey and many other factors.  For example, IP address tells us where respondents were when they completed the survey, not when they formed usability or satisfaction opinions.  It is interesting that one of the more active groups on the KM4Dev Ning site is “KM4Dev for Africa“!  It seems meaningful to me that the membership of KM4Dev is spread around and members are in (or come from) so many different countries. But the fact that opinions (and possibly behaviors) do not seem to be correlated with geography in this case is a reminder that people are people, that contribution and participation are happening all over the world. There are probably more differences with regard to knowledge management,  knowledge sharing, and access to the Internet within any given country than there are between countries.

I have found that it can take some thought to turn the Excel spreadsheet that Survey Monkey generates into something that is easy to use, but it’s not too difficult.  But how can you get location from an IP (or “Internet Protocol” address?  Well, you guessed it,  OpenRefine to the rescue. Once you have simplified Survey Monkey’s spreadsheet, you can import it into OpenRefine and look up an IP address to find a location.  OpenRefine can combine a string such as with another string such as (which happens to be the IP address of “”) to form a URL such as follows for all the rows in an OpenRefine “project” (in our case, 144 responses to the survey):

OpenRefine has a nifty command to create a new column with the results of retrieving a URL in an existing column.  When you do that with the URL above, you get geographic details for Google in JSON format:

{"ip":"","country_code":"US","country_name":"United States","region_code":"CA","region_name":"California","city":"Mountain View","zipcode":"94043","latitude":37.4192,"longitude":-122.0574,"metro_code":"807","areacode":"650"} can return the same information in other formats and will show you your IP number.  Once Freegeoip has done its work, OpenRefine makes it easy for you to create new columns to split out the details into separate columns with codes such as value.parseJson().country_name

Similar commands are used to extract longitude and latitude, etc.  Finally, latitude and longitude are handy both for grouping countries into continents or other aggregations (as above, where I grouped countries into “north” and “south”) as well as for double-checking your work, so that “Belgium” doesn’t end up in the “Developing South” group:


OpenRefine advertises itself as a tool for working with “messy data” and data about communities will almost always be messy!  It’s worth learning how to use.

2 responses so far

Nov 22 2013

Community questions and possibilities from a little data

Do we ever use all the data that’s out there (and that we generate every day) for the benefit of our communities?  How could the data that we generate help our communities see themselves in new or more useful ways?  The burgeoning new industry around Big Data is built on the premise that collection is ubiquitous, cost of access and processing are falling rapidly, and our knowledge about creating useful visualizations is growing.  For the last year I’ve been mulling whether these assumptions apply to the communities in which I participate (as a member or as a consultant).

There are two characteristics of KM4Dev, where I’m both a member and (for a brief period of time) a paid consultant, that make the community and its potential use of data significant.  The first is that the community lives on multiple platforms that are completely independent of each other, so there is always some uncertainty about community boundaries and who is included or not in any given data frame.  There are many questions we might have that would require combining data from the several different platforms and that can take a lot of effort or be impossible for various reasons.  All of that makes using data for a community more challenging.  Communities that use multiple platform like KM4Dev really are the norm; one-platform communities are the exceptional (but well-studied) case.  The second characteristic of KM4Dev that is important here is that it is independent and self-funding.  Often it is sponsors and advertisers that are driving the collection and interpretation of data about communities: since I am both a member and am funded to look at KM4Dev at the moment, it seems important to dig into the issues and explain myself. I’ve put together several blog posts from the current project I’m doing for KM4Dev.

I’ve pulled examples from my phase 1 report and added some discussion and details that don’t really belong in the report.  I’m trying to do several things in this series of posts:

  • Unpack some of the data access issues
  • Illustrate some of steps involved in producing a graph or table output
  • Say what I see in the output from a community development perspective
  • Describe what next steps could or should be and why I stopped where I did

The first step of course is recognizing that there might be data that’s relevant and helpful. Community development folks like me tend to be a very intuitive lot that’s oriented toward person-to-person interaction, so it’s a bit of a leap to reach for “hard data.” As I’ll show in the case of KM4Dev, figuring out where the useful data might be and how to get access can be a bit of a project. One obstacle is that data access and analysis is quite professionalized, involving specialized tools and invoking high standards.  Some day I can show you the scars from tongue lashings from statisticians telling me about my crimes against good practice: painful and discouraging at the time. I’m going to argue that rough, quick and dirty is pretty easy and good enough.

Sometimes there is no obstacle to getting access to the data.  You have access because you are a member and it’s already summarized for you.  Linkedin, for example, has ready-made statistics for its groups:


It has to be said Linkedin isn’t where most KM4Dev interactions occur, so we have to consider whether the bar chart in Figure 7 represents the general KM4Dev population.

Figure 7



The answer is: probably not, but the graphic gets us thinking about the organizational affiliation of KM4Dev community members. When I think about it, people who participate in KM4Dev conversations are fairly senior and they are quite accomplished. All I had to do to produce Figure 7 was capture a screen-print (with Snagit) and paste it in my report.  Access was easy but interpretation not so much.  Still the point is that no data or graphic is going to be the last word: suggestions, pointers and indications are all we should expect.  If a graphic produces a good question, it has done its job.

Things aren’t always that easy, hopefully our curiosity is stimulated and we keep looking.  In this case, it turns out that the KM4Dev community’s front door ( is a Ning site that has been set up to collect interesting data from people when they register.  And several years ago I offered to help do some chore that required admin access.  I had forgotten that I had access but it turned out that I still have admin access to the site, so I could just download a CSV file that could be easily opened in Excel or manipulated by other programs.  I have found that having to explain to someone what you are going to do with the data before you’ve actually seen it can be difficult: you don’t know what you are going to find or what you might do with the data until you start interacting with it. And getting authorization, getting your hands on it and certainly getting the data in a form that allows you think about your community can take more time than you might think or than you might think it’s worth. In fact, having access to the data will almost always raise more questions than it answers and could even be misleading, so this first step is one where it’s easy to give up.

Again, Figure 6 might be a slightly different group than the one that interacts on the KM4Dev discussion list, but this Wordle from the Ning members dataset is more interesting and more specific than what we saw in Linkedin:

Figure 6 – Job descriptions



The member dataset in the Ning site has a column labeled “Occupation/Title”. When they register, people can describe themselves using whatever terms make sense to them. Putting these descriptors in reveals variation in punctuation and capitalization which we can standardize with a text editor (in my case,  A wordle is a nice way to visualize the several words that go with the main ones, “knowledge” and “management.”  The next step with this bit of data might be to standardize all the job titles so as to group community members and, possibly, understand what those differences might mean.

Here is a simple display based on calculating a member’s age from the birth date they provided when registering on the Ning platform.

Figure 5 – age distribution


This basic histogram, which shows the distribution in age, is trivial to produce when you use a statistical package like R, which I’ve been learning to use over the past year:

What is striking to me about the age distribution (apart from the several cases where the age calculation results in a number great than 90) is how very young KM4Dev membership is.  The next step might be to break the dataset down into age bands to see whether there are any patterns of participation that vary with age.  For example, do older members post to the Dgroup discussion more frequently than the young ones?

The final example in this post involved quite a bit of effort to produce a simple result, showing the percentage of members from the different types of organization:

Table 1 – Organizational affiliation
Organization type Count Percent
NGO/INGO 912 24%
Academic/Research 669 17%
Individual/Consultant 575 15%
UN/Multilateral 559 15%
Private Sector 451 12%
Government agency/Bilateral 420 11%
Other 216 6%
CBO 32 1%
Total people registered 3834 100%
Although  the profile form on the Ning site has a restricted number of organization types, the membership data looks like this:
  1. INGO
  2. INGO,Academic/Research
  3. Government agency/Bilateral,INGO

In these three samples, you can see that some people just chose one category while others chose many(up to 5) and they were recorded in different orders. In addition, sometimes the different categories were separated by a comma and sometimes by a horizontal bar character (“|”).  Using a wonderful and free data cleaning tool named OpenRefine, it takes six steps to count two people in the INGO category in previous three examples, one half person each in Academic/Research and Government agency/Bilateral.  In addition, I collapsed the counts for NGO and INGO as well.  OpenRefine does a great job of saving the code that you construct interactively and it also keeps a step-by-step description of what you’ve done.  Here is the partly-intelligible description of the six steps:

  1. Create column n-cats at index 1 based on column Row Labels using expression grel:value.split(/,|//).length()
  2. Split column Row Labels by separator
  3. Create column weighted-once at index 13 based on column once using expression grel:value / cells[“n-cats”].value
  4. Create column weighted-many at index 15 based on column many using expression grel:value / cells[“n-cats”].value
  5. Transpose cells in 11 column(s) starting with Row Labels 1 into rows in two new columns named column-key and column-value
  6. Remove column column-key

There were some additional steps in Excel that were involved in producing this table, but you can see how a considerable amount of complexity is involved in producing something simpler to look at and easier to think about.

What I see in this table is a very productive diversity of organizations represented.  And that is both a source of fruitful dialog at the same time as it creates a challenge: KM4Dev probably has a different value proposition for people from different types of organizations and people from those different types of organizations are able to make different kinds of contributions to the community.  Garnering support for infrastructure or other necessary investments (like the occasional bout of data analysis) from such different kinds of organization is likely to be very tricky.  A further step would be to compare the amount and kind of contributions that come from people in those different categories.  It would also be interesting to look for common characteristics among people who combine categories in different ways (e.g., one foot in academia and one foot in any of the several other categories of organization).

3 responses so far

Nov 12 2013

Walking around it to find a problem’s shape

maze-walk-medium_4309560642Working together and staying in touch over more than ten years, Sean Murphy and I have tried a lot of different ways of learning together, from each other and from others.   We kept at it and learned from our experience, and we’ve been able to help clients learn. We’ve even helped clients learn to learn.  We’ve focused on the development of a “minimum viable product or process or “platform” or practice,” the idea being that there’s a lot about a new offering that you can’t learn until you are actually offering it.  Recently we decided to explore our practice this further with people in public, on a regular basis.  Essentially we are exploring our own  MVP in a series of webinars.   Here is what we came up for our initial invitation:

  • If you are planning a new service offering, involving technologies and social interactions between customers, this clinic on minimum viable service can help you learn your way out of conflicting assumptions, lack of relevant data, difficulty understanding service value, and resource constraints. This is especially the case if you need to get adoption by a newly forming or an existing community, that may be contained within one firm or span many.  Drawing on their experience in new product introduction and communities of practice, Sean Murphy of SKMurphy and John David Smith of Learning Alliances, will demonstrate the value of a “walking around the problem” technique for early service design that they have developed individually and together over many years.

Terry Frazier and Dixie Griffin Good were the panelists for our first effort and we posted a recording and our meeting notes on Sean’s Blog.  In making plans for future sessions, I found some notes describing what we were trying to do.  I’ve edited them here as an overview of our process:

Or download directly from

When “walking around a problem” we work to:

  • Deliberately avoid “jumping to conclusions” too soon
  • Enlarge the scope of solution-finding by getting to know more about what the problem looks like from as many sides as we can
  • Create a safe and non-confrontational inquiry process that doesn’t inadvertently close off aspects of a problem or potential solutions
  • Use informal tools that are the electronic equivalent of a shared napkin so that we create a resource that people can come back to
  • Acknowledge time boundaries by identifying experiments or sources of information that could make the problem and its solution clearer

Experimentation is now a standard business process for market exploration and customer discovery. When designing experiments (or identifying new sources of information) that support leveraging a minimum viable product or process, we seek to identify:

  • Real world interactions with real people who could be real customers
  • Blind spots that can hide possible experiments from consideration
  • The lowest investment & highest yield

When taking notes in real time as we explore an issue

  • Balance expansive conversation with retention (notes can be referred to later)
  • Notice many more possibilities than could possibly be explored in the short term
  • Everybody is involved in taking or editing the notes to make corrections and add references
  • We try to follow-up with a summary that highlights follow-up actions (including things to avoid).
If you would like to be a panelist, contact me or Sean.

We have planned three more MVP Clinics for Social/Community Applications

photo credit: CarbonNYC via photopin cc

Comments Off on Walking around it to find a problem’s shape

Oct 16 2013

Some ideas about a tool for community reflection

One initiative I took on as part of the IFAD synthesis project got going due to a conversation with Philipp Grunewald where we were wondering how a community like KM4Dev could guide a researcher’s activities.  Philipp was willing to give 2 hours a week to the community in some kind of research effort, but wanted to find a topic that was useful or valued by the community, rather than pushing a researcher’s perspective on the community.

We experimented with (or “wiki surveys”), a tool for public opinion and consultation, and indeed we got new ideas and some interesting guidance on the most popular and resonant ideas.   The ideas that we came up with to seed the inquiry seemed to  stimulate new ones but did not swamp or block the thinking.  I think that this tool could be useful for  KM4Dev in the future and it might play a useful role for other free-standing communities  that are long lived and have grown large enough that too much conversation is a bigger problem than not enough.monthly-km4dev-dgroups-posting-sep2000-to-feb2012

If all of us shared all of our ideas on the Dgroup and then started working on ranking them we would all go mad, end up devoting all our time to KM4Dev, or (most likely) just give up.  And the main topic of KM4Dev is knowledge management and sharing in development organizations, not how can one researcher help out.   But the KM4Dev community has grown large  so that sometimes the “thinking together” that was its hallmark 5 or 10 years ago is difficult with the existing set of tools; this is not a reflection on present membership, leadership or community culture.  It’s just a limitation of the bandwidth that an email list affords.

If you think growing beyond what an email list will comfortably allow, consider the alternative. Should it grow smaller, somehow?  Divide up?  Shut out new people?  Only talk about the “truly important stuff”?  Move to another platform entirely?  Because the KM4Dev community is bigger, it’s also more diverse, which is reflected in the different languages (in the sense of different jargons that show up), interests, disciplines, knowledge, work-contexts, and motivations that you can see by following the discussions. This also makes the community all the more valuable, so dealing with growth is all the more important.

More about Wiki Surveys as a tool for communities

Basically Wiki Surveys is a tool that lies in between a closed form survey and an open-ended interview.   Open-ended interviews can yield rich detail and new categories but get very expensive fairly quickly.  Closed-form surveys (e.g., Survey Monkey or Google Forms) are limited by the questions we ask, which so easily miss the insight that a community holds.  Wiki Surveys is easy to use and I think solves some of the dilemmas between the two kinds of information gathering modes.  It is obviously a more involved and sophisticated tool, but it could become as commonplace in the way that solves the problem of scheduling a large group.  , provided there’s some understanding of it (which is up to us to help with).

All Our Ideas is a research project based at Princeton University that is dedicated to creating new ways of collecting social data. You can learn more about the theory and methods behind our project by reading Matthew J. Salganik and Karen E.C. Levy, Wiki surveys: Open and quantifiable social data collection or watching a talk that explains the logic behind the tool.  The authors cite three characteristics of the Wiki Surveys tool that resonate with a community of practice context:

  • greedy: it incorporate both the information from prolific as well as non-prolific contributors; other tools force us to choose between the two; communities of practice inherently have to the balance both ends of such a spectrum, so why restrict contributions or guidance to one end of the spectrum or another?
  • collaborative: it allows people to suggest new ideas that had not been anticipated when the original question was formed; it’s in the interactions in communities that they produce new ideas, so shouldn’t the tools we use to consult with a community do just that?
  • adaptive: it uses what is known so far to guide the inquiry, insuring that new ideas are tested against previously submitted ones (and tries to test new ideas so that they have as close to “an equal chance” as possible).  Although deep and old roots make a community solid, it’s today’s insights that should guide the future, wo why would we allow our tools to be too constrained by past terminology or question framing?

The tool produces a score for each idea that’s submitted. The score is based on pairwise comparison with other ideas.  User-submitted ideas have an equal chance to “catch up” because they are considered somewhat more frequently.  The new, user-submitted ideas turn out to be quite interesting.  Salganik and Levy suggest that they frequently often contain two useful kinds of ideas:

  • alternative framing  of a problem where an idea is expressed in natural language, with a context that is different and useful.  For example, an OECD survey about education where the top idea was a very pithy and vivid, “Teach to think, not to regurgitate.”  It seems to me that the discourse in a community of practice is as much about framing problems in a new way as it is about “solving” problems.
  • novel information where information is brought to the process that is really new or different.  For example, a study for New York City suggested that docking ships that were not plugged into the electrical grid produced emissions equivalent to 12,000 cars per ship.  I’ve written about communities as effective mechanisms for gathering complex information from the landscape.


To ask the KM4Dev community we asked the question, “What topic would you like to explore with Philipp on behalf of KM4Dev in the next 7 months?”  We came up with a series of “seed” ideas that seemed reasonable but have found that the user-submitted ideas have higher scores than the ones we submitted. It turns out that user-submitted ideas often have much lower scores than the seed ideas because of the amount of variance in user-submitted ideas.  Some are very good (popular) and others are unpopular.


Red dots on the top panel are the scores for user-submitted ideas and the blue dots are “seed” ideas.

The voting is now closed, but you can see the results here:  I think there is a lot to think about in the results.  There’s no substitute for thinking about why, of the 32 ideas that were considered,  two of the most unpopular ideas were among the ones that we had originally proposed and seemed quite sensible at the time:

  • Self-awareness of community relevance
  • Relationships between KM4Dev and KBF (or other adjacent communities)

Most popular ideas were both user-contributed:

  • Does KM4Dev have to be a community? Or is it good enough if it is a platform that provides desired services?
  • Strategic KM4Dev — examples, analysis, orientation

It’s worth noting that uses the same strategy for tracking and showing geographic diversity as I did when I re-analyzed the data in a KM4Dev survey.


I think wiki surveys shows great promise for reflection and inquiry in a community setting.

Comments Off on Some ideas about a tool for community reflection

Oct 10 2013

Learning in and around a “known community”

I was recently hired by the KM4Dev  core group to  synthesise the results of the three completed tasks and to provide an analysis of insights and recommendations to KM4Dev as to how it could further develop. We agreed that I would write up process notes on my blog, so I expect to write several posts with observations about KM4Dev and what I learn doing this project.

Because I’ve been involved with KM4Dev off and on over the years, the community is somewhat familiar  and I know some of the people and at the same time I recognize just how much I don’t know about the community, the people or the work they do.  To me that’s an exciting mix.

The first thing about this project is that the client is a community and that’s different from a client that’s an organization.  That’s a distinction that the Core Team that hired me is struggling with and that I see in many settings, between human interactions as enacting community or as achieving organizational ends.  This issue surfaced right away when I read a recommendation in one of the reports that, “KM4Dev should consider [several actions here] …. ”  Since there is no one individual who fully represents the community or who can really speak for it (much less act on its behalf), I decided that working for a client that is a community must include some effort to bring findings and suggestions to people’s awareness.  So part of what I’m doing is offering “provocations” or suggestions in the context of ongoing KM4Dev discussions.  So far one provocation is about “a newsletter for KM4Dev” and the other is about “research on behalf of KM4Dev” using as an idea collection and ranking mechanism.  Even though I’ve been thinking about this question of the balance between community and organization as frameworks for human interaction, exactly how this balance works for KM4Dev is unknown to me.

A second area that I noticed as simultaneously familiar and un-familiar is the email list platform that KM4Dev is built on.  What could be more ordinary or more invisible as community infrastructure than an email list?  Well, to my surprise, Dgroups requires each post to be approved! I hadn’t posted to KM4Dev for a while and so when I got a response saying, “your posting has been accepted into a moderation queue,” I thought, “Oh, I think I’ll be posting more frequently, I need to get on the ‘pre-approved people’ list!” When I asked the moderator, I got nice emails from Nancy White and Lucie Lamoreaux explaining that everyone’s posts had to be moderated.  Go figure.

The job of curation in a community is unending, but in this project, starting from where you’re at meant that the first job was to gather all the artifacts from the previous studies together in one place.  In the spirit of, “there will be nobody to clean up or curate after me” and “I’m reporting to a very distributed community,” I decided to create a little template for the project to link all of the stuff I produce on the Wiki together.  The unfamiliar and moment of truth (where I recognized that I didn’t want to delve further) was when I began using the Form and Discussion Template that Davide Piga had created for the community.

Even though I know some people in KM4Dev from attending face-to-face meetings, from occasional participation in the email list discussions, and because there is a certain amount of membership overlap between KM4Dev and CPsquare, in any given conversation I know some of the people who participate and don’t know others.  Part of my provocations strategy has been to strike up side conversations with people like Philipp Grunewald, Tina Hetzel, and Anna Downie.

I’ve always been struck by how there are pockets of competence (technology related and otherwise) in a community of practice — not everyone knows the same stuff nor can do the same things. One of the online collaboration practices that I hold dear is using a Google Doc or etherpad for writing real-time meeting notes.  I was heartened to find in my first meeting with the core group that people jumped right on it and started modifying the agenda I had prepared, quickly added to the notes and actually jotted down things that I was saying on the Skype call.  After the meeting, people who couldn’t make it to the meeting left comments and responded to each other in the Google Doc.  Sharing communications practices with a client (and representatives of a client) is a solid foundation for learning more about a rich and complex community like KM4Dev.

One response so far

Oct 08 2013

Reporting on my sabbatical in Shambhala

small__2051500480Last week Nancy White and I were in “the hot seat” for the Networked Learning Conference. We decided to talk about blind spots – those things that are right in front of us but for some reason we just don’t see them. Actions built on untested or aged assumptions. Actions based on our own preferences and perceptions which make no sense to others, yet we count them as “common sense.” 

One way that blind spots become vivid for me is when I change roles or “social place”.  This blog post elaborates on some thinking that started in that “hot seat.”

After some 14 years working on communities of practice and how technology can support them, I needed a break.  Last December, took a break, working pro bono for a very different kind of community of practice.  From then until September I’ve been working for the Portland Shambhala Meditation Center almost full time as the communications director.  Before that I’ve been on the Shambhala Center’s governing council in an advisory, greybeard role for the last 7-8 years, but due to a transition to a new Center Directors it seemed important to take on a “real job”, so I took on the job of Communications Director.  Now that my full-time stint is over, it’s time to think about the blind spots I noticed.

The first is surely too much focus on the front-end of a community’s life. Too much focus on the front end besets consultants (who are hired to help launch a community) as well as people who study communities.  When I took over as communications director, there was a lot of infrastructure in place, with a lot of work patterns defined and my work had to serve an existing community that was shifting.  A lot of infrastructure and assumptions and practices had to go by the wayside before replacements could really be visible. It is clear to me that others will come after me who will take up (and modify) what I have set up.   A longer view changes our perspective – especially when you consider that this particular Shambhala Community has roots going back 2,600 years.

Community observer as author-god. Over the years I’ve seen many masters theses, PhD projects and even formal research agendas that propose to “create a community of practice” in order to study it.  My problem with that strategy is that it distorts what we observe, it sets up the researcher/observer as community arbiter of last resort and it leaves community participants behind after the creator’s degree is granted or the research is completed.  Several times during my sabbatical I had to deal with the fact that people were not interested in anything new or better — they wanted help figuring out passwords! I had to work with their concerns and satisfy their needs; it was clear that the community was in charge, not me.

Community as “a first date”. A major blind spot that working for the Shambhala Center revealed to me is that working to “create community” not only focuses attention on the front end of a community’s life, it implies recruitment and as a result a very tentative commitment.  How much of our understanding of communities, social learning, and engagement is based on communities where the observers are complete outsiders, newcomers, or where we assume  that participation is somehow optional? What effect does it have when we participate only because we are scavenging for insights to be used elsewhere? On the other hand, what difference does it make in our actions or in our understanding when we feel completely identified with a community – when we are involved for the long haul? I think the relevance of these questions depends on whether you think depth of understanding matters. I’ve realized that that for me, when it comes to community, it does.  The interesting communities are the ones that persist.

Technology is a necessary prerequisite. One of the ideas that we explored in Digital Habitats was about technology stewardship as a role for someone who knows enough about a community to be able to see community (and learning) implications of technology choices. This is certainly true for mostly-distributed, mostly-technology-mediated communities. My sabbatical emphasized just how true it is for a mostly face-to-face community, too. The role of technology in a Shambhala Center (and it’s a big one) is primarily to enable face-to-face interaction and meditation practice. In a mostly face-to-face community it is still the case that many people who know the community well don’t see technology-based opportunities and conversely, I think, that many people who know about the technological possibilities don’t see the opportunities for community benefit. Going back to the password-confused and techno-phobic end of the spectrum: the blind spot in my work with communities over the past 14 years is that I tended to come into contact with people who had already crossed big technology hurdles. To some extent, if they couldn’t handle the technology basics, they weren’t “present” or were even invisible. I”m seeing technology is a key thread but only one of many.

Forget about a community-wide platform. I continue to observe and reflect on a myriad of small technology balances and styles of interaction. I am struck by how much the Shambhala Community in Portland uses email. There is a lot of it! In distributed communities so many side conversations in email might threaten community cohesiveness or integrity; but since this community can count on its face-to-face venues and interactions, the centrifugal quality of email is not a problem. I see almost no interest in synchronous meeting technologies; they are the lifeblood of so many distributed communities and relationships. A notable practice that developed before I began my sabbatical has been for the governing council to develop its meeting agendas in a Google Doc, inserting all the “information items,” such as monthly reports, in the document for people to read before the meeting. The first agenda item is always a simple, “Any questions about the reports that people put in the Google Doc?” After that, the conversation can really focus on decisions or questions that really need discussion. That’s a nice example of using technology to remove some of the dross from our “being together.” What seems most important to me is that the Portland Shambhala Community doesn’t respect technology boundaries: people use whatever tools they have at hand with whoever they are working with and they don’t worry much about the rest of the community.

That’s all for the moment.

Photo credit: Unhindered by Talent via photopin cc

One response so far

Jul 06 2013

Adding email notification to Google Forms and Google Spreadsheets

This idea and the code to implement it may be useful in other settings, but I developed it in the context of supporting a community with a largely volunteer organization at its center.   At least in my mind the code comes from a point of view.

Here’s a picture of what it does:


Adding email notifications to a Google Form and a Google Spreadsheet.

Here are the functions that are involved (shown in red), starting from the upper right-hand corner:

  1.  If the person completing the form provides their email address, the Google Script below will send them a snapshot of what they entered into the form in a simple but legible format that I call a “row snapshot.”
  2. Obviously Google Forms are very handy in the context of a community or a small organization because they make it easy to collect information from a large number of un-identified people into an orderly and useful format.
  3. Once data is entered into a Google Spreadsheet, many manipulations, edits, sorts, and analyses are relatively easy.   Controlling access to the Google Spreadsheet so that either few or many people can see or edit the Spreadsheet is easy to do.
  4. People who are looking at the Spreadsheet while logged on (as opposed to anonymous users) can use a drop down menu created by this program to request a “row snapshot” report for any given row.  The first time you run the program, you need to authorize it to send email to you on your behalf.
  5. Once an authorized user has received the row snapshot in an email message, it’s useful for sharing, by forwarding it to other people who are involved or need to be alerted to the status of whatever it is that is represented in that row.

In developing this, I was adapting code from a Google Script tutorial titled “Tutorial: Automating a Help Desk Workflow“.  What was crucial but easy to miss was this little snippet:

Now we need to set up a trigger so that this function will be called each time a form response is submitted. In the Script Editor, go to Resources > Current project’s triggers. Click the link that says ‘No triggers set up. Click here to add one now.’

Once that was working, the code has evolved over the past few months to the point where I think it’s worth sharing:

Comments Off on Adding email notification to Google Forms and Google Spreadsheets

Older Entries »