Apr 30 2020

A conversation about chats and virtual events

Published by under Uncategorized

My friend Caren and I recorded a conversation about an online (Zoom) event that I hosted today. I described my process and we explored topics such as:

  • Planing events
  • Setting up chat ice-breakers
  • Knitting together the different elements, tools, events across time
  • Creating safe – and sacred – spaces
  • and more

Here’s the video! An un-edited 45 minute discussion. Let me know if it’s useful…


I wanted to demonstrate how to use the automatic YouTube transcription to show a word cloud of our discussion, but for some reason I can’t get at the transcription!

No responses yet

Apr 20 2020

Chat for reflection and inquiry

Published by under Uncategorized

The coronavirus is offering more opportunities to facilitate online meetings and therefore reflect on how I use chat to bring people together and help them connect more deeply.

I have a little chat structure that I’m using again and again, with variations. It involves a series of chat prompts, where everyone is instructed to reflect on a question, write a response, NOT hit enter till everyone else is done, and then hit enter simultaneously. I give people a moment of silence to scan what the group has written. Then I ask one person to pick out a few of the posts and read them out loud. Sometimes a discussion ensues.

Before a meeting I prepare a set of questions in a text editor that I can quickly copy paste into the chat. Having the questions written down like that provides a fence between sections of the chat and helps people see the question that they are responding to. I find it’s best to start with a simple and straightforward question first and gradually move on to more consequential ones that set the stage for the work the group may be engaged in.

Easy warm up question: location, weather, etc.

More substantive question: like "How are you feeling right now?"

More challening question: like What is the big issue we are tackling?

Of course having the enhanced chat transcript to share afterwards is a bridge to future interactions and collaborations. I use tic marks (single quotes) so that the Google Spreadsheet that I use to set up the HTML table won’t throw an error because an initial equal sign usually indicates the start a formula. Multi-line chat entries always need a bit of extra attention to get them in the right column.

All of this effort and protocol is because I think that:

  • This way of using chat allows a group to “see itself” all at once, and the conversation can address individual (or diverging) views and widely held ones at once.
  • Chat records people’s “verbatim” thinking – in their own words, with the terms they use to think.
  • Chat is scannable – much faster than having people fumble to find the mute/unmute button.
  • I think the people are able to get the gist from written statements in other languages better than they can listening to another language.
  • And, finally, you can use a Wordle afterwards! (Like the following one that represents the words on a wikipedia page about Chat.)

Of course I usually can’t help myself, so I often take public notes right in the chat during a conversation — with appropriate fences between sections so that the structure of the conversation is visible.

No responses yet

Apr 16 2020

Putting chat transcripts to work

Published by under Uncategorized

I’ve been writing about how to get more from chat transcripts for a long time. And a recent count in Evernote shows chat history back to 2013 (105 of them containing notes from one-to-one conversations with learning partners). Recently I’ve found a couple methods that make a chat even more useful.

I want to throw away the chaff and keep the wheat. (I’ll use a few lines from a recent chat in this example, showing only the things that I wrote; in most cases there would be postings from many people interleaved.) Here’s an example of what Zoom saves:

14:23:31	 From John David Smith : https://fullcirc.com/2020/03/31/moving-online-in-pandemic-5-this-is-the-time-of-creative-destruction/
14:26:15	 From John David Smith : Distributed leadership is MORE important!
14:27:52	 From John David Smith : Transformation steps…
14:27:52	 From John David Smith : ==========================
14:28:22	 From John David Smith : Being able to observe services…
14:29:47	 From John David Smith : Emotional impact of being in a breakout with someone from Bangkok

Here’s what I want to save:

JDS: 	https://fullcirc.com/2020/03/31/moving-online-in-pandemic-5-this-is-the-time-of-creative-destruction/
JDS: 	Distributed leadership is MORE important!
JDS: 	Transformation steps…
JDS: 	==========================
JDS: 	Being able to observe services…
JDS: 	Emotional impact of being in a breakout with someone from Bangkok

Putting the same information in a table (e.g., in a Google Doc) makes it much more useful:

Chat in a table

In a Google Doc several people might go through the chat and make comments, noting insights from different perspectives:

Chat in a table with comments after the fact

Sometimes I add a column to the table and gather comments in a third column:

Chat in a table with a 3rd column

To do all of this, currently I use a TextMate macro as follows:

{ argument = {
action = 'replaceAll';
findString = '^[0-9\:]*\t From ';
regularExpression = :true;
replaceString = '';
wrapAround = :true;
command = 'findWithOptions:';
{ argument = {
action = 'replaceAll';
findString = ' \: ';
regularExpression = :true;
replaceString = '\t';
wrapAround = :true;
command = 'findWithOptions:';
{ argument = {
action = 'replaceAll';
findString = 'John David Smith: ';
regularExpression = :true;
replaceString = 'JDS: ';
wrapAround = :true;
command = 'findWithOptions:';

Not so pretty. And the formats for different chat systems have changed over the years. 🙁

And that wont’ work for you if you don’t use my text editor. I’ve struggled to figure out how to share this technique of saving and formatting the wheat of a chat transcript. I found that the Unix sed utility, for example, is not standard across platforms when it comes to inserting the tab you need to plop the text you want into a Google Spreadsheet. Then lo and behold, Louis Sweeny, a member of the Liberating Structures community, figured out how to implement a snippet that I’d shared in the Slack space! Put the following snippet into a Google Sheet and copy it down as long as needed:

=regexreplace(Regexextract(REGEXREPLACE(B3,"^From ",),".*?: ")," : ",)

When you put a Zoom chat transcript in a spreadsheet, the above code cleans up your transcript and you have a table that you can cut and paste into a text file. Click here to get a copy of the magic spreadsheet to do this. It comes with instructions and everything!

2 responses so far

Mar 10 2020

A field trip agenda – for better meetings

Published by under Uncategorized

I’m always inspired by Nancy White, and this effort to respond to COVID-19 is a perfect example of why.

It reminded me that I’ve been sitting on a “field trip agenda” that I developed to help people on the Shambhala Process Team hold better meetings on Zoom. I’m sharing it here, as is, since perfection is the enemy of the good. It has a good dose of Shambhala terminology, such as Ground, Path and Fruition (roughly, the context, the process, and the outcome of whatever you are talking about). It owes a lot to conversations with Susan Skjei and Liza Smith. They are the best thinking partners in my world.

You may find the following material useful if you imagine it as notes to yourself — what to do during a 90 minute training and demonstration session.

Overall Goal: (model setting a clear agenda)

  • Quick, easy boost for Shambhala conveners of all sorts
  • It also provides further resources (such as this document)

Agenda – 90 minute Schedule:

Model good practice, Invitation

(This is what I would send out to participants:)

  • Do you find yourself facilitating lots of online meetings without having received much training on how to do them?  Are you anxious about making a meeting really fun and productive? Have you ever struggled to keep people engaged during an online meeting? Have you experienced lots of silence or people talking, but not listening? (ground)
  • Join us for a 90 minute interactive training session that will explore design frameworks and techniques, do some exercises and demonstrate good practices for making meetings fun and productive. (path)
  • Join us on such and such a date.  This webinar will:
    • Build your skills to develop agendas and facilitate online meetings that lead to action.
    • Develop meeting designs that leave people feeling connected and wanting to connect more.
    • Put the technology (Zoom and some other tools) to work for you rather than being a distraction or an obstacle. (fruition)
  • Consider the time zones of your participants and send out an invitation that includes:
    • The topic and purpose of the meeting
    • Who’s invited (and why you should attend)
    • Start and stop time
    • The Zoom information and phone alternatives
  • Model good practice: Meditation – gathering the group and our minds 
    • Purpose: help people settle and allow latecomers join without having to catch up.
    • Instruction: “Turn away from your computer…” 
    • How: Make it short. 3 minutes.  Share a screen explaining what’s happening.  Guided meditation is recommended so that anyone participating in audio-only mode is reassured that something is happening.
  • Model good practice: BRIEF Welcome 
    • Purpose: Welcome and explain the meeting purpose: “In this 90 minute session we are trying to model good practice, provide a framework for effective use of Zoom, and gather some useful tips”.  Could include an overview of the agenda.
    • This session will be recorded (as announced). In most meetings it needs to be a formal agreement step.  Recording would be shared within Shambhala but can’t be restricted. Note that recording to the cloud can become expensive very quickly. If you save the recording on your own machine you can upload it to YouTube for free and you have more control over what’s recorded.
    • We are taking notes in this Google Doc during the meeting.  Multiple participants can help and all should be able to view. Meeting notes template??
    • Important to assess people’s comfort with the tool (e.g., Zoom) by one of these:
      • Have people self-assess and ask for help beforehand if they need it
      • Assume level of comfort and follow up afterward if necessary
      • Poll in real time (good if the group has not met this way before)
      • Do people understand the difference between “Speaker View” and “Gallery View” — and can they swap when they want?
      • Do people know how to identify themselves (by clicking on their name on the lower-right-hand corner of their image)?
  • Model good practice: Check-in 
    • Purpose: get everyone’s voice & practice turning their microphone off and on
    • Announce an arbitrary sequence & call on people; or each person invites the next person to go.
    • Set and enforce time limits. On-the-spot decision.
    • Check-in topic should connect to the meeting purpose and prime the conversation — without being too heavy
    • Check-in question: “Name, Place, what kind of online meetings do you need to run?”
    • For large groups, check-in might be in a breakout
  • Model good practice: Breakouts 
    • Purpose: create small group conversations, give people more opportunities to talk and listen and get to know each other
    • Pose the question as simply as possible, then repeat it and broadcast it during the breakout
    • Timing 8 minutes
    • Groups of 3 (consider different sizes for different purposes)
    • Reconvening & segue 
    • Roles to consider in each breakout group: facilitator, timer, recorder, reporter
    • Question: “What has been your experience of online meetings? Share some highlights and challenges”
  • Model good practice: Chat Debrief: We’re going to use one way of using the chat for group debrief and later we’ll explore another way
    • Purpose: listen to everyone and create a group transcript that can be used later.
    • Wait to hit enter till everyone is finished writing and then all hit enter at the same time and use a gong to indicate “enter”) or free form.
    • Reading out some highlights & reflect.
  • Present: Some tips on meeting design 
    • Think of the meeting’s purpose and see how it sounds when you explain it to a partner.  Model good practice: Thinking back to the meetings that you talked about in your breakout session and chat, what were the different kinds of purposes of those meetings? Why were people coming together?
      • Divergent thinking (brainstorming)
      • Convergent focus on agreement or a specific outcome (decision-making)
      • The feeling of asserting cohesion or “we are a ____ group”
      • The sense of “presencing” (resting in awareness of possible future) is important
      • (Add other goals here…)
    • Meeting organizers need a partner; 
      • You can take turns speaking and managing technical details 
      • or have specific roles
      • Stick to the plan or improvise or respond to what comes up?
    • Model good practice: Write out a script or an agenda and sharing it with participants so they know where they are in the process.  That’s what this Google Doc is.
    • Model good practice: Use screen-sharing skillfully: briefly show a PowerPoint or Google Doc or even a video
    • Screen capture to create a group portrait: 
      • can  be a nice record of “being together”
      • Easy way of taking attendance
      • Remembering names and faces
  • Alternate between gallery and “speaker” (single-person) view – controlled by host and determines what’s recorded
  • Model good practice: Liberating Structures framework or discipline: how have we used these micro-structure design elements today? At each step, consider how to organize
    • a structuring invitation (what a sequence or an individual step is for, what participants can expect);
    • how the “space” is arranged and what materials are needed (“Hollywood Squares” vs. “Speaker mode” vs screen-sharing; using chat; shared Google Doc);
    • how participation is distributed (percentage of the time in different modes of participation — from passive listening, interacting, collaborating, gathering insights);
    • how groups are configured (single speaker, breakouts of different sizes, assigning specific people to specific breakout rooms, sequential speaking like a check-in, speaking all at once, guided meditation or individual note-taking); and,
    • a sequence of steps and time allocation (for the whole and for each chunk). 
  • Your role as facilitator 
    • Your contract with the group: you “protect the conversation” on behalf of the group; that gives you the right to exercise control.
    • Read the room means paying attention to participation verbally, in the chat, providing different ways for people to stay engaged, check points to express discomfort, (sometimes) interpret signals out loud.  Having a partner for this part can be extremely helpful.
    • The danger of being “too helpful” or playing the role of “summarizer” with unconscious power and unconscious biases.  Issues of projection. 
    • Mixing or switching between distinct roles (between facilitator, expert, elder, fool, etc.) can be confusing (to others and to ourselves).
    • Fundamental importance of mindfulness and transparency.
    • Any facilitator (or speaker) whose internet speed might be slow should consider using the phone audio option that Zoom provides illustrated in this screen-shot: 
  • Some other tools to use with Zoom:
  • Model good practice: Breakout / whole group summary sequence
    • Reflection: what are meetings you organize about?  What are we really doing here?
      • Some moments of individual reflection & note-taking (silence): think about an upcoming meeting that you will take part in or facilitate. What is the meeting’s purpose? Can you define it in 1-2 sentences?  
      • Pairs: Share your meeting’s purpose.  Is it clear? Does it make sense? Share one thing you would do to improve the next meeting
      • Groups of four: 1) share your purpose 2) your improvement goal and 3) how does that translate into an agenda? What choices might you make?
    • Back in the whole group. Themes & patterns in what you’ve heard. Feel free to speak or write in the chat or listen
      • Images that have come up
      • Unmute & talk 
      • This is a sense-making practice: Assessing where we are, gathering possibilities, imagining next steps
  • More about using Chats
    • Model good practice: Simultaneous responses and brainstorming organized around a specific question where everyone participates. Facilitators pull out a few themes coming up in the chat.
    • Zoom chat can be a private backchannel
    • Insert markers in the chat at the right time to indicate breaks or changes of topic or meeting format along the lines of:
      • ======================================
      • Core Values Brainstorm
      • ======================================
    • Process and publish the chat transcript afterwards.  Can be the basis of meeting notes or a communiqué.  Append to this agenda document.
      Techniques for making the chat an effective meeting record (have a designated note-taker)
    • Leverage the Zoom Chat transcript for sharing or further use
  • Model good practice: Check-out — everyone share:
    • My key learnings, next steps, suggestions.
    • Question: What’s one practice that I’m going to experiment with?

Some resources (annotated & shared during or after an event)

One response so far

Nov 22 2019

Reconnecting a community — in public

Published by under Uncategorized

I sat down with my friend Howard Silverman to talk about prototyping methods for connecting a community to itself. The particular community of practice that I’m working with connects face-to-face (or doesn’t connect sometimes) but is also connecting via the Internet: via the Presencing Institute‘s U.Lab MOOC, through a global network of similar meditation centers that are in conversations on Zoom, and through these short video interviews.

This word cloud gives a flavor of the 12 minute discussion.

No responses yet

Dec 05 2018

Community and organization intertwine

I’ve been thinking about how community and organization are intertwined, especially when they are interdependent like they are in churches, synagogues, or mission-driven organizations like Amnesty International.  The formation of a process team that’s focused on governance in Shambhala prompts me to write some of my thoughts down.

  • Community and organization are different social entities that represent different ways of participating in our human world.  One of them can do things that the other can’t.  For example, organizations can own assets like websites and other technologies, have payrolls, are bound by law, and have clear accountability. On the other hand, communities can be informal and don’t even exist unless we participate in them,  but they are personal and meaningful in ways that organizations rarely are. 
  • The two interact and we switch back and forth between an organizational and community view often without noticing.  I’ve thought about the interactions a lot and I get confused.  Your mileage will vary.
  • The most important point is that community and organization can support and augment each other, or can hobble each other.  Therefore it’s worth thinking about their interactions and how they are intertwined.

In his book on church governance, Dan Hotchkiss gives a simple and compelling argument for how organization and community interact:

The most important factor in deciding how to organize a congregation for decision-making is its size because no fact about a group of human beings says more ab out it than its size.

Dan Hotchkiss, Governance and Ministry: Rethinking Board Leadership (Lanham, MD: Rowman & Littlefield Pub Inc, 2016), p 99.

In the following table, I lay out some contrasts, describing how each side answers a general question like, “What is it?” In each cell I add in italics what that side can do to support the other.  Afterwards I give some examples where community and organization seem to harm each other.

Participating with an organizational perspectiveParticipating with a community perspective
What is it?An organization is a recognized legal person that has formal rules of operation. Can provide venues and infrastructure for community gathering.A community is a history of clustered relationships and events. Community interactions can keep alive the memories and values that make organizations honest — and valuable to society.
What resources are required?Organizations depend on a larger social or economic system, laws, money, and produce “outcomes”. Organization can scale or extend community. Communities depend on the larger society — a fabric of social relationships. Community can validate organization life in (local) practice.
How are roles defined?Organizational roles are contracted or appointed. Can recognize & formalize community values & provide focus.Community roles evolve, are negotiated informally over time, are based on participation. Community provides a reservoir of committed talent; members return to community after serving the organization.
What separates the inside from the outside? Can buy or sell assets; can sell or procure work externally. Can extend community reach or protect it from external threats.Legitimate peripheral participation enables outsiders to experience community norms and values gradually.  Can bring new vitality into an organization including people, ideas, and resources.
How does it visualize itself?Organization charts and protocols codify relationships and power.  Can simplify or close off intractable debates or conflicts that  suck energy from the community. Stories, events, memories are part of individual sense-making and are shared in community life.  The community’s memory can keep an organization true to its purpose and its conversations can alert the organization to emerging needs.
How is communication organized?Messages go through formal, legitimate channels. Can reduce the noise of community chatter and be purposeful about listening to widely separated perspectives.Conversations are ad hoc, shaped by individual relationships, opportunity, and feeling of relevance. Can provide a grapevine that tells truths that illuminate organizational blind spots.
How does a collective “voice” express something to the world?Formality enables “singing from the same song book.” Can gather a community’s message and broadcast it.Shapes multiple, opposing voices into a dialog. Can add depth and breadth to an organization’s point of view.

Here are a few illustrative stories of negative interactions. 

  • In a story about a young pastor who fired a church organist only to be fired himself, Dan Hotchkiss writes: “Informal networks kill silently, so it is not easy to retrace their steps.  No doubt Gladys, like most church staff members, had a political constituency all her own.  Her supporters did not speak up in the deliberations of the formal church — the first board meeting, where the focus was on her competence as organist.  In that setting, it would have felt out of place to speak of personal affection or the fact that Gladys had provided music for hundreds of funerals and weddings and had woven herself deep into the fabric of the church’s life.  But in the informal congregation, such considerations no doubt dominated the agenda. In this case it was the informal congregation whose priorities won out. Gemeinschaft is more important in small congregations than in large ones, but it never quite goes away!” — Dan Hotchkiss, 2016, pp 100-101.
  • Pedophiles in the Catholic church were bound to each other by codes of silence and enabled by an organization that provided a setting and cover for their activities.  When their activities were exposed and the organization’s complicity in the cover-up was also exposed, the cost to the organization was enormous.  We don’t know details of this story, but these elements must have been there and the costs were real.
  • When an organization must draw leaders from the community that surrounds it, recruitment can’t be just a formal process to fill leadership positions.  In a leadership development project with Juan Carlos de la Puente 2 years ago, we found that Amnesty International’s leadership recruitment and development process had become too procedural and rule-bound, using only “organizational” logic.  Their best leaders were deeply aligned with the AI community, but were also quite critical of organizational bureaucracy. We recommended that they treat leaders in Latin America as part of a community and become more purposeful about befriending prospective leaders to get to know them before proposing a specific organizational roles.

If we participate in an organization or community that depends on the other way of participating, we need to alternate between the two perspectives.   I don’t think there is a formula for balancing community and organization.  You have to be there.

No responses yet

Jun 16 2018

cRaggy 2018: design, feedback & reflections

This blog post describes the cRaggy event  at the June 2, 2018 Cascadia R Conf, its design, the logic behind its design, feedback from participants and reflections on how such an event might be better in the future.

Here’s the pith of the how we learn R: The R ecosystem is a marvel made up of a global cloud of people, their connections, their know-how, and their tools.  Learning in this ecosystem involves choosing between specific pathways: a place and time, with certain people, using specific boundary objects — in some reasonable sequence of steps.  The best instructional, event, or conference design results in increased excitement, inspiration, enjoyment, personal connections, and know-how.  In a way, the event pieces that we string together to make up a conference are just like a bunch of R statements that aren’t useful till we put them together with human intention, skill, and passion.

Down on the ground cRaggy started with not much more than the half-baked idea depicted in this drawing:

From the beginning I was thinking of cRaggy as a sequence of steps strung together to structure individual and collective experience, along the lines of liberating structures, with learning objective as the main goal.

The cRaggy design process was itself a string of collaborations with the conference organizing committee.  They helped by validating the original idea and by building on it to produce the final event. Chester Ismay and Ted Laderas, in particular, had lots of of specific suggestions for datasets, which was a key element of the design.  Chester mentioned that Andrew Bray’s students were doing a lot with local, civic data; and one of Andrew’s suggestions was the BIKETOWN dataset which was the one we ended up using. Chester also put me in touch with Thomas Mock, who’s been running the Tidy Tuesday events.  We borrowed a lot of ideas from Tidy Tuesday and email exchanges with Thomas were very helpful in evolving the final design.  Ted has written up some reflections about the overall conference design.

This year’s cRaggy event

We announced the cRaggy event in January, without very many specifics.  As the conference approached, we published a set of instructions for participants, calling it the cRaggy gRaphics show-and-tell.  Here is the super-simple form that people completed when they submitted their entry on the day of the conference:

cRaggy entries were all posted in one corner of the 360 person capacity room where the conference was held.  The beer and food were served in the same corner at the end of the day. People could stand around discussing the entries during the whole day.

The three entries that received the most votes gave a 5 minute lighting talk at the end of the day:

Design  to Balance Opposing Factors

During the design phase and on the conference day, I was aware that “design with social learning in mind” meant balancing two opposing forces.  This table to suggests how those forces alternated, more or less in chronological order, as a kind of learning peristalsis.

Concentrate, constrain, narrow it down Open up, expand, broadcast
Gather design ideas and suggestions from many people to build on a half-baked idea
Announce the cRaggy event and then the rules early on
Identify hundreds of possible datasets that would be interesting.
Select one dataset that was local, topical, accessible, and the right size Dataset is highly “mergeable” with other datasets because it has “universal keys” (time and place)
Produce a minimal example demonstrating how to access the data Example is important for lowering the barrier to entry
Advertise the cRaggy dataset two weeks before the conference; encourage everyone to participate
Participants pose their own analytical question
Post entries in one corner before 9 am on conference day Last minute entries are acceptable
Entries have contact info, github link
Entries posted near the food & beer
Time in conference schedule to browse entries; everyone invited to vote
Each person has one vote to “hear more” about one entry
Sticky notes and authors available to stimulate conversations
Three submitters contacted to give lighting talks
Lighting talks at the end of the day to share backstory, dead ends, next steps
Follow up on Twitter: #TidyTuesday

Overall feedback from conference participants

In the conference feedback questionnaire, several people said that cRaggy was their favorite part of the conference.  Some said that the lighting talks they liked most were the cRaggy talks. One said, “I didn’t participate in cRaggy this year, BUT I LOVED IT! Please do it again!

Feedback from cRaggy participants

I wrote to the twelve people who submitted an entry and got really thoughtful and interesting feedback from many of them.

Participants agreed that cRaggy was really fun.  Sample comments were:

 “It was a fun, no-pressure way to feel a bit more involved in the conference and see how other people approached the dataset.”

 “I can’t think of anything more fun than exploring data and creating visualizations.”

Participants especially liked the BIKETOWN dataset because:

 “[it] struck a wonderful balance of being interesting, big-but-not-too-big, in pretty good shape tidy-wise (but not perfect) and fun to explore.”

They liked the fact that the dataset had both dates and geolocation features, which made it “really easy to join up with other sets.”

Part of cRaggy’s value was that the dataset forced people to work outside of their usual professional domain.  For example, two different respondents said,

 “I work in anthropology, specifically archaeology, and so it was really fun to branch out to a very different kind of dataset that has time stamps in the minutes and not in the tens or hundreds of years.”

 “I am a transportation professional and found myself overthinking what to do with the data set a lot [and that was good].”

One participant summarized it,

 “… a big value in the event is exposing people to ideas beyond those directly relating to R code they might not come across otherwise.”

cRaggy was a way to encourage people to dive into the R ecosystem.  One participant was impressed with

 “… how helpful and active the R community is in Stack Overflow, GitHub, CRAN, Reddit, etc. In essence, I am super grateful of R’s passionate developers and user base (in real life and online).“

As a bit of an #rstats glutton, I was struck that one very interesting cRaggy entry was from someone who admitted that they weren’t even on Twitter!  Talk about diversity!

Suggestions for next time

The original idea was to share and think about graphics, but clearly participants thought it could go further.  They thought that cRaggy focused “more on presentation and communication than on coding and data analysis.”  Ed Borasky put his finger on the fact that voting missed thoughtful examination of data problems that weren’t as recognizable as flashy graphics.  He said

 “I spent a *lot* of time cleaning the data. See http://rpubs.com/znmeb/biketown.”

Other suggestions included:

 “It would be cool to easily see links to github repos from the other entrants.”

 “Switch to a virtual format – the “paste on the wall” thing really doesn’t cut it.”

Charlotte Wickham had several interesting suggestions:

 “It might also be nice to somehow celebrate the learning side of the event, i.e. each entrant must also provide a sentence describing something new they learnt or tried in the process of entering, that could be displayed independently of the actual entries.”

 “I’d love to see some more support for those who might be on the edge of entering.  I’m not sure what this might look like, but maybe a pre-conference hack event, a online forum (Slack or something), or just a few more people posting starts they’ve made or questions they’d like to answer.  I’d imagine the primary focus would be on encouraging people to post something on the day regardless of where they get to.”

 How can we keep the event approachable and comfortable for people across all sorts of skill levels?

We wanted cRaggy to result in the selection of people who would give a lighting talk, but participants thought that the voting could be improved.

 “I would suggest that voting NOT be publicly presented via stickers. I would use a ballot box or online kind of voting system that’s anonymous to the voters and participants. As a social network analyst, I would posit that there seemed to be a preferential attachment (i.e. “rich get richer”) effect with the stickers.”

 “Have more categories of winners, such as most creative, most artistic visual / graph, most last-minute (maybe), etc.”

I had thought of having different categories of votes, but never quite figured out the logistics.  In the heat of the conference (after all I was a participant first and an organizer second!) I even forgot to record the number votes that each entry received.  Next time I would display the entry form in advance so that people would expect to provide additional information such as

  • How much time did you spend?
  • What was your question?
  • What packages did you use?
  • What did you learn?
  • What would you have done with more time?

Beyond that, the cRaggy idea could evolve by somehow mapping the steps people go through as participants to a model of the steps in a data analysis project, either Hadley Wickham’s model from R for Data Science or something along the lines of John Tukey’s (1982) “Introduction to styles of data analysis techniques” ( PDF) that proposes stringing data analysis steps together along the lines of:

No responses yet

Feb 12 2018

Computing on R

R is not just software.  It’s actually a global organism that grows information: insights, discussions, work methods, human relationships, and open questions as well as a massive amount of software and all the resources that document or support it.  Cesar Hidalgo equates growing information with “computing”, claiming, “It is the growth of information that unifies the emergence of life with the growth of economies, and the emergence of complexity with the origins of wealth.”

It’s a big deal that R code can compute on R code because R code is just as much a “first class thing” as the data we compute on. Both are first class objects, as Hadley Wickham points out in the rlang package documentation. But it’s an even bigger deal that the R organism computes on itself, as well.  Hidalgo explores how social structures process information, “We form social structures to compensate for our limited capacities, and these social structures learn how to process information.”  My argument in this post is that Hadley’s model for data analysis describes computation on data and computation on the R organism. Trying to point to instances of social structure helps organize observations of how information grew at the Rstudio::Conf in San Diego and how we are collectively learning how to process information.

Hadley Wickham's model applied to the R Organism

You can observe a lot just by gathering tweets. I didn’t create any persistent information during the conference. I just soaked it all in, waiting to write up some reflections until I got on the plane home.  Wanting to add detail, I downloaded Tweets with the rstudioconf hashtag.  In that batch, I found two cool efforts to gather and analyze Tweets that were produced during the conference that are worth reading:

When I looked at all the rstudioconf Tweets I had downloaded, I found many gems that had URLs in them and had been retweeted 10 times or more. (Here are the retrieval details.)  Here are some of the examples of people participating in the R organism “computing on itself” at the Rstudio::conf 2018:

Hadley Wickham's diagram with the names of people who compute on R

Import: Open, welcome.  Importing data can be hard work, right? Welcoming new people and bringing in new ideas is often not recognized as the important and challenging work that it is.  Mara Averick performs a valuable service on Twitter as @dataandme.  Her talk at the conference about contributing to the tidyverse  unpacked the process of entering the organism and becoming a contributor to it. Marco Blume talked about using Data Camp to develop R skills and data literacy in everyone at his company who wanted to skill up.  But he was also brining a new idea into the R organism’s conversation with itself: what does a social transformation around data literacy look like?  What makes it happen?  How do you assess progress?  

Tidy: Configure.  Making your data tidy can be a big undertaking. “Tidy” is more or less obvious when you get there, but can be very challenging at the beginning of the process. In JJ Allaire’s keynote about deep learning and TensorFlow in R he makes a huge step toward making TensorFlow look like a native part of R. His talk included a set of tools, a gallery and a book.  It represented a massive effort to bring a whole new domain into “normal R.” And in a remarkable “no bullshit” fashion, he mentioned a recent  paper casting an interesting shadow of doubt over every data scientist’s valiant effort to “clean the data.” A paper on “Scalable and accurate deep learning for electronic health records suggests that at large scale “dirty but complete” data may have more useful predictive value than one that has been “reduced by intelligent cleaning”.  

Transform: Reinterpret. Di Cook explored how to take the “traditional, ubiquitous tools” like the tidyverse and ggplot2 and connect them with another set of ubiquitous tools: randomization and replication.  She argued that graphs are just the result of calculations on data and so evaluating that output requires rigorous methods and more calculation. Here are her slides.  And you can use and build on a package (that she maintains) to do it yourself As if to remind us of the iterative nature of transformation & reinterpretation,  Carson Sievert discussed graphs for exploration using a JavaScript library and referring to his book and R package that is more or less at the opposite end of the spectrum from Di Cook’s talk.

Visualize: Contextualize, metabolize.  Jenny Bryan’s workshop on “What They Forgot to Teach You About R” was really about helping you visualize how you work and think through how to make your work processes more orderly and rational. Rethinking a common tool like Github and adapting it to a data analysis use case is a work in progress. Jessica Minnier’s workshop notes were great!   I was really surprised to notice how much I just “jump outside of R / Rstudio” to do stuff in the course of a project and how that makes my work so much less reproducible. Of course there’s a book about the GitHub part of the data analyst’s work flow

Model: Simplify, standardize. Modeling, to use John Tukey’s words, is separating a fit from the residuals. The R community makes a great effort to name things well and (particularly in the Tidyverse) to have some consistency in package APIs.  Nicholas Tierney‘s ePoster on his naniar package to profile missing data is a perfect example of taking a messy subject and wrapping it up in a neat package with a neat name and a good joke about Narnia thrown in.  Jim Hester’s talk  “You can make your own package in 20 minutes” simplifies the process down to the bare minimum so that we can standardize code and are free to go through the Transform, Visualize, Model cycle again.

Communicate: Share, continue.  The emphasis on that “last mile” of communication and sharing results is something remarkable about the R organism.  It has always given me great confidence and it was much in evidence at the RStudio Conference.  One great examples of that was Yihui Xie’s “Creating Websites with R Markdown and blogdown.  Of course having enough extra compute power to crack a good joke every other slide is also personally inspiring to me.  Petr Simecek‘s collection of all the conference slides organizes the whole thing and sets the ground for the next cycle of computing on R.  I already got a  ticket for next year’s conference.

Recommended reading: Cesar Hidalgo. Why Information Grows: The Evolution of Order, from Atoms to Economies. Basic Books, 2015.  http://isbn.nu/9780465048991

No responses yet

Jun 12 2017

Feedback from the Cascadia R Conference participants

I helped organize the Cascadia R Conference on June 3, 2017.

About 190 people attended the all-day conference.  I volunteered to do the conference evaluation questionnaire and to analyze the results. We adapted a questionnaire that the CSV, Conf, 3 had used and used a Google Form to gather  feedback at the end of the day.  We got 59 responses or  32% of the conference participants, which seemed like an encouraging response rate.

The week after the conference, the organizing committee (which included Chester Ismay, Jessica Minnier, Lilly Winfree, Oliver Keyes, Scott Chamberlain, Ted Laderas, and me) chatted about the open-ended responses and discussed how things went in our Slack channel.   That actually seemed like an excellent, informal sense-making strategy for when an entire committee is data-oriented. This post combines some of their reflections (without specific attribution of specific comments or contributions from them) with a more systematic summarization of the data that I’ve done afterward.  Naturally I had to horse around with the response data in R and produce some graphs to depict what I thought was important. Ted Laderas Wrote up his reflections on the whole project in another post.

In scheduling a full day’s sessions – on a Saturday – one of our goals was to bridge across communities, specifically geographic communities (north and south along I-5).  We succeeded beyond our expectations.  About a third of the respondents came from more than 50 miles away with a bunch of people from more than 100 miles away.

Respondents asked for more social time: we know but probably always need to be reminded that R users are very social. (That’s an essential ingredient of R’s secret sauce.)  Definitely the day’s schedule was action packed, which we thought was a good thing. But, as one participant said, “The lightning talks were kind of rushed”,  as they were supposed to be.  When asked what their favorite thing was about the conference, 22 participants said “Workshops!”:

  • Workshops – 22
  • Lightning talks – 10
  • Meeting people – 7
  • Keynotes – 7

Two days might provide more social breathing room (for everyone except the organizers).  However a two-day format might be a challenge for people driving to Portland from far away.  In the future we could consider having a full “pre-conference” day for workshops and a separate day for talks. (We didn’t really know how popular the workshops would be.)  The useR conference uses the “pre-conference” structure for workshops.  They had 15 minutes for talks in 5 different locations with 3 minutes in between to transfer between talks so our setup for the talks was in line with their practice.

In the future we should just say upfront that this is not a traditional conference.  “To keep the cost of registration low we sacrifice <this> and <this> and <this>.”  And one of those is that you are on your own for housing.  A longer conference might tempt us to try to deal with an “official conference hotel”, but that would probably become a big headache.

Mimicking other conference formats and organizations like useR or the csv,conf is an important strategy for a small group like ours.  Afterward we noticed that the Open Source Bridge has the volunteer thing figured out, for example.  Some of us are going to that to pick up tips and strategies.

Here are the histograms from the new R skimr package, which suggests that everybody thought the location was great, a few people had problems with WIFI, and keynote topics and overall talk quality was very (but not completely) positive.

With really outstanding support from OHSU, rOpenSci, and Rstudio we did pretty well.

Interesting that we had a good mix of R mastery — from beginners to masters.  People who identified themselves as 3’s or 4’s were unusually enthusiastic about the keynote topics.

Respondents who said they thought participating in the conference would lead to future collaborations were more likely to say they would be willing to help organize (or volunteer) or that maybe they would be.  The cross-tabulation is below.  This pattern is clearer when you look at a Tukey median polish with residuals in italics and and the fit in bold.

Willing to help organize?


Met people
I'm likely 








Column fit

Here is the code for fitting a median polish:

xtab <- xtabs(~ will_collab + help_org, data = feedback)

Also, respondents who said they were early or in the middle of their careers more frequently said they would be (or maybe would be) willing to volunteer or help organize a conference in the future.

Some selected comments from the open-ended responses:

  • Amazing conference! love the one day thing, the schedule, and the location!
  • I think the cheap cost and central location really contributed to the crowd size.
  • It would be great to have a keynote about ethics in data science.

Final suggestions seemed to be all over the place:

  • Less talks; more interactive activities
  • Longer
  • Longer breaks
  • More lightning talks, fewer full talks!
  • More talks and less workshops
  • Fewer, more in-depth talks
  • Explicit tracks + more than 2 tracks, maybe? “R in Bio” / “Stat Computing” / ???
  • Have three workshop levels next time – total beginner, intermediate, and advanced.
  • Notify the food carts ahead of time; they were really unprepared to be so slammed, and I think giving them a heads-up would’ve helped everyone

Finally, here are the topics we asked respondents:

  • Satisfaction with Registration
  • Satisfaction with Location/Meeting space
  • Satisfaction with Timing: Please rate the overall distribution of talk schedules and unstructured time
  • Satisfaction with Keynote topics
  • Satisfaction with Talk quality
  • Satisfaction with Conference agenda overall
  • Satisfaction with Quality of the wifi/internet access
  • Satisfaction with Snacks and drinks
  • Please add additional comments on the overall conference organization.
  • Where are you in your career?
  • On your way to R mastery, where are you now?
  • How many miles did you travel to get here?
  • Do you think you will form new collaborations as a result of attending?
  • Overall, the conference _________.
  • I considered the conference a _________.
  • I would like this conference to be held _________.
  • Are you able to help organize or volunteer next time?
  • If you’re able to help next time, please add your email.
  • What were your favorite parts of the conference?
  • What would you change or improve about the conference?
  • Please provide any additional comments below.

Being able to participate in the whole process, from hatching the idea to helping it actually happen, to participating in the whole conference and finally to thinking about what worked and what could be improved afterwards in this blog post was a great experience.  I’d happily do it again!

I’m even more convinced, if I needed to be, that the sociability and community-orientation that’s baked into R is profoundly important.  When we are struggling with a bit of code, gnarly data, or a graph that doesn’t quite look right, we tend to think of ourselves as working alone.  R provides many reminders that we are not alone and that data analysis is a social act.

No responses yet

Feb 06 2017

Dogged pursuit of data quality and use

Published by under pdxdata

Data Dogs notes – Jan 2, 2017 – https://www.meetup.com/Portland-Data-Science-Group/events/236078570/

Challenge question: describe three measurements that you are familiar with at each of these three levels 1) personal, 2) very large scale, and 3) in between.  For each level describe what’s measured, why, how and and who does the measurement.  How does it all add up to good or bad measures?

Group 1 report-out

Personal examples:

  • Observing the health of a great aunt
  • Health data such as using a fitbit and syncing it with a phone
  • Tracking blood donations, keeping track of which arm is used. Noticing how being lazy about collection enters into it.

Large scale:  

  • Google in general.  
  • Noticing how adverts follow us around.  Avoiding them by using a VPN or going incognito.  All of it makes paranoia more admirable.

In between:

  • Have a customer that generates lots of data: so much that paper doesn’t work anymore. Trying an electronic version.  Find that clients collect data but don’t analyze it, don’t want to hear conclusions.  They really want to hear “their story” reflected.
  • Similar to “egoless programming,” it’s a challenge to try to disassociate yourself from the product.  So can look at the data.  
  • Example from the Kim Kaners paper: not collecting metrics because data can be used against you.  
  • Quote from a client: “You can’t trust the data because things have improved [since it was collected].”  If the business going well, maybe you don’t need to improve?
  • Which is most frustrating: getting clear, clean data or getting clients to listen or use the data?
  • Like Web Server statistics.  You can control: what’s on the website, how to advertise.  Control what’s to be measured.  Where allow advertisements.

Group 2 report out

  • A difficult topic: “Usually people have some role in measurement. So psychology always has an impact. Feelings matter in measurement.”
  • Another topic: accuracy vs precision.  Measure something vs estimate.  Is every measurement is a bit of an estimate?
  • Gripe about data we depend but where we can’t influence measurement methods: people don’t reveal their methodology, even though it’s important.  
  • Spent time on last: formalized process, change methods, be more precise, summarize.  Subjective data.  Tracking assessments of hires.  Categorical data.  Journal entries.  Different people using different codes.  Severity codes.  What does a 4 severity mean?
  • When you formalize the process, it seems like you spend a lot of time on the measurement.  
  • GitPrime.com: “Data-driven visibility for modern software teams.” A new githost company that sells you statistics about your developers.  Measuring programming productivity.  “Software teams have never had a shred of hard data to bring into discussions. Forward-thinking organizations are moving beyond subjective measures like tickets-cleared, and evaluating their work patterns based on concrete data. Knowing immediately when an engineer is stuck or spinning their wheels helps managers do their jobs in a way that has never been possible. Measuring things like churn, codebase impact, and true cost of paying down technical debt, allows engineering to demonstrate how they’re meeting the business objectives at hand.”  cdn2.hubspot.net/hubfs/2494207/Content/whitepaper/GP-DataDrivenTeams.pdf
  • Goodhart’s law: “when a measure becomes a target, it ceases to become a good measure.”

Group 3 report out


  • Body weight is a big one.  
  • Getting things done. Todo lists.  Tracked or not.  

Large scale:

  • Statistics around gun violence.  How get the data, understand it.  Measurement problems: there are laws against measurement!   Scraping the net for data.  Washington Post efforts to collect data about gun violence.  
  • Data about electronic components not shared.

In Between:

  • Own data science project: sampling size. Counting criteria. What measuring  affects accuracy, calculations.  More samples means more accuracy but also more time gathering data.  Facing the tradeoffs gathering cell counts & brain samples. Counting standards. A project on dementia.  Looking through a microscope.  Cells picked out.  Candy turtles.  
  • Trying to get out of manually measuring (or manually entering data about) anything, mechanising all data collection.  Practices vary by industry.  In the electronics industry the unit is “defects per million”.  Different in a new ERP system.  
  • Economic incentive to do as little manual entry as possible: it takes time and money.

Group 4 report out


  • Measuring the amount of time spent executing a job during the day.  Getting at “what’s personal efficiency?” How measure it? Difficulties and vagaries of measurement.  Have to remember to measure it throughout the day.  Bottom line: billing clients with it.  
  • Anecdotal (“I had a bad Tuesday”) vs. data-driven decisions.
  • When business doubled, who is wondering about capacity in the factory?


  • How businesses handle item/master data is frequently messy.  People use a field in the system for very different purposes than what was intended or for what others use it.  More communication can lead to more or standardization.  What a field “means” and how you are supposed to use it.  Variant uses of a field can also be clever.
  • An example of data variability reduction: Subway, Inc. studied supplier data provided by its franchisees.  They found that  80% of supplier data was inaccurate across the whole supplier chain.  Now they get purchase amount data from supplier and push it to the individual franchisees to standardize reporting about franchise performance.  

We had a general discussion about assessing causation when direct experimentation is impossible using the “Hill Criteria” that Roger Peng mentioned in a recent podcast: https://en.wikipedia.org/wiki/Bradford_Hill_criteria

Thomas E. Kida, Don’t Believe Everything You Think: The 6 Basic Mistakes We Make in Thinking

  • We prefer stories to statistics
  • We seek to confirm
  • We rarely appreciate the role of chance and coincidence in life
  • We can misperceive our world
  • We oversimplify
  • We have faulty memories


No responses yet

Older Entries »