Donato Ricci spoke with Pedro Miguel Cruz about his conception and practice of data visualization.This interview was conducted as an extension of seminar session #6, held on 15.05.14, where Cruz spoke about “Visualizing Complexity.”
Donato Ricci [DR]: Would you describe yourself as a computer scientist or a designer? Do you prefer one to the other? If not, why not? Or perhaps you consider yourself a storyteller?
Pedro Miguel Cruz [PMC]: That goes into the core definition of infoviz as an intersection of several fields. I started in Physics Engineering so I had some training in dealing in solving analytical problems with analytical tools. But in the end, what I wanted was to solve concrete problems, with concrete tools and solutions. Concrete in the sense that I can point to an image and tell you “that’s the solution”. In informatics engineering, my background is more on an engineering perspective of computer science, so I cannot say that I’m a computer scientist. I often use solutions of computer science problems when building a visualization, but I rarely innovate upon those solutions (think about standard algorithms or computer graphics techniques – nevertheless I already used some solutions that could be considered new from a technical perspective in computer graphics and scientific visualization, but I don’t invest in having those solutions working as general purpose solutions outside of the context that they were created for). So I see my reasonable knowledge in computer science and graphics more as a tool. Where I innovate, maybe, is in the amalgam of techniques that I bring together to problems in different domains where they aren’t generally used.
I think it’s fair to be called a storyteller. The story comes from several choices that you make. First of all, if I’m working on a subject of my own choosing (rather than as a designer on another person’s project) then I’m making very direct editorial decisions about which aspects or issues of the subject I consider a priority to treat and treat in a way that best brings that issue to life for the reader/viewer. This involves decisions about which data dimensions to highlight and what’s the emphasis to give to these dimensions by assigning them visual representations. Furthermore the story is carried by the rhetoric of the visualization; mainly the designer can push a more normative or explicit discourse than might otherwise be present through more standard approaches. Furthermore I also make very concrete choices on an execution level in order to render such stories and discourses very clear. And in this sense design plays its role as a process of message clarification. You can say that I’m a designer and a storyteller. You can say that I’m a designer of stories (because the story is the shape that I give to a series of “events”, the data).
DR: How do you explain what you do to your family?
PMC: This is funny because if I go with “I do information visualization” they don’t have a clue. Even talking about data, design, discourses… But if I address what I do by talking about social or historical matters, and then tell them that I can “show” them so they can better understand, they pretty much get it (of course after showing the examples). They understood my projects enough that they could start asking really tough questions about them and their construction. And that, I never thought it would happen.
DR: Could we say that part of visualizers’ work is to observe and describe the shape of social phenomena? Do you have any particular strategy for exploring different visualization solutions that lead you to the most compelling solution? Is there always “one” solution that is the most compelling and how do you decide?
For example, the choice to disentangle a network from a graph to better grasp the data? In other word how do you describe your way of working? Is it linked somehow to what Moritz Stefaner labels as bootstrapping?
PMC: Finding a shape to social or other phenomena, giving it form, is definitely part of a visualizer’s work. The strategy that I use depends on the nature of the project. If it has very concrete objectives defined, then there are often solutions used in the same domain that work pretty well and can be used. Or often you just merge several solutions in the same visualization in order to have complementary views of the data. On the contrary, if the project is on a subject of my own choosing, I actively avoid using an already seen approach in visualization to that problem, and for that matter, I also try not to make the core of the visualization about an analytical standardized graphical strategy to depict the dataset in general. As for if it is the most compelling that is always an open debate. Even when choosing from a catalogue of standardized graphical methods, one can argue that you have several that are equally effective. When working again on your own material rather than an external assignment, well I can at least say that the solution I choose is the one most compelling to me.
I’m not familiar with the bootstrapping strategy from Moritz, so I’m not sure I completely get it, but if it refers to building a set of basic visualizations in order to help you figure out the data and then work on a next step in order to build a more elaborate, compelling one, then for sure. It’s part of the visualization building process to have already at least basic static figures that illustrate the data for you and point in directions where you could extract the most valuable narratives from data, or point in a direction where you know that the visualization model that you are thinking will accommodate that same dataset (for example, you have several solutions that won’t work with a very high number of data points, so you have to study how many you are displaying at the same time, or if the extremely elaborate solution that you came up with to depict relations among data points won’t make the visualization unrenderable–and unreadable for that matter–, from a computer processing perspective, for instance).
This brings me to another matter: details, execution and aesthetics. That is for me of the utmost importance, to work every detail of the visualization with the same care that you would if you were tailoring a suit—animations should be smooth, transitions should feel natural, it should run smoothly in realtime, if it’s 30fps, it’s 30fps, not 15, there is a layout, things should be aligned, choosing the typography is also crucial since it sets the tone of the visualization, there is a typographical grid; color theory exists and it should be used, etc. On the aesthetics side, I’m talking about clarity and elegance. Not every color is to be used in every visualization, neither every shape. I look for two things: either to show clear patterns from visual complexity as if we had a synthesized unique visual form that portrays such patterns, but where we can search for each of its tiny single constituent parts (I often feel that I’m working with textures and patterns, and trying to emphasizes differences in their density for example); or by applying Swiss attention to complexity–it’s grid-based, it’s ordered, it’s elementary in the shapes, it’s so much easier to achieve visual elegance through it.
But let me tell you a bit more about my approach: I research, research, research, because I’m always interested in making something new! I also rarely use straightforward approaches. Let’s put some salt in all those boring graphs, please. Even with network visualizations, I find myself half asleep because the graph itself is already an abstraction, and they so often have that same hairy ball aesthetics… I try to make the visualizations closer to what I think the imaginary representation people intuitively have for a given dataset. That’s my favorite approach: e.g. if you have a person traveling from A to B, I don’t draw a static line from A to B. I make a person actually move from A to B. This takes me to the core of my approach: I try to build systems that react to data. You have agents/actors in the visualization and they react to data, while having other properties and constraints that are not data related, but more related to that imaginary that I previously mentioned, and if those additional behaviors imply that they don’t always portray the data exactly the same way, well, that just makes things more interesting. I create agents and behaviors, and I set them free, and there they are feeding or their little world of data while enabling us to look at that same data just by observing their behavior. In this sense I often say that my approaches are nature-inspired.
DR: More and more trans-disciplinary teams are set up to observe and explore complex social phenomena. It is my impression that the more designers that are involved in these teams the more they learn more about the analysis than about data visualization. Furthermore, the more they are side by side with other disciplines the more the visualization become standardized so one could ask: “in data visualization, is innovation possible?” So then, what is innovation in data visualization?
PMC: Ahah! That is so true. When you have a large team, normalization reigns. Because there are so many insights that you want to guarantee that you can’t with only a strong graphical concrete approach–in the end you are left you with a series of standardized ones that can be used to decode all those discourses that were inputed by that large team during the process, so that everyone is happy. Furthermore trans-diciplinary teams are often not very keen in very bold approaches, or approaches that they are not familiar with.
Of course there can be innovation in dataviz! But you need a vision. And you need to choose if you are doing business reports, or visualization that is to be consumed by a very large audience in a short amount of time while engaging them. Naturally you cannot convey every vicissitude in data, but you can open a door for the awareness of their presence or directly to their exploration. There should be a vision, that may or not work, that may or not change along the way with the team’s input, but it shouldn’t lose its essence. And I’m talking about a vision because that’s the thing that can create some room to innovate, or innovation will be dragged down by the mighty forces of normalization. When innovating you take some risks, you are not sure that it will work, and you are well aware that some of the chosen solutions are not the best from an analytical perspective. But in the end: did you make the dataset more interesting than the dataset itself? Did you engage a large audience and create awareness of that dataset? For the “what is innovation in dataviz” question, you can look at it from several ways. For example, all those ways that exist to make a network layout. They are there, but they have been a long time ago, but only recently we see them more synthesized. They are more often not new, than just freshly cooked. Other times they were just buried in the past, but they were not new. But their application to a new domain problem? That might be new and that might constitute the innovation per se. I hope that in the near future we will be able to naturally address this question just like the way that we do for “what is innovation in graphic design?”. It’s about history, applications to concrete problems and philosophies. Now, we can also talk about approaches to data visualization, as tendencies that shape the field and generate what we perceive as innovation in data visualization. For example, you should have already noticed that I try to make my visualizations ludic and playful in order to engage the public. But this approach has clear gamification roots, that I haven’t yet fully implemented. Full featured gamification of data viz (and I can refer to very simple games) could be a major milestone in the field.
DR: A lot of these trans-disciplinary teams are dealing with the ‘second computational turn’ in the social and cultural researches. Here two main empirical approaches could be identified: the Big Data as opposed to the Digital Methods.
The first one focuses on very large data-sets and often characterized as ‘data-driven’. It is mainly exploratory in orientation, tending to identify patterns where the nature of the patterns recognized may seem of secondary importance compared to a general demonstration of the potential analytic capacity of this type of research.
By contrast, Digital methods rather highlights the formatted nature of much digital data. It highlights there is no such thing as ‘raw data’. In practice, data and analysis cannot be distinguished in any easy or straightforward way. In other words, is to a large extent driven by research design (formulating good research questions, delineating of source sets, developing a narrative and findings).
Is there in your experience any evident way of presenting visually these two ways of dealing with the data, the analytical one and the interpretative one? In my experience the Big Data approach often leads to an anecdotical use of the visualization while the Digital Methods tends to “use” the visualization as argumentative devices. Does this make any sense to you?
PMC: Yes, I completely agree. You have some very poor uses of visualization (or sometimes no visualization at all) when dealing with some datasets and favoring only their analytical description (with some simple drawings, here and there…). We are talking about two different ways of working, seeming even that they employ different types of professionals. Of course that “big data approach” will typically lead to poor representations. When choosing a narrative you have to apply certain constraints, perhaps leave some aspects of the data out of the picture, and not everyone likes to do this a priori. It may also be much harder to do since data could be less structured than the one being used on the “digital methods approach”, and then it is harder to extract semantic contexts from it. What you can do is to tag the data yourself, structure it yourself. But even this will have you making certain assumptions that not everyone is comfortable doing so early in the process. Of course the visualizations in “digital methods” are richer, more visually elaborate, but less complex in the problem delineation. They say more to people. But let me tell you, sometimes you have certain subjects or data, that will always be contextualized as “big data”, and these subjects shouldn’t be treated using the “digital methods” approaches–the complexity of the visualization may actually alienate the interest of a public you are trying to reach.
DR: Both in crafting an anecdote and an argument, the design choices implied in doing visualization lead to the notion of the researcher as an author. How do you make this role visible in your visualization, if it is possible? How, as an author, do you make your design choices in connection with your foreseen public?
Santiago Ortiz, in one of his workshop, asked the students to find a way to visualize two numbers and they ended up with more than 40 ways, all equally valid. Is the notion of authorship as linked to a principle of responsibility the way to escape the urge to evaluate visualization for example in ergonomic terms as the cognitive charge or the speed of decoding information from their visual representation?
PMC: Of course it is possible. You have several author roles in the process of visualization, you choose a story (and extract it from data), you choose an audience and you even setup those additional visual cues that add to your discourse/story but are not necessarily in the data at all. This is visible in the visualizations that I do because you have there clear graphical and functional constraints embrace and solutions that are purposely tailored for that dataset and hardly would work with others on different subjects.
Well, it seems you have more than 40 ways to represent two numbers, but are they equally valid considering your audience and story? If you take the semantic nature of those numbers you will see they are not. So here you are already making choices where what is left to know is if the choice of the story and audience are also yours.
As for relating authorship with the lack of scientific analysis of the perceptual effectiveness of a visualization, I do not think it should be seen like that at all. I can tell that those types of empirical evaluations are not my approach, but they are needed, and I certainly do mine based on the knowledge of others’ analysis. The importance here is not to obliterate all that we already know about rules for the perceptual decoding of information on the posture that we don’t want to compromise our “artist’s integrity”… First things first. Information first. You are not an artist, you are an author of a very well defined discourse that you want to communicate, a discourse that you logically built from a dataset, and the author of a design that clearly and effectively communicates that discourse.
Pedro Miguel Cruz, is an independent designer and currently enrolled as a PhD student in the Doctoral Program for Information Science and Technology at the University of Coimbra in Portugal.