Data Visualization in the Petabyte Age

Paulina M. Paiz
11 min readFeb 17, 2021

In 1758, Carl Linnaeus, also known as “the founder of modern taxonomy,” was exploring the best ways to deal with information overload. As a botanist and zoologist, his tables and charts helped him organize and classify the “confuseo rerum” of the data he collected. Not long after that, Alexander Von Humboldt, a geographer, and William Whewell, a mathematician, decided to leave the columns and the charts in favor of a more horizontal science that would unveil patterns hidden in the data. Linnaeus, Humboldt, and Whewell all used visualization to try to contain information. However, as in most cases, visualizations actually fueled further production of information. Many scientists and academics today are well aware of this phenomenon and thus, support the partial, if not total, abandonment of visualization practices, in favor of augmented analytics. Data visualization, they argue, is not useful in this petabyte age, where the amount of information and relationships is simply impossible to represent in a model.

As we move away from models of knowledge production that strive to find causation, and towards algorithms of correlation, it is true that traditional information visualization might take a back seat. Yet Google still claims that “Data visualization has arguably never been more in-demand.” Why is that? Scientists, designers, and academics have found that the uses of data visualization range from helping us understand what goes on inside algorithms to increasing citizenry engagement and sometimes even producing works of art. In this paper I hope to explore how data scientists and producers of information graphics have adapted to using Big Data and how data visualization is changing vis-à-vis other trends in the data space.

Ironically, data visualizations are stepping in to help smooth the transition towards augmented analytics. It seems like we are not yet ready to trust algorithms and statistics to make the decisions for us. We now seek to not only organize and comprehend nature (like Linnaeus, Humboldt, and Whewell) but also the ways in which algorithms manipulate nature. In fact, computational techniques and procedures are often compared to a “black box” because we cannot understand the ways in which they act on the information we input. As a result, there is a lot of talk about explainable artificial intelligence (XAI), or human-interpretable machine learning. Data visualizations are instrumental in helping advance these kinds of technologies. For example, Google’s “What-If” tool allows people to visualize inference results and showcase some of the biases behind machine learning. Graphical representation is also valuable in the process of correcting inaccurate observations or data points made by machine learning systems. One of the key visualizations that already allows programmers to understand and correct their ML code is “The Embedding Projector.” Going forward, the cognition-oriented perspective of Big Data predicts that we can expect to see data visualization mediating our interactions with data and enhancing our interpretability of algorithms.

It is important to keep in mind, though, that although more easily understandable, data visualizations are not free of error and bias. As Alexander Galloway asserts, “Any data visualization is first and foremost a visualization of the conversion rules themselves, and only secondarily a visualization of the raw data.” The concept of raw data is interesting to explore when discussing error and bias because it implies neutrality and objectivity while in reality there is growing literature arguing that “data is anything but raw.” Seeing raw data as an oxymoron is helpful in understanding how nature is transformed in each layer of knowledge production. If data is not a direct representation of nature, then data visualizations and algorithms cannot be. The collection of data, as well as the production of visualizations, carries with it a set of human decisions that in many ways modify nature. For instance, French statisticians in the 19th century, realized that people cannot easily be reduced to statistics because so much is lost in the process such as culture and context. In a similar way, it is difficult to assert the validity of data visualizations considering that designers decide which information to make relevant and which to discard as noise. The design of visual communications is usually done with (often causal) relationships in mind and there are many ways they can be misleading. The academic and scientific communities should give more careful thought to the ways in which they can correct for error and bias in design, considering how many people currently rely on this medium for information.

Data visualizations are powerful tools of persuasion and communication. They give information shape and form, inviting them to the world of reality. In a recent report, Stephanie Evergreen, a famous graphic designer, said: “If seeing is believing, then visualizing data can exacerbate the fallacies perpetuated by questionable statistics such as spurious correlations.” There is a need to further explore the relationship between objects and data as well as the implications of “data-as-objects” having mass and momentum. Colin Ware, the Director of the Data Visualization Research Lab at the University of New Hampshire, sees the materiality of data as a potential danger: “The problem is that once data is represented as a visual object, it attains a kind of literal concrete quality that makes the viewer think it is accurate.” In addition, it is also dangerous that we look at data as material because that implies it cannot be molded when in reality, the data visualization community is moving towards seeing their work as drafts that are open for evaluation and improvement.

Fernanda Viégas, from Google’s Big Picture Lab, believes “Criticism through redesign may be one of the most powerful tools we have for moving the field of visualization forward.” The idea of engaging in continuous redesign and enhancement is not novel. Both Humboldt and Whewell were well aware of the imperfections of their tableau and cotidal maps, respectively and they acknowledged the need for seeing things as works in progress. Now we have better tools and digital platforms to promote collaboration among designers- IBM’s “Many Eyes,” Tableau Public and Gapminder are just a few examples. On top of this, many of these platforms take advantage of crowdsourcing so they allow anyone to chime in and correct or add to someone else’s work. This culture of collaboration within the field of data visualization will continue to expand because of two important trends: Open Data and Civic Technoscience. With the move towards open data, we can expect to see more people participating in the design and production of data visualizations not only because they will have access to more information but also because the “need to analyze and visualize this information will be more common.” The second trend, Civic technoscience, is defined as “the practice, research, and design space that enables each of us to question the state of things around us and to share that information for public good.” As cultures of civic technoscience become more prevalent, the need for simple tools to create data visualizations is increasing. In order for individuals with no scientific or design background to participate in critical making, it is important that virtual collaboration platforms are established to further the development of graphical information production. Moreover, datavis specialists should share their expertise online so that they are accessible to those who want to contribute to the scientific process.

The ability to graph used to be a display of status and intelligence, what will happen when everyone can do it? One thing is for sure: more people will move closer to the first layers of knowledge production, gaining the ability to conduct rigorous analysis for themselves. By giving the everyday man or woman the power to map and mold the way he or she sees the world, data visualization will change the power landscape. Gone will be the days in which governments and corporations display statistical graphics to govern and control. We will be entering another type of Golden Age for statistics in which visual tools empower citizens to be more informed members of their community who contribute to the development of scientific analysis as well as the advancement of democracy.

The appeal of data visualization and its effectiveness in engaging people is nothing new. In creating his tableau, Humboldt said that “By speaking both to our imagination and our spirit at the same time, a physical tableau of the equatorial regions could not only be of interest to those in the field of physical sciences but could also stimulate people to study it who do not yet know all the pleasures associated with developing our intelligence.” Whewell recognized the advantages of the graphical model just as well. He claimed that “order and regularity are more readily and clearly recognized, when thus exhibited to the eye in a picture, than they are when presented to the mind in any other manner.” And this seems to be backed by data. Danish physicist, Tor Norretranders, did a study called “Bandwidth of the senses” in which he converted the bandwidth of the senses into computer terms and found that sight is the fastest sense in terms of information processing (having the bandwidth of a computer). It is interesting to see all the ways in which people are taking advantage of this powerful medium of communication and continue to innovate it.

With the introduction of computers and digital interfaces, data visualization is changing from having observing subjects to having active users. The concept of data exploration is being completely revamped with graphics that are touch friendly and interactive. The designer is no longer the sole commander of information: now people have the ability to play with data and models. Although this will help in removing some of the biases that designers weave into their visualizations, it will also introduce new kinds of problems because people will modify the visualization to suit their needs, which not always reflect true facts. More research must be conducted to guide both designers and users towards better manipulation of data visualizations. Another way that data visualizations are being transformed in the 21st century is through virtual reality. This technology, which is already being used by Microsoft Business Intelligence, allows users to interact with data visualizations on a whole new level. It would not be surprising to see the addition of gamification more widely implemented into visualizations now that users can fully immerse themselves in them. Designers will continue to explore and invent ways to smooth the transition of information, enhance analysis, and improve retention by taking insights from psychology and neuroscience. Already, there is a growing number of “User-Experience Designers” who are tasked with organizing information so that users have a more seamless experience. As the attention-economy continues to pervade almost every industry, UX designers will have to find new ways to make their visualizations more appealing to the general public.

Information graphics designed for the general public and about the general public have a history of helping people understand how they fit into society. In the 19th century, U.S. Army Office and doctor, Jedediah Hyde Baxter, created a map of hernia prevalence that “crafted an image of a nation.” By blending medical knowledge with political knowledge in an information graphic, he helped make the American population “legible.” About 100 years later, statisticians looking to quantify the nation used information graphics to “canvass a “typical” community” and identify “The Averaged American.” It can be argued that information graphic designers have just as much (or perhaps even more) “creative potential to define societies” than statisticians because they transcribe the numbers to visuals that have the ability to not only reach wider audiences but also evoke deeper thought and feelings. By being both a science and an art, information visualization can help advance questions surrounding the individual’s role in a societal context and what it means to be human.

Trevor Paglen is an American artist who has explored in length these kinds of questions by using his work to blur the lines between science and art. In a current exhibition at the Smithsonian American Art Museum titled “Machine Vision,” Paglen invites his audience to see through the eyes of computer programs and algorithms. The transition from human-seeable to machine-readable images that are not only processed by computer programs but also usually presented in digital interfaces concerns Paglen. He believes that our increasing reliance on digital information and visualizations is leaving humans with less agency in the knowledge production process and machine images with more power to redefine humanity.

Data visualization is taking a new form in the world of art, where they refer to it as “artistic visualization.” Two prominent designers in the field, Fernanda Viegas and Martin Wattenberg, define artistic visualizations as “visualizations of data done by artists with the intent of making art.” This unique approach allows artists to avoid the scientific burden of the search for truth and instead, focus on aesthetics and expression. Even researchers at Google have explored this kind of approach. In a recent study about visualizing pollution, a group of them said: “We tried to take it as far as we could away from pie charts and line graphs. We wanted to make a piece of art, something that would capture people’s attention and get the data in front of a new audience.” Unfortunately, there are often tradeoffs between a visualization’s informational accuracy and its aesthetic appeal. Andrew Vande Moere and Andrea Lau are two scholars who have noted this tension between the informational and aesthetic uses of data visualization and have created a matrix to illustrate it (see figure). Producers of data visualizations are not blind to this tension. Even in the early 1800’s, scientists like Humboldt recognized that “A drawing that by nature is bound to respect scales cannot be done in a very picturesque fashion: all the demands of geodetic precision are contrary to this.” Nevertheless, Fernanda Viega and Martin Wattenberg still think it is worth asking what kinds of insights the artistic community can contribute to the field of information visualization, given that in the end, it is a field rooted in interdisciplinary tradition. They suggest that designers look into the artistic use of advancing a particular point of view and tools to make persuasive graphics. To put it simply, “Traditional analytic visualization tools have sought to minimize distortions, since these may interfere with dispassionate analysis… Should data visualization researchers investigate ways to support making a point, as well as disinterested analysis?” That is a question that is definitely worth asking in a world where the objectivity of facts is continually being questioned.

But even if it is not through art, the data visualization community must continue searching for ways to innovate and adapt to Big Data and emerging technologies. According to Gartner’s Hype Cycle which measures the maturity of technology and its applications, Data Visualization is already in the plateau of productivity. Although this means that we are taking advantage of this tool, we have to be careful that this does not become a plateau in the sense that innovation slows down and the practices behind data visualization stop improving. Research says, however, that this will not be the case. “Historians of science have shown how scientific ideas, theories, and models have historically developed along with the techniques and technologies of imaging and visualization.” Going forward, it is important for more academics and scholars to join in the analysis of data visualization’s applications and societal effects given that current research is mostly spearheaded by the designers themselves. As people find uses for data visualization outside complexity-reduction, such as understanding the black-box of algorithms and producing art, it is safe to say that data visualization is here to stay.

--

--