PhenoPlot employs many visual elements, such as differently sized, coloured and structured objects, to represent multiple dimensions independently of XY coordinates. Supplementary Table 1 lists all PhenoPlot elements that the user can choose for plotting depending on the features measured. Like other visualization tools, such as heatmaps and star and facial glyphs, data scaling is required in PhenoPlot. In the example shown (Fig. 1a), the cell body, nucleus, and perinuclear regions are represented using ellipses. The length and width of each of these objects are represented as the major and minor dimension of the ellipse, respectively. Dimensional variables (that is, length and width) should be scaled together to a 0.1–1 interval to maintain the aspect ratio between different dimensions and implicitly represent additional dimensions (for example, cell width-to-length ratio). In Fig. 1a, the number of nuclei is plotted as subcircles within the nuclear ellipse. The relative area of cell protrusions, such as lamellipodia, is represented on the top of the cell as a half-ellipse whose major dimension is proportional to the relative protrusion area (Fig. 1a). Intensities of the cell, nucleus and perinuclear regions are represented by mapping average intensity values of fluorescent markers to different colour hues.
To increase the number of dimensions that can be represented in PhenoPlot, we devised the concept of ‘Proportional Filling’ that exploits the principle of visual closure where humans can easily perceive the value of partially filled object11. Given a variable scaled between 0 and 1, we represent the feature using a glyph and the value by filling part of the glyph in proportion to the variable value with a specified symbol or colour. For example, if we measure the neighbour fraction (NF), that is, the fraction of the cell border that is in contact with other cells, then we can represent NF as the fraction of cell ellipse border that is thickened or overlaid by a symbol (Fig. 1a). Other representations include the proportion of the cell ellipse that is filled with a symbol, which can be used to represent cellular texture, the number of mitochondria or the number of vesicles (Fig. 1a). Similarly, the proportion of the nucleus ellipse filled with a symbol can be used to represent nuclear texture. We also added three organelle glyphs (ellipse, rectangle and line), where the height of the filled portion of the organelle is proportional to the variable value (Fig. 1a,b). These organelle glyphs can be used to represent an organelle intensity, quantity or texture. In total, eight features are provided that exploit proportional filling.
PhenoPlot allows the customization of different element colours and line styles and the specification of cell positions in a two-dimensional plane. Importantly, many PhenoPlot elements are colour independent, which increases its usability. A figure legend will be drawn automatically using the user input for feature names. Figure 1b shows the appearance of PhenoPlot elements representing different values for 15 variables (Supplementary Table 2). Unlike other visualization methods such as bar charts (Supplementary Fig. 1a), heatmaps (Supplementary Fig. 1b), star glyphs (Supplementary Fig. 1c and Supplementary Table 3) and Chernoff faces (Supplementary Fig. 1d and Supplementary Table 3), PhenoPlot represents particular cellular features intuitively (for example, cell shape, texture features or nuclear morphology).
Profiling breast cancer cell lines morphology with PhenoPlot
To demonstrate the utility of PhenoPlot, we generated PhenoPlots to describe the phenotypes of 19 breast cell lines, which are predominantly derived from human tumours (Supplementary Table 4). For each cell line, nuclear and cell bodies were fluorescently labelled, fixed, and imaged by confocal microscopy (Methods). Nine features were plotted for each cell including the length and the width of the cells and nuclei; the area of cellular protrusions; NF, which measures the fraction of cell border in contact with other cells; cellular ruffliness, which reflects the irregularity of the cell border; and the cellular and nuclear textures, which describe the distribution of pixel intensity in these regions (see Methods). Hierarchical clustering was used to group cell lines with similar morphologies into five clusters (Supplementary Fig. 2). We used PhenoPlot to visualize the average measurements for each cluster and produce intuitive representations based on the measurements of 155,811 cells (Fig. 2a, top row). Using PhenoPlot, we are able to better visualize aspects of cell morphology that are otherwise difficult for the human observer to appreciate. For example, the PhenoPlot of cells in cluster 1 shows that they are round, poorly spread, have high NF, low nuclear texture index and do not form protrusions (Fig. 2a). In contrast, the PhenoPlot of cells in cluster 2 shows that cells have extensive ruffles, low NF and high values of cellular and nuclear texture index. On the basis of the high value of protrusiveness, ruffliness and texture, we infer that the cells in cluster 2 are likely to be highly motile. This notion is consistent with the fact that hs578T and MDA-MB-157 cells are derived from metastatic breast cancer and are known to be invasive12. PhenoPlot shows that cells in cluster 3 are far less ruffly and textured and have higher NF than cells in cluster 2, suggesting that they are less motile. On the basis of their PhenoPlots, cells in cluster 4 appear to have an intermediate phenotype between clusters 1 and 2, while cells in cluster 5 seem to be similar to cells in cluster 3, but less spread. Thus, PhenoPlots provide effective and intuitive pictorial representations of cellular phenotypes that allow the interpretation of quantitative results and their relation to cellular images.
Discriminating between phenotypes of different clusters and making inferences regarding underlying biological process is challenging when using either images of a ‘representative cell’ (which is a cell with features closest to the average of all cells in the cluster), or images containing many cells (Fig. 2a middle and bottom rows and Supplementary Fig. 3). For example, cells in clusters 2 and 3 appear to have similar large, spread, flat shapes as determined by raw images (Supplementary Fig. 3), even though cells in cluster 2 exhibit far more ruffles than cells in cluster 3 (Fig. 2a top and b,c). Moreover, it is difficult for humans to appreciate from raw images that cluster 4 cells are the most ‘textured’ of all cells in the data set (Supplementary Fig. 3). It is also difficult for humans to appreciate the relationships between variables using raw images. For example, cluster 2 and 3 cells are both spread, but the ratio of ruffles to protrusiveness is very different (Fig. 2a top).
By comparison, typical visualization methods such as heatmaps or bar charts are not intuitive representations of phenotypes and are not easy to relate to cell images. For example, heatmaps represent the variables using colour shades of boxes (Fig. 2b), but these boxes do not reflect the visual appearance of the feature. Thus, it is difficult to picture how cells look from a heatmap, especially when many dimensions are displayed. Although bar charts are effective in identifying differences between the values of a few variables, it is difficult for the analyst to interpret a biological phenotype from this representation (Fig. 2c). Furthermore, it is difficult to understand the relationship between variables using heatmaps or bar charts, because features are compared individually.
PhenoPlot is a flexible visualization method
Like other glyph-based approaches, PhenoPlots are independent of XY coordinates. This makes PhenoPlot a flexible tool that can be combined with other visualization methods. Furthermore, extra dimensions can be visualized using the position of PhenoPlots in a two-dimensional plane. For example, projecting PhenoPlots of average measurements for the different breast cell lines in the first two principal components (PCs) of the data facilitates the identification of phenotypic similarities and differences between cell lines (Fig. 3a). Figure 3a shows that cell lines on the left-hand side have epithelial-like shapes (low protrusiveness, less spread, and high NF), cells on the right-hand side have mesenchymal-like shapes (highly protrusive and ruffly, more spread, and low NF), while cell lines with intermediate morphologies are in the middle. Moreover, interesting relationships can be easily identified from this representation. For example, mesenchymal-like cell lines have higher nuclear texture than epithelial-like cell lines except for MCF10A and SUM159, and some of the cell lines with intermediate morphology have increased nuclear texture values. This observation can trigger further experiments to investigate the nature of nuclear texture differences between epithelial and mesenchymal phenotypes. Conversely, a typical scatter plot provides no information on the nature of differences between cell lines (Fig. 3b). Thus, PhenoPlot is a flexible method that can assist data analysis and identification of new hypotheses and complement other analysis and visualization techniques.