Customizing the Sankey Diagram

dave.metza · September 4, 2018, 4:01pm

Hi - I am looking for way to visualize the flow of data through a chronological sequence of categorical axes. Specifically, I want to show the number of people (measure) as they move through various locations (dimensions) from a start point to a finish point. I was able to get basically what I want with the alluvial plot in R (image 1), and am wondering if it is possible to replicate this in Dundas? I have been trying to get this using a Sankey diagram but it will only show the data hierarchically, which causes repeating dimension attributes in the same node (image 2). If I try to use source-target node pairs then it seems to just randomly lay out the nodes ignoring the order (image 3). Any help would be appreciated, as would any suggestions for other visualization types that might work.

image 1 - the desired result achieved in R:

image 2: Dundas Sankey hierarchical layout with repeating attributes

image 3: Dundas Sankey source-target with (apparently) random layout (this is only showing the first 2 of the 4 dimensions)

dave.metza · September 4, 2018, 4:01pm

Update:

I was able to get a bit closer to what I want by remodeling the data so that every node is uniquely labeled (based on the location and its index in the sequence), rather than just source-target pairs.

However, the problem remains of how to have locations ordered horizontally across, such that nodes with the same index are stacked in sequence (0, 1, 2, 3, etc.). Currently it orders locations vertically within each stack, and then makes an attempt to order across, before apparently giving up (as evidenced by all the neglected crap in the red circle). I am then able to manually drag and drop each individual node to put it back in the correct order in view mode, but that's dumb. Am I missing a property here that forces them to stay in order?

Inactive-Member-45034791 · September 4, 2018, 4:01pm

Hi Dave,

With a Sankey diagram, you would need to have data where each value refers to the name of a source of target node. When there is no target set in the Visualization tab of a Sankey diagrams Data Analysis Panel, data will then be grouped by each hierarchy value and can be repeated as you’ve seen. It should be possible to turn any data for a Sankey diagram into source-target pairs. If your data is made up of four hierarchies, you could turn those into three pairs of links which make up a single chain, something like H1-H2, H2-H3, H3-H4.

Also, in the R’s alluvial diagram, each column contains its own values distinct from other values of other columns, but in your case, it seems your data is constructed a bit differently. In your case for instance, if you wanted each “EAU” value to be displayed as a separate node, it would have to be uniquely identified, which can be done by using a pair of hierarchies to identify each node. Something like the screenshot below.

Also in terms of the updated image you posted, it looks like your final column is showing “leaf nodes, which have no outgoing links to the right. The Sankey diagram will line up the leaf nodes to the far right, but your original data shouldn’t contain leaf nodes if it was made up of complete rows of values in four columns.

I hope this provides a little more insight on how you could achieve this.

Thanks,

dave.metza · September 4, 2018, 3:56pm

Thanks for the reply. It seems the Sankey diagram only works if every member of the hierarchy (Source Loc, Target Loc) contains an equal total number of records. In my case I expect (and want) there to be leaf nodes, because not every client necessarily moves through 4 locations, maybe they only go to 3 or 2. You handled that above by adding null locations, which I was hoping to avoid because it kind of ruins the ability to visualize the flow from a start to an end point.

The desired result would include the leaf nodes, but have them aligned with the rest of the diagram - so the sequential nature of this data is obvious to the viewer. If I use the null method like you did above, is there a way via scripting to hide the "null" nodes from displaying on the diagram?

Inactive-Member-45034791 · September 4, 2018, 4:01pm

Hi Dave,

Thanks for the information, you should still be able to achieve the desired result with leaf nodes by using NULL values as each record should contain a reference to its source and target location. In your case, each record should have 4 columns, where some of the columns can be NULL values as you had in your very first diagram. If you’d like to hide the NULL nodes as well, what you could do is set up states on the metric set and set the transparency of the nodes and/or links to 0%. My Final Result would end up looking like below, (original on the bottom, with states configured with 0% transparency above it).

Image title

You would first need to use a NULL replacement transform in the data cube and set the NULL values of your hierarchies to any value of your choosing, as this will be used to set up our states. Once you have done that, you would need to set up states but don’t select a measure value when prompted. You would then select the null replacement value when configuring the state.

Image title

You should notice now that the state is colored in only for the links of the node. What you would need to do is add in a second metric set, and set up states on that metric set as well. This will apply the new state to the nodes.

Image title

When adding the state group as before, the measure would be set to none.

Image title

Your diagram should now be colored correctly for the state, you can then set the transparency of the nodes and/or links to 0% and they shouldn’t be shown on your diagram.

Image title

Thanks,

james.davis · September 4, 2018, 3:56pm

This post has just been a pleasure to watch and read.

Great Question Dave.

Assume answer Trevenn