VSeed, an elegant data composer, transforming complexity into simplicity.
!!!###!!!title=7.2 Pivot Table and Pivot Chart Data Module——VisActor/VTable Contributing Documents!!!###!!!!!!###!!!description=---title: 7.2 Pivot Table and Pivot Chart Data Module
key words: VisActor,VChart,VTable,VStrory,VMind,VGrammar,VRender,Visualization,Chart,Data,Table,Graph,Gis,LLM---
!!!###!!!
Overview
Data processing is one of the core steps of data visualization. This section will introduce how PivotTable organizes and processes data, enabling data to support efficient rendering of PivotTable while also possessing PivotTable data analysis capabilities.
Automatic Organization of Dimension Tree
Background of the Requirement
Using our diagram: \r
Suppose we want to implement such a multidimensional table, generally speaking, the parameters we expect from the business side are:
Well-organized dimension trees RowTree, ColumnTree (similar to timeTree and channelTree) \r
Specific data records under various dimensions and indicators \r
In theory, it can be achieved, but the drawbacks are also obvious: the business side needs to assemble the data into this structure by themselves, which has a high integration cost. We expect the business side to only pass concise Records with some simple configurations, and we can parse the data ourselves and render it into a multidimensional table. For example, the Records passed in are the original data found from the db: \r
Objective: Transform the original dataset `Records` through data processing to obtain a data structure that supports display in pivot table format
Implementation Approach
Analysis
With the above background and objectives, some questions may easily arise: \r
How to generate rowTree, columnTree from raw data?
Answer: Group aggregation. Similar to SQL's group, theoretically we can sort out the values of each dimension from records in a way similar to group (e.g., group aggregation to find the dimension values under platform such as "Taobao" | "JD" | "Douyin").
How to ensure the lowest time complexity and pursue performance when the data volume of records is large?
Approach
Convention for user-provided data & data structure
Collect dimension members (e.g., under the platform dimension there are "Taobao" | "JD" | "Douyin" three members)
Assemble rowTree, columnTree
When rendering, quickly search for the corresponding data of the cell from records (as shown in the figure) \r
Theoretically, based on the known tasks: \r
Traversing the records once can accomplish the task of "collecting dimension members"; based on the collected dimension members and the columns, rows, indicatorKeys passed in by the user, theoretically, it is possible to assemble the rowTree and columnTree.
But how do we know the parent-child relationship of these dimensions? How do I know that the shop dimension is actually a sub-dimension of the platform dimension? \r
When users pass columns, the parent dimension should be sorted before the child dimension, e.g.:
But the issue of "quickly finding the corresponding data from records when rendering" is quite troublesome. Suppose we know the row dimension + column dimension of the cell, we need to implement the getCellValue(col: number, row: number) function. Do we have to iterate over records again? That would be too cumbersome.
The most efficient method: By leveraging the capabilities of a **hash map**, the time complexity of lookup can be reduced to O(1). So how to design the structure of a hash map? \r
In fact, the data area is a two-dimensional matrix, so you can use (row, col) to locate the position of each cell. Therefore, if we have a two-dimensional hash map, its structure is roughly as follows, which can be used to look up cell data.
// HashMap 的第一层 key 为 row,第二层 key 为 col
type HashMap = Record<string, Record<string, IndicatorValue[]>>
// 指标值
type IndicatorValue = {
indicatorKey: string;
value: string;
}
In our requirement, how do we define the structure of a hash map with two layers of keys? To ensure uniqueness, we can use the string composed of the path from root to leaf node in rowTree and columnTree as the key (as shown in the diagram and code below).