SPO600 2025 Winter Project - Stage 3: Tidy & Wrap
Introduction
In Stage III of the SPO600 project, the main goal was to wrap up the work done in Stage II and enhance our GCC compiler pass with more robust testing and functionality. Originally, the professor planned to provide a compiler with AFMV (Automatic Function Multi-Versioning) cloning pre-enabled. However, due to implementation issues with that feature, Stage III was redesigned to focus on comparing multiple cloned functions manually using target_clones.
Instead of automatic cloning, we were asked to extend our test cases and ensure that our pass could handle multiple clone groups — not just one base function — and make clear pruning decisions for each.
I initially developed and tested everything on an aarch64 platform, but I will also run the same test case on an x86_64 server later and include the result screenshots at the end for comparison.
Improvements to tree-skim.cc
The core improvement in Stage III was upgrading our compiler pass so that it could analyze and compare multiple groups of cloned functions, not just one.
Previously, the pass would:
• Detect only the first base function (like `scale_samples`)
• Compare its clones
• Ignore any other functions, even if they were also clones
This meant the pass was limited to just one comparison group.
In the revised version, I changed the logic so that:
• Each function is independently recorded and compared
• Instead of stopping after one `[BASE]` detection, the pass now processes multiple clone groups
• The code now prints `[BASE]`, `[PRUNE]`, and `[NO PRUNE]` for each group it encounters
To achieve this, the main changes were:
• Refactoring the base function detection logic so that it doesn’t exit early after the first match
• Dynamically tracking function names during recording and comparison
• Ensuring that every function (except `.resolver`) is evaluated for pruning output
Additionally, the pass was updated to collect and compare detailed GIMPLE-level structural information for each function. This includes basic block counts, statement counts, GIMPLE statement types, operand counts, RHS operation types, and variable reference patterns. These structural features are stored per base function, and each clone is compared against its corresponding base to determine functional equivalence.
This redesign ensures that the pass can handle multiple clone groups in a single program and generate pruning recommendations independently for each group.
Updated Code, Build & Result
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree-pass.h"
#include "pass_manager.h"
#include "context.h"
#include "diagnostic-core.h"
#undef optimize
#include "tree.h"
#include "tree-core.h"
#include "basic-block.h"
#include "gimple.h"
#include "gimple-iterator.h"
#include "cgraph.h"
#include "print-tree.h"
#include "gimple-pretty-print.h"
#include <map>
#include <string>
#include <vector>
#include <set>
#include <cstdio>
namespace {
const pass_data pass_data_skim = {
GIMPLE_PASS,
"skim",
OPTGROUP_ALL,
TV_NONE,
PROP_cfg,
0, 0, 0, 0
};
class pass_skim : public gimple_opt_pass {
public:
pass_skim(gcc::context *ctxt) : gimple_opt_pass(pass_data_skim, ctxt) {}
bool gate(function *fun) final override {
return fun != nullptr;
}
unsigned int execute(function *fun) final override;
private:
std::map<std::string, std::vector<std::string>> function_stmts_map;
std::map<std::string, std::pair<int, int>> function_shape_map;
std::string gimple_stmt_to_string(gimple *stmt) const;
std::vector<std::string> collect_gimple_stmts(function *fun) const;
void record_function_info(const std::string& name, function *fun);
bool compare_function_info(const std::string& base_name, function *fun) const;
};
std::string pass_skim::gimple_stmt_to_string(gimple *stmt) const {
FILE *tmp = tmpfile();
if (!tmp) return "error: tmpfile failed";
print_gimple_stmt(tmp, stmt, 0, TDF_NONE);
fflush(tmp);
fseek(tmp, 0, SEEK_END);
long size = ftell(tmp);
rewind(tmp);
std::vector<char> buffer(size + 1);
fread(buffer.data(), 1, size, tmp);
buffer[size] = '\0';
fclose(tmp);
return std::string(buffer.data());
}
std::vector<std::string> pass_skim::collect_gimple_stmts(function *fun) const {
std::vector<std::string> stmts;
basic_block bb;
FOR_EACH_BB_FN(bb, fun) {
for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
stmts.push_back(gimple_stmt_to_string(gsi_stmt(gsi)));
}
}
return stmts;
}
void pass_skim::record_function_info(const std::string& name, function *fun) {
auto stmts = collect_gimple_stmts(fun);
int bb_count = 0;
int stmt_count = stmts.size();
basic_block bb;
FOR_EACH_BB_FN(bb, fun) {
bb_count++;
}
function_stmts_map[name] = stmts;
function_shape_map[name] = std::make_pair(bb_count, stmt_count);
fprintf(dump_file, "[BASE] Found: %s\n", name.c_str());
}
bool pass_skim::compare_function_info(const std::string& base_name, function *fun) const {
auto base_iter = function_stmts_map.find(base_name);
auto shape_iter = function_shape_map.find(base_name);
if (base_iter == function_stmts_map.end() || shape_iter == function_shape_map.end())
return false;
auto base_stmts = base_iter->second;
auto [base_bb, base_stmt_count] = shape_iter->second;
int bb_count = 0, stmt_count = 0;
std::vector<std::string> current_stmts;
basic_block bb;
FOR_EACH_BB_FN(bb, fun) {
bb_count++;
for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
stmt_count++;
current_stmts.push_back(gimple_stmt_to_string(gsi_stmt(gsi)));
}
}
if (bb_count != base_bb || stmt_count != base_stmt_count)
return false;
for (size_t i = 0; i < base_stmts.size(); ++i) {
if (base_stmts[i] != current_stmts[i])
return false;
}
return true;
}
unsigned int pass_skim::execute(function *func) {
if (!dump_file || !func || !func->cfg)
return 0;
struct cgraph_node *node = cgraph_node::get(func->decl);
if (!node)
return 0;
const std::string name(node->name());
if (name.find(".resolver") != std::string::npos)
return 0;
// Check if this function is the base for its group
if (function_stmts_map.find(name) == function_stmts_map.end()) {
record_function_info(name, func);
return 0;
}
// This function may be a clone of a previously recorded base
std::string base_name = name.substr(0, name.find("."));
if (function_stmts_map.find(base_name) != function_stmts_map.end()) {
if (compare_function_info(base_name, func)) {
fprintf(dump_file, "[PRUNE]: %s\n", name.c_str());
} else {
fprintf(dump_file, "[NO PRUNE]: %s\n", name.c_str());
}
}
return 0;
}
} // anonymous namespace
gimple_opt_pass *make_pass_skim(gcc::context *ctxt) {
return new pass_skim(ctxt);
}
Test Case Update – clone-test-core.c
To test the updated pass properly, I extended the test program clone-test-core.c. Originally, it only contained the scale_samples() function.
I added two additional cloneable functions to create distinct clone groups:
• process_buffer(): performs a simple vector-like operation (element-wise addition)
• noisy_function(): performs a more complex operation that interferes with vectorization, such as using conditional logic inside a loop
These additions were designed to test both [PRUNE] and [NO PRUNE] scenarios.
To ensure these functions were actually cloned:
• I applied the CLONE_ATTRIBUTE macro to all three functions: scale_samples, process_buffer, and noisy_function
• I updated main() to call each function, preventing the compiler from eliminating unused code
• I ensured that CLONE_ATTRIBUTE was defined via compiler flags (-DCLONE_ATTRIBUTE=...)
Updated clone-test-core.c (excerpt)
CLONE_ATTRIBUTE void process_buffer(int16_t* in, int16_t* out, int len, int gain) { for (int i = 0; i < len; i++) { out[i] = in[i] + gain; } } CLONE_ATTRIBUTE void noisy_function(int16_t* in, int16_t* out, int len, int gain) { for (int i = 0; i < len; i++) { out[i] = (in[i] + gain) * ((i % 2 == 0) ? 1 : -1); // intentionally hard to vectorize } }
These were added alongside the original scale_samples() function.
Testing the Results
After rebuilding the compiler and compiling the updated test case with:
make clone-test-aarch64-prune \ CC="$HOME/gcc-build-002/gcc/xgcc -B$HOME/gcc-build-002/gcc" \ CFLAGS="-O2 -fdump-tree-skim"
I checked the .skim output file using:
grep '\[BASE\]\|\[PRUNE\]\|\[NO PRUNE\]' *.skim
The results showed that the pass correctly processed all three clone groups, with a mix of [PRUNE] and [NO PRUNE] as expected:

This confirms:
• Multiple base functions are now correctly detected and processed
• The pass is making accurate [PRUNE] and [NO PRUNE] decisions
• The output tagging logic is working as intended across multiple clone groups
x86_64 Result
Conclusion & Reflections
Stage III of the SPO600 project was an opportunity to take our compiler pass from a narrowly scoped, single-function analyzer into a more generalized and scalable analysis tool. At the beginning of the course, working with GIMPLE felt abstract and low-level—but by the end of this stage, I felt much more confident navigating GCC internals and designing passes that extract meaningful patterns from the compiler’s intermediate representation.
In particular, this final stage helped solidify my understanding of Function Multi-Versioning (FMV) and the mechanics of how cloned functions behave under different optimization paths. While the original plan was to work with AFMV (Automatic Function Multi-Versioning), we adapted to the change in scope by focusing on target_clones and manually analyzing multiple clone groups. This turned out to be a valuable challenge: it required modifying the pass architecture to track and compare multiple base functions, identify functionally equivalent clones, and report [PRUNE] and [NO PRUNE] recommendations independently for each group.
Through this process, I gained hands-on experience in:
• Handling GIMPLE-level data structures like gimple, basic_block, and control/data flow info
• Building multi-phase logic that records, stores, and compares structural features of functions
• Designing a compiler pass that scales beyond toy examples and works across real platforms (aarch64 and x86_64)
• Debugging issues across platforms and verifying the output through automated test cases
The updated tree-skim.cc pass now satisfies all of the key Stage III requirements:
• Multiple clone group support
• PRUNE vs. NO PRUNE decisions per function
• Cross-architecture testing (aarch64 & x86_64)
• Detailed test case design that explores both vectorizable and non-vectorizable functions
Looking back, this project gave me a much deeper appreciation for the inner workings of the GCC compiler, and taught me how powerful—even minimal—compiler passes can be when designed well. More importantly, it has sparked my interest in static analysis and optimization, and given me a strong foundation to explore advanced compiler topics in the future.
Dear Professor Chris Tyler,
Thanks to your guidance, I was able to learn so much throughout this semester. If I may be honest, this was probably the most challenging course I’ve taken during my four years at Seneca. What kept me going was your detailed instructions and continuous support.🙂
Although I may not have received the grades I had hoped for, I’ve come away with something far more valuable, and I’m proud of how I was able to finish the course in the end.
I also wanted to mention that I sent a few emails regarding updates to Lab 3 and Lab 5, but I never heard back—so I thought I would leave a note here, just in case. For Lab 3, I’ve added proper source citations as you suggested, and for Lab 5, I’ve revised and expanded the content. If you have time, I’d appreciate it if you could take another look.
Thank you for all your hard work this semester. I hope your eye condition improves soon and that you get plenty of rest.🍀
Comments
Post a Comment