SPO600 2025 Winter Project - Stage 2: Clone Group Detection (part3)
Continuing from my previous post, I started working on the next step of my custom GIMPLE pass in GCC — identifying clone function groups via function name patterns like .default, .resolver, and .variant.N.
Implementation Plan
1. Iterate over all functions using FOR_EACH_FUNCTION(node)
2. Retrieve each function’s name via DECL_NAME(node->decl)
3. Use regex to filter names like .default, .resolver, .variant.*
4. Group clone candidates using std::map<std::string, std::vector<std::string>>
5. Output clone groups to dump_file
Editing tree-skim.cc
To support this logic, I updated tree-skim.cc to include:
#include <map>
#include <string>
#include <vector>
#include <regex>
These headers are essential for:
• <map> – to group function clones by base name
• <vector> – to store function names within a group
• <string> – for safe C++ string manipulation
• <regex> – to match clone naming patterns like foo.default or foo.variant.1
I defined the regex as:
std::regex fmv_regex(R"(^(.*)\.(default|resolver|variant\.\d+)$)");
This allows me to extract the base name (e.g., foo) and group all related function clones under it.
Here’s an example of the desired output:
Clone group: foo
-> foo.default
-> foo.variant.1
Why is this important?
This setup is the foundation for Stage 2 — I will later compare these functions’ GIMPLE representations to determine whether they are functionally identical (i.e., real clones worth pruning). Grouping first helps scope the comparison efficiently.
Build Problems Encountered
After modifying tree-skim.cc, I ran into a classic GCC macro conflict:
./options.h:311:32: error: expected initializer before ‘.’ token
#define optimize global_options.x_optimize
I wasn’t using optimize in my code, but this macro was likely conflicting with a C++ standard header. I tried:
• Ensuring C++17 standard was used (for structured bindings)
• Adding #undef optimize before #include "tree.h"
Regex error from libstdc++-v3
After the macro issue, the build failed again due to this:
error: unused parameter ‘__first’ [-Werror=unused-parameter]
error: unused parameter ‘__last’ [-Werror=unused-parameter]
These came from inside <regex> in libstdc++-v3. The compiler was treating unused parameters as errors due to -Werror.
Rather than reconfiguring the entire GCC build to relax -Werror globally, I tried to only compile tree-skim.o with -Wno-error=unused-parameter:
cd ~/gcc-build-001/gcc
make CXXFLAGS="-Wno-error=unused-parameter" tree-skim.o
This worked!
Then I resumed the full build with:Then I resumed the full build using:
cd ~/gcc-build-001
make -j$(nproc)
Verifying That Everything Works
After the build succeeded, I tested my pass like this:
echo 'int foo() { return 42; }' > test.c
./xgcc -B. -O2 -fdump-tree-skim -c test.c
• Initially, running without -c caused a linking error (main not found).
• But with -c, the .skim dump file was successfully generated!
I verified the output file:
ls test.c.*skim
cat test.c.264t.skim
And saw:
Current pass-skim.cc Code
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree-pass.h"
#include "pass_manager.h"
#include "context.h"
#include "diagnostic-core.h"
#undef optimize
#include "tree.h"
#include "tree-core.h"
#include "basic-block.h"
#include "gimple.h"
#include "gimple-iterator.h"
#include "cgraph.h"
#include <map>
#include <string>
#include <vector>
#include <regex>
namespace {
const pass_data pass_data_skim = {
GIMPLE_PASS,
"skim",
OPTGROUP_ALL,
TV_NONE,
PROP_cfg,
0,
0,
0,
0,
};
class pass_skim : public gimple_opt_pass {
public:
pass_skim(gcc::context *ctxt)
: gimple_opt_pass(pass_data_skim, ctxt) {}
bool gate(function *fun) final override {
return fun != nullptr;
}
unsigned int execute(function *fun) final override;
};
unsigned int pass_skim::execute(function *fun) {
const char* fname = function_name(fun);
if (dump_file == nullptr) {
printf("dump_file is null — dump output not activated.\n");
} else {
printf("dump_file is active.\n");
}
printf("Function: %s\n", fname);
int bb_count = 0;
int gimple_stmt_count = 0;
basic_block bb;
FOR_EACH_BB_FN(bb, fun) {
if (!bb) continue;
bb_count++;
for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
gimple_stmt_count++;
}
}
printf("Basic blocks: %d\n", bb_count);
printf("GIMPLE statements: %d\n\n", gimple_stmt_count);
if (dump_file) {
fprintf(dump_file, "=== Scanning all functions ===\n");
struct cgraph_node *node;
FOR_EACH_FUNCTION(node) {
const char *fn_name = IDENTIFIER_POINTER(DECL_NAME(node->decl));
fprintf(dump_file, "Function found: %s\n", fn_name);
}
fprintf(dump_file, "=== End of function scan ===\n\n");
}
if (dump_file) {
std::map<std::string, std::vector<std::string>> clone_groups;
struct cgraph_node *node;
FOR_EACH_FUNCTION(node) {
const char* full_name_cstr = IDENTIFIER_POINTER(DECL_NAME(node->decl));
std::string full_name(full_name_cstr);
std::regex fmv_regex(R"(^(.*)\.(default|resolver|variant\.\d+)$)");
std::smatch match;
if (std::regex_match(full_name, match, fmv_regex)) {
std::string base_name = match[1];
clone_groups[base_name].push_back(full_name);
}
}
for (std::map<std::string, std::vector<std::string>>::const_iterator it = clone_groups.begin();
it != clone_groups.end(); ++it) {
const std::string& base = it->first;
const std::vector<std::string>& clones = it->second;
fprintf(dump_file, "Clone group: %s\n", base.c_str());
for (const std::string& fn : clones) {
fprintf(dump_file, " -> %s\n", fn.c_str());
}
fprintf(dump_file, "\n");
}
}
return 0;
}
} // namespace
gimple_opt_pass *make_pass_skim(gcc::context *ctxt) {
return new pass_skim(ctxt);
}
Summary
What worked:
• GCC Pass registration
• Basic block and GIMPLE analysis
• Clone group filtering with regex
• Dump output generation via -fdump-tree-skim -c
What I learned:
• optimize macro conflicts are subtle and painful — #undef is your friend
• GCC’s internal regex headers are sensitive to -Werror
• You can isolate problematic files and bypass Werror for them
• Use -c to trigger dump file generation
What’s next?
Next step is to compare GIMPLE IR across the clone group members to detect potential equivalence. This will likely involve structural matching and may require IR normalization.
Comments
Post a Comment