SPO600 2025 Winter Project - Stage 2: Clone Group Detection (part3)

Continuing from my previous post, I started working on the next step of my custom GIMPLE pass in GCC — identifying clone function groups via function name patterns like .default, .resolver, and .variant.N.

 

Implementation Plan

1. Iterate over all functions using FOR_EACH_FUNCTION(node) 

2. Retrieve each function’s name via DECL_NAME(node->decl) 

3. Use regex to filter names like .default, .resolver, .variant.* 

4. Group clone candidates using std::map<std::string, std::vector<std::string>> 

5. Output clone groups to dump_file 


Editing tree-skim.cc

To support this logic, I updated tree-skim.cc to include:

#include <map>

#include <string>

#include <vector>

#include <regex>

These headers are essential for:

<map> – to group function clones by base name

<vector> – to store function names within a group

<string> – for safe C++ string manipulation

<regex> – to match clone naming patterns like foo.default or foo.variant.1


I defined the regex as:

std::regex fmv_regex(R"(^(.*)\.(default|resolver|variant\.\d+)$)");

This allows me to extract the base name (e.g., foo) and group all related function clones under it.


Here’s an example of the desired output:

Clone group: foo

  -> foo.default

  -> foo.variant.1


Why is this important?

This setup is the foundation for Stage 2 — I will later compare these functions’ GIMPLE representations to determine whether they are functionally identical (i.e., real clones worth pruning). Grouping first helps scope the comparison efficiently.


Build Problems Encountered

After modifying tree-skim.cc, I ran into a classic GCC macro conflict:

./options.h:311:32: error: expected initializer before ‘.’ token

#define optimize global_options.x_optimize

I wasn’t using optimize in my code, but this macro was likely conflicting with a C++ standard header. I tried:

Ensuring C++17 standard was used (for structured bindings)

Adding #undef optimize before #include "tree.h"


Regex error from libstdc++-v3

After the macro issue, the build failed again due to this:

error: unused parameter ‘__first’ [-Werror=unused-parameter]

error: unused parameter ‘__last’ [-Werror=unused-parameter]

These came from inside <regex> in libstdc++-v3. The compiler was treating unused parameters as errors due to -Werror.


Rather than reconfiguring the entire GCC build to relax -Werror globally, I tried to only compile tree-skim.o with -Wno-error=unused-parameter:

cd ~/gcc-build-001/gcc

make CXXFLAGS="-Wno-error=unused-parameter" tree-skim.o

This worked!


Then I resumed the full build with:Then I resumed the full build using:

cd ~/gcc-build-001

make -j$(nproc)


Verifying That Everything Works

After the build succeeded, I tested my pass like this:

echo 'int foo() { return 42; }' > test.c

./xgcc -B. -O2 -fdump-tree-skim -c test.c

Initially, running without -c caused a linking error (main not found).

But with -c, the .skim dump file was successfully generated!


I verified the output file:

ls test.c.*skim

cat test.c.264t.skim


And saw:

Current pass-skim.cc Code

#include "config.h"


#include "system.h"

#include "coretypes.h"

#include "backend.h"

#include "tree-pass.h"

#include "pass_manager.h"

#include "context.h"

#include "diagnostic-core.h"

#undef optimize

#include "tree.h"

#include "tree-core.h"

#include "basic-block.h"

#include "gimple.h"

#include "gimple-iterator.h"

#include "cgraph.h"


#include <map>

#include <string>

#include <vector>

#include <regex>



namespace {


const pass_data pass_data_skim = {

    GIMPLE_PASS,

    "skim",

    OPTGROUP_ALL,

    TV_NONE,

    PROP_cfg,

    0,

    0,

    0,

    0,

};


class pass_skim : public gimple_opt_pass {

public:

    pass_skim(gcc::context *ctxt)

        : gimple_opt_pass(pass_data_skim, ctxt) {}


    bool gate(function *fun) final override {

        return fun != nullptr;

    }


    unsigned int execute(function *fun) final override;

};


unsigned int pass_skim::execute(function *fun) {

    const char* fname = function_name(fun);


    if (dump_file == nullptr) {

        printf("dump_file is null — dump output not activated.\n");

    } else {

        printf("dump_file is active.\n");

    }


    printf("Function: %s\n", fname);


    int bb_count = 0;

    int gimple_stmt_count = 0;


    basic_block bb;

    FOR_EACH_BB_FN(bb, fun) {

        if (!bb) continue;


        bb_count++;


        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {

            gimple_stmt_count++;

        }

    }


    printf("Basic blocks: %d\n", bb_count);

    printf("GIMPLE statements: %d\n\n", gimple_stmt_count);


    if (dump_file) {

        fprintf(dump_file, "=== Scanning all functions ===\n");


        struct cgraph_node *node;

        FOR_EACH_FUNCTION(node) {

            const char *fn_name = IDENTIFIER_POINTER(DECL_NAME(node->decl));

            fprintf(dump_file, "Function found: %s\n", fn_name);

        }


        fprintf(dump_file, "=== End of function scan ===\n\n");

    }


    if (dump_file) {

        std::map<std::string, std::vector<std::string>> clone_groups;


        struct cgraph_node *node;

        FOR_EACH_FUNCTION(node) {

            const char* full_name_cstr = IDENTIFIER_POINTER(DECL_NAME(node->decl));

            std::string full_name(full_name_cstr);


            std::regex fmv_regex(R"(^(.*)\.(default|resolver|variant\.\d+)$)");

            std::smatch match;


            if (std::regex_match(full_name, match, fmv_regex)) {

                std::string base_name = match[1];

                clone_groups[base_name].push_back(full_name);

            }

        }


        for (std::map<std::string, std::vector<std::string>>::const_iterator it = clone_groups.begin();

             it != clone_groups.end(); ++it) {

            const std::string& base = it->first;

            const std::vector<std::string>& clones = it->second;


            fprintf(dump_file, "Clone group: %s\n", base.c_str());

            for (const std::string& fn : clones) {

                fprintf(dump_file, "  -> %s\n", fn.c_str());

            }

            fprintf(dump_file, "\n");

        }

    }


    return 0;

}


} // namespace


gimple_opt_pass *make_pass_skim(gcc::context *ctxt) {

    return new pass_skim(ctxt);

}


Summary

What worked:

GCC Pass registration

Basic block and GIMPLE analysis

Clone group filtering with regex

Dump output generation via -fdump-tree-skim -c


What I learned:

optimize macro conflicts are subtle and painful — #undef is your friend

GCC’s internal regex headers are sensitive to -Werror

You can isolate problematic files and bypass Werror for them

Use -c to trigger dump file generation


What’s next?

Next step is to compare GIMPLE IR across the clone group members to detect potential equivalence. This will likely involve structural matching and may require IR normalization.



Comments

Popular posts from this blog

SPO600 2025 Winter Project - Stage 1: Create a Basic GCC Pass (part1)

SPO600 2025 Winter Project - Stage 2: GIMPLE Level Clone Analysis and Pruning (part4)

Lab 1 - 6502 Assembly Language