SPO600 2025 Winter Project - Stage 2: Plan and Approach (part1)

Introduction

In Stage 2 of the SPO600 project, the goal is to implement a GCC compiler pass that detects cloned functions generated through Function Multi-Versioning (FMV) and determines whether they can be pruned. This process involves analyzing GIMPLE Intermediate Representation (IR) to compare the structural and semantic equivalence of these functions.


This blog post provides an overview of the project scope, technical strategy, tools used, and implementation phases, based on extensive lecture notes, in-class guidance, and project documentation.



Project Objective

Create a GIMPLE-level analysis pass that:

1. Identifies cloned functions by their FMV naming conventions.

2. Extracts and compares GIMPLE representations of these functions.

3. Outputs a pruning recommendation based on structural equivalence.



High-Level Workflow

The project can be broken into three major parts:


A. Find Cloned Functions

- Functions created by FMV often have suffixes like `.resolver`, `.default`, or `.variant` (e.g., `foo.resolver`, `foo.default`).

- We will scan all functions using `FOR_EACH_FUNCTION` to detect and group these clones.

- The comparison target will typically be `.default` vs `.resolver` versions.


B. Extract GIMPLE IR and Compare

- For each identified pair:

  - Extract GIMPLE statements using `FOR_EACH_BB_FN` and GIMPLE iterators.

  - Use `gimple_code(stmt)` to get numeric IDs for statement types.

  - Use `gimple_code_name[...]` for human-readable statement names.

- Compare the two functions' GIMPLE code sequences (e.g., `8, 8, 6, 4, 10`).

- Normalize variable names (like `_5` vs `_17`) based on their type and usage order to prevent false mismatches.


C. Output Pruning Recommendation

- If two functions are determined to be functionally equivalent:

  - Output a message like `PRUNE: foo`

- If they are not equivalent:

  - Output `NOPRUNE: foo`

- These messages should be written into a diagnostic dump file for the pass.



Implementation Details

- Use a custom GIMPLE pass registered near the end of the GIMPLE pass pipeline (after most optimizations).

- Use macros like:

- `FOR_EACH_FUNCTION` to iterate all known functions.

- `FOR_EACH_BB_FN` to traverse basic blocks in each function.

- `gimple_code`, `gimple_code_name`, and `print_gimple_stmt` to analyze GIMPLE structure.

- Normalize variable names using a simple type-based renaming strategy (e.g., `_int0`, `_int1`).



Tools and Files

- Main implementation in `tree-skim.cc` (or similarly named pass file).

- Register the pass in `passes.def`.

- Compile using `make -j` in the GCC build directory.

- Test using `.tgz` test cases provided on the SPO600 server.

- Use diagnostic dumps and `-fdump-tree-<passname>` to verify output.



Blog Series Plan

This post is part of a multi-part documentation series:

1. Stage 2 Overview and Planning (this post)

2. Function Clone Identification (in progress)

3. GIMPLE Extraction and Normalization

4. Statement Comparison Logic

5. Prune Recommendation Output and Test Results

6. Reflection and Next Steps



Summary

This stage is not just about writing a working pass—it's about understanding GCC's internal IR (GIMPLE), navigating FMV behavior, and developing automated reasoning logic to determine code equivalence. The upcoming posts will dive deeper into each phase of the implementation process.


Let’s get started!


Comments

Popular posts from this blog

SPO600 2025 Winter Project - Stage 1: Create a Basic GCC Pass (part1)

SPO600 2025 Winter Project - Stage 2: GIMPLE Level Clone Analysis and Pruning (part4)

Lab 1 - 6502 Assembly Language