Heap-item Analysis (Type Analysis)

What is heap item?

Here we first give several definitions for further usage in this section. These definitions can been referred in the paper of rCanary.

  • Heap Item: A heap item is an object containing the heap allocation with a fixed type (layout).
  • Heap-item Unit: A heap-item unit has a PhantomData<T> field with a typed generic parameter and a pointer field to store the value T.
  • Isolated Parameter: An isolated parameter is a stand-alone and typed generic parameter (e.g., T but not &T).

In the type analysis, we need to generate an abstraction for all rust types used in the local crate. Each type has a layout summary to illustrate the data abstraction in the fully initialized state.

The summary comprises two parts:

  1. a boolean value indicating if it contains a heap item;
  2. a boolean vector indicating if each generic parameter is an isolated parameter.

For example, the summary of the vector Vec<T,A> is (1) [0, 1].

Example of type analysis.

Overview

We give the overall framework of type analysis as below.

Framework of type analysis.

Then we will discuss the workflow of each component in this figure.

The visitor method is main pass of the constructor part in type analysis, it will perform several important procedural to determine whether an adt definition (adt-def) will occupy at least one heap allocation, reflecting holding heap-ownership in RAP system.

From the top-down method of our approach, this visitor is the set of several sub-phases which means it contains multiple sub-visitors to make whole method self.visitor() work.

For example, given an AdtDef (like Vec<T>), the result of visitor contains two parts:

  • pt1 Enum: {Owned / UnOwned} indicates whether it will directly have a heap data
  • pt2 Array: [bool;N] indicates whether each generic parameter will have a raw param

Those 2 parts can accelerate heap-ownership inference in the data-flow analysis.

Type Collector

We use an inter-procedural visitor to collect all the types for local crate and the types are after monomorphism. Our goal is to extract all the AdtDefs.

The struct TypeAnalysis implements mir::Visitor to simulate as the type collector.

#![allow(unused)]
fn main() {
pub struct TypeAnalysis<'tcx, 'a> {
    rcx: &'a mut RapGlobalCtxt<'tcx>,
    fn_set: Unique,
    ty_map: TyMap<'tcx>,
    adt_recorder: Unique,
}
}

Note: the type in this phase is Ty::ty rather of Hir::ty.

As we can see, the function visit_ty will flatten all the fields and generics appeared. This procedural can cover almost all the rust types.

#![allow(unused)]
fn main() {
impl<'tcx, 'a> TypeAnalysis<'tcx, 'a> {
    fn visit_ty(&mut self, ty: Ty<'tcx>, ty_context: TyContext) {
        match ty.kind() {
            TyKind::Adt(adtdef, substs) => {
                // ...
                for field in adtdef.all_fields() { self.visit_ty(field.ty(self.tcx(), substs) ,copy_ty_context(&ty_context)) }
                for ty in substs.types() { self.visit_ty(ty, copy_ty_context(&ty_context)); } },
            TyKind::Array(ty, ..) => { self.visit_ty(*ty, ty_context); },
            TyKind::Slice(ty) => { self.visit_ty(*ty, ty_context); },
            TyKind::RawPtr(typeandmut) => { let ty = typeandmut.ty; self.visit_ty(ty, ty_context); },
            TyKind::Ref(_, ty, ..) => { self.visit_ty(*ty, ty_context); },
            TyKind::Tuple(tuple_fields) => {
                for field in tuple_fields.iter() {
                    self.visit_ty(field, copy_ty_context(&ty_context));
                } },
            _ => return,
        }
    }
}
}

The AdtDef are finally stored in the field adt_recorder. After the collecting all AdtDef, we have 4 phases to generate the summary for each type.

#![allow(unused)]
fn main() {
start_channel(|did| self.extract_isolaoted_parameter(did), &dids);
start_channel(|did| self.extract_isolaoted_parameter_prop(did), &dids);
start_channel(|did| self.extract_heap_item_unit(did), &dids);
start_channel(|did| self.extract_owner_prop(did), &dids);
}

Extract Isolated Parameters

Extract params in adt types, the param means one generic parameter acting like T, A, etc...

In the sub-visitor IsolatedParameter, it will visit the given type recursively, and extract all params to construct a mapping.

#![allow(unused)]
fn main() {
#[derive(Clone)]
struct IsolatedParameter<'tcx> {
    tcx: TyCtxt<'tcx>,
    parameters: Parameters,
}
}

Note we only interest in isolated params (T not like *mut T). It lies in one-entire field | recursive in tuple | recursive in array | mixed before.

The implementation can be found in function extract_raw_generic and trait impl impl TypeVisitor for IsolatedParameter.

Here, we give some examples of extracting isolated parameters. Given a struct Example<A, B, T, S>:

#![allow(unused)]
fn main() {
 struct Example<A, B, T, S> {
        a: A,
        b: (i32, (f64, B)),
        c: [[(S) ; 1] ; 2],
        d: Vec<T>,
}
}

The final result for <A, B, T, S> is <true, true, false, true>.

Propagate Isolated Parameter

Extract all params in the adt types like param T and then propagate from the bottom to top.

This procedural is the successor of extracting isolated parameters, and the main idea of IsolatedParameterPropagation is to propagate params from bottom AdtDef to the top as well as updating TypeAnalysis Context.

Note that it will thorough consider mono-morphization existed in the AdtDef. That means the type Vec<T>, Vec<Vec<T>> and Vec<i32> are totally different!!!!

Given a struct Example<A, B, T, S>:

#![allow(unused)]
fn main() {
struct X<A> {
        a: A,
}
struct Y<B> {
    a: (i32, (f64, B)),
    b: X<i32>,
}
struct Example<A, B, T, S> {
    a: X<A>,
    b: (i32, (f64, B)),
    c: [[(S) ; 1] ; 2],
    d: Vec<T>,
}
}

The final result for X<A> is <true>, for Y<B> is <true>, and for Example<A, B, T, S> is <true, true, false, true>.

Extract Heap-item Unit

Next, we need to extract all types that has PhantomData<T> inside which T must be a isolated parameter. Consider these types as a unit to guide the traversal over adt types.

As for one heap-item unit, only struct will contains the information that we want like

#![allow(unused)]
fn main() {
// Example:
struct Foo<T> {
    NonNull<T>,      // this indicates a pointer
    PhantomData<T>,  // this indicates a ownership
}
}

Propagate Heap Item Result