Add a New Query

This chapter shows how to define your own query. Adding a new query typically involves the following steps:

  1. Checking the database and determining what information needs to be loaded.
  2. Implementing the function that takes the information and computes the desired result.
  3. Writing the result into a CSV file.
  4. Registering the query in the queries list.

Each of the following sections discuss each step in more detail. As an example, we use a query that finds the definitions of types that have raw pointers as fields and are not annotated as #[repr(C)] (manager/src/queries/non_tree_types.rs).

Database Structure

Before we can define a new query, we need to understand the database structure. The database schema is defined in two files:

  1. database/src/schema.dl – core data, which is generated by the extractor. Modifications made to this file require rerunning the extractor.
  2. database/src/derived.dl – derived data, or, in other words, data computed by the queries and stored in the database so that it can be reused by other queries.

From these schemas, the procedural macros generate various data structures and functions. For writing queries, the most important data structure is Loader that allows loading the relations stored in the database as Rust vectors. &Loader is passed as an argument to each query.

One very important derived relation is selected_builds that is created from the CrateList.json the by query prepare-builds. Since we can have more than one build of the same crate (for example, if we had among dependencies different versions of a crate or the same crate with different configuration flags), to avoid duplicates in the analysis the selected_builds relation stores which builds should be analysed by queries.

For our query, we are interested in three relations:

  1. types_adt_field – the relation between fields and their types.
  2. types_raw_ptr – the relation that contains all types that are raw pointers.
  3. selected_adts – the derived relation that contains the abstract data types such as enum or struct defined in selected_builds.

Computing the Relation

For your query, create a new module in manager/src/queries/mod.rs. For example:

mod non_tree_types;

The module should contain the function query:

pub fn query(loader: &Loader, report_path: &Path) {
    // Query implementation.
}

Here, loader is the Loader object mentioned in the previous section and report_path is the folder in which we should store the CSV files.

Before we write the result to a CSV file, we will obtain a vector of types that contain raw pointer fields. We can do this via a simple Datalog query (we are using Datapond library):

// Declare the output variable.
let non_tree_types;
datapond_query! {
    // Load the relations by using “loader”.
    load loader {
        relations(types_adt_field, types_raw_ptr),
    }
    // Specify that “non_tree_types” is the output variable.
    output non_tree_types(typ: Type)
    // Define the relation by using a Datalog rule:
    non_tree_types(adt) :-
        types_adt_field(.adt=adt, .typ=typ),
        types_raw_ptr(.typ=typ).
}

To generate the readable CSV file with the information, we need to traverse the list of all relevant adts, check for each of them whether it is one of the types from non_tree_types and if yes, desugar to a human readable format. To make the checking more efficient, we can convert non_tree_types from a vector to a hash set. The code would be:

let non_tree_types: HashSet<_> = non_tree_types.elements.iter().map(|&(typ,)| typ).collect();
let non_tree_adts = selected_adts.iter().flat_map(
    |&(
        build,
        item,
        typ,
        def_path,
        resolved_def_path,
        name,
        visibility,
        type_kind,
        def_kind,
        kind,
        c_repr,
        is_phantom,
    )| {
        if non_tree_types.contains(&typ) {
            Some((
                build,
                build_resolver.resolve(build),
                item,
                typ,
                def_path_resolver.resolve(def_path),
                def_path_resolver.resolve(resolved_def_path),
                &strings[name],
                visibility.to_string(),
                &strings[type_kinds[type_kind]],
                def_kind.to_string(),
                kind.to_string(),
                c_repr,
                is_phantom,
            ))
        } else {
            None
        }
    },
);

Finally, we can write the results to the CSV file:

write_csv!(report_path, non_tree_adts);

The results will be written to a file ../workspace/reports/<query-name>/<iterator-variable>.csv.