How To Build a Smart Contract Compiler Using Python Solc-x Library

How To Build a Smart Contract Compiler Using Python Solc-x Library

Introduction

About a couple of months ago, I decided to automate the repetitive and less-rewarding part of my audit process. I started building an auditing bot that catches different smart contract bugs and vulnerabilities across various severities (from high to non-critical and gas vulnerabilities) for code4rena, which could also be recommissioned at will to work for other purposes.

Since I was building the bot with Python, the project felt like a good way to solidify my knowledge of Python at the time. I had no idea of the magnitude of the project I was taking on. I struggled; I struggled because the documentation for the Python Solc-x library, a library I needed to complete the project, wasn't well written, and there was a shortage of tutorials/guides that explained how to use the library properly.

For the most part, I had to rely on ChatGPT and the poorly written documentation to figure out how to use the library. There was just one problem with this approach; ChatGPT kept furnishing me with information regarding the previous version of the library, and I had to depend on the documentation for the updated syntaxes. Rather than spending time figuring out other tedious parts of the bot, I had to spend time figuring out how to get the compiler to work and how to get it to generate the information I needed.

This guide exists, therefore, to help other developers figure out how to use the Python Solc-x library to build a solidity compiler, most importantly, so they don't have to spend as much time as I did to get my compiler to work.

To get started with this tutorial, you only need a basic grasp of Python and file path manipulation using the os module. So, without further ado, let's commence by exploring the Python Solc-x Library.

What is Python Solc-x?

Python Solc-x is a powerful library designed to facilitate the compilation of smart contract files and source code. This versatile tool offers a range of functions that enable you to compile both Solidity source code and contract files, providing various output options and reporting any compilation errors encountered during the process.

By integrating Python Solc-x into your projects, you can streamline the compilation of Solidity code and leverage the insights it provides. Whether you are a developer working on blockchain applications or just curious about the inner workings of smart contracts, Python Solc-x is an invaluable resource to have at your disposal. Its user-friendly functions and error-handling mechanisms make it an excellent choice for anyone looking to engage with Solidity and smart contract development.

Python Solc-x Installation

Before you embark on your journey with the Python Solc-x Library, it's essential that you have Python installed on your machine. If you don't already have Python, don't worry; there are plenty of tutorials available to guide you through the installation process.

Once you have Python up and running, you'll likely have access to pip, the handy package installer for Python packages and modules. In case you're using the latest Python version or anything higher than Python 3.4, pip should already be at your service.

You can rely on pip to swiftly install the Python Solc-x Library into your Python environment. Just input the command provided below into your command line interface or Visual Studio Code terminal.

pip install py-solc-x

Once the installation is complete, you'll see an indication in your terminal or interface. With this step behind you, you'll be fully equipped to harness the library's capabilities and embark on your journey to build a Solidity compiler.

Installing the Appropriate Solidity Compiler Version

A developer well-versed in Solidity recognizes the pivotal role of a Solidity compiler. They comprehend its significance, primarily as it pertains to the critical task of structuring a contract into a format that the Ethereum Virtual Machine (EVM) can interpret. An instance of this importance can be found in Remix, an integrated online development environment. Here, if a developer fails to specify the correct compiler or selects an incompatible one, the system promptly raises an error. This parallel principle extends to the Python Solc-x library.

As sophisticated as the library is, it has a major flaw. It is unable to figure out the appropriate compiler for the contract automatically. This means the weight rests on the developer to figure out and install the compiler version manually.

While we'll delve deeper into the specifics of these compilation modes shortly, it's worth noting that this flaw is exclusive to compile_files and doesn't impact compile_source. The rationale behind this distinction is rather straightforward. In the case of compile_source, the contract's source code is submitted as an argument, not as a list of file paths. Consequently, it's a relatively straightforward task to incorporate the compiler version directly within the compile_source function.

In contrast, compile_files handles multiple contract files, meaning the Solidity version may vary with each compilation. As a result, compile_files requires explicit instructions regarding the compiler version to be used for each contract. Here's how you can determine the pragma version for your contract.

def pragma_finder(file):
    try:
        with open(file, 'r', encoding='utf-8') as f:
            lines = f.read().split("\n")
    except Exception as e:
        print(f"ERR: {e}")

    # finding out the solidity compiler version that is used by the contract and the contract name
    for line in lines:
        if line.strip().startswith("pragma solidity "):
            pragma_version = line.split("pragma solidity ")[-1].rstrip(";")

            if pragma_version.startswith(">=") or pragma_version.startswith("<=") or pragma_version.startswith("="):
                pragma_version = pragma_version.split("=")[-1]
                break

            elif pragma_version.startswith("<"):
                pragma_version = find_lower_version(pragma_version.split("<")[-1])
                break

            ## figures out the pragma version to use when a higher version is required
            elif pragma_version.startswith(">"):
                pragma_version = find_higher_version(pragma_version.split(">")[-1])
                break

            # Any version that includes a caret 
            # means the version itself can be used
            elif pragma_version.startswith("^"):
                pragma_version = pragma_version.split("^")[-1]
                break

        else:
            continue

    return pragma_version

The pragma_finder() function is responsible for determining the appropriate Solidity version required to compile the contract.

To begin, the function splits the contract's content into lines, storing them in a variable called lines. It then proceeds to iterate through this content until it locates the solidity version declaration (usually in the form of pragma solidity X.X.XX). Once this declaration is identified, the version is extracted and used for compilation. If the contract specifies a higher Solidity version, the function adeptly handles that situation as well. The pragma_finder() function returns the solidity version once it successfully figures it out.

Additionally, there are several other utility functions incorporated within the pragma_finder() function that contribute to the determination of the appropriate compiler version. For more detailed insights into these functions, check here.

compile_source vs. compile_files

Python Solc-x has two compilation modes mainly differentiated by the input they receive. One of the modes is compile_source(), which compiles smart contracts directly using its source code. On the other hand, we have compile_files(), which accepts an array of contracts (file paths) as input.

import solcx

# Remove functionname an function visibility to ensure that the compiler worksas it should
def compile_contract_source(solidity_code):
    # Figure out the pragma version and install it. Since the contract requires anything above 0.8.0, we'll use 0.8.17
    pragma_version = "0.8.17"

    found = False

    # You can check to see if the version of the compiler you specifie is has been installed 
    # by checking if it is among the compilers installed on your machine
    list_of_installed_compilers = solcx.get_installed_solc_versions()
    for compiler in list_of_installed_compilers:
        if str(compiler) == pragma_version:
            found = True
            break

    if not found: solcx.install_solc(pragma_version)

    # Actual compilations happen here
    # It is good practice to catch possible errors in a try and catch statement
    try:
        ast = solcx.compile_source(solidity_code, output_values=["ast"], solc_version=pragma_version)
        if ast: print(f"ABI of solidity code is printed below\n\n {ast}")
    except solcx.exceptions.solcError as error:
        print(f"ERR: AST COMPILATION MODE ERROR {error}")

    try:
        abi = solcx.compile_source(solidity_code, output_values=["abi"], solc_version=pragma_version)
        if abi: print(f"\nABI of solidity code is printed below\n\n {abi}")
    except solcx.exceptions.solcError as error:
        print(f"ERR: ABI COMPILATION MODE ERROR {error}")
solidity_code = """
//SPDX-License-Identifier: UNLICENSED
pragma solidity ^0.8.0;

contract arrayContract {

    uint[] public nums = [1, 2, 3];
    uint[3] public numsFixed = [4, 5, 6];

    function examples() external {
        nums.push(4);
        nums[2] = 777;
        delete nums[1];
        nums.pop();
    }

    function returnArray() external view returns(uint[] memory) {
      return nums;
    }

    function memoryArray() external pure returns (uint) {
      uint[] memory a = new uint[](5);
      a[1] = 123;
      return a[1];
    }

}
"""

To be able to use Solc-x in your file, you need to import it: import solcx. This grants you access to everything you need to build the compiler.

The compile_contract_source function above compiles raw Solidity code. As previously stated, when using solcx.compile_source(), there's no need to deduce the pragma version. The pragma version was manually specified because we were already aware of the version required by the solidity_code.

Furthermore, the function gathers a list of the compilers that have been previously installed. This step is crucial as it helps avoid redundant installations of compilers. If the required compiler isn't already installed, the function takes care of that.

Finally, the function invokes the compile_source function to compile the solidity_code while incorporating various significant flags, some of which will be elaborated on later, including the output_values flag.

The solcx_version flag, however, allows you to specify the compiler version you intend to employ during the compilation process. In the event of a successful compilation of the contract, the desired output is displayed. Conversely, if any errors occur during the process, an error message detailing the encountered issue is presented.

# Takes in the file path for the soliity contract you are 
# trying to compile
def compile_contract_file(solidity_contract: str):
    pragma_version = pragma_finder(solidity_contract)

    found = False

    # You can check to see if the version of the compiler you specifie is has been installed by checking if it is among the compilers installed on your machine
    list_of_installed_compilers = solcx.get_installed_solc_versions()
    for compiler in list_of_installed_compilers:
        if str(compiler) == pragma_version:
            found = True
            break

    if not found: solcx.install_solc(pragma_version)

    # Actual compilations happen here
    # It is good practice to catch possible errors in a try and catch statement
    try:
        BR = solcx.compile_files([solidity_contract], output_values=["bin-runtime"], solc_version=pragma_version)
        if BR: print(f"Bin runtime of solidity code is printed below\n\n {BR}")
    except solcx.exceptions.solcError as error:
        print(f"ERR: BIN RUNTIME COMPILATION MODE ERROR {error}")

    try:
        abi = solcx.compile_files([solidity_contract], output_values=["abi"], solc_version=pragma_version)
        if abi: print(f"\nABI of solidity code is printed below\n\n {abi}")
    except solcx.exceptions.solcError as error:
        print(f"ERR: ABI COMPILATION MODE ERROR {error}")

solcx.compile_files(), as shown in the code snippet above, has a similar syntax to that of solcx.compile_source. The major difference is that the former takes file paths while the latter takes the contract source code.

A complex contract has imports, and with imports comes the issue of remapping. You need to point the compiler to the right place to look for the imports, hence the need to properly handle remappings.

How to Handle Remappings

Just like in Foundry or many other solidity frameworks, when you have imports that aren't in the form of relative paths, you have to point the contracts in the right direction using remappings.txt or remappings in foundry.toml.

Certainly, this guide won't delve into every detail or address every possible scenario, but it will focus on the essential aspects. While you might encounter more complex remappings in your work, this section will provide a comprehensive overview of the fundamental process you need to understand.

The import_remappings flag for both compile_source and compile_files accepts a list in the format shown below:

"old=new"

The resolve_imports function below peruses the remappings.txt or foundry.toml and constructs a new remapping pointer.

import toml

## Resolves the imports of the contract
def resolve_imports(basePath:str) -> list:
    remap_list = []

    ## Checks if remappings.txt exist in the base path
    if os.path.exists(os.path.join(basePath, 'remappings.txt')):
        # Copies the content of the remappings file into 'remappings' else, it throws an exception/error
        try:
            with open(os.path.join(basePath, 'remappings.txt'), 'r', encoding="utf-8") as content:
                remappings = content.read().split("\n")
        except Exception as err:
            print(f"FILE OPEN ERROR: {err}")
    elif os.path.exists(os.path.join(basePath, "foundry.toml")):
        try:
            content = toml.load(os.path.join(basePath, "foundry.toml"))
            for each in content["profile"]:
                remappings = content["profile"][each]['remappings']
        except Exception as err:
            print(f"FOUNDRY TOML OPERATION ERROR: {err}")

    # For each remapping, split into from_path and to_path, verify that the path exist, then reconfigure the remappings
    for remapping in remappings:
        if len(remapping.split("=")) == 2:
            from_path,to_path = remapping.split("=")
            to_path = to_path.strip()
            full_to_path = os.path.join(os.path.relpath(basePath), to_path)

            if os.path.exists(full_to_path):
                new_map = f"{from_path}={full_to_path}"
                remap_list.append(new_map)
            else:
                print(f"Warning: Path does not exist: {full_to_path}")

        else:
            print(f"Warning: Malformed remapping line: {remapping}")

    # Returns reamp list
    return remap_list

The resolve_imports() function checks for the presence of remappings.txt or foundry.toml file in the directory. If either of the files exists, its content is read into the remappings variable.

The foundry.toml file exhibits a distinct structure compared to the .txt file. This is where the Python toml package proves to be quite useful. It seamlessly transforms the contents of a toml file into a dictionary. The process is simple – you load the file's content using toml.load() and then easily navigate through the dictionary to extract the necessary remappings information.

Once the remappings have been found, you need to point the compiler to the right file path for each import.

forge-std/=lib/forge-std/

Assuming the above is one of the imports, Solc-x will not be able to find the lib folder on its own. You need to transform this into the absolute or relative path, a format which Solc-x will understand.

Since remappings contains all we need, we will begin to loop through its element one after the other, joining the basePath to create the absolute path for the import. This way, Solc-x knows to look for the new remapping once it comes across the old one.

Output Values

Output values are important for many reasons. There are 3 possible outputs, which are: AST, ABI, and bin-runtime

Output in this context is useful for many reasons. Since I was building an auditing bot, I was drawn to the Abstract Syntax Tree(AST). AST is a tree-like data that contains information like the contract type, the solidity version, function declaration details, information about the parameter(s) and return variable(s), etc.

Alternatively, the compilation results can be in the form of an Application Binary Interface (ABI). ABI defines how you call functions in a smart contract and how the data should be formatted when communicating with the contract.

There is a third option - bin-runtime. Bin here means binary; therefore, bin-runtime typically refers to the compiled binary code of the smart contract that can be executed on a blockchain. It's essentially the compiled version of the contract's source code that is ready for deployment and execution on the blockchain.

How you use the data or output values generated depends on what you hope to achieve. In our case, we requested all 3 in the sample code above for learning purposes. Although we didn't do anything with the data returned, it was useful in checking if our contract(s) compiled successfully.

Converting Python Solc-x Generated AST to proper data

Working with ABI and bin-runtime is generally uncomplicated. In contrast, dealing with AST can be a bit challenging. The Abstract Syntax Tree (AST) is essentially a composite structure made up of lists and dictionaries. While the specific data generated for each contract might differ, it typically adheres to a consistent pattern. Typically, developers create a function to traverse the elements within the AST's data and organize this information into a list of dictionaries, making it more user-friendly.

A valuable tool in this context is a Python Formatter. It takes the AST data produced by the library and transforms it into a format that is easier to comprehend. This enables you to identify the path to extract the necessary information. For a practical example of how AST data from smart contract source code was converted into a more readable format, click here.

Conclusion

The Python Solc-x library is a versatile tool for developers venturing into the world of Ethereum smart contracts. Its ability to compile, analyze, and handle Solidity code is invaluable in the development of secure and efficient contracts.

This guide explored the library's core functionalities, the extraction of Solidity compiler versions, resolving complex import structures, etc., with the aim of eliminating the time developers spend trying to figure the library out.

Hopefully, this guide helps someone trying to use Python Solc-x out there ✅

Stay connected and informed by following me on both Twitter and LinkedIn. Who knows, you might also find something that interests you on my GitHub page, so check it out😊.

I regularly share insightful tips and tricks to help you become a more proficient Solidity developer and smart contract auditor.

Continue putting in the work. WAGMI.