23.2 Passing C++ memory to Python

To highlight the technique for passing C++ memory natively into Python’s ndarray, we consider first the case of the native Nektar++ matrix structure. In many situations, matrices created by Nektar++ (usually a shared_ptr of NekMatrix<D, StandardMatrixTag> type) need to be passed to Python - for instance, when performing differentiation using e.g. Gauss quadrature rules a differentiation matrix must be obtained. In order to keep the program memory efficient, the data should not be copied into a NumPy array but rather be referenced by the Python interface. This, however, complicates the issue of memory management.

Consider a situation where C++ program no longer needs to work with the generated array and the memory dedicated to it is deallocated. If this memory has already been shares with Python, the Python interface may still require the data contained within the array. However since the memory has already been deallocated from the C++ side, this will typically cause an out-of-bounds memory exception. To prevent such situations a solution employing reference counting must be used.

Converter method

pybind11 provides the methods to convert a C++ type element to one recognised by Python as well as to maintain appropriate reference counting. Listing 23.1 shows an abridged version of the converter method with comments on individual parameters. The object requiring conversion is a shared_ptr of NekMatrix<D, StandardMatrixTag> type (named mat).


Listing 23.1: Converter method for converting the C++ arrays into Python NumPy arrays.
1namespace pybind11::detail 
2{ 
3 
4template <typename Type, typename T> struct standard_nekmatrix_caster 
5{ 
6public: 
7    PYBIND11_TYPE_CASTER(Type, const_name("NekMatrix<T>")); 
8 
9    bool load(handle src, bool) 
10    { 
11        // Perform Python to C++ conversion: returns true if successful, false 
12        // otherwise. 
13    } 
14 
15    // Conversion part 2 (C++ -> Python) 
16    static handle cast( 
17        const Nektar::NekMatrix<T, Nektar::StandardMatrixTag> &src, 
18        return_value_policy, handle) 
19    { 
20        // Perform C++ to Python conversion. 
21    } 
22}; 
23 
24template <typename Type> 
25struct type_caster<Nektar::NekMatrix<Type, Nektar::StandardMatrixTag>> 
26    : standard_nekmatrix_caster< 
27          Nektar::NekMatrix<Type, Nektar::StandardMatrixTag>, Type> 
28{ 
29}; 
30 
31} // namespace pybind11::detail

Firstly we give a brief overview of the general process undertaken during type conversion. pybind11 maintains a registry of known C++ to Python conversion types, which by default allows for fundamental data type conversions such as double and float. In this manner, many C++ functions can be automatically converted, for example when they are used in .def calls when registering a Python class. Clearly the convert function here contains much of the functionality. In order to perform automatic conversion between the NekMatrix and a 2D ndarray, we register the conversion function inside the pybind11 namespace via a template specialisation of type_caster. We also note that throughout the conversion code (and elsewhere in NekPy), we make use of the pybind11 NumPy bindings. These are a lightweight wrapper around the NumPy C API, which simplifies the syntax somewhat and avoids direct use of the API.

The custom caster has two methods, load and cast that perform conversion from Python to C++, and C++ to Python respectively. We now go through each listing.


Listing 23.2: Python to C++ converter for NekMatrix
1bool load(handle src, bool) 
2{ 
3    // Perform some checks: the source should be an ndarray. 
4    if (!array_t<T>::check_(src)) 
5    { 
6        return false; 
7    } 
8 
9    // The array should be a c-style, contiguous array of type T. 
10    auto buf = array_t<T, array::c_style | array::forcecast>::ensure(src); 
11    if (!buf) 
12    { 
13        return false; 
14    } 
15 
16    // There should be two dimensions. 
17    auto dims = buf.ndim(); 
18    if (dims != 2) 
19    { 
20        return false; 
21    } 
22 
23    // Copy data across from the Python array to C++. 
24    std::vector<size_t> shape = {buf.shape()[0], buf.shape()[1]}; 
25    value = Nektar::NekMatrix<T, Nektar::StandardMatrixTag>( 
26        shape[0], shape[1], buf.data(), Nektar::eFULL, buf.data()); 
27 
28    return true; 
29}

For NekMatrix we do not yet have custom C++-side code to handle Python as a data source. The cast routine therefore simply copies the data from the Python handle. The thing to note in this instance is that the class member variable value stores the resulting C++ value as a type Type.

In terms of conversion from Python to C++, we use the following cast function.


Listing 23.3: C++ to Python converter for NekMatrix
1static handle cast( 
2    const Nektar::NekMatrix<T, Nektar::StandardMatrixTag> &src, 
3    return_value_policy, handle) 
4{ 
5    using NMat = Nektar::NekMatrix<T, Nektar::StandardMatrixTag>; 
6 
7    // Construct a new wrapper matrix to hold onto the data. Assign a 
8    // destructor so that the wrapper is cleaned up when the Python object 
9    // is deallocated. 
10    capsule c(new NMat(src.GetRows(), src.GetColumns(), src.GetPtr(), 
11                       Nektar::eWrapper), 
12              [](void *ptr) { 
13                  NMat *mat = (NMat *)ptr; 
14                  delete mat; 
15              }); 
16 
17    // Create the NumPy array, passing the capsule. When we go out of scope, 
18    // cs reference count will have been reduced by 1, but array increases 
19    // the reference count when it assigns the base to the array. 
20    array a({src.GetRows(), src.GetColumns()}, 
21            {sizeof(T), src.GetRows() * sizeof(T)}, src.GetRawPtr(), c); 
22 
23    // This causes the array to be released without decreasing its reference 
24    // count, which we do since if we just returned a, then the array would 
25    // be deallocated immediately when this function returns. 
26    return a.release(); 
27}

This conversion can be done more efficiently, since we may construct a NekMatrix using a reference count of the original matrix, by accessing its Array<OneD, T> object via the GetPtr method. To ensure this array lives for the entire time of the NumPy array, we wrap this into a Python capsule object. The capsule is designed to hold a C pointer and a callback function that is called when the Python object is deallocated.

A new matrix is therefore constructed and passed to the capsule. This will prevent it being destroyed if it is in use in Python, even if on the C++ side the memory is deallocated. The callback function simply deletes the allocated pointer when it is no longer required, cleaning up the memory appropriately. This process is shown in Figure 23.3. It is worth noting that the steps (c) and (d) can be reversed and the pointer created by the Python binding can be removed first. In this case the memory will be deallocated only when the underlying Array<OneD, T> created by C++ is deallocated.


PIC

(a) Nektar++ creates a data array referenced by a shared_ptr

PIC

(b) Array is passed to Python binding which creates a new shared_ptr to the data

PIC

(c) Nektar++ no longer needs the data - its shared_ptr is removed but the memory is not deallocated

PIC

(d) When the data is no longer needed in the Python interface the destructor is called and shared_ptr is removed.
Figure 23.3 Memory management of data created in C++ using shared_ptr and passed to Python. Reference counter shown in green.


Next, the converter method returns a NumPy array using the array object from pybind11. Note that the capsule object is passed as the ndarray base argument – i.e. the object that owns the data. In this case, when all ndarray objects that refer to the data become deallocated, NumPy will not directly deallocate the data but instead release its reference to the capsule. The capsule will then be deallocated, which will deallocate the NekMatrix and thus decrement the counter in the Array<OneD, >. We also note that data ordering is important for matrix storage; Nektar++ stores data in column-major order, whereas NumPy arrays are traditionally row-major. The stride of the array has to be passed into the np::from_data in a form of a tuple (a, b), where a is the number of bytes needed to skip to get to the same position in the next row and b is the number of bytes needed to skip to get to the same position in the next column.

Finally, in order to stop Python from immediately destroying the resulting ndarray, we call the release function on the array, which prevents the reference counter from being decremented when the array is returned.

Testing

The process outlined above requires little manual intervention from the programmer. There are no explicit calls to Python C API as all operations are carried out by pybind11. Therefore, the testing focused mostly on correctness of returned data, in particular the order of the array. To this end, the Differentiation tutorials were used as tests. In order to correctly run the tutorials the Python wrapper needs to retrieve the differentiation matrix which, as mentioned before, has to be converted to a datatype Python recognises. The test runs the differentiation tutorials and compares the final result to the fixed expected value. The test is automatically run as a part of ctest command if both the Python wrapper and the tutorials have been built.

23.2.1 Passing Python data to C++

Conversely, a similar problem exists when data is created in Python and has to be passed to the C++ program. In this case, as the data is managed by Python, the main reference counter should be maintained by the Python object and incremented or decremented as appropriate using incref and decref methods respectively. Although we do not support this process for the NekMatrix as described above, we do use this process for the Array<OneD, T> structure. When the array is no longer needed by the C++ program the reference counter on the Python side should be decremented in order for Python garbage collection to work appropriately – however this should only happen when the array was created by Python in the first place.

The files implementing the below procedure are:

Modifications to Array<OneD, const DataType> class template

In order to perform the operations described above, the C++ array structure should contain information on whether or not it was created from data managed by Python. To this end, two new attributes were added to C++ Array<OneD, const DataType> class template in the form of a struct:

1struct PythonInfo { 
2    void *m_pyObject; // Underlying PyObject pointer 
3    void (*m_callback)(void *); // Callback 
4};

where:

Inside Array<OneD, T>, this struct is held as a double pointer, i.e.:

1PythonInfo **m_pythonInfo;

This is done because if the ndarray was created originally from a C++ to Python conversion (as outlined in the previous section), we need to convert the Array – and any other shared arrays that hold the same C++ memory – to reference the Python array. If we did not do this, then it is possible that the C++ array could be destroyed whilst it is still used on the Python side, leading to an out-of-bounds exception. By storing this as a double pointer, in a similar fashion to the underlying reference counter m_count, we can ensure that all C++ arrays are updated when necessary. We can keep track of Python arrays by checking *m_pythonInfo; if this is not set to nullptr then the array has been constructed throught the Python to C++ converter.

Adding new attributes to the arrays might cause a significantly increased memory usage or additional unforeseen overheads, although this was not seen in benchmarking. However to avoid all possibility of this, a preprocessor directive has been added to only include the additional arguments if NekPy has been built (using the option NEKTAR_BUILD_PYTHON).

A new constructor has been added to the class template, as seen in Listing 23.4. m_memory_pointer and m_python_decrement have been set to nullptr in the pre-existing constructors. A similar constructor was added for const arrays to ensure that these can also be passed between the languages. Note that no calls to Nektar++ array initialisation policies are made in this constructor, unlike in the pre-existing ones, as there is no need for the new array to copy the data.


Listing 23.4: New constructor for initialising arrays created through the Python to C++ converter method.
1Array(unsigned int dim1Size, DataType* data, void* memory_pointer, void (*python_decrement)(void *)) : 
2    m_size( dim1Size ), 
3    m_capacity( dim1Size ), 
4    m_data( data ), 
5    m_count( nullptr ), 
6    m_offset( 0 ) 
7{ 
8    m_count = new unsigned int(); 
9    *m_count = 1; 
10 
11    m_pythonInfo = new PythonInfo *(); 
12    *m_pythonInfo = new PythonInfo(); 
13    (*m_pythonInfo)->m_callback = python_decrement; 
14    (*m_pythonInfo)->m_pyObject = memory_pointer; 
15}

Changes have also been made to the destructor, as shown in Listing 23.5, in order to ensure that if the data was initially created in Python the callback function would decrement the reference counter of the NekPy array object. The detailed procedure for deleting arrays is described further in this section.


Listing 23.5: The modified destructor for C++ arrays.
1~Array() 
2{ 
3    if( m_count == nullptr ) 
4    { 
5        return; 
6    } 
7 
8    *m_count -= 1; 
9    if( *m_count == 0 ) 
10    { 
11        if (*m_pythonInfo == nullptr) 
12        { 
13            ArrayDestructionPolicy<DataType>::Destroy( m_data, m_capacity ); 
14            MemoryManager<DataType>::RawDeallocate( m_data, m_capacity ); 
15        } 
16        else 
17        { 
18            (*m_pythonInfo)->m_callback((*m_pythonInfo)->m_pyObject); 
19            delete *m_pythonInfo; 
20        } 
21 
22        delete m_pythonInfo; 
23        delete m_count; // Clean up the memory used for the reference count. 
24    } 
25}

Creation of new arrays

The following algorithm has been proposed to create new arrays in Python and allow the C++ code to access their contents:

  1. The NumPy array object to be converted is passed as an argument to a C++ method available in the Python wrapper.

  2. The converter method is called to convert a Python NumPy array into C++ Array<OneD, const DataType> object.

  3. If the NumPy array was created through the C++-to-Python process (which can be determined by checking the base of the NumPy array), then:

  4. Otherwise, the converter creates a new Array<OneD, const DataType> object with the following attribute values:

  5. The Python reference counter of the NumPy array object is increased.

  6. If any new references to the array are created in C++ the m_count attribute is increased accordingly. Likewise, if new references to NumPy array object are made in Python the reference counter increases.

The process is schematically shown in Figure 23.3a and 23.3b.

Array deletion

The array deletion process relies on decrementing two reference counters: one on the Python side of the program (Python reference counter) and the other one on C++ side of the program. The former registers how many Python references to the data there are and if there is a C++ reference to the array. The latter (represented by m_count attribute) counts only the number of references on the C++ side and as soon as it reaches zero the callback function is triggered to decrement the Python reference counter so that registers that the data is no longer referred to in C++. Figure 23.5 presents the overview of the procedure used to delete the data.

In short, the fact that C++ uses the array is represented to Python as just an increment to the object reference counter. Even if the Python object goes out of scope or is explicitly deleted, the reference counter will always be non-zero until the callback function to decrement it is executed, as shown in Figure 23.3c. Similarly, if the C++ array is deleted first, the Python object will still exist as the reference counter will be non-zero (see Figure 23.3d).

Converter method

As with conversion from C++ to Python, a converter method was registered to make Python NumPy arrays available in C++ with pybind11, which can be found in the SharedArray.hpp bindings file.


PIC

(a) The NumPy array is created in Python. Note that the NumPy object and the data it contains are represented by two separate memory addresses.

PIC

(b) The C++ array is created through the converter method: its attributes point to the appropriate memory addresses and the reference counter of the memory address of NumPy array object is incremented.

PIC

(c) If the NumPy object is deleted first, the reference counter is decremented, but the data still exists in memory.

PIC

(d) If the C++ array is deleted first, the callback function decrements the reference counter of the NumPy object but the data still exists in memory.
Figure 23.4 Diagram showing the process of creation and deletion of arrays. Reference counters shown in green.



PIC

Figure 23.5 Flowchart describing the procedure for array deletion from either side of the code.


Testing

As the process of converting arrays from Python to C++ required making direct calls to C API and relying on custom-written methods, more detailed testing was deemed necessary. In order to thoroughly check if the conversion works as expected, three tests were conducted to determine whether the array is:

  1. referenced (not copied) between the C++ program and the Python wrapper,

  2. accessible to C++ program after being deleted in Python code,

  3. accessible in Python script after being deleted from C++ objects.

Python files containing test scripts are currently located in library\Demos\Python\tests. They should be converted into unit tests that should be run when Python components are built.