To highlight the technique for passing C++ memory natively into Python’s ndarray
, we
consider first the case of the native Nektar++ matrix structure. In many situations, matrices
created by Nektar++ (usually a shared_ptr
of NekMatrix<D, StandardMatrixTag>
type)
need to be passed to Python - for instance, when performing differentiation using e.g. Gauss
quadrature rules a differentiation matrix must be obtained. In order to keep the program
memory efficient, the data should not be copied into a NumPy array but rather be
referenced by the Python interface. This, however, complicates the issue of memory
management.
Consider a situation where C++ program no longer needs to work with the generated array and the memory dedicated to it is deallocated. If this memory has already been shares with Python, the Python interface may still require the data contained within the array. However since the memory has already been deallocated from the C++ side, this will typically cause an out-of-bounds memory exception. To prevent such situations a solution employing reference counting must be used.
pybind11
provides the methods to convert a C++ type element to one recognised by Python
as well as to maintain appropriate reference counting. Listing 23.1 shows an abridged version
of the converter method with comments on individual parameters. The object requiring
conversion is a shared_ptr
of NekMatrix<D, StandardMatrixTag>
type (named
mat
).
1namespace pybind11::detail 2{ 3 4template <typename Type, typename T> struct standard_nekmatrix_caster 5{ 6public: 7 PYBIND11_TYPE_CASTER(Type, const_name("NekMatrix<T>")); 8 9 bool load(handle src, bool) 10 { 11 // Perform Python to C++ conversion: returns true if successful, false 12 // otherwise. 13 } 14 15 // Conversion part 2 (C++ -> Python) 16 static handle cast( 17 const Nektar::NekMatrix<T, Nektar::StandardMatrixTag> &src, 18 return_value_policy, handle) 19 { 20 // Perform C++ to Python conversion. 21 } 22}; 23 24template <typename Type> 25struct type_caster<Nektar::NekMatrix<Type, Nektar::StandardMatrixTag>> 26 : standard_nekmatrix_caster< 27 Nektar::NekMatrix<Type, Nektar::StandardMatrixTag>, Type> 28{ 29}; 30 31} // namespace pybind11::detail
Firstly we give a brief overview of the general process undertaken during type conversion.
pybind11
maintains a registry of known C++ to Python conversion types, which by default
allows for fundamental data type conversions such as double
and float
. In this
manner, many C++ functions can be automatically converted, for example when
they are used in .def
calls when registering a Python class. Clearly the convert
function here contains much of the functionality. In order to perform automatic
conversion between the NekMatrix
and a 2D ndarray
, we register the conversion function
inside the pybind11
namespace via a template specialisation of type_caster
. We
also note that throughout the conversion code (and elsewhere in NekPy), we make
use of the pybind11
NumPy bindings. These are a lightweight wrapper around the
NumPy C API, which simplifies the syntax somewhat and avoids direct use of the
API.
The custom caster has two methods, load
and cast
that perform conversion from Python to
C++, and C++ to Python respectively. We now go through each listing.
NekMatrix
1bool load(handle src, bool) 2{ 3 // Perform some checks: the source should be an ndarray. 4 if (!array_t<T>::check_(src)) 5 { 6 return false; 7 } 8 9 // The array should be a c-style, contiguous array of type T. 10 auto buf = array_t<T, array::c_style | array::forcecast>::ensure(src); 11 if (!buf) 12 { 13 return false; 14 } 15 16 // There should be two dimensions. 17 auto dims = buf.ndim(); 18 if (dims != 2) 19 { 20 return false; 21 } 22 23 // Copy data across from the Python array to C++. 24 std::vector<size_t> shape = {buf.shape()[0], buf.shape()[1]}; 25 value = Nektar::NekMatrix<T, Nektar::StandardMatrixTag>( 26 shape[0], shape[1], buf.data(), Nektar::eFULL, buf.data()); 27 28 return true; 29}
For NekMatrix
we do not yet have custom C++-side code to handle Python as a data source.
The cast
routine therefore simply copies the data from the Python handle. The thing to note
in this instance is that the class member variable value
stores the resulting C++ value as a
type Type
.
In terms of conversion from Python to C++, we use the following cast
function.
NekMatrix
1static handle cast( 2 const Nektar::NekMatrix<T, Nektar::StandardMatrixTag> &src, 3 return_value_policy, handle) 4{ 5 using NMat = Nektar::NekMatrix<T, Nektar::StandardMatrixTag>; 6 7 // Construct a new wrapper matrix to hold onto the data. Assign a 8 // destructor so that the wrapper is cleaned up when the Python object 9 // is deallocated. 10 capsule c(new NMat(src.GetRows(), src.GetColumns(), src.GetPtr(), 11 Nektar::eWrapper), 12 [](void *ptr) { 13 NMat *mat = (NMat *)ptr; 14 delete mat; 15 }); 16 17 // Create the NumPy array, passing the capsule. When we go out of scope, 18 // c’s reference count will have been reduced by 1, but array increases 19 // the reference count when it assigns the base to the array. 20 array a({src.GetRows(), src.GetColumns()}, 21 {sizeof(T), src.GetRows() * sizeof(T)}, src.GetRawPtr(), c); 22 23 // This causes the array to be released without decreasing its reference 24 // count, which we do since if we just returned a, then the array would 25 // be deallocated immediately when this function returns. 26 return a.release(); 27}
This conversion can be done more efficiently, since we may construct a NekMatrix
using a reference count of the original matrix, by accessing its Array<OneD, T>
object via the GetPtr
method. To ensure this array lives for the entire time of the
NumPy array, we wrap this into a Python capsule object. The capsule is designed to
hold a C pointer and a callback function that is called when the Python object is
deallocated.
A new matrix is therefore constructed and passed to the capsule. This will prevent it
being destroyed if it is in use in Python, even if on the C++ side the memory is
deallocated. The callback function simply deletes the allocated pointer when it is no
longer required, cleaning up the memory appropriately. This process is shown in
Figure 23.3. It is worth noting that the steps (c) and (d) can be reversed and the
pointer created by the Python binding can be removed first. In this case the memory
will be deallocated only when the underlying Array<OneD, T>
created by C++ is
deallocated.
shared_ptr
is
removed.
shared_ptr
and passed to
Python. Reference counter shown in green.
Next, the converter method returns a NumPy array using the array
object from pybind11
.
Note that the capsule
object is passed as the ndarray
base argument – i.e. the object that
owns the data. In this case, when all ndarray
objects that refer to the data become
deallocated, NumPy will not directly deallocate the data but instead release its reference to
the capsule. The capsule will then be deallocated, which will deallocate the NekMatrix
and thus decrement the counter in the Array<OneD, >
. We also note that data
ordering is important for matrix storage; Nektar++ stores data in column-major order,
whereas NumPy arrays are traditionally row-major. The stride of the array has
to be passed into the np::from_data
in a form of a tuple (a, b)
, where a
is the
number of bytes needed to skip to get to the same position in the next row and
b
is the number of bytes needed to skip to get to the same position in the next
column.
Finally, in order to stop Python from immediately destroying the resulting ndarray
, we call
the release
function on the array, which prevents the reference counter from being
decremented when the array is returned.
The process outlined above requires little manual intervention from the programmer.
There are no explicit calls to Python C API as all operations are carried out by
pybind11
. Therefore, the testing focused mostly on correctness of returned data, in
particular the order of the array. To this end, the Differentiation tutorials were
used as tests. In order to correctly run the tutorials the Python wrapper needs to
retrieve the differentiation matrix which, as mentioned before, has to be converted
to a datatype Python recognises. The test runs the differentiation tutorials and
compares the final result to the fixed expected value. The test is automatically run as a
part of ctest
command if both the Python wrapper and the tutorials have been
built.
Conversely, a similar problem exists when data is created in Python and has to be passed to
the C++ program. In this case, as the data is managed by Python, the main reference counter
should be maintained by the Python object and incremented or decremented as appropriate
using incref
and decref
methods respectively. Although we do not support this process for
the NekMatrix
as described above, we do use this process for the Array<OneD, T>
structure.
When the array is no longer needed by the C++ program the reference counter on the Python
side should be decremented in order for Python garbage collection to work appropriately –
however this should only happen when the array was created by Python in the first
place.
The files implementing the below procedure are:
LibUtilities/Python/BasicUtils/SharedArray.cpp
and
LibUtilities/BasicUtils/SharedArray.hpp
Array<OneD, const DataType>
class templateIn order to perform the operations described above, the C++ array structure should contain
information on whether or not it was created from data managed by Python. To this end, two
new attributes were added to C++ Array<OneD, const DataType>
class template in the
form of a struct
:
1struct PythonInfo { 2 void *m_pyObject; // Underlying PyObject pointer 3 void (*m_callback)(void *); // Callback 4};
where:
m_pyObject
is a pointer to the PyObject
containing the data, which should be an
ndarray
;
m_callback
is a function pointer to the callback function which will decrement
the reference counter of the PyObject
.
Inside Array<OneD, T>
, this struct is held as a double pointer, i.e.:
1PythonInfo **m_pythonInfo;
This is done because if the ndarray
was created originally from a C++ to Python
conversion (as outlined in the previous section), we need to convert the Array –
and any other shared arrays that hold the same C++ memory – to reference the
Python array. If we did not do this, then it is possible that the C++ array could be
destroyed whilst it is still used on the Python side, leading to an out-of-bounds
exception. By storing this as a double pointer, in a similar fashion to the underlying
reference counter m_count
, we can ensure that all C++ arrays are updated when
necessary. We can keep track of Python arrays by checking *m_pythonInfo
; if this is not
set to nullptr
then the array has been constructed throught the Python to C++
converter.
Adding new attributes to the arrays might cause a significantly increased memory usage
or additional unforeseen overheads, although this was not seen in benchmarking.
However to avoid all possibility of this, a preprocessor directive has been added to
only include the additional arguments if NekPy
has been built (using the option
NEKTAR_BUILD_PYTHON
).
A new constructor has been added to the class template, as seen in Listing 23.4.
m_memory_pointer
and m_python_decrement
have been set to nullptr
in the pre-existing
constructors. A similar constructor was added for const
arrays to ensure that these can also
be passed between the languages. Note that no calls to Nektar++ array initialisation policies
are made in this constructor, unlike in the pre-existing ones, as there is no need for the new
array to copy the data.
1Array(unsigned int dim1Size, DataType* data, void* memory_pointer, void (*python_decrement)(void *)) : 2 m_size( dim1Size ), 3 m_capacity( dim1Size ), 4 m_data( data ), 5 m_count( nullptr ), 6 m_offset( 0 ) 7{ 8 m_count = new unsigned int(); 9 *m_count = 1; 10 11 m_pythonInfo = new PythonInfo *(); 12 *m_pythonInfo = new PythonInfo(); 13 (*m_pythonInfo)->m_callback = python_decrement; 14 (*m_pythonInfo)->m_pyObject = memory_pointer; 15}
Changes have also been made to the destructor, as shown in Listing 23.5, in order to ensure that if the data was initially created in Python the callback function would decrement the reference counter of the NekPy array object. The detailed procedure for deleting arrays is described further in this section.
1~Array() 2{ 3 if( m_count == nullptr ) 4 { 5 return; 6 } 7 8 *m_count -= 1; 9 if( *m_count == 0 ) 10 { 11 if (*m_pythonInfo == nullptr) 12 { 13 ArrayDestructionPolicy<DataType>::Destroy( m_data, m_capacity ); 14 MemoryManager<DataType>::RawDeallocate( m_data, m_capacity ); 15 } 16 else 17 { 18 (*m_pythonInfo)->m_callback((*m_pythonInfo)->m_pyObject); 19 delete *m_pythonInfo; 20 } 21 22 delete m_pythonInfo; 23 delete m_count; // Clean up the memory used for the reference count. 24 } 25}
The following algorithm has been proposed to create new arrays in Python and allow the C++ code to access their contents:
The NumPy array object to be converted is passed as an argument to a C++ method available in the Python wrapper.
The converter method is called to convert a Python NumPy array into C++
Array<OneD, const DataType>
object.
If the NumPy array was created through the C++-to-Python process (which can be
determined by checking the base
of the NumPy array), then:
extract the Nektar++ Array from the capsule;
convert the Array (and all of its other references) to a Python array so that any C++ arrays that share this memory also know to call the appropriate decrement function;
set the NumPy array’s base
to an empty object to destroy the original capsule.
Otherwise, the converter creates a new Array<OneD, const DataType>
object with the
following attribute values:
data
points to the data contained by the NumPy array,
memory_pointer
points to the NumPy array object,
python_decrement
points to the function decrementing the reference counter
of the PyObject
,
subsequently, m_count
is equal to 1.
The Python reference counter of the NumPy array object is increased.
If any new references to the array are created in C++ the m_count
attribute is increased
accordingly. Likewise, if new references to NumPy array object are made in Python the
reference counter increases.
The process is schematically shown in Figure 23.3a and 23.3b.
The array deletion process relies on decrementing two reference counters: one on the Python
side of the program (Python reference counter) and the other one on C++ side of the
program. The former registers how many Python references to the data there are and if there
is a C++ reference to the array. The latter (represented by m_count
attribute) counts only the
number of references on the C++ side and as soon as it reaches zero the callback function is
triggered to decrement the Python reference counter so that registers that the data is no
longer referred to in C++. Figure 23.5 presents the overview of the procedure used to delete
the data.
In short, the fact that C++ uses the array is represented to Python as just an increment to the object reference counter. Even if the Python object goes out of scope or is explicitly deleted, the reference counter will always be non-zero until the callback function to decrement it is executed, as shown in Figure 23.3c. Similarly, if the C++ array is deleted first, the Python object will still exist as the reference counter will be non-zero (see Figure 23.3d).
As with conversion from C++ to Python, a converter method was registered to make Python
NumPy arrays available in C++ with pybind11
, which can be found in the SharedArray.hpp
bindings file.
As the process of converting arrays from Python to C++ required making direct calls to C API and relying on custom-written methods, more detailed testing was deemed necessary. In order to thoroughly check if the conversion works as expected, three tests were conducted to determine whether the array is:
referenced (not copied) between the C++ program and the Python wrapper,
accessible to C++ program after being deleted in Python code,
accessible in Python script after being deleted from C++ objects.
Python files containing test scripts are currently located in library\Demos\Python\tests. They should be converted into unit tests that should be run when Python components are built.