Efficient copying of vertex coords to and from numpy arrays

varkenvarken · February 4, 2016, 8:30am

I did some investigation into the overhead involved in copying data to and from numpy arrays from Blender meshes. Python is quite fast itself when used properly but when you work with millions of vertices some operations might be done far more efficiently with Numpy (which comes bundled with Blender). However, you will need to copy a lot of data and in this article I investigated those costs and tried to find the optimal code to perform those copies. I have also provided sample code and full benchmark code so you can repeat these measurements and see if this might save you time in your specific situation. It is quite a long and technical article I am afraid (but it does have some pictures

regards,

– Michel.

eppo · February 4, 2016, 9:15am

Thank you!
Interesting read, as always ;).

ambi · February 4, 2016, 9:45am

I also recently bumped into this problem and here’s what I did. Hope you find it useful.


    def read_verts(self, mesh):
        mverts_co = np.zeros((len(mesh.vertices)*3), dtype=np.float)
        mesh.vertices.foreach_get("co", mverts_co)
        return np.reshape(mverts_co, (len(mesh.vertices), 3))      
    
    def read_edges(self, mesh):
        fastedges = np.zeros((len(mesh.edges)*2), dtype=np.int) # [0.0, 0.0] * len(mesh.edges)
        mesh.edges.foreach_get("vertices", fastedges)
        return np.reshape(fastedges, (len(mesh.edges), 2))
    
    def read_norms(self, mesh):
        mverts_no = np.zeros((len(mesh.vertices)*3), dtype=np.float)
        mesh.vertices.foreach_get("normal", mverts_no)
        return np.reshape(mverts_no, (len(mesh.vertices), 3))

What I’m still trying to figure out, though, is how to set vertex colors per vertex and not spend half of the entire script run time to do it.

varkenvarken · February 5, 2016, 6:12am

@ambi: Thanks! Linus Yng also pointed me to the foreach_get and foreach_set methods and they are a huge improvement on the ‘classic’ approach. I updated the article (and the benchmark) to reflect it.

As for vertex color layers: the data in each layer is a bpy_prop_collection too so it should be possible to access it using foreach_get/ foreach_set, or is that not what you mean?

– Michel.

ambi · February 5, 2016, 6:17am

@varkenvarken: The problem is that vertex colors are determined by face loop indices, and my vertex colors are calculated by vertex, so something like this is required:


mloops = mesh.loops
colors = np.zeros((len(color_layer.data),3))
for poly in mesh.polygons:
    for idx in poly.loop_indices:
        colors[idx] = retvalues[mloops[idx].vertex_index]
colors = colors.flatten()
color_layer.data.foreach_set("color", colors)

varkenvarken · February 5, 2016, 6:27am

@ambi: not behind a machine right now but I would guess most of the time is spent in


colors[idx] = retvalues[mloops[idx].vertex_index]

With mloops being a Python list the [idx] indexing is probably quite slow. But could you not first retrieve all the vertex indices for the loops with foreach_get (i.e. make mloops a ndarray instead of a list)?

ambi · February 5, 2016, 6:28am

I’ll give it a shot and post later how it goes.

ambi · February 5, 2016, 6:46am

Yeah, seems it’s a bit faster. Not a lot, since Python lists are dynamic arrays and the get is O(1) iirc. What looks to be actually the costly one is get_attrib from Blender data structure.

set_colors() went from 0.6 to 0.55. As this is the optimized version, at this point the inner calculation loop is where the most time is spent. The unoptimized one took the most time in reading and setting the colors.

         1513869 function calls in 4.550 seconds


   Ordered by: cumulative time


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.550    4.550 ops.py:175(__call__)
        1    0.016    0.016    4.542    4.542 {built-in method call}
        1    3.102    3.102    4.527    4.527 mesh_curves.py:185(execute)
        1    0.388    0.388    0.560    0.560 mesh_curves.py:105(set_colors)
   504480    0.554    0.000    0.554    0.000 {method 'dot' of 'numpy.ndarray' o
bjects}
   126290    0.193    0.000    0.193    0.000 {method 'reduce' of 'numpy.ufunc'
objects}
   125952    0.077    0.000    0.077    0.000 bpy_types.py:493(loop_indices)
        4    0.073    0.018    0.073    0.018 {method 'foreach_get' of 'bpy_prop
_collection' objects}
        1    0.067    0.067    0.067    0.067 {method 'foreach_set' of 'bpy_prop
_collection' objects}
   126290    0.040    0.000    0.040    0.000 {method 'fill' of 'numpy.ndarray'
objects}
   630770    0.027    0.000    0.027    0.000 {method 'append' of 'list' objects
}
        1    0.000    0.000    0.022    0.022 mesh_curves.py:175(read_edges)
        1    0.000    0.000    0.015    0.015 mesh_curves.py:180(read_norms)
        1    0.000    0.000    0.013    0.013 mesh_curves.py:170(read_verts)
        2    0.008    0.004    0.008    0.004 ops.py:147(_scene_update)
        1    0.003    0.003    0.003    0.003 {method 'flatten' of 'numpy.ndarra
y' objects}
        9    0.000    0.000    0.000    0.000 {built-in method zeros}
       12    0.000    0.000    0.000    0.000 bpy_types.py:589(__getattribute__)


        3    0.000    0.000    0.000    0.000 fromnumeric.py:125(reshape)
        3    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarra
y' objects}
        2    0.000    0.000    0.000    0.000 BoolTool.py:307(HandleScene)
       20    0.000    0.000    0.000    0.000 {built-in method getattr}
       13    0.000    0.000    0.000    0.000 {built-in method len}
        1    0.000    0.000    0.000    0.000 ops.py:41(__getattr__)
        2    0.000    0.000    0.000    0.000 mesh_curves.py:100(poll)
        1    0.000    0.000    0.000    0.000 ops.py:83(__getattr__)
        1    0.000    0.000    0.000    0.000 ops.py:171(idname_py)
        1    0.000    0.000    0.000    0.000 ops.py:80(__init__)
        2    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' obje
cts}
        1    0.000    0.000    0.000    0.000 ops.py:159(__init__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Prof
iler' objects}

ambi · February 5, 2016, 7:31am

Here’s the end result for anyone interested. By making a lot of assumptions how the data is ordered, I was able to skim off 75% of the time. Retvalues is color by vertex (vertex->color).


        # write vertex colors
        colors = np.zeros((len(color_layer.data),3))
        mloops = np.zeros((len(mesh.loops)), dtype=np.int)
        mesh.loops.foreach_get("vertex_index", mloops)


        # FIXME: Making a lot of completely horrific assumptions on how 
        #        the data is ordered on the Blender side of things        
        colors = retvalues[mloops]


        colors = colors.flatten()
        color_layer.data.foreach_set("color", colors)

Oyster · February 5, 2016, 8:17am

does ‘polygons[…]’ elements have no attribute ‘polygons’?
I am writing an exporter, and try to mimic this post as


    def read_polygons(self, mesh):
        fastpolygons = np.zeros((len(mesh.polygons)), dtype=bpy.types.bpy_prop_collection)
        mesh.polygons.foreach_get("polygon", fastpolygons)
        return fastpolygons

but I get

does ‘polygons[…]’ elements have no attribute ‘polygons’

So is there only the classic method to loop over polygons as


for polygon in mesh.polygons:
   do-something

thanks

ambi · February 5, 2016, 8:31am

Paste your code to some pastebin or Github and let us have a look at it. What are you trying to do? Which line does the error show? Look at C.active_object.data.polygons[0] in the Blender Python console to see what is available for polygons.

Mesh doesn’t seem to have a foreach_get, so I would assume that using this method to read entire polygon structures is impossible.

The documentation says about foreach_get: “This is a function to give fast access to attributes within a collection.” So I would think that getting an actual bpy_prop_collection with it is not something you’re supposed to do.

Oyster · February 5, 2016, 8:32pm

for the default cube scene, run


import bpy
import numpy as np
def read_polygons(mesh):
    fastpolygons = np.zeros((len(mesh.polygons)), dtype=bpy.types.MeshPolygon)
    mesh.polygons.foreach_get("polygon", fastpolygons)   #  <b>blender will err and say "polygons[...]' elements have no attribute 'polygon'"</b>
    return fastpolygons
    
mesh = bpy.data.meshes[0]

#this classic method is ok
for var in mesh.polygons:
    print(var)
    print(dir(var))

#I expected this works like above classic one but has rapid speed
for var in read_polygons(mesh):
    print(var)
    print(dir(var))

varkenvarken · February 6, 2016, 2:22am

extra warning: the docs also state that foreach_xxx will only work for attribs that are bool, int or float (or arrays of those) so getting any other type of attribute will probably not work

varkenvarken · February 6, 2016, 3:15am

I rewrote my randomvertexcolors addon to use numpy and I get about a 50% reduction (from 3.1 seconds to 2.0 seconds on a mesh with almost 1M polygons) with the following code:


    def execute(self, context):
        bpy.ops.object.mode_set(mode='OBJECT')
        mesh = context.scene.objects.active.data
        vertex_colors = mesh.vertex_colors.active.data
        polygons = mesh.polygons
        verts = mesh.vertices
        npolygons = len(polygons)
        nverts = len(verts)
        nloops = len(vertex_colors)


        if self.usenumpy:
            start = time()


            startloop = np.empty(npolygons, dtype=np.int)
            numloops = np.empty(npolygons, dtype=np.int)
            polygon_indices = np.empty(npolygons, dtype=np.int)


            polygons.foreach_get('index', polygon_indices)
            polygons.foreach_get('loop_start', startloop)
            polygons.foreach_get('loop_total', numloops)


            colors = np.random.random_sample((npolygons,3))
            loopcolors = np.empty((nloops,3))


            for s,n,pi in np.nditer([startloop, numloops, polygon_indices]):
                loopcolors[slice(s,s+n)] = colors[pi]


            loopcolors = loopcolors.flatten()
            vertex_colors.foreach_set("color", loopcolors)
        else:
            start = time()
            for poly in polygons:
                color = [random(), random(), random()]
                for loop_index in range(poly.loop_start, poly.loop_start + poly.loop_total):
                    vertex_colors[loop_index].color = color
        if self.timeit:
            print("%s: %d/%d (verts/polys) in %.1f seconds"%("numpy" if self.usenumpy else "plain", nverts, npolygons, time()-start))
        bpy.ops.object.mode_set(mode='VERTEX_PAINT')
        bpy.ops.object.mode_set(mode='EDIT')
        bpy.ops.object.mode_set(mode='VERTEX_PAINT')
        context.scene.update()
        return {'FINISHED'}

As you can see I retrieved all the indices from both loops and polys first and then did the assignment of the random colors by using numpy’s nditer(). Now I am not a numpy expert so I guess that instead of creating all those slice objects even better results might be possible by creating index arrays.

varkenvarken · February 6, 2016, 5:50am

After some more tinkering, I can reduce the timing even more, to 0.8s, by doing away with the innermost Python loop that creates the slice objects and doing all the indexing in Numpy:


            loopcolors[startloop] = colors[polygon_indices]
            numloops -= 1
            nz = np.flatnonzero(numloops)
            while len(nz):
                startloop[nz] += 1
                loopcolors[startloop[nz]] = colors[polygon_indices[nz]]
                numloops[nz] -= 1
                nz = np.flatnonzero(numloops)

In effect we now have as many parallel loops as we have polygons. So now we can assign vertex colors (on my machine) to over 1 million faces per second (and that includes generating 3M random floats), which is not too bad I guess.

Details including download here.

ambi · February 6, 2016, 6:43am

You could also try something like this. It makes assumptions on how Blender works and could get broken in later releases but seems to work right now.

(entire script)


import bpy
import numpy as np


mesh = bpy.context.active_object.data


vcolname = "Testing"
if vcolname not in mesh.vertex_colors:
    mesh.vertex_colors.new(name=vcolname)
    
color_layer = mesh.vertex_colors[vcolname]
mesh.vertex_colors[vcolname].active = True


retvalues = np.random.random((len(mesh.vertices), 3))
mloops = np.zeros((len(mesh.loops)), dtype=np.int)
mesh.loops.foreach_get("vertex_index", mloops)
color_layer.data.foreach_set("color", retvalues[mloops].flatten())

ambi · February 6, 2016, 6:46am

@Oyster: The idea with foreach_get and foreach_set is optimization. I suggest making the algorithm first in pure Python, because it’s a lot more readable and manageable, and after then if you need performance, going Numpy and foreach_get & _set.

varkenvarken · February 6, 2016, 7:10am

@ambi: your script would assign a different color to each vertex (but the same color to all loops that share this vertex) if I read it correctly, which is not what I am aiming for, I want each polygon to have a uniform (but random) color, so I have to assign the same color to each loop of a given polygon (but different colors to loops that share a vertex).

BTW, I don’t see anything that could break although I guess there is no need to initialize mloops to zeros as it gets overwritten immediately.

varkenvarken · February 7, 2016, 2:13am

small change suggested by Linus Yng, changing everything to 32bits gives another 2x speed increase (for my random vertex colors example. Just foreach_get / foreach_set gives a 14x speed increase)

Anybody knows how to generate an array of 32 bit random floats in Numpy without producing an intermediate 64 bit array first?

The code now uses:


colors = np.random.random_sample((npolygons,3)).astype(np.float32)

because random_sample (and related functions) do not take a dtype argument. I am not sure generating 32 bit numbers would be that much faster (most 64bit operations in numpy result in only a 40% penalty on my machine compared to 32 bit operations) but saving on a potentially very large temporary array would still be interesting.

ambi · February 9, 2016, 5:28am

You could use ctypes to allocate the memory and then numpy.ctypeslib.as_array to use that allocated memory as a numpy array. If you really wanted to, that is.