A Video-based Motion Tracking System

On octo­ber 2nd 2007, a mod­i­fied ver­sion of the motion track­ing sys­tem i orig­i­nal­ly devel­oped for my Light­room project, will be used in an inter­ac­tive sound-instal­la­tion of Wern­er Urban at the Kinder­boeken­bal 2007 at Muziekge­bouw aan ‘t IJ in Ams­ter­dam. For those inter­est­ed i will share here a lit­tle bit of the tech­ni­cal back­grounds of this sys­tem.

Hard­ware The sys­tem con­sists of a firewire cam­era (from Uni­Brain), with a wide-angle lens (Mar­shall Elec­tron­ics), an omni­di­rec­tion­al infra-red illu­mi­na­tor (home­made, for very dark light­ing con­di­tions) con­nect­ed via an 11 mtr. long firewire cable (also from Uni­Brain) to a com­put­er. This com­put­er is run­ning a soft­warepatch (writ­ten in the Max/Jitter pro­gram­ming envi­ron­ment). I usu­al­ly place the cam­era direct­ly above the scene being tracked, so that the bar­rel dis­tor­tion of the wide angle lens isn’t too annoy­ing. It can track and report up to 8 dif­fer­ent posi­tions simul­ta­ne­ous­ly (thanks to the excel­lent exter­nal object for Jit­ter writ­ten by Randy Jones: 2up.jit.centroids). These coor­di­nates can then be used for trig­ger­ing all sorts of actions in Max. In the case of the orig­i­nal project it trig­gered a matrix of 64 lights and the spa­tial­iza­tion of sound in a quadra­phon­ic speak­er set­up.

Soft­ware The track­ing part basi­cal­ly con­sists of three seper­ate max-patch­es con­nect­ed to each oth­er, one for set­ting cam­era para­me­ters and dimen­sions, one for video pre-pro­cess­ing and the last one for the track­ing itself. Set­ting cam­era para­me­ters and dimen­sions. It con­sists of get­ting the live video image into the patch and set­ting cam­era para­me­ters, which is some pret­ty basic tun­ing of the jit.grab object (like set­ting dimen­sions, frame rate and cam­era-spe­cif­ic set­tings like com­pres­sion etc.) For good track­ing (and a speedy patch) the image dimen­sions don’t have to be big at all. I nor­mal­ly grab a videostream of max. 160 x 120 pix­els with a frame rate of 10 fps. Video pre-pro­cess­ing. Some pre-pro­cess­ing is need­ed on the video image to get a good and sta­ble image for track­ing. What i did was to record (in advance) a short movie from the emp­ty space being tracked, load it in to the patch and use it as a ref­er­ence movie. This (loop­ing) movie is sub­tract­ed from the live input from the cam­era, result­ing in a videostream that shows only pix­els that are dif­fer­ent then the ref­er­ence movie (i.e. mov­ing objects, peo­ple, etc.). Final­ly, i nor­mal­ly make the image mono­chrome for track­ing only greyscale val­ues (although you can track col­ors) and apply some jit.fastblur to get rid of most of the noise before send­ing it to the track­ing part. Track­ing. Here comes in the mag­ic of the 2up.jit.centroids object. I used a lot from the help patch includ­ed with the object. It all is pret­ty much self-explana­to­ry. You pick a val­ue you want to track and 2up.jit.centroids reports up to 8 coor­di­nates if it sens­es this val­ue any­where in the image. From here on, you can use this coor­di­nates for any­thing one can think of.

Things to watch out for Light. A lot depends on light­ing con­di­tions. A small change in bright­ness (for instance clouds mov­ing in front of the sun) can ren­der the sys­tem total­ly unus­able. That’s why i made an infrared illu­mi­na­tor from a stack of blue and red Lee fil­ters attached around a 150 Watts light bulb, to make the scene as even­ly lit as pos­si­ble even if the scene itself has to stay dark.

Steady­cam. The cam­era has to be tight­ly fixed in posi­tion, because a small move­ment of the cam­era will make all pix­els dif­fer­ent from the ref­er­ence movie, result­ing in a jit­ter patch freak­ing out.

Back­ground (floor) and dif­fer­ent mate­ri­als (clothes, hair) affect track­ing. This is the most unre­li­able part of the sys­tem. Some­times a per­son wears clothes of a cer­tain col­or or mate­r­i­al that blends too much with the back­ground­col­or/-mate­r­i­al, result­ing in an image that does­n’t have enough con­trast. Thus mak­ing it hard for the track­ing object to track a spe­cif­ic col­or. Every sit­u­a­tion demands a lit­tle fine­tun­ing; you can set a tol­er­ance and a tresh­old for 2up.jit.centroids, to set a range of val­ues to be tracked and a min­i­mum val­ue.