I am a PhD student in computer vision at the Weizmann Institute of Science,
advised by Prof. Tali Dekel.
Previously, I was a Research Scientist Intern at Meta Reality Labs, advised by Jonathon Luiten.
My research interests are in generative AI, 3D/4D reconstruction and tracking, and interpretability of vision foundation models for unveiling novel applications.
I completed my Master's at the Weizmann Institute in Tali Dekel's lab, and received my Bachelor's degree in Computer Science at the American University of Armenia.
We present DRoPS, a method for dynamic scene reconstruction from monocular videos that leverages a static pre-scan of the dynamic object as an explicit geometric and appearance prior.
By organizing Gaussian primitives into surface-aligned pixel grids and modeling motion with a CNN-based Deep Motion Prior,
DRoPS achieves SOTA in novel-view synthesis and long-range 3D tracking accuracy.
We introduce a color-encoded illumination setup for high-speed volumetric scene reconstruction that temporally encodes dynamics via sequential colored strobes.
This enables reconstruction of dynamic scenes from conventional camera frame rates while recovering detailed geometry and motion.
We present a novel self-supervised method for long-range dense tracking in video, which harnesses the powerful visual prior of DINO.
By combining test-time training on a single input video with the semantic representations of DINO,
DINO-Tracker reaches SOTA in tracking across long occlusions.
We present a new framework for text-driven image-to-image translation that harnesses the power of a pre-trained text-to-image diffusion model.
We observe and empirically demonstrate that fine-grained control over the generated structure can be achieved by manipulating spatial features and their self-attention inside the model.
We design a feed-forward model for real-time semantic appearance transfer that is directly conditioned on ViT features,
allowing the model to utilize the powerful information they encode.
We present a method for semantically transferring the appearance of one natural image to another.
We train a generator given only a single structure/appearance image pair as input.
Our key idea is leveraging a pre-trained Vision Transformer as a semantic prior by
deriving novel representations of structure and appearance from its feature space.