I am a PhD student in computer vision at the Weizmann Institute of Science, advised by Prof. Tali Dekel. My research interests are in image and video generation, self-supervised learning, and interpretability of vision foundation models for unveiling novel applications.

I completed my Master's at the Weizmann Institute in Tali Dekel's lab, and received my Bachelor's degree in Computer Science at the American University of Armenia.

Email  /  Google Scholar  /  Twitter  /  Github  /  LinkedIn

DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Narek Tumanyan*, Assaf Singer*, Shai Bagon, Tali Dekel
ECCV 2024
project page / arXiv / code

We present a novel self-supervised method for long-range dense tracking in video, which harnesses the powerful visual prior of DINO. By combining test-time training on a single input video with the semantic representations of DINO, DINO-Tracker reaches SOTA in tracking across long occlusions.

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Narek Tumanyan*, Michal Geyer*, Shai Bagon, Tali Dekel
CVPR 2023
project page / arXiv / code / video

We present a new framework for text-driven image-to-image translation that harnesses the power of a pre-trained text-to-image diffusion model.

We observe and empirically demonstrate that fine-grained control over the generated structure can be achieved by manipulating spatial features and their self-attention inside the model.

Disentangling Structure and Appearance in ViT Feature Space
Narek Tumanyan, Omer Bar-Tal, Shir Amir, Shai Bagon, Tali Dekel
ACM TOG 2023
project page / arXiv

We design a feed-forward model for real-time semantic appearance transfer that is directly conditioned on ViT features, allowing the model to utilize the powerful information they encode.

Splicing ViT Features for Semantic Appearance Transfer
Narek Tumanyan*, Omer Bar-Tal*, Shai Bagon, Tali Dekel
CVPR 2022 (Oral)
project page / arXiv / code / video

We present a method for semantically transferring the appearance of one natural image to another. We train a generator given only a single structure/appearance image pair as input. Our key idea is leveraging a pre-trained Vision Transformer as a semantic prior by deriving novel representations of structure and appearance from its feature space.


