DINOv2: Learning Robust Visual Features without Supervision_김대영발표 > Seminar

DINOv2: Learning Robust Visual Features without Supervision_김대영발표

페이지 정보

작성자 최고관리자 댓글 조회 작성일 24-01-08 13:03

본문

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.

첨부파일

[20230830]_seminar_최종_.pptx (79.5M) 2회 다운로드 | DATE : 2024-01-08 13:03:28

이전글Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis_정윤재발표 24.01.08
다음글COVINS-G: A Generic Back-end for Collaborative Visual-Inertial SLAM_김동욱발표 24.01.08

댓글목록

등록된 댓글이 없습니다.

Boards

Seminar

페이지 정보

본문

첨부파일

댓글목록