| OpenGVLab/InternGPT |
2,976 |
|
0 |
0 |
over 2 years ago |
0 |
|
18 |
apache-2.0 |
Python |
| InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统) |
| NVlabs/prismer |
1,245 |
|
0 |
0 |
about 2 years ago |
0 |
|
0 |
other |
Python |
| The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts". |
| microsoft/Oscar |
995 |
|
0 |
0 |
over 2 years ago |
0 |
|
137 |
mit |
Python |
| Oscar and VinVL |
| peteanderson80/bottom-up-attention |
979 |
|
0 |
0 |
about 5 years ago |
0 |
|
56 |
mit |
Jupyter Notebook |
| Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome |
| subho406/OmniNet |
426 |
|
0 |
0 |
over 5 years ago |
0 |
|
1 |
apache-2.0 |
Python |
| Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain |
| TheoCoombes/ClipCap |
64 |
|
0 |
0 |
about 3 years ago |
1 |
May 29, 2022 |
4 |
|
Python |
| Using pretrained encoder and language models to generate captions from multimedia inputs. |
| X-PLUG/mPLUG |
15 |
|
0 |
0 |
almost 3 years ago |
0 |
|
0 |
|
Python |
| mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022) |
| YangLiu9208/CausalVLR |
11 |
|
0 |
0 |
over 2 years ago |
0 |
|
0 |
|
|
| CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning |
| anujanegi/VQA |
6 |
|
0 |
0 |
over 6 years ago |
0 |
|
0 |
mit |
Python |
| Visual Question Answering System |