تشخیص اشیاء

تشخیص اشیاء یک فناوری کامپیوتری مربوط به بینایی رایانه‌ای و پردازش تصویر است که با شناسایی نمونه‌هایی از اشیاء معنادار از یک کلاس خاص (مانند انسان‌ها، ساختمان‌ها یا اتومبیل‌ها) در تصاویر و ویدئوهای دیجیتال سروکار دارد.^[۱] حوزه‌های به خوبی تحقیق شده تشخیص اشیا شامل تشخیص چهره و تشخیص عابر پیاده است. تشخیص اشیاء در بسیاری از زمینه‌های بینایی رایانه ای از جمله جستجوی تصویر و سامانه تصاویر مداربسته کاربرد دارد.

کاربردها[ویرایش]

تشخیص اشیاء به‌طور گسترده در کارهای بینایی رایانه ای مانند حاشیه نویسی تصویر،^[۲] شمارش وسیله نقلیه،^[۳] شناسایی فعالیت،^[۴] تشخیص چهره، سیستم تشخیص چهره، تقسیم‌بندی مشترک اشیاء ویدئویی استفاده می‌شود. همچنین در ردیابی اشیاء، به عنوان مثال ردیابی توپ در طول مسابقه فوتبال، ردیابی حرکت چوب کریکت، یا ردیابی یک شخص در یک ویدیو استفاده می‌شود.

اغلب، تصاویر آزمایشی از توزیع داده‌های متفاوتی نمونه‌برداری می‌شوند که کار تشخیص شی را به‌طور قابل‌توجهی دشوارتر می‌کند.^[۵] برای رسیدگی به چالش‌های ناشی از شکاف دامنه بین آموزش و داده‌های آزمایشی، بسیاری از رویکردهای تطبیق دامنه بدون نظارت پیشنهاد شده‌اند.^[۵]^[۶]^[۷]^[۸]^[۹] یک راه حل ساده و سرراست برای کاهش شکاف دامنه، اعمال رویکرد ترجمه تصویر به تصویر، مانند چرخه GAN است.^[۱۰] در میان کاربردهای دیگر، تشخیص اشیاء متقابل دامنه در رانندگی خودکار اعمال می‌شود، جایی که مدل‌ها را می‌توان در تعداد زیادی از صحنه‌های بازی ویدیویی آموزش داد، زیرا برچسب‌ها می‌توانند بدون کار دستی تولید شوند.

مفهوم[ویرایش]

هر کلاس شی دارای ویژگی‌های خاص خود است که به طبقه‌بندی کلاس کمک می‌کند - به عنوان مثال همه دایره‌ها گرد هستند. کلاس تشخیص اشیاء از این ویژگی‌های خاص استفاده می‌کند. به عنوان مثال، هنگام جستجوی دایره‌ها، اجسامی که در یک فاصله خاص از یک نقطه (یعنی مرکز) هستند جستجو می‌شوند. به همین ترتیب، هنگام جستجوی مربع، به اجسامی که در گوشه‌ها عمود هستند و طول ضلع آنها برابر است، نیاز است. روش مشابهی برای تشخیص چهره استفاده می‌شود که در آن چشم‌ها، بینی و لب‌ها را می‌توان یافت و همچنین ویژگی‌هایی مانند رنگ پوست و فاصله بین چشم‌ها را نیز می‌توان یافت.

روش‌ها[ویرایش]

روش‌های تشخیص اشیاء معمولاً در دو روش مبتنی بر شبکه عصبی یا غیر عصبی قرار می‌گیرند. برای روش‌های غیر عصبی، ابتدا لازم است ویژگی‌ها را با استفاده از یکی از روش‌های زیر تعریف کنیم، سپس با استفاده از تکنیکی مانند ماشین بردار پشتیبانی (SVM) طبقه‌بندی را انجام دهیم. از سوی دیگر، تکنیک‌های عصبی قادر به تشخیص سرتاسر یک شیء بدون تعریف مشخصی از ویژگی‌ها هستند و معمولاً مبتنی بر شبکه‌های عصبی پیچشی (CNN) هستند.

روش‌های غیر عصبی:
رویکردهای شبکه عصبی:
- پیشنهادات منطقه (R-CNN,^[۱۳]Fast R-CNN,^[۱۴]Faster R-CNN,^[۱۵]^[۱۶]cascade R-CNN)
- ردیاب چند جعبه تک شات (SSD)^[۱۷]
- شما فقط یک بار نگاه می‌کنید (YOLO)^[۱۸]^[۱۹]^[۲۰]^[۱۱]^[۲۱]
- شبکه عصبی پالایش تک شات برای تشخیص اشیاء (RefineDet)^[۲۲]
- Retina-Net^[۲۳]^[۱۶]
- شبکه‌های پیچشی تغییر شکل‌پذیر^[۲۴]^[۲۵]

منابع[ویرایش]

↑ Dasiopoulou, S.; Mezaris, V.; Kompatsiaris, I.; Papastathis, V. -K.; Strintzis, M.G. (2005-10). "Knowledge-assisted semantic video object detection". IEEE Transactions on Circuits and Systems for Video Technology. 15 (10): 1210–1224. doi:10.1109/tcsvt.2005.854238. ISSN 1051-8215. {{cite journal}}: Check date values in: |date= (help)
↑ Ling Guan; Yifeng He; Sun-Yuan Kung (1 March 2012). Multimedia Image and Video Processing. CRC Press. pp. 331–. ISBN 978-1-4398-3087-1.
↑ Alsanabani, Ala; Ahmed, Mohammed; AL Smadi, Ahmad (2020). "Vehicle Counting Using Detecting-Tracking Combinations: A Comparative Analysis". 2020 the 4th International Conference on Video and Image Processing. pp. 48–54. doi:10.1145/3447450.3447458. ISBN 9781450389075. S2CID 233194604.
↑ ^[4] Wu, Jianxin, et al. "A scalable approach to activity recognition based on object use." 2007 IEEE 11th international conference on computer vision. IEEE, 2007.
↑ ^۵٫۰ ^۵٫۱ Oza, Poojan; Sindagi, Vishwanath A. ; VS, Vibashan; Patel, Vishal M. (2021-07-04). "Unsupervised Domain Adaptation of Object Detectors: A Survey". arXiv:2105.13502 [cs.CV].
↑ Khodabandeh, Mehran; Vahdat, Arash; Ranjbar, Mani; Macready, William G. (2019-11-18). "A Robust Learning Approach to Domain Adaptive Object Detection". arXiv:1904.02361 [cs.LG].
↑ Soviany, Petru; Ionescu, Radu Tudor; Rota, Paolo; Sebe, Nicu (2021-03-01). "Curriculum self-paced learning for cross-domain object detection". Computer Vision and Image Understanding. 204: 103166. arXiv:1911.06849. doi:10.1016/j.cviu.2021.103166. ISSN 1077-3142. S2CID 208138033.
↑ Menke, Maximilian; Wenzel, Thomas; Schwung, Andreas (October 2022). "Improving GAN-based Domain Adaptation for Object Detection". 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC): 3880–3885. doi:10.1109/ITSC55140.2022.9922138. ISBN 978-1-6654-6880-0. S2CID 253251380.
↑ Menke, Maximilian; Wenzel, Thomas; Schwung, Andreas (2022-08-31). "AWADA: Attention-Weighted Adversarial Domain Adaptation for Object Detection". arXiv:2208.14662 [cs.CV].
↑ Zhu, Jun-Yan; Park, Taesung; Isola, Phillip; Efros, Alexei A. (2020-08-24). "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks". arXiv:1703.10593 [cs.CV].
↑ ^۱۱٫۰ ^۱۱٫۱ Bochkovskiy, Alexey (2020). "Yolov4: Optimal Speed and Accuracy of Object Detection". arXiv:2004.10934 [cs.CV].
↑ Dalal, Navneet (2005). "Histograms of oriented gradients for human detection" (PDF). Computer Vision and Pattern Recognition. 1.
↑ Ross, Girshick (2014). "Rich feature hierarchies for accurate object detection and semantic segmentation" (PDF). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 580–587. arXiv:1311.2524. doi:10.1109/CVPR.2014.81. ISBN 978-1-4799-5118-5. S2CID 215827080.
↑ Girschick, Ross (2015). "Fast R-CNN" (PDF). Proceedings of the IEEE International Conference on Computer Vision. pp. 1440–1448. arXiv:1504.08083. Bibcode:2015arXiv150408083G.
↑ Shaoqing, Ren (2015). "Faster R-CNN". Advances in Neural Information Processing Systems. arXiv:1506.01497.
↑ ^۱۶٫۰ ^۱۶٫۱ Pang, Jiangmiao; Chen, Kai; Shi, Jianping; Feng, Huajun; Ouyang, Wanli; Lin, Dahua (2019-04-04). "Libra R-CNN: Towards Balanced Learning for Object Detection". arXiv:1904.02701v1 [cs.CV].
↑ Liu, Wei (October 2016). "SSD: Single shot multibox detector". Computer Vision – ECCV 2016. European Conference on Computer Vision. Lecture Notes in Computer Science. Vol. 9905. pp. 21–37. arXiv:1512.02325. doi:10.1007/978-3-319-46448-0_2. ISBN 978-3-319-46447-3. S2CID 2141740.
↑ Redmon, Joseph (2016). "You only look once: Unified, real-time object detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1506.02640. Bibcode:2015arXiv150602640R.
↑ Redmon, Joseph (2017). "YOLO9000: better, faster, stronger". arXiv:1612.08242 [cs.CV].
↑ Redmon, Joseph (2018). "Yolov3: An incremental improvement". arXiv:1804.02767 [cs.CV].
↑ Wang, Chien-Yao (2021). "Scaled-YOLOv4: Scaling Cross Stage Partial Network". Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:2011.08036. Bibcode:2020arXiv201108036W.
↑ Zhang, Shifeng (2018). "Single-Shot Refinement Neural Network for Object Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4203–4212. arXiv:1711.06897. Bibcode:2017arXiv171106897Z.
↑ Lin, Tsung-Yi (2020). "Focal Loss for Dense Object Detection". IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (2): 318–327. arXiv:1708.02002. Bibcode:2017arXiv170802002L. doi:10.1109/TPAMI.2018.2858826. PMID 30040631. S2CID 47252984.
↑ Zhu, Xizhou (2018). "Deformable ConvNets v2: More Deformable, Better Results". arXiv:1811.11168 [cs.CV].
↑ Dai, Jifeng (2017). "Deformable Convolutional Networks". arXiv:1703.06211 [cs.CV].

پیوند به بیرون[ویرایش]

تشخیص کلاس شی چندگانه

محلی سازی کنش مکانی-زمانی

تشخیص اشیاء ویدیویی و تقسیم‌بندی مشترک

[1] Dasiopoulou, S.; Mezaris, V.; Kompatsiaris, I.; Papastathis, V. -K.; Strintzis, M.G. (2005-10). "Knowledge-assisted semantic video object detection". IEEE Transactions on Circuits and Systems for Video Technology. 15 (10): 1210–1224. doi:10.1109/tcsvt.2005.854238. ISSN 1051-8215. {{cite journal}}: Check date values in: |date= (help)

[2] Ling Guan; Yifeng He; Sun-Yuan Kung (1 March 2012). Multimedia Image and Video Processing. CRC Press. pp. 331–. ISBN 978-1-4398-3087-1.

[3] Alsanabani, Ala; Ahmed, Mohammed; AL Smadi, Ahmad (2020). "Vehicle Counting Using Detecting-Tracking Combinations: A Comparative Analysis". 2020 the 4th International Conference on Video and Image Processing. pp. 48–54. doi:10.1145/3447450.3447458. ISBN 9781450389075. S2CID 233194604.

[4] [4] Wu, Jianxin, et al. "A scalable approach to activity recognition based on object use." 2007 IEEE 11th international conference on computer vision. IEEE, 2007.

[:0-5] ۵٫۰ ^۵٫۱ Oza, Poojan; Sindagi, Vishwanath A. ; VS, Vibashan; Patel, Vishal M. (2021-07-04). "Unsupervised Domain Adaptation of Object Detectors: A Survey". arXiv:2105.13502 [cs.CV].

[6] Khodabandeh, Mehran; Vahdat, Arash; Ranjbar, Mani; Macready, William G. (2019-11-18). "A Robust Learning Approach to Domain Adaptive Object Detection". arXiv:1904.02361 [cs.LG].

[7] Soviany, Petru; Ionescu, Radu Tudor; Rota, Paolo; Sebe, Nicu (2021-03-01). "Curriculum self-paced learning for cross-domain object detection". Computer Vision and Image Understanding. 204: 103166. arXiv:1911.06849. doi:10.1016/j.cviu.2021.103166. ISSN 1077-3142. S2CID 208138033.

[8] Menke, Maximilian; Wenzel, Thomas; Schwung, Andreas (October 2022). "Improving GAN-based Domain Adaptation for Object Detection". 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC): 3880–3885. doi:10.1109/ITSC55140.2022.9922138. ISBN 978-1-6654-6880-0. S2CID 253251380.

[9] Menke, Maximilian; Wenzel, Thomas; Schwung, Andreas (2022-08-31). "AWADA: Attention-Weighted Adversarial Domain Adaptation for Object Detection". arXiv:2208.14662 [cs.CV].

[10] Zhu, Jun-Yan; Park, Taesung; Isola, Phillip; Efros, Alexei A. (2020-08-24). "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks". arXiv:1703.10593 [cs.CV].

[:1-11] ۱۱٫۰ ^۱۱٫۱ Bochkovskiy, Alexey (2020). "Yolov4: Optimal Speed and Accuracy of Object Detection". arXiv:2004.10934 [cs.CV].

[12] Dalal, Navneet (2005). "Histograms of oriented gradients for human detection" (PDF). Computer Vision and Pattern Recognition. 1.

[13] Ross, Girshick (2014). "Rich feature hierarchies for accurate object detection and semantic segmentation" (PDF). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 580–587. arXiv:1311.2524. doi:10.1109/CVPR.2014.81. ISBN 978-1-4799-5118-5. S2CID 215827080.

[14] Girschick, Ross (2015). "Fast R-CNN" (PDF). Proceedings of the IEEE International Conference on Computer Vision. pp. 1440–1448. arXiv:1504.08083. Bibcode:2015arXiv150408083G.

[15] Shaoqing, Ren (2015). "Faster R-CNN". Advances in Neural Information Processing Systems. arXiv:1506.01497.

[:2-16] ۱۶٫۰ ^۱۶٫۱ Pang, Jiangmiao; Chen, Kai; Shi, Jianping; Feng, Huajun; Ouyang, Wanli; Lin, Dahua (2019-04-04). "Libra R-CNN: Towards Balanced Learning for Object Detection". arXiv:1904.02701v1 [cs.CV].

[17] Liu, Wei (October 2016). "SSD: Single shot multibox detector". Computer Vision – ECCV 2016. European Conference on Computer Vision. Lecture Notes in Computer Science. Vol. 9905. pp. 21–37. arXiv:1512.02325. doi:10.1007/978-3-319-46448-0_2. ISBN 978-3-319-46447-3. S2CID 2141740.

[18] Redmon, Joseph (2016). "You only look once: Unified, real-time object detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1506.02640. Bibcode:2015arXiv150602640R.

[19] Redmon, Joseph (2017). "YOLO9000: better, faster, stronger". arXiv:1612.08242 [cs.CV].

[20] Redmon, Joseph (2018). "Yolov3: An incremental improvement". arXiv:1804.02767 [cs.CV].

[21] Wang, Chien-Yao (2021). "Scaled-YOLOv4: Scaling Cross Stage Partial Network". Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:2011.08036. Bibcode:2020arXiv201108036W.

[22] Zhang, Shifeng (2018). "Single-Shot Refinement Neural Network for Object Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4203–4212. arXiv:1711.06897. Bibcode:2017arXiv171106897Z.

[23] Lin, Tsung-Yi (2020). "Focal Loss for Dense Object Detection". IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (2): 318–327. arXiv:1708.02002. Bibcode:2017arXiv170802002L. doi:10.1109/TPAMI.2018.2858826. PMID 30040631. S2CID 47252984.

[24] Zhu, Xizhou (2018). "Deformable ConvNets v2: More Deformable, Better Results". arXiv:1811.11168 [cs.CV].

[25] Dai, Jifeng (2017). "Deformable Convolutional Networks". arXiv:1703.06211 [cs.CV].

[۱]

[۲]

[۳]

[۴]

[۵]

[۶]

[۷]

[۸]

[۹]

[۱۰]

[۱۱]

[۱۲]

[۱۳]

[۱۴]

[۱۵]

[۱۶]

[۱۷]

[۱۸]

[۱۹]

[۲۰]

[۲۱]

[۲۲]

[۲۳]

[۲۴]

[۲۵]