https://cocodataset.org/

 

COCO - Common Objects in Context

 

cocodataset.org

 

COCO(Common Object in Context)

is a large-scale object dtection, segmentation, and captioning dataset.

features:

  • Object segmentation
  • Recognition in context
  • Superpixel stuff segmentation
  • 330K images (> 200K labeled)
  • 1.5 million object instances
  • 80 object categories
  • 91 stuff categories
  • 5 captions per image
  • 250,000 people with keypoints

 

Data format

All annotations share the same basic data structure below:

{
"info": info,
"images": [image],
"annotations": [annotation],
"licenses": [license],
}
info: {
"year": int,
"version": str,
"description": str,
"contributor": str,
"url": str,
"date_created": datetime,
}
image: {
"id": int,
"width": int,
"height": int,
"file_name": str,
"license": int,
"flickr_url": str,
"coco_url": str,
"date_captured": datetime,
}
license: {
"id": int,
"name": str, "url": str,
}

 

1. Object Detection

Each object instance annotation contains a series of fields,
including the category id and segmentation mask of the object.

annotation: {
"id": int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon],
"area": float,
"bbox": [x,y,width,height],
"iscrowd": 0 or 1,
}
categories: [
{
"id": int,
"name": str,
"supercategory": str,
}
]
  • iscrowd: large groups of objects (e.g. a crowd of people)
    • 0 = single object, polygons used
    • 1 = collection of objects, RLE used
  • segmentation
    • RLE(Run Length Encoding)
      • counts: [ ... ]
      • size: [ width, height ]
    • [ polygon ]
      • polygon: [ x1, y1, x2, y2, ... ]
  • bbox: enclosing bounding box,

 

2. Keypoint Detection

annotation: {
:
: object detection annotations
:
"keypoints": [x1, y1, v1, ...],
"num_keypoints": int,
: ,
}
categories: [
{
:
: object detection annotations
:
"keypoints": [str],
"skeleton": [edge]
}
]
  • annotation
    • keypoints: a length 3k array where k is the total number of keypoints defined for the category
      • each keypoint has a 0-indexed location x, y and a visibility flag v
      • v=0: not labed (in which case x=y=0)
      • v=1: labeld but not visible
      • v=2: labeld and visible
    • num_keypoint: the number of labeled keypoints (v > 0) for a given object
      (many objects, e.g. crowds and small objects, will have num_keypoints=0).
  • categories
    • keypoints: a length k array of keypoint names
    • skeleton: defines connectivity via a list of keypoint edge pairs
    • edge: [ index1, index2 ]
"categories": [
{
"supercategory": "person",
"id": 1,
"name": "person",
"keypoints": [
"nose",
"left_eye", "right_eye",
"left_ear", "right_ear",
"left_shoulder", "right_shoulder",
"left_elbow", "right_elbow",
"left_wrist", "right_wrist",
"left_hip", "right_hip",
"left_knee", "right_knee",
"left_ankle", "right_ankle"
],
"skeleton": [
[16, 14], [14, 12], [17, 15], [15, 13],
[12, 13], [6, 12], [7, 13], [6, 7], [6, 8],
[7, 9], [8, 10], [9, 11], [2, 3], [1, 2],
[1, 3], [2, 4], [3, 5], [4, 6], [5, 7]
]
}
]

 

3. Stuff Segmentation

The stuff annotation format is idential and fully compatible to the object detection format above

(except iscrowd is unnecessary and set to 0 by default).

...

 

4. Panoptic Segmentation

For the panotic task, each annotation struct is a per-image annotation rather than a per-object annotation.

Each per-image annotation has two parts:

(1) a PNG that stores the class-agnostic image segmentation and

(2) a JSON struct that stores the semantic information for each image segmentation.

...

  1. annotation.image_id == image.id
  2. For each annotation, per-pixel segment ids are stored as a single PNG at annotation.file_name.
  3. For each annotation, per-segment info is stored in annotation.segments_info.
    segment_info.id stores the unique id of the segment and is used to retrieve the corresponding mask from the PNG.
    • category_id gives the semantic category
    • iscrowd indicates the segment encompasses a group of objects (relevant for thing categories only).
    • The bbox and area fields provide additional info about the segment.
  4. The COCO panoptic task has the same thing categories as the detection task,
    whereas the stuff categories differ from those in the stuff task.
    Finally, each category struct has two additional fields:
    • isthing: distinguishes stuff and thing categories
    • color: is useful for consistent visualization
annotation: {
"image_id": int,
"file_name": str,
"segments_info": [segment_info],
}
segment_info: {
"id": int,
"category_id": int,
"area": int,
"bbox": [x, y, width, height],
"iscrowd": 0 or 1,
}
categories: [
{
"id": int,
"name": str,
"supercategory": str,
"isthing": 0 or 1,
"color": [R,G,B],
}
]

 

5. Image Captioning

These annotations are used to store image captions.

Each caption describes the specified image and each image has at least 5 captions (some images have more).

annotation: {
"id": int,
"image_id": int,
"caption": str,
}

 

6. DensePose

Each annotation contains a series of fields, including category id, bounding box, body part masks and parametrization data for selected points.

DensePose annotations are stored in dp_* fields:

 

Annotated masks:

  • dp_masks: RLE encoded dense masks.
    • All part masks are of size 256x256.
    • They correspond to 14 semantically meaningful parts of the body:
      Torso, Left/Right Hand, Left/Right Foot, Upper Leg Left/Right, Lower Leg Left/Right, Upper Arm Left/Right, Lower Arm Left/Right, Head;

Annotated points:

  • dp_x, dp_y: spatial coordinates of collected points on the image.
    • The coordinates are scaled such that the boudning box size is 256x256.
  • dp_I: The patch index that indicates which of the 24 surface patches the point is on.
    • Patches correspond to the body parts described above.
      Some body parts are split into 2 patches:
      1.  
      2. Torso
      3. Right Hand
      4. Left Hand
      5. Left Foot
      6. Right Foot
      7.  
      8.  
      9. Upper Leg Right
      10. Upper Leg Left
      11.  
      12.  
      13. Lower Leg Right
      14. Lower Leg Left
      15.  
      16.  
      17. Upper Arm Left
      18. Upper Arm Right
      19.  
      20.  
      21. Lower Arm Left
      22. Lower Arm Right
      23.  
      24. Head
  • dp_U, dp_V: Coordinates in the UV space.
    Each surface patch has a separate 2D parameterization.
annotation: {
"id": int,
"image_id": int,
"category_id": int,
"is_crowd": 0 or 1,
"area": int,
"bbox": [x, y, width, height],
"dp_I": [float],
"dp_U": [float],
"dp_V": [float],
"dp_x": [float],
"dp_y": [float],
"dp_masks": [RLE],
}

http://host.robots.ox.ac.uk/pascal/VOC/

 

The PASCAL Visual Object Classes Homepage

2006 10 classes: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep. Train/validation/test: 2618 images containing 4754 annotated objects. Images from flickr and from Microsoft Research Cambridge (MSRC) dataset The MSRC images were easier th

host.robots.ox.ac.uk

PASCAL(Pattern Analysis, Statistical Modeling and Computational Learning)

VOC(Visual Object Classes)

 

Pascal VOC Chanllenges 2005-2012

 

Classification/Detection Competitions

  1. Classification
    For each of the twenty classes, predicting presence/absence of an example of that class in the test image.
  2. Detection
    Predicting the bounding box and label of each object from the twenty target classes in the test image.

20 classes

Segmentation Competition

  • Segmentation
    Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise.

Action Classification Competition

  • Action Classification
    Predicting the action(s) being performed by a person in a still image.

10 action classes + "other"

 

ImageNet Large Scale Visual Recognition Competition

To estimate the content of photographs for the purpose of retrieval and automatic annotation using a subset of the large hand-labeled ImageNet dataset (10,000,000 labeled images depicting 10,000+ object categories) as training.

 

 

Person Layout Tester Competition

  • Person Layout
    Predicting the bounding box and label of each part of a person (head, hands, feet).

 

Data

폴더 계층 구조

VOC20XX
├─ Annotations
├─ ImageSets
├─ JPEGImages
├─ SegmentationClass
└─ SegmentationObject
  • Annotations: JPEGImages 폴더 속 원본 이미지와 같은 이름들의 xml 파일들이 존재, 정답 데이터
  • ImageSets: 사용 목적의 이미지 그룹 정보(test, train, trainval, val), 특정 클래스가 어떤 이미지에 있는지 등에 대한 정보 포함
  • JPEGImages: *.jpg 확장자를 가진 이미지 파일들, 입력 데이터
  • SegmentationClass: Semantic segmentation 학습을 위한 label 이미지
  • SegmentationObject: Instance segmentation 학습을 위한 label 이미

XML 파일 구조

<annotation>
<folder>VOC2012</folder>
<filename>2007_000027.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
</source>
<size>
<width>486</width>
<height>500</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>person</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>174</xmin>
<ymin>101</ymin>
<xmax>349</xmax>
<ymax>351</ymax>
</bndbox>
<part>
<name>head</name>
<bndbox>
<xmin>169</xmin>
<ymin>104</ymin>
<xmax>209</xmax>
<ymax>146</ymax>
</bndbox>
</part>
<part>
<name>hand</name>
<bndbox>
<xmin>278</xmin>
<ymin>210</ymin>
<xmax>297</xmax>
<ymax>233</ymax>
</bndbox>
</part>
<part>
<name>foot</name>
<bndbox>
<xmin>273</xmin>
<ymin>333</ymin>
<xmax>297</xmax>
<ymax>354</ymax>
</bndbox>
</part>
<part>
<name>foot</name>
<bndbox>
<xmin>319</xmin>
<ymin>307</ymin>
<xmax>340</xmax>
<ymax>326</ymax>
</bndbox>
</part>
</object>
</annotation>
  • size: xml 파일과 대응되는 이미지의 width, height, depth(channel) 정보
    • width
    • height
    • depth
  • segmented:
  • object
    • name: 클래스 이름
    • pose: person의 경우만 사용됨
    • truncated: 0 = 전체 포함, 1 = 일부 포함
    • difficult: 0 = 인식하기 쉬움, 1 = 인식하기 어려움
    • bndbox
      • xmin: 좌측상단 x 좌표값
      • ymin: 좌측상단 y 좌표값
      • xmax: 우측하단 x 좌표값
      • ymax: 우측하단 y 좌표값
    • part: person의 경우에만 사용됨

'ML' 카테고리의 다른 글

COCO Dataset  (0) 2023.08.16
분류 모델의 성능평가지표 Accuracy, Recall, Precision, F1-score  (0) 2022.12.19
윈도우 버전 YOLO v3 설치  (0) 2021.08.16

Confusion Matrix

Prediction \ Ground-Truth True False
Positive True Positive
Correct Hit
False Positive
False Alarm
(Type I Error)
Negative False Negative
Missing Result
(Type II Error)
True Negative
Correct Rejection

Type I Error: False Alram

좋은 중고차로 예측하고 구매했지만 실제로 좋지 않은 차인 경우

 

Type II Error: Missing Result

암환자를 건강하다고 판별하는 경우

 

1. Accuracy (정확도)

전체 예측 건수에서 정답을 맞힌 건수의 비율

모든 분류 건수 중에 정답을 맞춘 비율 (True, False 정확히 분류)

 

Accuracy = (TP + TN) / (TP + TN + FP + FN)

                = (TP + TN) / (All Data)

 

Accuracy Paradox (정확도 역설):

희박한 확률의 결과를 예측할 때 (실제 데이터에 Negative 비율이 너무 높아서)

TP는 낮고 TN만 높아도 Accuracy 값은 높게 나온다. => Recall(재현율)로 평가

예) 폭설 예측, 결과가 TN만 나와도 Accuracy가 높다.

 

2. Recall (재현율)

object를 얼마나 잘 찾는가?

실제로 정답이 True인 것들(GT=True) 중 분류기가 TP로 예측한 비율

(True가 발생하는 확률이 적을 때 사용)

맞다고 분류해야 하는 건수 중에 제대로 분류한 비율

 

Recall = TP / (TP + FN) = TP / (All Ground-Truths)

단점: Accuracy와 반대로 TP만 맞추고 TN이 낮아도 Recall이 높게 나옴

 

3. Precision (정밀도)

찾은 object가 정확한가?

True로 예측(Positive)한 건수(TP + FP) 중 실제로 TP인 비율

분류기가 맞다고 예측한 건수 중에 실제로 맞은 건수 비율

 

Precision = TP / (TP + FP) = TP / (All Detections)

단점: Recall의 장점

 

Reference: https://darkpgmr.tistory.com/162

 

precision, recall의 이해

자신이 어떤 기술을 개발하였다. 예를 들어 이미지에서 사람을 자동으로 찾아주는 영상 인식 기술이라고 하자. 이 때, 사람들에게 "이 기술의 검출율은 99.99%입니다"라고 말하면 사람들은 "오우..

darkpgmr.tistory.com

Accuracy = (TP + TN) / (TP + TN + FP + FN) Precision = TP / (TP + FP)
정확도 정밀도
결과가 참값(TP + TN)에 얼마나 가까운지 얼마나 일관된 값을 출력하는지
시스템의 bias 반복 정밀도

예) 몸무게 50kg인 사람의 몸무게 측정, 60, 60.12, 59.99, ... 와 같이 60kg 근처 값으로 나오면

이 저울의 accuracy는 매우 낮지만 precision은 높다.

 

4. F1-score (조화평균)

 

F1-score = 2 x (Precision x Recall) / (Precision + Recall)

 

5. IoU (Intersection over Union)

 

IoU = TP / (TP + FP + FN)

       = Area(GT ∩ Prediction) / Area(GT ∪ Prediction)

보통 0.5 값을 기준으로 함

 

6. AP(Average Precision)

PR곡선(Precision, Recall)을 그렸을 때, 근사되는 넓이

PR Curve : Confidence 레벨에 대한 threshold 값의 변환에 대해 물체 검출기의 성능 평가

Reference:  https://bskyvision.com/465

 

물체 검출 알고리즘 성능 평가방법 AP(Average Precision)의 이해

물체 검출(object detection) 알고리즘의 성능은 precision-recall 곡선과 average precision(AP)로 평가하는 것이 대세다. 이에 대해서 이해하려고 한참을 구글링했지만 초보자가 이해하기에 적당한 문서는 찾

bskyvision.com

 

7. mAP(mean Average Precision)

class가 여러개인 경우 평균 AP

Reference: https://github.com/Cartucho/mAP

 

GitHub - Cartucho/mAP: mean Average Precision - This code evaluates the performance of your neural net for object recognition.

mean Average Precision - This code evaluates the performance of your neural net for object recognition. - GitHub - Cartucho/mAP: mean Average Precision - This code evaluates the performance of your...

github.com

 

 

 

 

 

 

 

 

'ML' 카테고리의 다른 글

COCO Dataset  (0) 2023.08.16
Pascal VOC(Visual Object Classes) Challenges  (0) 2023.08.15
윈도우 버전 YOLO v3 설치  (0) 2021.08.16
  • Visual Studio 2019
  • NVIDIA CUDA (cuda-toolkit-archive)
  • NVIDIA cuDNN (cudnn-archive)
  • OpenCV 2.4 이상 (4.1)

Darknet project

darknet github

 

GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Da

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet ) - GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object ...

github.com

Visual Studio 설정

darknet-master\build\darknet\darknet.vcxproj 파일 수정

:
<WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
:
<PlatformToolset>v142</PlatformToolset>
:
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
<Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.1.props" />
</ImportGroup>
:
<AdditionalIncludeDirectories>$(OPENCV_DIR)\include;$(SolutionDir)..\..\include;$(SolutionDir)..\..\3rdparty\stb\include;$(SolutionDir)..\..\3rdparty\pthreads\include;$(CUDA_PATH_V10_1)\include;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
:
<AdditionalLibraryDirectories>$(OPENCV_DIR)\x64\vc15\lib;$(CUDA_PATH)\lib\$(PlatformName);$(CUDNN)\lib\x64;$(cudnn)\lib\x64;..\..\3rdparty\pthreads\lib;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
:
<PostBuildEvent>
<Command>xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_ffmpeg???_64.dll" "$(OutDir)"
xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_world???.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cublas64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cudart64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cusolver64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\curand64_*.dll" "$(OutDir)"
</Command>
</PostBuildEvent>
:
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
<Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.1.targets" />
</ImportGroup>
:
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="14.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|x64">
<Configuration>Debug</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|x64">
<Configuration>Release</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<ProjectGuid>{4CF5694F-12A5-4012-8D94-9A0915E9FEB5}</ProjectGuid>
<RootNamespace>darknet</RootNamespace>
<WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v142</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
<Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.1.props" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<OutDir>$(SolutionDir)$(Platform)\</OutDir>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<OutDir>$(SolutionDir)$(Platform)\</OutDir>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<SDLCheck>true</SDLCheck>
<AdditionalIncludeDirectories>$(OPENCV_DIR)\include;$(SolutionDir)..\..\include;$(SolutionDir)..\..\3rdparty\stb\include;$(SolutionDir)..\..\3rdparty\pthreads\include;$(CUDA_PATH_V10_1)\include;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
<PreprocessorDefinitions>CUDNN_HALF;CUDNN;_CRTDBG_MAP_ALLOC;_MBCS;_TIMESPEC_DEFINED;_SCL_SECURE_NO_WARNINGS;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;GPU;WIN32;DEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<UndefinePreprocessorDefinitions>OPENCV;</UndefinePreprocessorDefinitions>
<MultiProcessorCompilation>true</MultiProcessorCompilation>
<ForcedIncludeFiles>stdlib.h;crtdbg.h;%(ForcedIncludeFiles)</ForcedIncludeFiles>
</ClCompile>
<Link>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalLibraryDirectories>$(OPENCV_DIR)\x64\vc15\lib;$(CUDA_PATH)\lib\$(PlatformName);$(CUDNN)\lib\x64;$(cudnn)\lib\x64;..\..\3rdparty\pthreads\lib;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
<OutputFile>$(OutDir)\$(TargetName)$(TargetExt)</OutputFile>
<AdditionalDependencies>pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;%(AdditionalDependencies)</AdditionalDependencies>
<AssemblyDebug>true</AssemblyDebug>
</Link>
<CudaCompile>
<CodeGeneration>compute_30,sm_30;compute_75,sm_75</CodeGeneration>
<TargetMachinePlatform>64</TargetMachinePlatform>
</CudaCompile>
<PostBuildEvent>
<Command>xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_ffmpeg???_64.dll" "$(OutDir)"
xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_world???.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cublas64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cudart64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cusolver64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\curand64_*.dll" "$(OutDir)"
</Command>
</PostBuildEvent>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<SDLCheck>true</SDLCheck>
<AdditionalIncludeDirectories>$(OPENCV_DIR)\include;$(SolutionDir)..\..\include;$(SolutionDir)..\..\3rdparty\stb\include;$(SolutionDir)..\..\3rdparty\pthreads\include;$(CUDA_PATH_V10_1)\include;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
<PreprocessorDefinitions>OPENCV;CUDNN_HALF;CUDNN;_TIMESPEC_DEFINED;_SCL_SECURE_NO_WARNINGS;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;GPU;WIN32;_CONSOLE;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<CLanguageStandard>c11</CLanguageStandard>
<CppLanguageStandard>c++1y</CppLanguageStandard>
<PrecompiledHeaderCompileAs>CompileAsCpp</PrecompiledHeaderCompileAs>
<CompileAs>Default</CompileAs>
<UndefinePreprocessorDefinitions>NDEBUG</UndefinePreprocessorDefinitions>
</AdditionalUsingDirectories>
</ClCompile>
<Link>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<AdditionalLibraryDirectories>$(OPENCV_DIR)\x64\vc15\lib;$(CUDA_PATH)\lib\$(PlatformName);$(CUDNN)\lib\x64;$(cudnn)\lib\x64;..\..\3rdparty\pthreads\lib;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
<AdditionalDependencies>pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;%(AdditionalDependencies)</AdditionalDependencies>
<OutputFile>$(OutDir)\$(TargetName)$(TargetExt)</OutputFile>
</Link>
<CudaCompile>
<TargetMachinePlatform>64</TargetMachinePlatform>
<CodeGeneration>compute_30,sm_30;compute_75,sm_75</CodeGeneration>
</CudaCompile>
<PostBuildEvent>
<Command>xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_ffmpeg???_64.dll" "$(OutDir)"
xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_world???.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cublas64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cudart64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cusolver64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\curand64_*.dll" "$(OutDir)"
</Command>
</PostBuildEvent>
</ItemDefinitionGroup>
<ItemGroup>
<CudaCompile Include="..\..\src\activation_kernels.cu" />
<CudaCompile Include="..\..\src\avgpool_layer_kernels.cu" />
<CudaCompile Include="..\..\src\blas_kernels.cu" />
<CudaCompile Include="..\..\src\col2im_kernels.cu" />
<ClInclude Include="..\..\src\upsample_layer.h" />
<ClInclude Include="..\..\src\utils.h" />
<ClInclude Include="..\..\src\yolo_layer.h" />
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
<Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.1.targets" />
</ImportGroup>
</Project>

OpenCV, cuDNN, GPU architecture 설정

$(OPENCV_DIR)\include
$(SolutionDir)..\..\include
$(SolutionDir)..\..\3rdparty\stb\include
$(SolutionDir)..\..\3rdparty\pthreads\include
$(CUDA_PATH_V10_1)\include
$(OPENCV_DIR)\x64\vc15\lib
$(CUDA_PATH)\lib\$(PlatformName)
$(CUDNN)\lib\x64;$(cudnn)\lib\x64
$(SolutionDir)..\..\3rdparty\pthreads\lib;

관련 DLL 복사

xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_ffmpeg???_64.dll" "$(OutDir)"
xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_world???.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cublas64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cudart64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cusolver64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\curand64_*.dll" "$(OutDir)"

테스트 실행

미리 훈련된 weight 값 다운로드

...\build\darknet\x64>darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights ./data/dog.jpg
CUDA-version: 10010 (10020), cuDNN: 7.6.1, CUDNN_HALF=1, GPU count: 1
OpenCV version: 4.1.0
compute_capability = 610, cudnn_half = 0
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF
3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF
5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF
6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF
13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF
37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF
38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF
62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF
63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF
75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
81 conv 255 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 255 0.088 BF
82 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
83 route 79 -> 13 x 13 x 512
84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF
85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256
86 route 85 61 -> 26 x 26 x 768
87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF
88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
93 conv 255 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 255 0.177 BF
94 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
95 route 91 -> 26 x 26 x 256
96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF
97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128
98 route 97 36 -> 52 x 52 x 384
99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF
100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
105 conv 255 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BF
106 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 65.879
avg_outputs = 532444
Allocate additional workspace_size = 52.43 MB
Loading weights from ./yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 57.593000 milli-seconds.
bicycle: 99%
dog: 100%
truck: 94%

 

Docker 사용하기

$ docker pull votiethuy/tensorflow1.12-base
$ docker run -it --rm -v /c/TensorFlow/DW2TF:/dw2tf votiethuy/tensorflow1.12-base
# python main.py --cfg 'data/yolov3.cfg' --weights 'data/yolov3.weights' --output 'data/'

 

'ML' 카테고리의 다른 글

COCO Dataset  (0) 2023.08.16
Pascal VOC(Visual Object Classes) Challenges  (0) 2023.08.15
분류 모델의 성능평가지표 Accuracy, Recall, Precision, F1-score  (0) 2022.12.19

+ Recent posts