https://cocodataset.org/

 

COCO - Common Objects in Context

 

cocodataset.org

 

COCO(Common Object in Context)

is a large-scale object dtection, segmentation, and captioning dataset.

features:

  • Object segmentation
  • Recognition in context
  • Superpixel stuff segmentation
  • 330K images (> 200K labeled)
  • 1.5 million object instances
  • 80 object categories
  • 91 stuff categories
  • 5 captions per image
  • 250,000 people with keypoints

 

Data format

All annotations share the same basic data structure below:

{
  "info": info,
  "images": [image],
  "annotations": [annotation],
  "licenses": [license],
}

info: {
  "year": int,
  "version": str,
  "description": str,
  "contributor": str,
  "url": str,
  "date_created": datetime,
}

image: {
  "id": int,
  "width": int,
  "height": int,
  "file_name": str,
  "license": int,
  "flickr_url": str,
  "coco_url": str,
  "date_captured": datetime,
}

license: {
  "id": int,
  "name": str, "url": str,
}

 

1. Object Detection

Each object instance annotation contains a series of fields,
including the category id and segmentation mask of the object.

annotation: {
  "id": int,
  "image_id": int,
  "category_id": int,
  "segmentation": RLE or [polygon],
  "area": float,
  "bbox": [x,y,width,height],
  "iscrowd": 0 or 1,
}

categories: [
  {
    "id": int,
    "name": str,
    "supercategory": str,
  }
]
  • iscrowd: large groups of objects (e.g. a crowd of people)
    • 0 = single object, polygons used
    • 1 = collection of objects, RLE used
  • segmentation
    • RLE(Run Length Encoding)
      • counts: [ ... ]
      • size: [ width, height ]
    • [ polygon ]
      • polygon: [ x1, y1, x2, y2, ... ]
  • bbox: enclosing bounding box,

 

2. Keypoint Detection

annotation: {
  :
  : object detection annotations
  :
  "keypoints": [x1, y1, v1, ...],
  "num_keypoints": int,
  : ,
}

categories: [
  {
    :
    : object detection annotations
    :
    "keypoints": [str],
    "skeleton": [edge]
  }
]
  • annotation
    • keypoints: a length 3k array where k is the total number of keypoints defined for the category
      • each keypoint has a 0-indexed location x, y and a visibility flag v
      • v=0: not labed (in which case x=y=0)
      • v=1: labeld but not visible
      • v=2: labeld and visible
    • num_keypoint: the number of labeled keypoints (v > 0) for a given object
      (many objects, e.g. crowds and small objects, will have num_keypoints=0).
  • categories
    • keypoints: a length k array of keypoint names
    • skeleton: defines connectivity via a list of keypoint edge pairs
    • edge: [ index1, index2 ]
"categories": [
  {
    "supercategory": "person",
    "id": 1,
    "name": "person",
    "keypoints": [
      "nose",
      "left_eye",       "right_eye",
      "left_ear",       "right_ear",
      "left_shoulder",  "right_shoulder",
      "left_elbow",     "right_elbow",
      "left_wrist",     "right_wrist",
      "left_hip",       "right_hip",
      "left_knee",      "right_knee",
      "left_ankle",     "right_ankle"
    ],
    "skeleton": [
      [16, 14], [14, 12], [17, 15], [15, 13],
      [12, 13], [6, 12], [7, 13], [6, 7], [6, 8],
      [7, 9], [8, 10], [9, 11], [2, 3], [1, 2],
      [1, 3], [2, 4], [3, 5], [4, 6], [5, 7]
    ]
  }
]

 

3. Stuff Segmentation

The stuff annotation format is idential and fully compatible to the object detection format above

(except iscrowd is unnecessary and set to 0 by default).

...

 

4. Panoptic Segmentation

For the panotic task, each annotation struct is a per-image annotation rather than a per-object annotation.

Each per-image annotation has two parts:

(1) a PNG that stores the class-agnostic image segmentation and

(2) a JSON struct that stores the semantic information for each image segmentation.

...

  1. annotation.image_id == image.id
  2. For each annotation, per-pixel segment ids are stored as a single PNG at annotation.file_name.
  3. For each annotation, per-segment info is stored in annotation.segments_info.
    segment_info.id stores the unique id of the segment and is used to retrieve the corresponding mask from the PNG.
    • category_id gives the semantic category
    • iscrowd indicates the segment encompasses a group of objects (relevant for thing categories only).
    • The bbox and area fields provide additional info about the segment.
  4. The COCO panoptic task has the same thing categories as the detection task,
    whereas the stuff categories differ from those in the stuff task.
    Finally, each category struct has two additional fields:
    • isthing: distinguishes stuff and thing categories
    • color: is useful for consistent visualization
annotation: {
  "image_id": int,
  "file_name": str,
  "segments_info": [segment_info],
}

segment_info: {
  "id": int,
  "category_id": int,
  "area": int,
  "bbox": [x, y, width, height],
  "iscrowd": 0 or 1,
}

categories: [
  {
    "id": int,
    "name": str,
    "supercategory": str,
    "isthing": 0 or 1,
    "color": [R,G,B],
  }
]

 

5. Image Captioning

These annotations are used to store image captions.

Each caption describes the specified image and each image has at least 5 captions (some images have more).

annotation: {
  "id": int,
  "image_id": int,
  "caption": str,
}

 

6. DensePose

Each annotation contains a series of fields, including category id, bounding box, body part masks and parametrization data for selected points.

DensePose annotations are stored in dp_* fields:

 

Annotated masks:

  • dp_masks: RLE encoded dense masks.
    • All part masks are of size 256x256.
    • They correspond to 14 semantically meaningful parts of the body:
      Torso, Left/Right Hand, Left/Right Foot, Upper Leg Left/Right, Lower Leg Left/Right, Upper Arm Left/Right, Lower Arm Left/Right, Head;

Annotated points:

  • dp_x, dp_y: spatial coordinates of collected points on the image.
    • The coordinates are scaled such that the boudning box size is 256x256.
  • dp_I: The patch index that indicates which of the 24 surface patches the point is on.
    • Patches correspond to the body parts described above.
      Some body parts are split into 2 patches:
      1.  
      2. Torso
      3. Right Hand
      4. Left Hand
      5. Left Foot
      6. Right Foot
      7.  
      8.  
      9. Upper Leg Right
      10. Upper Leg Left
      11.  
      12.  
      13. Lower Leg Right
      14. Lower Leg Left
      15.  
      16.  
      17. Upper Arm Left
      18. Upper Arm Right
      19.  
      20.  
      21. Lower Arm Left
      22. Lower Arm Right
      23.  
      24. Head
  • dp_U, dp_V: Coordinates in the UV space.
    Each surface patch has a separate 2D parameterization.
annotation: {
  "id": int,
  "image_id": int,
  "category_id": int,
  "is_crowd": 0 or 1,
  "area": int,
  "bbox": [x, y, width, height],
  "dp_I": [float],
  "dp_U": [float],
  "dp_V": [float],
  "dp_x": [float],
  "dp_y": [float],
  "dp_masks": [RLE],
}

http://host.robots.ox.ac.uk/pascal/VOC/

 

The PASCAL Visual Object Classes Homepage

2006 10 classes: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep. Train/validation/test: 2618 images containing 4754 annotated objects. Images from flickr and from Microsoft Research Cambridge (MSRC) dataset The MSRC images were easier th

host.robots.ox.ac.uk

PASCAL(Pattern Analysis, Statistical Modeling and Computational Learning)

VOC(Visual Object Classes)

 

Pascal VOC Chanllenges 2005-2012

 

Classification/Detection Competitions

  1. Classification
    For each of the twenty classes, predicting presence/absence of an example of that class in the test image.
  2. Detection
    Predicting the bounding box and label of each object from the twenty target classes in the test image.

20 classes

Segmentation Competition

  • Segmentation
    Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise.

Action Classification Competition

  • Action Classification
    Predicting the action(s) being performed by a person in a still image.

10 action classes + "other"

 

ImageNet Large Scale Visual Recognition Competition

To estimate the content of photographs for the purpose of retrieval and automatic annotation using a subset of the large hand-labeled ImageNet dataset (10,000,000 labeled images depicting 10,000+ object categories) as training.

 

 

Person Layout Tester Competition

  • Person Layout
    Predicting the bounding box and label of each part of a person (head, hands, feet).

 

Data

폴더 계층 구조

VOC20XX
 ├─ Annotations
 ├─ ImageSets
 ├─ JPEGImages
 ├─ SegmentationClass
 └─ SegmentationObject
  • Annotations: JPEGImages 폴더 속 원본 이미지와 같은 이름들의 xml 파일들이 존재, 정답 데이터
  • ImageSets: 사용 목적의 이미지 그룹 정보(test, train, trainval, val), 특정 클래스가 어떤 이미지에 있는지 등에 대한 정보 포함
  • JPEGImages: *.jpg 확장자를 가진 이미지 파일들, 입력 데이터
  • SegmentationClass: Semantic segmentation 학습을 위한 label 이미지
  • SegmentationObject: Instance segmentation 학습을 위한 label 이미

XML 파일 구조

<annotation>
  <folder>VOC2012</folder>
  <filename>2007_000027.jpg</filename>
  <source>
    <database>The VOC2007 Database</database>
    <annotation>PASCAL VOC2007</annotation>
    <image>flickr</image>
  </source>
  <size>
    <width>486</width>
    <height>500</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>person</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>174</xmin>
      <ymin>101</ymin>
      <xmax>349</xmax>
      <ymax>351</ymax>
    </bndbox>
    <part>
      <name>head</name>
      <bndbox>
        <xmin>169</xmin>
        <ymin>104</ymin>
        <xmax>209</xmax>
        <ymax>146</ymax>
      </bndbox>
    </part>
    <part>
      <name>hand</name>
      <bndbox>
        <xmin>278</xmin>
        <ymin>210</ymin>
        <xmax>297</xmax>
        <ymax>233</ymax>
      </bndbox>
    </part>
    <part>
      <name>foot</name>
      <bndbox>
        <xmin>273</xmin>
        <ymin>333</ymin>
        <xmax>297</xmax>
        <ymax>354</ymax>
      </bndbox>
    </part>
    <part>
      <name>foot</name>
      <bndbox>
        <xmin>319</xmin>
        <ymin>307</ymin>
        <xmax>340</xmax>
        <ymax>326</ymax>
      </bndbox>
    </part>
  </object>
</annotation>
  • size: xml 파일과 대응되는 이미지의 width, height, depth(channel) 정보
    • width
    • height
    • depth
  • segmented:
  • object
    • name: 클래스 이름
    • pose: person의 경우만 사용됨
    • truncated: 0 = 전체 포함, 1 = 일부 포함
    • difficult: 0 = 인식하기 쉬움, 1 = 인식하기 어려움
    • bndbox
      • xmin: 좌측상단 x 좌표값
      • ymin: 좌측상단 y 좌표값
      • xmax: 우측하단 x 좌표값
      • ymax: 우측하단 y 좌표값
    • part: person의 경우에만 사용됨

'ML' 카테고리의 다른 글

COCO Dataset  (0) 2023.08.16
분류 모델의 성능평가지표 Accuracy, Recall, Precision, F1-score  (0) 2022.12.19
윈도우 버전 YOLO v3 설치  (0) 2021.08.16

Confusion Matrix

Prediction \ Ground-Truth True False
Positive True Positive
Correct Hit
False Positive
False Alarm
(Type I Error)
Negative False Negative
Missing Result
(Type II Error)
True Negative
Correct Rejection

Type I Error: False Alram

좋은 중고차로 예측하고 구매했지만 실제로 좋지 않은 차인 경우

 

Type II Error: Missing Result

암환자를 건강하다고 판별하는 경우

 

1. Accuracy (정확도)

전체 예측 건수에서 정답을 맞힌 건수의 비율

모든 분류 건수 중에 정답을 맞춘 비율 (True, False 정확히 분류)

 

Accuracy = (TP + TN) / (TP + TN + FP + FN)

                = (TP + TN) / (All Data)

 

Accuracy Paradox (정확도 역설):

희박한 확률의 결과를 예측할 때 (실제 데이터에 Negative 비율이 너무 높아서)

TP는 낮고 TN만 높아도 Accuracy 값은 높게 나온다. => Recall(재현율)로 평가

예) 폭설 예측, 결과가 TN만 나와도 Accuracy가 높다.

 

2. Recall (재현율)

object를 얼마나 잘 찾는가?

실제로 정답이 True인 것들(GT=True) 중 분류기가 TP로 예측한 비율

(True가 발생하는 확률이 적을 때 사용)

맞다고 분류해야 하는 건수 중에 제대로 분류한 비율

 

Recall = TP / (TP + FN) = TP / (All Ground-Truths)

단점: Accuracy와 반대로 TP만 맞추고 TN이 낮아도 Recall이 높게 나옴

 

3. Precision (정밀도)

찾은 object가 정확한가?

True로 예측(Positive)한 건수(TP + FP) 중 실제로 TP인 비율

분류기가 맞다고 예측한 건수 중에 실제로 맞은 건수 비율

 

Precision = TP / (TP + FP) = TP / (All Detections)

단점: Recall의 장점

 

Reference: https://darkpgmr.tistory.com/162

 

precision, recall의 이해

자신이 어떤 기술을 개발하였다. 예를 들어 이미지에서 사람을 자동으로 찾아주는 영상 인식 기술이라고 하자. 이 때, 사람들에게 "이 기술의 검출율은 99.99%입니다"라고 말하면 사람들은 "오우..

darkpgmr.tistory.com

Accuracy = (TP + TN) / (TP + TN + FP + FN) Precision = TP / (TP + FP)
정확도 정밀도
결과가 참값(TP + TN)에 얼마나 가까운지 얼마나 일관된 값을 출력하는지
시스템의 bias 반복 정밀도

예) 몸무게 50kg인 사람의 몸무게 측정, 60, 60.12, 59.99, ... 와 같이 60kg 근처 값으로 나오면

이 저울의 accuracy는 매우 낮지만 precision은 높다.

 

4. F1-score (조화평균)

 

F1-score = 2 x (Precision x Recall) / (Precision + Recall)

 

5. IoU (Intersection over Union)

 

IoU = TP / (TP + FP + FN)

       = Area(GT ∩ Prediction) / Area(GT ∪ Prediction)

보통 0.5 값을 기준으로 함

 

6. AP(Average Precision)

PR곡선(Precision, Recall)을 그렸을 때, 근사되는 넓이

PR Curve : Confidence 레벨에 대한 threshold 값의 변환에 대해 물체 검출기의 성능 평가

Reference:  https://bskyvision.com/465

 

물체 검출 알고리즘 성능 평가방법 AP(Average Precision)의 이해

물체 검출(object detection) 알고리즘의 성능은 precision-recall 곡선과 average precision(AP)로 평가하는 것이 대세다. 이에 대해서 이해하려고 한참을 구글링했지만 초보자가 이해하기에 적당한 문서는 찾

bskyvision.com

 

7. mAP(mean Average Precision)

class가 여러개인 경우 평균 AP

Reference: https://github.com/Cartucho/mAP

 

GitHub - Cartucho/mAP: mean Average Precision - This code evaluates the performance of your neural net for object recognition.

mean Average Precision - This code evaluates the performance of your neural net for object recognition. - GitHub - Cartucho/mAP: mean Average Precision - This code evaluates the performance of your...

github.com

 

 

 

 

 

 

 

 

'ML' 카테고리의 다른 글

COCO Dataset  (0) 2023.08.16
Pascal VOC(Visual Object Classes) Challenges  (0) 2023.08.15
윈도우 버전 YOLO v3 설치  (0) 2021.08.16
  • Visual Studio 2019
  • NVIDIA CUDA (cuda-toolkit-archive)
  • NVIDIA cuDNN (cudnn-archive)
  • OpenCV 2.4 이상 (4.1)

Darknet project

darknet github

 

GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Da

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet ) - GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object ...

github.com

Visual Studio 설정

darknet-master\build\darknet\darknet.vcxproj 파일 수정

    :
    <WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
    :
    <PlatformToolset>v142</PlatformToolset>
    :
  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
    <ImportGroup Label="ExtensionSettings">
    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.1.props" />
  </ImportGroup>
    :
    <AdditionalIncludeDirectories>$(OPENCV_DIR)\include;$(SolutionDir)..\..\include;$(SolutionDir)..\..\3rdparty\stb\include;$(SolutionDir)..\..\3rdparty\pthreads\include;$(CUDA_PATH_V10_1)\include;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
    :
    <AdditionalLibraryDirectories>$(OPENCV_DIR)\x64\vc15\lib;$(CUDA_PATH)\lib\$(PlatformName);$(CUDNN)\lib\x64;$(cudnn)\lib\x64;..\..\3rdparty\pthreads\lib;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
    :
    <PostBuildEvent>
      <Command>xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_ffmpeg???_64.dll" "$(OutDir)"
xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_world???.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cublas64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cudart64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cusolver64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\curand64_*.dll" "$(OutDir)"
</Command>
    </PostBuildEvent>
    :
  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
  <ImportGroup Label="ExtensionTargets">
    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.1.targets" />
  </ImportGroup>
    :
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="14.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <ItemGroup Label="ProjectConfigurations">
    <ProjectConfiguration Include="Debug|x64">
      <Configuration>Debug</Configuration>
      <Platform>x64</Platform>
    </ProjectConfiguration>
    <ProjectConfiguration Include="Release|x64">
      <Configuration>Release</Configuration>
      <Platform>x64</Platform>
    </ProjectConfiguration>
  </ItemGroup>
  <PropertyGroup Label="Globals">
    <ProjectGuid>{4CF5694F-12A5-4012-8D94-9A0915E9FEB5}</ProjectGuid>
    <RootNamespace>darknet</RootNamespace>
    <WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
  </PropertyGroup>
  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
    <ConfigurationType>Application</ConfigurationType>
    <UseDebugLibraries>true</UseDebugLibraries>
    <PlatformToolset>v142</PlatformToolset>
    <CharacterSet>MultiByte</CharacterSet>
  </PropertyGroup>
  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
    <ConfigurationType>Application</ConfigurationType>
    <UseDebugLibraries>false</UseDebugLibraries>
    <PlatformToolset>v142</PlatformToolset>
    <WholeProgramOptimization>true</WholeProgramOptimization>
    <CharacterSet>MultiByte</CharacterSet>
  </PropertyGroup>
  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
  <ImportGroup Label="ExtensionSettings">
    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.1.props" />
  </ImportGroup>
  <ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
  </ImportGroup>
  <ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
  </ImportGroup>
  <PropertyGroup Label="UserMacros" />
  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
    <OutDir>$(SolutionDir)$(Platform)\</OutDir>
  </PropertyGroup>
  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
    <OutDir>$(SolutionDir)$(Platform)\</OutDir>
  </PropertyGroup>
  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
    <ClCompile>
      <WarningLevel>Level3</WarningLevel>
      <Optimization>Disabled</Optimization>
      <SDLCheck>true</SDLCheck>
      <AdditionalIncludeDirectories>$(OPENCV_DIR)\include;$(SolutionDir)..\..\include;$(SolutionDir)..\..\3rdparty\stb\include;$(SolutionDir)..\..\3rdparty\pthreads\include;$(CUDA_PATH_V10_1)\include;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
      <PreprocessorDefinitions>CUDNN_HALF;CUDNN;_CRTDBG_MAP_ALLOC;_MBCS;_TIMESPEC_DEFINED;_SCL_SECURE_NO_WARNINGS;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;GPU;WIN32;DEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
      <UndefinePreprocessorDefinitions>OPENCV;</UndefinePreprocessorDefinitions>
      <MultiProcessorCompilation>true</MultiProcessorCompilation>
      <ForcedIncludeFiles>stdlib.h;crtdbg.h;%(ForcedIncludeFiles)</ForcedIncludeFiles>
    </ClCompile>
    <Link>
      <GenerateDebugInformation>true</GenerateDebugInformation>
      <AdditionalLibraryDirectories>$(OPENCV_DIR)\x64\vc15\lib;$(CUDA_PATH)\lib\$(PlatformName);$(CUDNN)\lib\x64;$(cudnn)\lib\x64;..\..\3rdparty\pthreads\lib;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
      <OutputFile>$(OutDir)\$(TargetName)$(TargetExt)</OutputFile>
      <AdditionalDependencies>pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;%(AdditionalDependencies)</AdditionalDependencies>
      <AssemblyDebug>true</AssemblyDebug>
    </Link>
    <CudaCompile>
      <CodeGeneration>compute_30,sm_30;compute_75,sm_75</CodeGeneration>
      <TargetMachinePlatform>64</TargetMachinePlatform>
    </CudaCompile>
    <PostBuildEvent>
      <Command>xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_ffmpeg???_64.dll" "$(OutDir)"
xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_world???.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cublas64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cudart64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cusolver64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\curand64_*.dll" "$(OutDir)"
</Command>
    </PostBuildEvent>
  </ItemDefinitionGroup>
  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
    <ClCompile>
      <WarningLevel>Level3</WarningLevel>
      <Optimization>MaxSpeed</Optimization>
      <FunctionLevelLinking>true</FunctionLevelLinking>
      <IntrinsicFunctions>true</IntrinsicFunctions>
      <SDLCheck>true</SDLCheck>
      <AdditionalIncludeDirectories>$(OPENCV_DIR)\include;$(SolutionDir)..\..\include;$(SolutionDir)..\..\3rdparty\stb\include;$(SolutionDir)..\..\3rdparty\pthreads\include;$(CUDA_PATH_V10_1)\include;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
      <PreprocessorDefinitions>OPENCV;CUDNN_HALF;CUDNN;_TIMESPEC_DEFINED;_SCL_SECURE_NO_WARNINGS;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;GPU;WIN32;_CONSOLE;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
      <CLanguageStandard>c11</CLanguageStandard>
      <CppLanguageStandard>c++1y</CppLanguageStandard>
      <PrecompiledHeaderCompileAs>CompileAsCpp</PrecompiledHeaderCompileAs>
      <CompileAs>Default</CompileAs>
      <UndefinePreprocessorDefinitions>NDEBUG</UndefinePreprocessorDefinitions>
      </AdditionalUsingDirectories>
    </ClCompile>
    <Link>
      <GenerateDebugInformation>true</GenerateDebugInformation>
      <EnableCOMDATFolding>true</EnableCOMDATFolding>
      <OptimizeReferences>true</OptimizeReferences>
      <AdditionalLibraryDirectories>$(OPENCV_DIR)\x64\vc15\lib;$(CUDA_PATH)\lib\$(PlatformName);$(CUDNN)\lib\x64;$(cudnn)\lib\x64;..\..\3rdparty\pthreads\lib;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
      <AdditionalDependencies>pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;%(AdditionalDependencies)</AdditionalDependencies>
      <OutputFile>$(OutDir)\$(TargetName)$(TargetExt)</OutputFile>
    </Link>
    <CudaCompile>
      <TargetMachinePlatform>64</TargetMachinePlatform>
      <CodeGeneration>compute_30,sm_30;compute_75,sm_75</CodeGeneration>
    </CudaCompile>
    <PostBuildEvent>
      <Command>xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_ffmpeg???_64.dll" "$(OutDir)"
xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_world???.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cublas64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cudart64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cusolver64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\curand64_*.dll" "$(OutDir)"
</Command>
    </PostBuildEvent>
  </ItemDefinitionGroup>
  <ItemGroup>
    <CudaCompile Include="..\..\src\activation_kernels.cu" />
    <CudaCompile Include="..\..\src\avgpool_layer_kernels.cu" />
    <CudaCompile Include="..\..\src\blas_kernels.cu" />
    <CudaCompile Include="..\..\src\col2im_kernels.cu" />
    <ClInclude Include="..\..\src\upsample_layer.h" />
    <ClInclude Include="..\..\src\utils.h" />
    <ClInclude Include="..\..\src\yolo_layer.h" />
  </ItemGroup>
  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
  <ImportGroup Label="ExtensionTargets">
    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 10.1.targets" />
  </ImportGroup>
</Project>

OpenCV, cuDNN, GPU architecture 설정

$(OPENCV_DIR)\include
$(SolutionDir)..\..\include
$(SolutionDir)..\..\3rdparty\stb\include
$(SolutionDir)..\..\3rdparty\pthreads\include
$(CUDA_PATH_V10_1)\include
$(OPENCV_DIR)\x64\vc15\lib
$(CUDA_PATH)\lib\$(PlatformName)
$(CUDNN)\lib\x64;$(cudnn)\lib\x64
$(SolutionDir)..\..\3rdparty\pthreads\lib;

관련 DLL 복사

xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_ffmpeg???_64.dll" "$(OutDir)"
xcopy /d /y "$(OpenCV_DIR)\x64\vc15\opencv_world???.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cublas64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cudart64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\cusolver64_*.dll" "$(OutDir)"
xcopy /d /y "$(CUDA_PATH_V10_1)\bin\curand64_*.dll" "$(OutDir)"

테스트 실행

미리 훈련된 weight 값 다운로드

...\build\darknet\x64>darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights ./data/dog.jpg
 CUDA-version: 10010 (10020), cuDNN: 7.6.1, CUDNN_HALF=1, GPU count: 1
 OpenCV version: 4.1.0
 compute_capability = 610, cudnn_half = 0
net.optimized_memory = 0
batch = 1, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   3 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF
   4 Shortcut Layer: 1,  wt = 0, wn = 0, outputs: 208 x 208 x  64 0.003 BF
   5 conv    128       3 x 3/ 2    208 x 208 x  64 ->  104 x 104 x 128 1.595 BF
   6 conv     64       1 x 1/ 1    104 x 104 x 128 ->  104 x 104 x  64 0.177 BF
   7 conv    128       3 x 3/ 1    104 x 104 x  64 ->  104 x 104 x 128 1.595 BF
   8 Shortcut Layer: 5,  wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
   9 conv     64       1 x 1/ 1    104 x 104 x 128 ->  104 x 104 x  64 0.177 BF
  10 conv    128       3 x 3/ 1    104 x 104 x  64 ->  104 x 104 x 128 1.595 BF
  11 Shortcut Layer: 8,  wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF
  12 conv    256       3 x 3/ 2    104 x 104 x 128 ->   52 x  52 x 256 1.595 BF
  13 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
  14 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
  15 Shortcut Layer: 12,  wt = 0, wn = 0, outputs:  52 x  52 x 256 0.001 BF
  16 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
  17 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
  18 Shortcut Layer: 15,  wt = 0, wn = 0, outputs:  52 x  52 x 256 0.001 BF
  19 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
  20 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
  21 Shortcut Layer: 18,  wt = 0, wn = 0, outputs:  52 x  52 x 256 0.001 BF
  22 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
  23 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
  24 Shortcut Layer: 21,  wt = 0, wn = 0, outputs:  52 x  52 x 256 0.001 BF
  25 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
  26 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
  27 Shortcut Layer: 24,  wt = 0, wn = 0, outputs:  52 x  52 x 256 0.001 BF
  28 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
  29 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
  30 Shortcut Layer: 27,  wt = 0, wn = 0, outputs:  52 x  52 x 256 0.001 BF
  31 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
  32 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
  33 Shortcut Layer: 30,  wt = 0, wn = 0, outputs:  52 x  52 x 256 0.001 BF
  34 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
  35 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
  36 Shortcut Layer: 33,  wt = 0, wn = 0, outputs:  52 x  52 x 256 0.001 BF
  37 conv    512       3 x 3/ 2     52 x  52 x 256 ->   26 x  26 x 512 1.595 BF
  38 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  39 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  40 Shortcut Layer: 37,  wt = 0, wn = 0, outputs:  26 x  26 x 512 0.000 BF
  41 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  42 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  43 Shortcut Layer: 40,  wt = 0, wn = 0, outputs:  26 x  26 x 512 0.000 BF
  44 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  45 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  46 Shortcut Layer: 43,  wt = 0, wn = 0, outputs:  26 x  26 x 512 0.000 BF
  47 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  48 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  49 Shortcut Layer: 46,  wt = 0, wn = 0, outputs:  26 x  26 x 512 0.000 BF
  50 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  51 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  52 Shortcut Layer: 49,  wt = 0, wn = 0, outputs:  26 x  26 x 512 0.000 BF
  53 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  54 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  55 Shortcut Layer: 52,  wt = 0, wn = 0, outputs:  26 x  26 x 512 0.000 BF
  56 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  57 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  58 Shortcut Layer: 55,  wt = 0, wn = 0, outputs:  26 x  26 x 512 0.000 BF
  59 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  60 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  61 Shortcut Layer: 58,  wt = 0, wn = 0, outputs:  26 x  26 x 512 0.000 BF
  62 conv   1024       3 x 3/ 2     26 x  26 x 512 ->   13 x  13 x1024 1.595 BF
  63 conv    512       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 512 0.177 BF
  64 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  65 Shortcut Layer: 62,  wt = 0, wn = 0, outputs:  13 x  13 x1024 0.000 BF
  66 conv    512       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 512 0.177 BF
  67 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  68 Shortcut Layer: 65,  wt = 0, wn = 0, outputs:  13 x  13 x1024 0.000 BF
  69 conv    512       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 512 0.177 BF
  70 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  71 Shortcut Layer: 68,  wt = 0, wn = 0, outputs:  13 x  13 x1024 0.000 BF
  72 conv    512       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 512 0.177 BF
  73 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  74 Shortcut Layer: 71,  wt = 0, wn = 0, outputs:  13 x  13 x1024 0.000 BF
  75 conv    512       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 512 0.177 BF
  76 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  77 conv    512       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 512 0.177 BF
  78 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  79 conv    512       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 512 0.177 BF
  80 conv   1024       3 x 3/ 1     13 x  13 x 512 ->   13 x  13 x1024 1.595 BF
  81 conv    255       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 255 0.088 BF
  82 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
  83 route  79                                     ->   13 x  13 x 512
  84 conv    256       1 x 1/ 1     13 x  13 x 512 ->   13 x  13 x 256 0.044 BF
  85 upsample                 2x    13 x  13 x 256 ->   26 x  26 x 256
  86 route  85 61                                  ->   26 x  26 x 768
  87 conv    256       1 x 1/ 1     26 x  26 x 768 ->   26 x  26 x 256 0.266 BF
  88 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  89 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  90 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  91 conv    256       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 256 0.177 BF
  92 conv    512       3 x 3/ 1     26 x  26 x 256 ->   26 x  26 x 512 1.595 BF
  93 conv    255       1 x 1/ 1     26 x  26 x 512 ->   26 x  26 x 255 0.177 BF
  94 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
  95 route  91                                     ->   26 x  26 x 256
  96 conv    128       1 x 1/ 1     26 x  26 x 256 ->   26 x  26 x 128 0.044 BF
  97 upsample                 2x    26 x  26 x 128 ->   52 x  52 x 128
  98 route  97 36                                  ->   52 x  52 x 384
  99 conv    128       1 x 1/ 1     52 x  52 x 384 ->   52 x  52 x 128 0.266 BF
 100 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
 101 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
 102 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
 103 conv    128       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 128 0.177 BF
 104 conv    256       3 x 3/ 1     52 x  52 x 128 ->   52 x  52 x 256 1.595 BF
 105 conv    255       1 x 1/ 1     52 x  52 x 256 ->   52 x  52 x 255 0.353 BF
 106 yolo
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 65.879
avg_outputs = 532444
 Allocate additional workspace_size = 52.43 MB
Loading weights from ./yolov3.weights...
 seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 57.593000 milli-seconds.
bicycle: 99%
dog: 100%
truck: 94%

 

Docker 사용하기

$ docker pull votiethuy/tensorflow1.12-base
$ docker run -it --rm -v /c/TensorFlow/DW2TF:/dw2tf votiethuy/tensorflow1.12-base
# python main.py --cfg 'data/yolov3.cfg' --weights 'data/yolov3.weights' --output 'data/'

 

'ML' 카테고리의 다른 글

COCO Dataset  (0) 2023.08.16
Pascal VOC(Visual Object Classes) Challenges  (0) 2023.08.15
분류 모델의 성능평가지표 Accuracy, Recall, Precision, F1-score  (0) 2022.12.19

+ Recent posts