利用python读写json文件 – 个人点滴积累

今天在使用COCO数据集的时候遇到了一个问题.其中的标注格式是json文件,而我所下载的图像并不全.这样程序在根据标注文件寻找图像时,python会产生no such file的错误.

晚上通过了精简标注的方式解决了这个错误.也就是说,把多余的标注给删掉,只留下了可用的图片对应的标注.操作的具体过程如下,今天记录一下,已备后续查询.

1. 读取json文件

import json
with open('train_in.json','r',encoding='utf-8') as fp:
 jd = json.load(fp)
for name in jd:
 print(name)

可以看到json是一个有5个元素的字典,这5个元素的名称分别为

info
images
licenses
annotation
categories 其中categories存储了80个目录的id和name, info存储了本数据集的描述,licenses存储了各图像的版权来源 .而images和annotation存储了图像的路径和标注.

2. image和annotations属性

print(len(jd['images']))
print(len(jd['annotations']))

可知前者是一个长度为82,783的list,而后者是一个长度为604,907的list.

print(jd['images'][-1])
print(jd['annotations'][-1])

image的属性

可知image有如下8个属性

license
file_name
coco_url
height
width
data_captured
flickr_url
id

其中最关键的两个属性是file_name和id,前者存储了图像的文件名,而在读取annotation的时候,需要用到.

annotations的属性

同样,可以查看到annotations有如下7个属性

segmentation
area
iscrowd
image_id
bbox
category_id
id

其中前三个属性是图像分割的标注,而目标检测只需要用到后面4个属性即可.

3. 简化json

根据已经下载的图像目录,如果有一张图未曾下载,那么就在images和annotations中删除掉相应的内容.

相应代码如下:

'''
simplify the json file: 
clear attribution images and annotations
if image is not downlaod
'''

import os 
import json 
import time 

def simplify(json_in='train_in.json', json_out='train_out.json', image_root='../train2014'):
    ############## 1.  read ##################
    print(time.strftime("%H:%M:%S", time.localtime()),' simplify', json_in, ', start reading...') 
    with open(json_in,'r',encoding='utf-8') as fp:
        jd = json.load(fp) 
    ############## 1. -read ##################
    ############## 2.  analysis ##############
    # dict to be save
    print(time.strftime("%H:%M:%S", time.localtime()),' json file load over, start analying...') 
    rv = {}
    rv['info'] = jd['info']
    rv['licenses'] = jd['licenses']
    rv['categories'] = jd['categories']
    rv['images'] = []
    rv['annotations'] = []
    # valid image id 
    vd = set() 
    for image in jd['images']:
        img_name = os.path.join(image_root,image['file_name'])
        if os.path.exists(img_name):
            vd.add(image['id'])
            rv['images'].append(image)
    # valid annotation
    for label in jd['annotations']:
        if label['image_id'] in vd:
            rv['annotations'].append(label) 
    ############## 2. -analysis ##############
    ############## 3.  save ##################
    print(time.strftime("%H:%M:%S", time.localtime()), ' exit analying, start saving...') 
    # save file 
    with open(json_out,'w') as fp:
        json.dump(rv, fp) 
    print(time.strftime("%H:%M:%S", time.localtime()), json_out, ' already saved,  Done!') 
    ############## 3. -save ##################

def main():
    # simplify train json file 
    simplify(json_in='train_in.json', json_out='train_out.json', image_root='../train2014')
    simplify(json_in='val_in.json', json_out='val_out.json', image_root='../val2014')

def test():
    simplify()

if __name__ == "__main__":
    test()

1. 读取json文件

2. image和annotations属性

image的属性

annotations的属性

3. 简化json

“利用python读写json文件”上的1条回复

向minghuiw进行回复 取消回复

向minghuiw进行回复取消回复