使用通配符匹配删除S3中的对象。

huangapple go评论106阅读模式
英文:

Delete objects in s3 using wildcard matching

问题

我有以下可用的代码来从Amazon S3中删除对象:

params := &s3.DeleteObjectInput{
    Bucket: aws.String("Bucketname"),
    Key:    aws.String("ObjectKey"),
}
s3Conn.DeleteObjects(params)

但是我想要做的是使用通配符**删除文件夹下的所有文件。我知道Amazon S3不会将"x/y/file.jpg"视为x内的文件夹y,但我想要实现的是通过提及"x/y*"来删除具有相同前缀的所有后续对象。尝试过Amazon多对象删除

params := &s3.DeleteObjectsInput{
    Bucket: aws.String("BucketName"),
    Delete: &s3.Delete{
        Objects: []*s3.ObjectIdentifier{
            {
                Key: aws.String("x/y/.*"),
            },
        },
    },
}
result, err := s3Conn.DeleteObjects(params)

我知道在PHP中可以很容易地通过s3->delete_all_objects来实现,就像这个答案中所述。在GO语言中是否可以实现相同的操作。

英文:

I have the following working code to delete an object from Amazon s3

params := &s3.DeleteObjectInput{
		Bucket: aws.String("Bucketname"),
		Key : aws.String("ObjectKey"),
	}
s3Conn.DeleteObjects(params)

But what i want to do is to delete all files under a folder using wildcard *. I know amazon s3 doesn't treat "x/y/file.jpg" as a folder y inside x but what i want to achieve is by mentioning "x/y" delete all the subsequent objects having the same prefix. Tried amazon multi object delete

params := &s3.DeleteObjectsInput{
		Bucket: aws.String("BucketName"),
		Delete: &s3.Delete{
			Objects: []*s3.ObjectIdentifier {
				{
					Key : aws.String("x/y/.*"), 
				},
			},
		},
	}
	result , err := s3Conn.DeleteObjects(params)

I know in php it can be done easily by s3->delete_all_objects as per this answer. Is the same action possible in GOlang.

答案1

得分: 3

很遗憾,goamz包没有类似于PHP库中的delete_all_objects方法。

然而,PHP的delete_all_objects的源代码可以在这里找到(切换到源代码视图):http://docs.aws.amazon.com/AWSSDKforPHP/latest/#m=AmazonS3/delete_all_objects

以下是关键的代码行:

public function delete_all_objects($bucket, $pcre = self::PCRE_ALL)
{
    // 收集所有匹配项
    $list = $this->get_object_list($bucket, array('pcre' => $pcre));
 
    // 只要我们至少有一个匹配项...
    if (count($list) > 0)
    {
        $objects = array();
 
        foreach ($list as $object)
        {
            $objects[] = array('key' => $object);
        }
 
        $batch = new CFBatchRequest();
        $batch->use_credentials($this->credentials);
 
        foreach (array_chunk($objects, 1000) as $object_set)
        {
            $this->batch($batch)->delete_objects($bucket, array(
                'objects' => $object_set
            ));
        }
 
        $responses = $this->batch($batch)->send();
    }
}

如你所见,PHP代码实际上会在存储桶上发起一个HTTP请求,首先获取与PCRE_ALL匹配的所有文件,PCRE_ALL在其他地方被定义为const PCRE_ALL = '/.*/i';

你一次只能删除1000个文件,所以delete_all_objects会创建一个批处理函数,每次删除1000个文件。

你需要在你的go程序中创建与PHP库相同的功能,因为goamz包目前不支持此功能。幸运的是,这应该只需要几行代码,并且你有PHP库的指南。

完成后,向goamz包提交一个拉取请求可能是值得的!

英文:

Unfortunately the goamz package doesn't have a method similar to the PHP library's delete_all_objects.

However, the source code for the PHP delete_all_objects is available here (toggle source view): http://docs.aws.amazon.com/AWSSDKforPHP/latest/#m=AmazonS3/delete_all_objects

Here are the important lines of code:

public function delete_all_objects($bucket, $pcre = self::PCRE_ALL)
{
// Collect all matches
    $list = $this->get_object_list($bucket, array('pcre' => $pcre));
 
    // As long as we have at least one match...
    if (count($list) > 0)
    {
        $objects = array();
 
        foreach ($list as $object)
        {
            $objects[] = array('key' => $object);
        }
 
        $batch = new CFBatchRequest();
        $batch->use_credentials($this->credentials);
 
        foreach (array_chunk($objects, 1000) as $object_set)
        {
            $this->batch($batch)->delete_objects($bucket, array(
                'objects' => $object_set
            ));
        }
 
        $responses = $this->batch($batch)->send();

As you can see, the PHP code will actually make an HTTP request on the bucket to first get all files matching PCRE_ALL, which is defined elsewhere as const PCRE_ALL = '/.*/i';.

You can only delete 1000 files at once, so delete_all_objects then creates a batch function to delete 1000 files at a time.

You have to create the same functionality in your go program as the goamz package doesn't support this yet. Luckily it should only be a few lines of code, and you have a guide from the PHP library.

It might be worth submitting a pull request for the goamz package once you're done!

答案2

得分: 1

使用mc工具,您可以执行以下操作:

mc rm -r --force https://BucketName.s3.amazonaws.com/x/y

它将删除所有具有前缀"x/y"的对象

您可以使用minio-go在Go中实现相同的功能,如下所示:

package main

import (
	"log"

	"github.com/minio/minio-go"
)

func main() {
	config := minio.Config{
		AccessKeyID:     "YOUR-ACCESS-KEY-HERE",
		SecretAccessKey: "YOUR-PASSWORD-HERE",
		Endpoint:        "https://s3.amazonaws.com",
	}
    // 在此处找到您的S3端点 http://docs.aws.amazon.com/general/latest/gr/rande.html

	s3Client, err := minio.New(config)
	if err != nil {
		log.Fatalln(err)
	}
    isRecursive := true
	for object := range s3Client.ListObjects("BucketName", "x/y", isRecursive) {
		if object.Err != nil {
			log.Fatalln(object.Err)
		}
		err := s3Client.RemoveObject("BucketName", object.Key)
        if err != nil {
	        log.Fatalln(err)
            continue
        }
        log.Println("Removed : " + object.Key)
	}
}
英文:

Using the mc tool you can do:

mc rm -r --force https://BucketName.s3.amazonaws.com/x/y

it will delete all the objects with the prefix "x/y"

You can achieve the same with Go using minio-go like this:

package main

import (
	"log"

	"github.com/minio/minio-go"
)

func main() {
	config := minio.Config{
		AccessKeyID:     "YOUR-ACCESS-KEY-HERE",
		SecretAccessKey: "YOUR-PASSWORD-HERE",
		Endpoint:        "https://s3.amazonaws.com",
	}
    // find Your S3 endpoint here http://docs.aws.amazon.com/general/latest/gr/rande.html

	s3Client, err := minio.New(config)
	if err != nil {
		log.Fatalln(err)
	}
    isRecursive := true
	for object := range s3Client.ListObjects("BucketName", "x/y", isRecursive) {
		if object.Err != nil {
			log.Fatalln(object.Err)
		}
		err := s3Client.RemoveObject("BucketName", object.Key)
        if err != nil {
	        log.Fatalln(err)
            continue
        }
        log.Println("Removed : " + object.Key)
	}
}

答案3

得分: 1

自从提出这个问题以来,AWS GoLang库的S3 Manager已经添加了一些新的方法来处理这个任务(响应@Itachi的pr)。

请参阅Github记录:https://github.com/aws/aws-sdk-go/issues/448#issuecomment-309078450

这是他们在v1中的示例:https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/go/s3/DeleteObjects/DeleteObjects.go#L36

要在存储桶内的路径上实现“通配符匹配”,请将Prefix参数添加到示例的ListObjectsInput调用中,如下所示:

    iter := s3manager.NewDeleteListIterator(svc, &s3.ListObjectsInput{
        Bucket: bucket,
        Prefix: aws.String("somePathString"),
    })
英文:

Since this question was asked, the AWS GoLang lib for S3 has received some new methods in S3 Manager to handle this task (in response to @Itachi's pr).

See Github record: https://github.com/aws/aws-sdk-go/issues/448#issuecomment-309078450

Here is their example in v1: https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/go/s3/DeleteObjects/DeleteObjects.go#L36

To get "wildcard matching" on paths inside the bucket, add the Prefix param to the example's ListObjectsInput call, as shown here:

    iter := s3manager.NewDeleteListIterator(svc, &s3.ListObjectsInput{
        Bucket: bucket,
        Prefix: aws.String("somePathString"),
    })

答案4

得分: 0

有点晚了,但是由于我遇到了同样的问题,所以我创建了一个小的包,你可以将其复制到你的代码库中,并根据需要进行导入。

func ListKeysInPrefix(s s3iface.S3API, bucket, prefix string) ([]string, error) {
    res, err := s.Client.ListObjectsV2(&s3.ListObjectsV2Input{
        Bucket: aws.String(bucket),
        Prefix: aws.String(prefix),
    })
    if err != nil {
        return []string{}, err
    }

    var keys []string
    for _, key := range res.Contents {
        keys = append(keys, *key.Key)
    }
    return keys, nil
}

func createDeleteObjectsInput(keys []string) *s3.Delete {
    rm := []*s3.ObjectIdentifier{}
    for _, key := range keys {
        rm = append(rm, &s3.ObjectIdentifier{Key: aws.String(key)})
    }
    return &s3.Delete{Objects: rm, Quiet: aws.Bool(false)}
}

func DeletePrefix(s s3iface.S3API, bucket, prefix string) error {
    keys, err := s.ListKeysInPrefix(bucket, prefix)
    if err != nil {
        panic(err)
    }

    _, err = s.Client.DeleteObjects(&s3.DeleteObjectsInput{
        Bucket: aws.String(bucket),
        Delete: s.createDeleteObjectsInput(keys),
    })

    if err != nil {
        return err
    }
    return nil
}

所以,如果你有一个名为"somebucket"的存储桶,并且具有以下结构:s3://somebucket/foo/some-prefixed-folder/bar/test.txt,并且想要从"some-prefixed-folder"开始删除,使用方法如下:

func main() {
    // 在这里创建你的S3客户端
    // client := ....
    err := DeletePrefix(client, "somebucket", "some-prefixed-folder")
    if err != nil {
        panic(err)
    }
}

由于ListObjectsV2的实现方式,此实现只允许从给定前缀删除最多1000个条目,但它是分页的,所以只需添加功能以保持刷新结果,直到结果小于1000即可。

英文:

A bit late in the game, but since I was having the same problem, I created a small pkg that you can copy to your code base and import as needed.

func ListKeysInPrefix(s s3iface.S3API, bucket, prefix string) ([]string, error) {
	res, err := s.Client.ListObjectsV2(&s3.ListObjectsV2Input{
		Bucket: aws.String(bucket),
		Prefix: aws.String(prefix),
	})
	if err != nil {
		return []string{}, err
	}

	var keys []string
	for _, key := range res.Contents {
		keys = append(keys, *key.Key)
	}
	return keys, nil
}

func createDeleteObjectsInput(keys []string) *s3.Delete {
	rm := []*s3.ObjectIdentifier{}
	for _, key := range keys {
		rm = append(rm, &s3.ObjectIdentifier{Key: aws.String(key)})
	}
	return &s3.Delete{Objects: rm, Quiet: aws.Bool(false)}
}

func DeletePrefix(s s3iface.S3API, bucket, prefix string) error {
	keys, err := s.ListKeysInPrefix(bucket, prefix)
	if err != nil {
		panic(err)
	}

	_, err = s.Client.DeleteObjects(&s3.DeleteObjectsInput{
		Bucket: aws.String(bucket),
		Delete: s.createDeleteObjectsInput(keys),
	})

	if err != nil {
		return err
	}
	return nil
}

So, in the case you have a bucket called "somebucket" with the following structure: s3://somebucket/foo/some-prefixed-folder/bar/test.txt and wanted to delete from some-prefixed-folder onwards, usage would be:

func main() {
    // create your s3 client here
    // client := ....
    err := DeletePrefix(client, "somebucket", "some-prefixed-folder")
    if err != nil {
        panic(err)
    }
}

This implementation only allows to delete a maximum of 1000 entries from the given prefix due ListObjectsV2 implementation - but it is paginated, so it's a matter of adding the functionality to keep refreshing results until results are < 1000.

答案5

得分: 0

我能够使用通配符从CLI中删除S3存储桶中的对象。

aws s3 rm s3://<xyz bucket name>/2023/ --recursive --exclude '*' --include 'A*.csv'
英文:

I was able to delete objects in S3 bucket using wildcard from CLI

aws s3 rm s3://&lt;xyz bucket name&gt;/2023/ --recursive --exclude &#39;*&#39; --include &#39;A*.csv&#39; 

huangapple
  • 本文由 发表于 2015年11月20日 02:07:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/33811270.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定