How to process response body of workspace docs export in HTML (mimetype application/zip) from Batch Request of Google Drive API

huangapple go评论60阅读模式
英文:

How to process response body of workspace docs export in HTML (mimetype application/zip) from Batch Request of Google Drive API

问题

我可以处理将100个Google Workspace文档导出到我的服务器的纯文本,对于每个Google Drive批处理请求。我已经按照Kanshi Tanaike的出色示例"使用Google Apps Script的批处理请求进行高效文件管理" 进行操作。

但是,对于Google文档的HTML导出(而不是纯文本),我不知道如何处理响应正文。(顺便说一句,HTML导出的MIME类型为application/zip)。

我希望有人可以提供一些关于如何处理响应正文的基本信息。或者在Google Apps Script或Elixir中提供一个示例,我可以跟随操作?

我尝试使用批处理标记将响应正文拆分成100个请求(就像我在纯文本示例中所做的那样)。我剩下的可能是一个或100个zip文件,但我尝试解压它们的一切都失败了。我认为我在错误地拆分响应正文。我不是一名有经验的程序员,也没有处理zip文件的经验。我甚至尝试使用解压工具打开拆分的响应正文,但没有成功。

请注意,我可以处理单个Workspace文档的HTML导出的响应正文,不在批处理请求内,使用:

url = "https://www.googleapis.com/drive/v3/files/#{doc_id}/export?mimeType=application/zip"

这是因为响应非常明显是一组元组,要么是{item_name,text_string}(用于HTML部分或文档,可以直接处理),要么是{item_name,byte_sequence}(例如,文档内的图像)。

实际上,我目前只对HTML部分感兴趣,而不是图像,即使在批量导出中(是否有一种方式只在批处理请求中导出HTML?)

英文:

I am able to handle the export, in Plain Text, of 100 Google Workspace docs to my server, for each Google Drive Batch Request. I have followed Kanshi Tanaike’s excellent examples "Efficient File Management using Batch Requests with Google Apps Script"

However, for Google docs export in HTML (rather than plain text), I do not know how to process the response body. (BTW, the mime type is application/zip for HTML export).



I am hoping someone can provide some basic information about how to process the response body. Or perhaps an example in Google Apps Script or Elixir which I could then follow?

I have tried splitting the response body using the batch marker into the 100 requests (as I do with the plain text example). I am left with what what might be one or 100 zip files but I everything I have tried to unzip them has given an error. I presume that I am splitting the response.body incorrectly. I am not an experienced programmer and have no experience in working with zip files. I even tried opening split response.body with unzip utilities without success.

Note that I can handle the response.body of the export of a single workspace doc, in HTML, not within a Batch request, using:

url = "https://www.googleapis.com/drive/v3/files/#{doc_id}/export?mimeType=application/zip"

This is because the response is very clearly a set of tuples of either {item_name, text_string) (for the HTML part or the document, which can be processed directly) or {item_name, byte_sequence} (for, eg, images within the doc).

Actually, I am at the moment only interested in the HTML part rather than images, even in the batch export (is there a way to only export the HTML in a batch request? )

答案1

得分: 3

I believe your goal is as follows.

  • You want to export multiple Google Documents as application/zip with the batch requests using Google Apps Script.

Modification points:

  • In the current stage, the response value from the batch request is as follows.

      --batch_###
      Content-Type: application/http
      Content-ID: response-1
    
      HTTP/1.1 200 OK
      Content-Disposition: attachment
      Content-Type: application/zip
      Date: ###
      Expires: ###
      Cache-Control: private, max-age=0
      Content-Length: 1000
    
      ### data ###
      --batch_###--
    
    • I think that ### data ### is the zip file as the binary data. In this case, when the value is retrieved with res.getContentText(), the response value is converted to the string value. By this, even when the binary data is retrieved, the retrieved data is broken. I think that this is the reason for your current issue of but I everything I have tried to unzip them has given an error..
  • In order to correctly decode the retrieved data, in this case, it is required to process the response data with the binary level. In this case, it is required to process the data with the byte array.

In this answer, I would like to propose a simple sample script for decoding the response data from the batch request (In this case, Google Document files are exported as application/zip.).

Sample script:

Please copy and paste the following script to the script editor of the Google Apps Script project, and please set your folder ID and document IDs.

And, please enable Drive API at Advanced Google services.

/**
 * Ref: https://tanaikech.github.io/2023/03/08/split-binary-data-with-search-data-using-google-apps-script/
 * Split byteArray by a search data.
 * @param {Array} baseData Input byteArray of base data.
 * @param {Array} searchData Input byteArray of search data using split.
 * @return {Array} An array including byteArray.
 */
function splitByteArrayBySearchData_(baseData, searchData) {
  if (!Array.isArray(baseData) || !Array.isArray(searchData)) {
    throw new Error("Please give byte array.");
  }
  const search = searchData.join("");
  const bLen = searchData.length;
  const res = [];
  let idx = 0;
  do {
    idx = baseData.findIndex((_, i, a) => [...Array(bLen)].map((_, j) => a[j + i]).join("") == search);
    if (idx != -1) {
      res.push(baseData.splice(0, idx));
      baseData.splice(0, bLen);
    } else {
      res.push(baseData.splice(0));
    }
  } while (idx != -1);
  return res;
}

/**
 * Ref: https://cloud.google.com/blog/topics/developers-practitioners/efficient-file-management-using-batch-requests-google-apps-script
 * Create a request body of batch requests and request it.
 * 
 * @param {Object} object Object for creating request body of batch requests.
 * @returns {Object} UrlFetchApp.HTTPResponse
 */
function batchRequests_(object) {
  const { batchPath, requests } = object;
  const boundary = "sampleBoundary12345";
  const lb = "\r\n";
  const payload = requests.reduce((r, e, i, a) => {
    r += `Content-Type: application/http${lb}`;
    r += `Content-ID: ${i + 1}${lb}${lb}`;
    r += `${e.method} ${e.endpoint}${lb}`;
    r += e.requestBody ? `Content-Type: application/json; charset=utf-8" ${lb}${lb}` : lb;
    r += e.requestBody ? `${JSON.stringify(e.requestBody)}${lb}` : "";
    r += `--${boundary}${i == a.length - 1 ? "--" : ""}${lb}`;
    return r;
  }, `--${boundary}${lb}`);
  const params = {
    muteHttpExceptions: true,
    method: "post",
    contentType: `multipart/mixed; boundary=${boundary}`,
    headers: { Authorization: "Bearer " + ScriptApp.getOAuthToken() },
    payload,
  };
  return UrlFetchApp.fetch(`https://www.googleapis.com/${batchPath}`, params);
}

// Please run this function.
function main() {
  const folderId = "###"; // Please set folder ID you want to put the files.
  // Please set your document Ids.
  const documentIds = [
    "### Document ID1 ###",
    "### Document ID2 ###",
    "### Document ID3 ###",
    ,
    ,
    ,
  ];

  // Run batch requests.
  const requests = documentIds.map((id) => ({
    method: "GET",
    endpoint: `https://www.googleapis.com/drive/v3/files/${id}/export?mimeType=application/zip`,
  }));
  const object = { batchPath: "batch/drive/v3", requests };
  const res = batchRequests_(object);
  if (res.getResponseCode() != 200) {
    throw an Error(res.getContentText());
  }

  // Parse data as binary data, and create the data as Blob.
  const check = res.getContentText().match(/--batch.*/);
  if (!check) {
    throw new Error("Valid response value is not returned.");
  }
  const search = check[0];
  const baseData = res.getContent();
  const searchData = Utilities.newBlob(search).getBytes();
  const res1 = splitByteArrayBySearchData_(baseData, searchData);
  res1.shift();
  res1.pop();
  const blobs = res1.map((e, i) => {
    const rrr = splitByteArrayBySearchData_(e, [13, 10, 13, 10]);
    const data = rrr.pop();
    const metadata = Utilities.newBlob(rrr.flat()).getDataAsString();
    const dataSize = Number(metadata.match(/Content-Length:(.*)/)[1]);
    return Utilities.newBlob(data.splice(0, dataSize)).setName(`sampleName${i + 1}.zip`);
  });

  // Create blobs as the files in Google Drive.
  const folder = DriveApp.getFolderById(folderId);
  blobs.forEach(b => {
    if (b) {
      console.log({ filename: b.getName(), fileSize: b.getBytes().length })
      folder.createFile(b);
    }
  });
}
  • When this script is run, the zip files including the HTML data converted from Google Documents are created in the folder. And, the sample filenames are sampleName1.zip, sampleName2.zip, sampleName3.zip,,,.

Note:

  • IMPORTANT: I'm not sure whether this method can be used for 100 batch requests. Because, when the response size is more than 50 MB, an error might occur. I'm worried about this. So, when you test this script, please test the script using a small number of sample Google Documents.

  • I noticed I am at the moment only interested in the HTML part rather than images just now. As another approach, when mimeType=application/zip is changed to `mimeType=text/html

英文:

I believe your goal is as follows.

  • You want to export multiple Google Documents as application/zip with the batch requests using Google Apps Script.

Modification points:

  • In the current stage, the response value from the batch request is as follows.

      --batch_###
      Content-Type: application/http
      Content-ID: response-1
    
      HTTP/1.1 200 OK
      Content-Disposition: attachment
      Content-Type: application/zip
      Date: ###
      Expires: ###
      Cache-Control: private, max-age=0
      Content-Length: 1000
    
      ### data ###
      --batch_###--
    
    • I think that ### data ### is the zip file as the binary data. In this case, when the value is retrieved with res.getContentText(), the response value is converted to the string value. By this, even when the binary data is retrieved, the retrieved data is broken. I think that this is the reason for your current issue of but I everything I have tried to unzip them has given an error..
  • In order to correctly decode the retrieved data, in this case, it is required to process the response data with the binary level. In this case, it is required to process the data with the byte array.

In this answer, I would like to propose a simple sample script for decoding the response data from the batch request (In this case, Google Document files are exported as application/zip.).

Sample script:

Please copy and paste the following script to the script editor of the Google Apps Script project, and please set your folder ID and document IDs.

And, please enable Drive API at Advanced Google services.

/**
 * Ref: https://tanaikech.github.io/2023/03/08/split-binary-data-with-search-data-using-google-apps-script/
 * Split byteArray by a search data.
 * @param {Array} baseData Input byteArray of base data.
 * @param {Array} searchData Input byteArray of search data using split.
 * @return {Array} An array including byteArray.
 */
function splitByteArrayBySearchData_(baseData, searchData) {
  if (!Array.isArray(baseData) || !Array.isArray(searchData)) {
    throw new Error("Please give byte array.");
  }
  const search = searchData.join("");
  const bLen = searchData.length;
  const res = [];
  let idx = 0;
  do {
    idx = baseData.findIndex((_, i, a) => [...Array(bLen)].map((_, j) => a[j + i]).join("") == search);
    if (idx != -1) {
      res.push(baseData.splice(0, idx));
      baseData.splice(0, bLen);
    } else {
      res.push(baseData.splice(0));
    }
  } while (idx != -1);
  return res;
}

/**
 * Ref: https://cloud.google.com/blog/topics/developers-practitioners/efficient-file-management-using-batch-requests-google-apps-script
 * Create a request body of batch requests and request it.
 * 
 * @param {Object} object Object for creating request body of batch requests.
 * @returns {Object} UrlFetchApp.HTTPResponse
 */
function batchRequests_(object) {
  const { batchPath, requests } = object;
  const boundary = "sampleBoundary12345";
  const lb = "\r\n";
  const payload = requests.reduce((r, e, i, a) => {
    r += `Content-Type: application/http${lb}`;
    r += `Content-ID: ${i + 1}${lb}${lb}`;
    r += `${e.method} ${e.endpoint}${lb}`;
    r += e.requestBody ? `Content-Type: application/json; charset=utf-8" ${lb}${lb}` : lb;
    r += e.requestBody ? `${JSON.stringify(e.requestBody)}${lb}` : "";
    r += `--${boundary}${i == a.length - 1 ? "--" : ""}${lb}`;
    return r;
  }, `--${boundary}${lb}`);
  const params = {
    muteHttpExceptions: true,
    method: "post",
    contentType: `multipart/mixed; boundary=${boundary}`,
    headers: { Authorization: "Bearer " + ScriptApp.getOAuthToken() },
    payload,
  };
  return UrlFetchApp.fetch(`https://www.googleapis.com/${batchPath}`, params);
}

// Please run this function.
function main() {
  const folderId = "###"; // Please set folder ID you want to put the files.
  // Please set your document Ids.
  const documentIds = [
    "### Document ID1 ###",
    "### Document ID2 ###",
    "### Document ID3 ###",
    ,
    ,
    ,
  ];

  // Run batch requests.
  const requests = documentIds.map((id) => ({
    method: "GET",
    endpoint: `https://www.googleapis.com/drive/v3/files/${id}/export?mimeType=application/zip`,
  }));
  const object = { batchPath: "batch/drive/v3", requests };
  const res = batchRequests_(object);
  if (res.getResponseCode() != 200) {
    throw new Error(res.getContentText());
  }

  // Parse data as binary data, and create the data as Blob.
  const check = res.getContentText().match(/--batch.*/);
  if (!check) {
    throw new Error("Valid response value is not returned.");
  }
  const search = check[0];
  const baseData = res.getContent();
  const searchData = Utilities.newBlob(search).getBytes();
  const res1 = splitByteArrayBySearchData_(baseData, searchData);
  res1.shift();
  res1.pop();
  const blobs = res1.map((e, i) => {
    const rrr = splitByteArrayBySearchData_(e, [13, 10, 13, 10]);
    const data = rrr.pop();
    const metadata = Utilities.newBlob(rrr.flat()).getDataAsString();
    const dataSize = Number(metadata.match(/Content-Length:(.*)/)[1]);
    return Utilities.newBlob(data.splice(0, dataSize)).setName(`sampleName${i + 1}.zip`);
  });

  // Create blobs as the files in Google Drive.
  const folder = DriveApp.getFolderById(folderId);
  blobs.forEach(b => {
    if (b) {
      console.log({ filename: b.getName(), fileSize: b.getBytes().length })
      folder.createFile(b);
    }
  });
}
  • When this script is run, the zip files including the HTML data converted from Google Documents are created in the folder. And, the sample filenames are sampleName1.zip, sampleName2.zip, sampleName3.zip,,,.

Note:

  • IMPORTANT: I'm not sure whether this method can be used for 100 batch requests. Because, when the response size is more than 50 MB, an error might occur. I'm worried about this. So, when you test this script, please test the script using a small number of sample Google Documents.

  • I noticed I am at the moment only interested in the HTML part rather than images just now. As another approach, when mimeType=application/zip is changed to mimeType=text/html, it seems that only HTML data is included in the response value as the string. In this case, the response data can be parsed as a string.

Reference:

huangapple
  • 本文由 发表于 2023年3月7日 19:28:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75661391.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定