Bug 254021 - Compression Streams not handling large outputs during the flush stage
Summary: Compression Streams not handling large outputs during the flush stage
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: Safari Technology Preview
Hardware: Mac (Apple Silicon) macOS 12
: P2 Normal
Assignee: Brandon
URL:
Keywords: InRadar
: 256330 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-03-16 07:17 PDT by bzugmeyer@gmail.com
Modified: 2023-05-17 13:35 PDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description bzugmeyer@gmail.com 2023-03-16 07:17:08 PDT
I am testing the new `CompressionStream` API in Safari Technology Preview. When trying to compress a moderately large buffer, the data it produces seems invalid.

To reproduce:

  Compress a large enough buffer of non-repeating data. In the following example, I use a buffer of approximately 48kB. Functions are coming from the WebKit CompressionStream test suite[1]:

  const input = new TextEncoder().encode(
    JSON.stringify(Array.from({ length: 10_000 }, (_, i) => i))
  );
  const output = await compressArrayBuffer(input, "deflate");
  assert_array_equals(input, pako.inflate(output));

Expected:

  `pako.inflate` returns a buffer equal to the input buffer.

Actual:

  The decompression fails with `pako.inflate` returning `undefined`.

Notes:

  With this particular example, I noticed that it works correctly when compressing up to 35_578 bytes. The issue only occurs when I try to compress more bytes (>= 35_579).

  Please see this page[2] for a practical way to reproduce the issue. In Chrome, every test succeed, but in Safari tests with larger inputs are failing.

[1]: https://github.com/WebKit/WebKit/blob/20329b62061b40d5a423a1d75b67779945b84729/LayoutTests/imported/w3c/web-platform-tests/compression/compression-stream.tentative.any.js
[2]: https://safari-compressionstream-issue.benoitzugmeyer.repl.co
Comment 1 Alexey Proskuryakov 2023-03-16 08:59:08 PDT
Hopefully this is a dupe of bug 252474.
Comment 2 Radar WebKit Bug Importer 2023-03-23 07:18:14 PDT
<rdar://problem/107133345>
Comment 3 Brandon 2023-03-23 11:41:36 PDT
Thanks for the bug report I can reproduce in Safari, Safari Technology Preview, and MiniBrowser. This is not a dupe of the other one from what I can tell at first glance.


<!DOCTYPE html>
<html>
<script type="module">
import * as pako from "https://cdnjs.cloudflare.com/ajax/libs/pako/2.1.0/pako.esm.mjs";
import * as fflate from "https://unpkg.com/fflate@0.7.4/esm/browser.js";

// JSON-encoded array of 10 thousands numbers ("[0,1,2,...]"). This produces 48_891 bytes of data.
const fullData = new TextEncoder().encode(
  JSON.stringify(Array.from({ length: 10_000 }, (_, i) => i))
);

await test(10)
await test(1_000)
await test(10_000)
await test(30_000)
await test(40_000) // fails
await test(35_578) // succeeds
await test(35_579) // fails

async function test(bytesLength) {
  const data = fullData.subarray(0, bytesLength)
  const compressedData = await compressArrayBuffer(data, "deflate");
  
  // Decompress with pako, and check that we got the same result as our original string
  // Similar to https://github.com/WebKit/WebKit/blob/20329b62061b40d5a423a1d75b67779945b84729/LayoutTests/imported/w3c/web-platform-tests/compression/compression-stream.tentative.any.js#L54-L55
  try {
    assert_array_equals(data, pako.inflate(compressedData));
    console.log(`[pako] Succeeded with ${bytesLength} bytes`)
  } catch (error) {
    console.error(`[pako] Failed with ${bytesLength} bytes:`, error)
  }

  // Double check with another library
  try {
    assert_array_equals(data, fflate.unzlibSync(compressedData));
    console.log(`[fflate] Succeeded with ${bytesLength} bytes`)
  } catch (error) {
    console.error(`[fflate] Failed with ${bytesLength} bytes:`, error)
  }

  try {
    assert_array_equals(data, await decompress(compressedData));
    console.log(`deflate succeed with ${bytesLength} bytes`);
  } catch (error) {
    console.error(`deflate Failed with ${bytesLength} bytes`, error);
  }
}

async function compressArrayBuffer(input, format) {
  const cs = new CompressionStream(format);
  const writer = cs.writable.getWriter();
  writer.write(input);
  const closePromise = writer.close();
  const out = [];
  const reader = cs.readable.getReader();
  let totalSize = 0;
  while (true) {
    const { value, done } = await reader.read();
    if (done)
      break;
    out.push(value);
    totalSize += value.byteLength;
  }
  await closePromise;
  const concatenated = new Uint8Array(totalSize);
  let offset = 0;
  for (const array of out) {
    concatenated.set(array, offset);
    offset += array.byteLength;
  }
  return concatenated;
}

function assert_array_equals(a, b) {
  if (!a) {
    throw new Error(`Arrays not equal: a is falsy (${a})`)
  }
  if (!b) {
    throw new Error(`Arrays not equal: b is falsy (${b})`)
  }
  if (!a.length === b.length) {
    throw new Error(`Arrays not equal: a.length !== b.length (${a.length} !== ${b.length})`)
  }
  a.forEach((v, i) => {
    if (a[i] !== b[i]) {
      throw new Error(`Arrays not equal: a[${i}] !== b[${i}] (${a[i]} !== ${b[i]})`)
    }
  })
}

async function concatenateStream(readableStream) {
  const reader = readableStream.getReader();
  let totalSize = 0;
  const buffers = [];
  while (true) {
    const { value, done } = await reader.read();
    if (done) {
      break;
    }
    buffers.push(value);
    totalSize += value.byteLength;
  }
  reader.releaseLock();
  const concatenated = new Uint8Array(totalSize);
  let offset = 0;
  for (const buffer of buffers) {
    concatenated.set(buffer, offset);
    offset += buffer.byteLength;
  }
  return concatenated;
}

async function decompress(view) {
  const ds = new DecompressionStream('deflate');
  const writer = ds.writable.getWriter();
  writer.write(view);
  writer.close();
  return await concatenateStream(ds.readable);
}
</script>
</html>
Comment 4 Brandon 2023-05-10 23:14:49 PDT
So what was happening here is that we missed an edge case where during the flush step we may have data longer than the allocated output. Since the avail_in was set to 0 we would just exit. We need to verify that the stream has ended before exiting.

The reason the 35_579 size failed was the allocated output during the flush step was 16384, while the output data was just slightly larger. The first compression outputted only 2 bytes, while the remaining 13686 were only spit out during the flush step.


This fixes it.

 @@ -132,8 +131,8 @@ ExceptionOr<RefPtr<JSC::ArrayBuffer>> CompressionStreamEncoder::compress(const u
         result = deflate(&m_zstream, (m_finish) ? Z_FINISH : Z_NO_FLUSH);
         if (result != Z_OK && result != Z_STREAM_END && result != Z_BUF_ERROR)
             return Exception { TypeError, "Failed to compress data."_s };
-
-        if (!m_zstream.avail_in) {
+
+        if (!m_zstream.avail_in && (!m_finish || (m_finish && result == Z_STREAM_END))) {
             shouldCompress = false;
             output.resize(allocateSize - m_zstream.avail_out);
         }

Working on the PR now.
Comment 5 Brandon 2023-05-10 23:18:29 PDT
Pull request: https://github.com/WebKit/WebKit/pull/13741
Comment 6 EWS 2023-05-11 20:07:13 PDT
Committed 263997@main (a30d9284f5b9): <https://commits.webkit.org/263997@main>

Reviewed commits have been landed. Closing PR #13741 and removing active labels.
Comment 7 Brent Fulgham 2023-05-17 13:35:17 PDT
*** Bug 256330 has been marked as a duplicate of this bug. ***