[SYSTEMDS-??] Modality Alignment, Contrastive Learning, PDF and Transcript Loader#2459
Closed
b-enedict wants to merge 2929 commits intoapache:mainfrom
Closed
[SYSTEMDS-??] Modality Alignment, Contrastive Learning, PDF and Transcript Loader#2459b-enedict wants to merge 2929 commits intoapache:mainfrom
b-enedict wants to merge 2929 commits intoapache:mainfrom
Conversation
The current inner class DMLGateWayListener implements function from the GatewayServerListener interface, which are never invoked by the GatewayServer (since the GatewayServer, which also implements GatewayServerListener, does not implement these methods. Furthermore, DMLGateWayListener previously called, Sys.exit(), which I think is not correct, since it breaks the proper shutdown of the GatewayServer. Finally, this commit added a new unit case, which checks the functionality of the DMLGateWayListener. While merging, we verified that the additions did not contain any regressions in startup and shutdown of the Python API. Closes apache#2243
Currently the format of the .asf.yaml sends emails to everyone on all commits. According to https://issues.apache.org/jira/browse/INFRA-26700 the error is in our project. This commit, therefore, removes the outputdir, as specified in: https://github.com/apache/infrastructure-asfyaml?tab=readme-ov-file#jekyll-cms
0) hard-coded server names / properties 1) windows line endings 2) memory configurations 3) datagen scripts 4) missing l2svm script
The perftest runMSVM_10k_1k_dense_k5 reproducible failed on serializing the parfor body program for a remote spark parfor job, because there were remaining spark instructions (checkpoints). The reason was because the forced CP compilation before such remote jobs was only applied in a subset of cases and special hop properties were not correctly cleaned up. This patch fixes the general issue.
This patch fixes a perftest performance regression of runMSVM_10k_1k_dense_k5 which ran in 36s instead of few seconds in earlier releases. The reason was unnecessary spark context creation during parfor optimization. We now handle theses cluster info requests more carefully, which now avoids this unnecessary spark context creation and reduced the total runtime back to 5.9s.
This patch introduces a pruning technique on the cleaning pipeline returned by the top-K cleaning. We identify a smaller yet equally effective subset of primitives for all top-performing pipelines, which optimizes their scoring performance. Closes apache#2251
This patch adds the embedding layer as a built-in operator in our nn/layers library. The functionality is similar to pytorch.nn.Embedding (https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html) The layer receives indices as input which refer to indices of an embedding dictionary and returns an embedding matrix where row i refers to embedding vector indices[i] of the embedding dictionary. This layer is used in every transformer architecture. Here the indices usually come from a tokenizer and the embedding matrix is the input to the actual transformer model. Closes apache#2237
…DMLOptions Closes apache#2241.
This patch fixes an invalid left-hand-side and left- and right-hand-side broadcasting in the new ampute builtin function. We now have a proper error handling in the hop to guide script developers that broadcasts can only be used from the right-hand-side.
This patch fixes various issues where the new error handling was too strict because temporarily invalid hop configurations exist (e.g., in tests as well as while setting the outer config).
Invalid broadcasting and outer operations - to be fixed separately.
Following the stricter error handling for binary broadcast operations, we found a number of issues in existing builtin scripts. In CP, the runtime compensates for such incorrect operations (redirecting to outer operations) but in Spark we can't because different RDD operations are used before we see the block dimensions. This patch accordingly also fixed other existing issues.
This patch adds the missing hoisting of DML function calls (which always need to bind to variables) from basic if predicates for convenience and in order to prevent unexpected errors. Furthermore, this patch simplifies the existing DML-bodied ampute() builtin by using this features as well as call the existing sigmoid() instead of a custom one.
This commit changes the memory estimates of many of the HOPS, to include the datatype, Indicating if the output is a frame or a Matrix. It is included because I had to make one modification to a unary Op in BWARE, and in general, it does not hurt to add to many of the Op Types (even if they do not return frame types at the moment) Closes apache#2221
This patch fixes a compilation issue, where certain if-branches were lost during hoisting of function calls from if statements. The issue did not show up, because for test instances of function calls in if statements triggered rewrites of predicate constant folding and branch removal. Now, we also resolved all remaining FIXMEs in the new ampute() builtin function.
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5.5.3 to 6.0.0. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v5.5.3...v6.0.0) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: 6.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Optimize dense matrix mult for transposed inputs This introduces specialized kernels for dense matrix multiplication involving transposed inputs (t(A)%*%B, A%*%t(B), t(A)%*%t(B)). Previously, these operations required an explicit intermediate transpose step, which caused unnecessary runtime. The new kernels perform the operations in-place or using tiled-transposition, avoiding the full allocation cost. Performance benchmarks on 100x100 dense matrices show significant speedups especially for t(A)%*%B and t(A)%*%t(B) and can be tested with higher dimensions. Closes apache#2425.
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v6...v7) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - [Release notes](https://github.com/docker/metadata-action/releases) - [Commits](docker/metadata-action@v5...v6) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](docker/setup-buildx-action@v3...v4) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Closes apache#2412. Co-authored-by: bakiberkay <baki.b.uzel@campus.tu-berlin.de>
In this patch a new unimodal optimizing strategy is introduced. The dags are executed in parallel via a memory aware node scheduler and executor. The scheduler tracks the dependencies between the nodes and records the memory used by the system in order to run new nodes that fit these memory limits. Every representation needs to provide a memory estimate so the scheduler knows which representation nodes can be scheduled. The memory estimates consist of a cpu and gpu estimate and the scheduler is aware of both those attributes. (The hyperparameter tuner and multimodal optimizer still need to be adapted to this new approach, therefore the tests are currently disabled)
383f470 to
76b705c
Compare
This patch adds a new loaders for loading PDF files by converting all pages of the document into numpy arrays processable by openCV. Furthermore it adds a loader for loading and converting an audio file into a transcript using faster-whisper.
This patch introduces a modality alignment operator to match previously unaligned data based on feature similarity. The operator computes similarities (e.g., ORB descriptors or perceptual hashing) between a primary and a secondary modality and determines an optimal matching. The implementation includes an abstract alignment interface and concrete methods for ORB-based and p-hash-based image alignment. Instead of producing reordered modalities, the operator outputs a matching that is applied after representation learning and before fusion. This ensures consistent ordering and equal-length modalities for downstream processing.
…pairing This patch introduces a new operator to Scuro for building contrastive learning pipelines with greater flexibility in handling input modalities. Previously, contrastive pairs had to be structurally aligned in a preprocessing step before being used in Scuro. This limited the ability to work with independently transformed or dynamically generated modalities. The new operator constructs contrastive pairs via a Cartesian product of modalities and optionally extends them with additional modalities that are already aligned. The resulting combinations are evaluated using a user-defined function to determine whether a pair represents a positive or negative sample. Based on this evaluation, the operator outputs both the assigned label and the corresponding modality pair. This design enables dynamic label generation and supports scenarios where modalities are windowed, reshuffled, or transformed differently. It also allows flexible fusion of modalities after contrastive pairing, improving the expressiveness of contrastive learning workflows. Limitations: The Cartesian product can introduce significant computational overhead for large modality sets, which may require further optimization.
76b705c to
8b18bd0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces new functionality for multimodal learning in Scuro, including a contrastive learning operator, a modality alignment operator, and additional data loaders.
Changes
Contrastive Learning Operator
Modality Alignment Operator
Data Loaders