diff --git a/.github/workflows/syntax-checks.yaml b/.github/workflows/syntax-checks.yaml index 27badf441d3..eb19393268c 100644 --- a/.github/workflows/syntax-checks.yaml +++ b/.github/workflows/syntax-checks.yaml @@ -59,3 +59,17 @@ jobs: rustup toolchain install stable --profile minimal --no-self-update -c clippy -c rustfmt - name: Run `cargo fmt` on top of Rust API project run: cd src/libcprover-rust; cargo fmt --all -- --check + + # This job should take under a minute (est) + check-generated-intrinsic-models: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v6 + - name: Fetch dependencies + env: + DEBIAN_FRONTEND: noninteractive + run: | + sudo apt-get update + sudo apt-get install --no-install-recommends -yq clang-format-15 + - name: Check x86 SIMD intrinsic models are in sync with their generator + run: ./scripts/check_intrinsic_models_sync.sh diff --git a/doc/neon-intrinsic-models.md b/doc/neon-intrinsic-models.md new file mode 100644 index 00000000000..a4c9ae348b2 --- /dev/null +++ b/doc/neon-intrinsic-models.md @@ -0,0 +1,167 @@ +# Generating ARM/AArch64 NEON intrinsic models + +This document describes how CBMC models the ARM/AArch64 NEON SIMD builtins, how +the models are generated, and the design decisions behind the choice of +semantic source. It is the companion to `scripts/generate_neon_models.py`, +which emits `src/ansi-c/library/arm_neon.c`. + +## Background: how NEON reaches CBMC + +Clang's `` implements the public NEON intrinsics (`vabdq_s8`, ...) +on top of a smaller set of *polymorphic* compiler builtins +(`__builtin_neon_vabdq_v`, ...). At each call site the header casts every +operand to a byte-representative lane type (`int8x16_t`, i.e. `__gcc_v16qi`) +and passes a `NeonTypeFlags` integer "type code" that selects the actual lane +interpretation. So one builtin such as `__builtin_neon_vabdq_v` backs +`vabdq_s8`, `vabdq_s16`, `vabdq_u8`, ... and a model must switch on the type +code to reinterpret the representative bytes. + +For CBMC to verify such code it needs three things, each handled by a separate +piece of the front-end: + +1. **Declarations** for the `__builtin_neon_*` builtins + (`src/ansi-c/compiler_headers/gcc_builtin_headers_aarch64.h`, generated by + `clang_builtins.py` from `clang-tblgen -gen-arm-neon-sema`). +2. **The `neon_vector_type` attribute** so the `` typedefs are real + vectors (handled in the scanner/parser and `ansi_c_convert_type`). +3. **Library models** giving the builtins a body — the subject of this + document. + +Intrinsics that Clang *open-codes* (the `*OpInst` records in `arm_neon.td`, +e.g. `vaddq_s8` → `a + b`) lower to native C operators, which CBMC already +handles, so they need **no** model. Only the opaque builtins (the `SInst` / +`IInst` / ... records) need one. + +## Where do the model bodies come from? + +`arm_neon.td` carries **no semantics** for the opaque builtins: their +`Operation` field is `OP_NONE`. It tells us *which* builtins exist, the element +types each supports, and the type codes — i.e. the model's signature and +`switch` skeleton — but not what each one computes. The per-lane computation +must come from elsewhere. Two ARM-published machine-readable sources were +considered. + +### ARM intrinsics JSON (Intrinsics Guide) vs ARM ASL (Architecture Spec) + +**ARM-JSON** — the database behind the online Neon/ACLE Intrinsics Guide. Keyed +by *typed intrinsic* (`vabdq_s8`), with an `Operation` field giving high-level +per-element pseudocode. + +- *Advantages.* Keying matches our pipeline (`arm_neon.td` already gives + intrinsic → builtin + type code), so an entry maps to exactly one `switch` + case. The pseudocode is close to the per-lane C we emit. It is plain JSON, so + trivial to consume, and it describes the net per-element effect (handy for + intrinsics that lower to instruction *sequences*). +- *Disadvantages.* The pseudocode is written for humans: notation varies and + detail (FP rounding modes, NaN propagation, saturation-flag side effects) is + often elided. Coverage/quality is uneven, and it is a derived presentation, + not the spec ARM validates against, so corners can be wrong. + +**ARM-ASL** — the Architecture Reference Manual's machine-readable spec, keyed +by *instruction* (`SABD`), with rigorous executable decode+execute pseudocode. + +- *Advantages.* Authoritative and exact (saturation, rounding, flags, edge + cases). Formal/executable, so mechanically translatable in principle (ASLi, + Sail, isla). Parameterised by element size, mapping naturally to "compute for + this lane width". +- *Disadvantages.* Wrong keying for us: it is per *instruction*, so it needs an + intrinsic → instruction mnemonic map that is not in `arm_neon.td`. It is + heavy to translate — it references a large shared-function library and + architectural state (`FPCR`, `FPSR.QC`, `PSTATE`, the register file) — and is + often *more* precise than CBMC can use, so faithfully consuming it would bloat + models the solver then chokes on. It is also large and under specific Arm + license terms. + +**Summary.** JSON wins on integration fit and effort; ASL wins on rigor. Since +CBMC models should be simple and self-contained, JSON-style per-element +semantics are the better fit for the bulk, with ASL reserved as a targeted +correctness backstop for the cases where exactness matters and is tractable. +For the hardest tier (FP estimate/reciprocal, crypto, table lookups) neither +source yields a clean, solver-friendly C model automatically; those need +hand-written models or constrained nondeterminism regardless of source. + +### What is actually available, and the resulting design + +Empirically (June 2026): + +- The Intrinsics Guide **`Operation` pseudocode JSON is access-gated** (the + `developer.arm.com/.../intrinsics/data/intrinsics.json` endpoint returns 403) + and is under Arm license terms, so it cannot be fetched here nor vendored + into CBMC. +- ARM's **ACLE repository is openly licensed and fetchable**: + `ARM-software/acle`'s `neon_intrinsics/advsimd.md` is a 2.1 MB structured + reference listing **4689 intrinsics**. It does **not** contain per-element + pseudocode, but it does contain, for each intrinsic, the **AArch64 + instruction mnemonic** it maps to (`vabdq_s8` → `SABD`, `vqaddq_s8` → + `SQADD`, `vhaddq_s8` → `SHADD`, ...) — **356 distinct mnemonics** in total. + +This reshapes the plan favourably. The instruction mnemonic is the *true* +semantic identity of an intrinsic (it is also the ASL key), and it is a far +smaller, well-understood set than the ~2500 intrinsics or their pseudocode. +So instead of translating per-intrinsic pseudocode, we: + +1. take **structure** from `arm_neon.td` (builtins, type codes — as before); +2. take the authoritative **intrinsic → instruction mnemonic** mapping from + ACLE `advsimd.md`; and +3. supply **semantics** via a compact, auditable *mnemonic → per-lane C* + table in the generator (`SABD`/`UABD` → absolute difference, `SMAX`/`UMAX` + → maximum, `SQADD`/`UQADD` → saturating add, ...). + +This is authoritative about *which* operation each builtin is (no guessing from +names), keeps the hand-written part tiny (one entry per instruction family, not +per intrinsic), and lines up with the ASL key should we later want ASL-grade +rigor for specific instructions. The gated `Operation` JSON would only be +needed to avoid writing the mnemonic→C table at all; given how small that table +is, it is not worth the licensing and translation cost. + +Both inputs are **external and not vendored** (mirroring how the x86 generator +reads Intel's XML and how the declaration generator reads `arm_neon.td` from an +LLVM checkout): `arm_neon.td` comes from an LLVM checkout and `advsimd.md` from +the openly-licensed ACLE repo. Neither is committed to CBMC. + +## Running the generator + +```sh +# structure: clang's arm_neon.td (from an llvm-project checkout) +TD=llvm-project/clang/include/clang/Basic/arm_neon.td +# semantics key: ARM ACLE advsimd.md (openly licensed; do not vendor) +curl -sO https://raw.githubusercontent.com/ARM-software/acle/main/neon_intrinsics/advsimd.md + +python3 scripts/generate_neon_models.py "$TD" --acle advsimd.md \ + -o src/ansi-c/library/arm_neon.c +``` + +Without `--acle` the generator falls back to a small intrinsic-name-keyed +operation table (no ARM data required), which is enough to regenerate the ops +it already knows. With `--acle` it keys semantics on the ARM instruction +mnemonic, annotates each model with that mnemonic for provenance, and prints a +coverage audit: how many opaque builtins map to mnemonics the table covers, and +a histogram of the mnemonics it does not yet cover (the modeling roadmap). + +The output is run through `clang-format-15`, so regeneration is idempotent. + +## Current coverage and roadmap + +The generator models the mechanically-translatable integer families: +absolute difference (`SABD`/`UABD`), min/max (`SMAX`/`UMAX`/`SMIN`/`UMIN`), +saturating add/subtract (`SQADD`/`UQADD`/`SQSUB`/`UQSUB`), halving and +rounding-halving add/subtract (`SHADD`/`UHADD`/`SHSUB`/`UHSUB`/`SRHADD`/`URHADD`), +the pairwise add/min/max reductions (`ADDP`/`SMAXP`/`UMAXP`/`SMINP`/`UMINP`), +test-bits (`CMTST`) and bitwise select (`BSL`). +Saturating and halving arithmetic is widened to avoid signed-overflow undefined +behaviour; pairwise add is computed unsigned so its modular wrap is well +defined; bitwise select operates on the raw bits and so is independent of the +lane type code. + +The relational compares (`vceq`/`vcge`/`vcgt`/`vcle`/`vclt`) and the plain +arithmetic/logical ops (`vadd`/`vsub`/`vmul`, `vand`/`vorr`/`veor`/...) are +open-coded by `` into native C operators, which CBMC handles +directly, so they need no model. + +The `--acle` audit classifies the remaining opaque builtins by mnemonic. The +next tractable arithmetic tiers are `EXT` (vector extract by immediate) and the +saturating-shift group (`SQSHL`/`UQSHL`/...). Loads/stores (`LD1`/`ST1`/...), +permutes (`ZIP`/`UZP`/`TRN`/`TBL`), `DUP`, `INS` and the `NOP` (reinterpret) +group are structural rather than arithmetic and are handled separately or +natively. Floating-point (`FABD`/`FMAX`/...) and crypto/estimate instructions +need dedicated modeling and are out of scope for the mechanical generator. diff --git a/regression/ansi-c/gcc_neon_vector_type/main.c b/regression/ansi-c/gcc_neon_vector_type/main.c new file mode 100644 index 00000000000..44d26419b66 --- /dev/null +++ b/regression/ansi-c/gcc_neon_vector_type/main.c @@ -0,0 +1,18 @@ +// The neon_vector_type attribute (used by Clang's ) gives the +// vector size as a lane count rather than in bytes, unlike vector_size. +typedef __attribute__((neon_vector_type(16))) signed char int8x16_t; +typedef __attribute__((neon_vector_type(8))) short int16x8_t; +typedef __attribute__((neon_vector_type(4))) int int32x4_t; +typedef __attribute__((neon_vector_type(2))) double float64x2_t; + +int main() +{ + int8x16_t a = {0}; + a[3] = 7; + __CPROVER_assert(a[3] == 7, "lane indexing works"); + __CPROVER_assert(sizeof(int8x16_t) == 16, "16 lanes of signed char"); + __CPROVER_assert(sizeof(int16x8_t) == 16, "8 lanes of short"); + __CPROVER_assert(sizeof(int32x4_t) == 16, "4 lanes of int"); + __CPROVER_assert(sizeof(float64x2_t) == 16, "2 lanes of double"); + return 0; +} diff --git a/regression/ansi-c/gcc_neon_vector_type/test.desc b/regression/ansi-c/gcc_neon_vector_type/test.desc new file mode 100644 index 00000000000..75cc69573e8 --- /dev/null +++ b/regression/ansi-c/gcc_neon_vector_type/test.desc @@ -0,0 +1,9 @@ +CORE gcc-only +main.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring +^CONVERSION ERROR$ diff --git a/regression/cbmc-library/__builtin_ia32_lfence/main.c b/regression/cbmc-library/__builtin_ia32/lfence.c similarity index 100% rename from regression/cbmc-library/__builtin_ia32_lfence/main.c rename to regression/cbmc-library/__builtin_ia32/lfence.c diff --git a/regression/cbmc-library/__builtin_ia32_lfence/test.desc b/regression/cbmc-library/__builtin_ia32/lfence.desc similarity index 92% rename from regression/cbmc-library/__builtin_ia32_lfence/test.desc rename to regression/cbmc-library/__builtin_ia32/lfence.desc index 9542d988e8d..0f195a73692 100644 --- a/regression/cbmc-library/__builtin_ia32_lfence/test.desc +++ b/regression/cbmc-library/__builtin_ia32/lfence.desc @@ -1,5 +1,5 @@ KNOWNBUG -main.c +lfence.c --pointer-check --bounds-check ^EXIT=0$ ^SIGNAL=0$ diff --git a/regression/cbmc-library/__builtin_ia32_mfence/main.c b/regression/cbmc-library/__builtin_ia32/mfence.c similarity index 100% rename from regression/cbmc-library/__builtin_ia32_mfence/main.c rename to regression/cbmc-library/__builtin_ia32/mfence.c diff --git a/regression/cbmc-library/__builtin_ia32_mfence/test.desc b/regression/cbmc-library/__builtin_ia32/mfence.desc similarity index 92% rename from regression/cbmc-library/__builtin_ia32_mfence/test.desc rename to regression/cbmc-library/__builtin_ia32/mfence.desc index 9542d988e8d..02f1f6d6d1f 100644 --- a/regression/cbmc-library/__builtin_ia32_mfence/test.desc +++ b/regression/cbmc-library/__builtin_ia32/mfence.desc @@ -1,5 +1,5 @@ KNOWNBUG -main.c +mfence.c --pointer-check --bounds-check ^EXIT=0$ ^SIGNAL=0$ diff --git a/regression/cbmc-library/__builtin_ia32/pabsb128.c b/regression/cbmc-library/__builtin_ia32/pabsb128.c new file mode 100644 index 00000000000..4ded3178df6 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pabsb128.c @@ -0,0 +1,15 @@ +#include + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_pabsb128(__gcc_v16qi); + +int main() +{ + // Lane 0 is the interesting hardware case: pabsb leaves SCHAR_MIN unchanged + // (its absolute value is not representable as a signed byte). + __gcc_v16qi a = (__gcc_v16qi){ + SCHAR_MIN, -2, 3, -4, 5, -6, 7, -8, 9, -10, 11, -12, 13, -14, 15, -16}; + __gcc_v16qi r = __builtin_ia32_pabsb128(a); + __CPROVER_assert(r[0] == SCHAR_MIN && r[1] == 2 && r[15] == 16, "abs epi8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pabsb128.desc b/regression/cbmc-library/__builtin_ia32/pabsb128.desc new file mode 100644 index 00000000000..0beb8e2bdc4 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pabsb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pabsb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pabsd128.c b/regression/cbmc-library/__builtin_ia32/pabsd128.c new file mode 100644 index 00000000000..deb4ccf7bc6 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pabsd128.c @@ -0,0 +1,15 @@ +#include + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_pabsd128(__gcc_v4si); + +int main() +{ + // Lane 0 is the interesting hardware case: pabsd leaves INT_MIN unchanged + // (its absolute value is not representable), and it is also the input that + // exposed the -INT_MIN signed-overflow UB in the previous model. + __gcc_v4si a = (__gcc_v4si){INT_MIN, -2, 3, -4}; + __gcc_v4si r = __builtin_ia32_pabsd128(a); + __CPROVER_assert(r[0] == INT_MIN && r[1] == 2 && r[3] == 4, "abs epi32"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pabsd128.desc b/regression/cbmc-library/__builtin_ia32/pabsd128.desc new file mode 100644 index 00000000000..83873dca7ea --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pabsd128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pabsd128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pabsd256.c b/regression/cbmc-library/__builtin_ia32/pabsd256.c new file mode 100644 index 00000000000..f3badb87b0b --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pabsd256.c @@ -0,0 +1,14 @@ +#include + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +__gcc_v8si __builtin_ia32_pabsd256(__gcc_v8si); + +int main() +{ + // Lane 0: pabsd leaves INT_MIN unchanged (no UB in the model). + __gcc_v8si a = (__gcc_v8si){INT_MIN, -2, 3, -4, 5, -6, 7, -8}; + __gcc_v8si r = __builtin_ia32_pabsd256(a); + __CPROVER_assert( + r[0] == INT_MIN && r[1] == 2 && r[7] == 8, "abs epi32 (256)"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pabsd256.desc b/regression/cbmc-library/__builtin_ia32/pabsd256.desc new file mode 100644 index 00000000000..c6770eb23ce --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pabsd256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pabsd256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pabsw128.c b/regression/cbmc-library/__builtin_ia32/pabsw128.c new file mode 100644 index 00000000000..82537fd245d --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pabsw128.c @@ -0,0 +1,14 @@ +#include + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pabsw128(__gcc_v8hi); + +int main() +{ + // Lane 0 is the interesting hardware case: pabsw leaves SHRT_MIN unchanged + // (its absolute value is not representable as a signed 16-bit value). + __gcc_v8hi a = (__gcc_v8hi){SHRT_MIN, -2, 3, -4, 5, -6, 7, -8}; + __gcc_v8hi r = __builtin_ia32_pabsw128(a); + __CPROVER_assert(r[0] == SHRT_MIN && r[1] == 2 && r[7] == 8, "abs epi16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pabsw128.desc b/regression/cbmc-library/__builtin_ia32/pabsw128.desc new file mode 100644 index 00000000000..03a52862cd5 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pabsw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pabsw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddb.c b/regression/cbmc-library/__builtin_ia32/paddb.c new file mode 100644 index 00000000000..37525198d9b --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddb.c @@ -0,0 +1,18 @@ +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +__gcc_v8qi __builtin_ia32_paddb(__gcc_v8qi, __gcc_v8qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v8qi a, b; + __gcc_v8qi r = __builtin_ia32_paddb(a, b); + __gcc_v8qi_u ref = (__gcc_v8qi_u)a + (__gcc_v8qi_u)b; + __CPROVER_assert( + r[0] == (char)ref[0] && r[1] == (char)ref[1] && r[2] == (char)ref[2] && + r[3] == (char)ref[3] && r[4] == (char)ref[4] && r[5] == (char)ref[5] && + r[6] == (char)ref[6] && r[7] == (char)ref[7], + "__builtin_ia32_paddb == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddb.desc b/regression/cbmc-library/__builtin_ia32/paddb.desc new file mode 100644 index 00000000000..70f3ddd3ef9 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddb.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddb.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddb128.c b/regression/cbmc-library/__builtin_ia32/paddb128.c new file mode 100644 index 00000000000..402a19e35b9 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddb128.c @@ -0,0 +1,22 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_paddb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v16qi a, b; + __gcc_v16qi r = __builtin_ia32_paddb128(a, b); + __gcc_v16qi_u ref = (__gcc_v16qi_u)a + (__gcc_v16qi_u)b; + __CPROVER_assert( + r[0] == (char)ref[0] && r[1] == (char)ref[1] && r[2] == (char)ref[2] && + r[3] == (char)ref[3] && r[4] == (char)ref[4] && r[5] == (char)ref[5] && + r[6] == (char)ref[6] && r[7] == (char)ref[7] && r[8] == (char)ref[8] && + r[9] == (char)ref[9] && r[10] == (char)ref[10] && + r[11] == (char)ref[11] && r[12] == (char)ref[12] && + r[13] == (char)ref[13] && r[14] == (char)ref[14] && + r[15] == (char)ref[15], + "__builtin_ia32_paddb128 == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddb128.desc b/regression/cbmc-library/__builtin_ia32/paddb128.desc new file mode 100644 index 00000000000..06c7d429816 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddb256.c b/regression/cbmc-library/__builtin_ia32/paddb256.c new file mode 100644 index 00000000000..ab11bef4413 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddb256.c @@ -0,0 +1,30 @@ +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); +__gcc_v32qi __builtin_ia32_paddb256(__gcc_v32qi, __gcc_v32qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v32qi a, b; + __gcc_v32qi r = __builtin_ia32_paddb256(a, b); + __gcc_v32qi_u ref = (__gcc_v32qi_u)a + (__gcc_v32qi_u)b; + __CPROVER_assert( + r[0] == (char)ref[0] && r[1] == (char)ref[1] && r[2] == (char)ref[2] && + r[3] == (char)ref[3] && r[4] == (char)ref[4] && r[5] == (char)ref[5] && + r[6] == (char)ref[6] && r[7] == (char)ref[7] && r[8] == (char)ref[8] && + r[9] == (char)ref[9] && r[10] == (char)ref[10] && + r[11] == (char)ref[11] && r[12] == (char)ref[12] && + r[13] == (char)ref[13] && r[14] == (char)ref[14] && + r[15] == (char)ref[15] && r[16] == (char)ref[16] && + r[17] == (char)ref[17] && r[18] == (char)ref[18] && + r[19] == (char)ref[19] && r[20] == (char)ref[20] && + r[21] == (char)ref[21] && r[22] == (char)ref[22] && + r[23] == (char)ref[23] && r[24] == (char)ref[24] && + r[25] == (char)ref[25] && r[26] == (char)ref[26] && + r[27] == (char)ref[27] && r[28] == (char)ref[28] && + r[29] == (char)ref[29] && r[30] == (char)ref[30] && + r[31] == (char)ref[31], + "__builtin_ia32_paddb256 == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddb256.desc b/regression/cbmc-library/__builtin_ia32/paddb256.desc new file mode 100644 index 00000000000..ab5d46ddab9 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddb256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddb256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddd.c b/regression/cbmc-library/__builtin_ia32/paddd.c new file mode 100644 index 00000000000..92b8f1afe75 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd.c @@ -0,0 +1,16 @@ +typedef int __gcc_v2si __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); +__gcc_v2si __builtin_ia32_paddd(__gcc_v2si, __gcc_v2si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v2si a, b; + __gcc_v2si r = __builtin_ia32_paddd(a, b); + __gcc_v2si_u ref = (__gcc_v2si_u)a + (__gcc_v2si_u)b; + __CPROVER_assert( + r[0] == (int)ref[0] && r[1] == (int)ref[1], + "__builtin_ia32_paddd == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddd.desc b/regression/cbmc-library/__builtin_ia32/paddd.desc new file mode 100644 index 00000000000..47a5730bb91 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddd.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddd128.c b/regression/cbmc-library/__builtin_ia32/paddd128.c new file mode 100644 index 00000000000..ed3ba7ae970 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd128.c @@ -0,0 +1,17 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_paddd128(__gcc_v4si, __gcc_v4si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v4si a, b; + __gcc_v4si r = __builtin_ia32_paddd128(a, b); + __gcc_v4si_u ref = (__gcc_v4si_u)a + (__gcc_v4si_u)b; + __CPROVER_assert( + r[0] == (int)ref[0] && r[1] == (int)ref[1] && r[2] == (int)ref[2] && + r[3] == (int)ref[3], + "__builtin_ia32_paddd128 == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddd128.desc b/regression/cbmc-library/__builtin_ia32/paddd128.desc new file mode 100644 index 00000000000..9ae4f9754bf --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddd128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddd128_mask.c b/regression/cbmc-library/__builtin_ia32/paddd128_mask.c new file mode 100644 index 00000000000..0481c42ebf5 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd128_mask.c @@ -0,0 +1,15 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si +__builtin_ia32_paddd128_mask(__gcc_v4si, __gcc_v4si, __gcc_v4si, unsigned char); + +int main() +{ + __gcc_v4si a = {1, 1, 1, 1}; + __gcc_v4si b = {2, 2, 2, 2}; + __gcc_v4si src = {9, 9, 9, 9}; + // Mask 0x5: bits 0 and 2 set -> a+b (3); lanes 1 and 3 keep the source (9). + __gcc_v4si r = __builtin_ia32_paddd128_mask(a, b, src, 0x5); + __CPROVER_assert( + r[0] == 3 && r[1] == 9 && r[2] == 3 && r[3] == 9, "paddd128 merge-masked"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddd128_mask.desc b/regression/cbmc-library/__builtin_ia32/paddd128_mask.desc new file mode 100644 index 00000000000..caba91848eb --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd128_mask.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddd128_mask.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddd256.c b/regression/cbmc-library/__builtin_ia32/paddd256.c new file mode 100644 index 00000000000..38bb6aec7bd --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd256.c @@ -0,0 +1,18 @@ +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); +__gcc_v8si __builtin_ia32_paddd256(__gcc_v8si, __gcc_v8si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v8si a, b; + __gcc_v8si r = __builtin_ia32_paddd256(a, b); + __gcc_v8si_u ref = (__gcc_v8si_u)a + (__gcc_v8si_u)b; + __CPROVER_assert( + r[0] == (int)ref[0] && r[1] == (int)ref[1] && r[2] == (int)ref[2] && + r[3] == (int)ref[3] && r[4] == (int)ref[4] && r[5] == (int)ref[5] && + r[6] == (int)ref[6] && r[7] == (int)ref[7], + "__builtin_ia32_paddd256 == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddd256.desc b/regression/cbmc-library/__builtin_ia32/paddd256.desc new file mode 100644 index 00000000000..a7794dd351d --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddd256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddd512_mask.c b/regression/cbmc-library/__builtin_ia32/paddd512_mask.c new file mode 100644 index 00000000000..99066b2e14f --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd512_mask.c @@ -0,0 +1,20 @@ +typedef int __gcc_v16si __attribute__((__vector_size__(64))); +__gcc_v16si __builtin_ia32_paddd512_mask( + __gcc_v16si, + __gcc_v16si, + __gcc_v16si, + unsigned short); + +int main() +{ + __gcc_v16si a = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}; + __gcc_v16si b = {2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2}; + __gcc_v16si src = {9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9}; + // Mask 0x0005: bits 0 and 2 set -> those lanes get a+b (3), the rest keep + // the merge source (9). + __gcc_v16si r = __builtin_ia32_paddd512_mask(a, b, src, 0x0005); + __CPROVER_assert( + r[0] == 3 && r[1] == 9 && r[2] == 3 && r[3] == 9 && r[15] == 9, + "paddd512 merge-masked add"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddd512_mask.desc b/regression/cbmc-library/__builtin_ia32/paddd512_mask.desc new file mode 100644 index 00000000000..a33660c2449 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddd512_mask.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddd512_mask.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddq128.c b/regression/cbmc-library/__builtin_ia32/paddq128.c new file mode 100644 index 00000000000..1146ad24d6f --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddq128.c @@ -0,0 +1,16 @@ +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); +__gcc_v2di __builtin_ia32_paddq128(__gcc_v2di, __gcc_v2di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v2di a, b; + __gcc_v2di r = __builtin_ia32_paddq128(a, b); + __gcc_v2di_u ref = (__gcc_v2di_u)a + (__gcc_v2di_u)b; + __CPROVER_assert( + r[0] == (long long)ref[0] && r[1] == (long long)ref[1], + "__builtin_ia32_paddq128 == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddq128.desc b/regression/cbmc-library/__builtin_ia32/paddq128.desc new file mode 100644 index 00000000000..de23d1d1065 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddq128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddq128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddq256.c b/regression/cbmc-library/__builtin_ia32/paddq256.c new file mode 100644 index 00000000000..7105833741a --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddq256.c @@ -0,0 +1,17 @@ +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); +__gcc_v4di __builtin_ia32_paddq256(__gcc_v4di, __gcc_v4di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v4di a, b; + __gcc_v4di r = __builtin_ia32_paddq256(a, b); + __gcc_v4di_u ref = (__gcc_v4di_u)a + (__gcc_v4di_u)b; + __CPROVER_assert( + r[0] == (long long)ref[0] && r[1] == (long long)ref[1] && + r[2] == (long long)ref[2] && r[3] == (long long)ref[3], + "__builtin_ia32_paddq256 == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddq256.desc b/regression/cbmc-library/__builtin_ia32/paddq256.desc new file mode 100644 index 00000000000..98a14966bb6 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddq256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddq256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddsb128.c b/regression/cbmc-library/__builtin_ia32/paddsb128.c new file mode 100644 index 00000000000..3fabd00c815 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddsb128.c @@ -0,0 +1,14 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_paddsb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Signed saturation: 100+50=150 clamps to 127; -100+-50=-150 clamps to -128. + __gcc_v16qi a = + (__gcc_v16qi){100, -100, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; + __gcc_v16qi b = + (__gcc_v16qi){50, -50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}; + __gcc_v16qi r = __builtin_ia32_paddsb128(a, b); + __CPROVER_assert(r[0] == 127 && r[1] == -128 && r[2] == 4, "adds epi8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddsb128.desc b/regression/cbmc-library/__builtin_ia32/paddsb128.desc new file mode 100644 index 00000000000..43e7c7d7f7f --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddsb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddsb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddsw128.c b/regression/cbmc-library/__builtin_ia32/paddsw128.c new file mode 100644 index 00000000000..34f173e73e3 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddsw128.c @@ -0,0 +1,13 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_paddsw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + // Signed 16-bit saturation: 30000+10000 clamps to 32767; -30000+-10000 to + // -32768. + __gcc_v8hi a = (__gcc_v8hi){30000, -30000, 3, 4, 5, 6, 7, 8}; + __gcc_v8hi b = (__gcc_v8hi){10000, -10000, 1, 1, 1, 1, 1, 1}; + __gcc_v8hi r = __builtin_ia32_paddsw128(a, b); + __CPROVER_assert(r[0] == 32767 && r[1] == -32768 && r[2] == 4, "adds epi16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddsw128.desc b/regression/cbmc-library/__builtin_ia32/paddsw128.desc new file mode 100644 index 00000000000..394390b5b55 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddsw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddsw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddusb128.c b/regression/cbmc-library/__builtin_ia32/paddusb128.c new file mode 100644 index 00000000000..782afb36308 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddusb128.c @@ -0,0 +1,15 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_paddusb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Unsigned saturation: the bytes 200 and 100 (written as their signed-char + // equivalents) sum to 300, which clamps to 255 == -1 as a signed byte. + __gcc_v16qi a = + (__gcc_v16qi){200, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; + __gcc_v16qi b = + (__gcc_v16qi){100, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}; + __gcc_v16qi r = __builtin_ia32_paddusb128(a, b); + __CPROVER_assert(r[0] == -1 && r[1] == 2, "adds epu8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddusb128.desc b/regression/cbmc-library/__builtin_ia32/paddusb128.desc new file mode 100644 index 00000000000..8b16926d005 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddusb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddusb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddw.c b/regression/cbmc-library/__builtin_ia32/paddw.c new file mode 100644 index 00000000000..2b33555f7a4 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddw.c @@ -0,0 +1,17 @@ +typedef short __gcc_v4hi __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +__gcc_v4hi __builtin_ia32_paddw(__gcc_v4hi, __gcc_v4hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v4hi a, b; + __gcc_v4hi r = __builtin_ia32_paddw(a, b); + __gcc_v4hi_u ref = (__gcc_v4hi_u)a + (__gcc_v4hi_u)b; + __CPROVER_assert( + r[0] == (short)ref[0] && r[1] == (short)ref[1] && r[2] == (short)ref[2] && + r[3] == (short)ref[3], + "__builtin_ia32_paddw == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddw.desc b/regression/cbmc-library/__builtin_ia32/paddw.desc new file mode 100644 index 00000000000..4245f6710b8 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddw.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddw.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddw128.c b/regression/cbmc-library/__builtin_ia32/paddw128.c new file mode 100644 index 00000000000..d3a65c4474c --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddw128.c @@ -0,0 +1,18 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_paddw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v8hi a, b; + __gcc_v8hi r = __builtin_ia32_paddw128(a, b); + __gcc_v8hi_u ref = (__gcc_v8hi_u)a + (__gcc_v8hi_u)b; + __CPROVER_assert( + r[0] == (short)ref[0] && r[1] == (short)ref[1] && r[2] == (short)ref[2] && + r[3] == (short)ref[3] && r[4] == (short)ref[4] && r[5] == (short)ref[5] && + r[6] == (short)ref[6] && r[7] == (short)ref[7], + "__builtin_ia32_paddw128 == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddw128.desc b/regression/cbmc-library/__builtin_ia32/paddw128.desc new file mode 100644 index 00000000000..7e83acf4cb3 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/paddw256.c b/regression/cbmc-library/__builtin_ia32/paddw256.c new file mode 100644 index 00000000000..e4576b36589 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddw256.c @@ -0,0 +1,22 @@ +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); +__gcc_v16hi __builtin_ia32_paddw256(__gcc_v16hi, __gcc_v16hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native +) for all inputs. + __gcc_v16hi a, b; + __gcc_v16hi r = __builtin_ia32_paddw256(a, b); + __gcc_v16hi_u ref = (__gcc_v16hi_u)a + (__gcc_v16hi_u)b; + __CPROVER_assert( + r[0] == (short)ref[0] && r[1] == (short)ref[1] && r[2] == (short)ref[2] && + r[3] == (short)ref[3] && r[4] == (short)ref[4] && r[5] == (short)ref[5] && + r[6] == (short)ref[6] && r[7] == (short)ref[7] && r[8] == (short)ref[8] && + r[9] == (short)ref[9] && r[10] == (short)ref[10] && + r[11] == (short)ref[11] && r[12] == (short)ref[12] && + r[13] == (short)ref[13] && r[14] == (short)ref[14] && + r[15] == (short)ref[15], + "__builtin_ia32_paddw256 == native +"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/paddw256.desc b/regression/cbmc-library/__builtin_ia32/paddw256.desc new file mode 100644 index 00000000000..9a454eb7f0e --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/paddw256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +paddw256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pand128.c b/regression/cbmc-library/__builtin_ia32/pand128.c new file mode 100644 index 00000000000..e4edad524a0 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pand128.c @@ -0,0 +1,14 @@ +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +__gcc_v2di __builtin_ia32_pand128(__gcc_v2di, __gcc_v2di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native &) for all inputs. + __gcc_v2di a, b; + __gcc_v2di r = __builtin_ia32_pand128(a, b); + __gcc_v2di ref = a & b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1], "__builtin_ia32_pand128 == native &"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pand128.desc b/regression/cbmc-library/__builtin_ia32/pand128.desc new file mode 100644 index 00000000000..c57840f1078 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pand128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pand128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pandn128.c b/regression/cbmc-library/__builtin_ia32/pandn128.c new file mode 100644 index 00000000000..15182358781 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pandn128.c @@ -0,0 +1,15 @@ +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +__gcc_v2di __builtin_ia32_pandn128(__gcc_v2di, __gcc_v2di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ~a & b) for all inputs. + __gcc_v2di a, b; + __gcc_v2di r = __builtin_ia32_pandn128(a, b); + __gcc_v2di ref = ~a & b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1], + "__builtin_ia32_pandn128 == native ~a & b"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pandn128.desc b/regression/cbmc-library/__builtin_ia32/pandn128.desc new file mode 100644 index 00000000000..3bbccccbef0 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pandn128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pandn128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pavgb128.c b/regression/cbmc-library/__builtin_ia32/pavgb128.c new file mode 100644 index 00000000000..28bdb854dda --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pavgb128.c @@ -0,0 +1,15 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_pavgb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Lane 0 distinguishes unsigned from signed: -1 is 255 unsigned, so the + // rounded unsigned average of {255, 1} is (255 + 1 + 1) >> 1 == 128, which + // is -128 as a signed byte (a signed average would give 0). + __gcc_v16qi a = + (__gcc_v16qi){-1, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32}; + __gcc_v16qi b = (__gcc_v16qi){1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4}; + __gcc_v16qi r = __builtin_ia32_pavgb128(a, b); + __CPROVER_assert(r[0] == -128 && r[1] == 4, "avg epu8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pavgb128.desc b/regression/cbmc-library/__builtin_ia32/pavgb128.desc new file mode 100644 index 00000000000..4031b7619c2 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pavgb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pavgb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pavgw128.c b/regression/cbmc-library/__builtin_ia32/pavgw128.c new file mode 100644 index 00000000000..de49e563f75 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pavgw128.c @@ -0,0 +1,14 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pavgw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + // Lane 0 distinguishes unsigned from signed: -1 is 65535 unsigned, so the + // rounded unsigned average of {65535, 1} is (65535 + 1 + 1) >> 1 == 32768, + // which is -32768 as a signed 16-bit value (a signed average would give 0). + __gcc_v8hi a = (__gcc_v8hi){-1, 4, 6, 8, 10, 12, 14, 16}; + __gcc_v8hi b = (__gcc_v8hi){1, 4, 4, 4, 4, 4, 4, 4}; + __gcc_v8hi r = __builtin_ia32_pavgw128(a, b); + __CPROVER_assert(r[0] == -32768 && r[1] == 4, "avg epu16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pavgw128.desc b/regression/cbmc-library/__builtin_ia32/pavgw128.desc new file mode 100644 index 00000000000..445c4ef3170 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pavgw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pavgw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqb128.c b/regression/cbmc-library/__builtin_ia32/pcmpeqb128.c new file mode 100644 index 00000000000..f309031f310 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqb128.c @@ -0,0 +1,19 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_pcmpeqb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ==) for all inputs. + __gcc_v16qi a, b; + __gcc_v16qi r = __builtin_ia32_pcmpeqb128(a, b); + __gcc_v16qi ref = a == b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7] && + r[8] == ref[8] && r[9] == ref[9] && r[10] == ref[10] && + r[11] == ref[11] && r[12] == ref[12] && r[13] == ref[13] && + r[14] == ref[14] && r[15] == ref[15], + "__builtin_ia32_pcmpeqb128 == native =="); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqb128.desc b/regression/cbmc-library/__builtin_ia32/pcmpeqb128.desc new file mode 100644 index 00000000000..375804c4553 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpeqb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqb256.c b/regression/cbmc-library/__builtin_ia32/pcmpeqb256.c new file mode 100644 index 00000000000..a5a3d10a968 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqb256.c @@ -0,0 +1,24 @@ +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +__gcc_v32qi __builtin_ia32_pcmpeqb256(__gcc_v32qi, __gcc_v32qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ==) for all inputs. + __gcc_v32qi a, b; + __gcc_v32qi r = __builtin_ia32_pcmpeqb256(a, b); + __gcc_v32qi ref = a == b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7] && + r[8] == ref[8] && r[9] == ref[9] && r[10] == ref[10] && + r[11] == ref[11] && r[12] == ref[12] && r[13] == ref[13] && + r[14] == ref[14] && r[15] == ref[15] && r[16] == ref[16] && + r[17] == ref[17] && r[18] == ref[18] && r[19] == ref[19] && + r[20] == ref[20] && r[21] == ref[21] && r[22] == ref[22] && + r[23] == ref[23] && r[24] == ref[24] && r[25] == ref[25] && + r[26] == ref[26] && r[27] == ref[27] && r[28] == ref[28] && + r[29] == ref[29] && r[30] == ref[30] && r[31] == ref[31], + "__builtin_ia32_pcmpeqb256 == native =="); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqb256.desc b/regression/cbmc-library/__builtin_ia32/pcmpeqb256.desc new file mode 100644 index 00000000000..424832d97a0 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqb256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpeqb256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqd128.c b/regression/cbmc-library/__builtin_ia32/pcmpeqd128.c new file mode 100644 index 00000000000..7c725c15988 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqd128.c @@ -0,0 +1,15 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_pcmpeqd128(__gcc_v4si, __gcc_v4si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ==) for all inputs. + __gcc_v4si a, b; + __gcc_v4si r = __builtin_ia32_pcmpeqd128(a, b); + __gcc_v4si ref = a == b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3], + "__builtin_ia32_pcmpeqd128 == native =="); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqd128.desc b/regression/cbmc-library/__builtin_ia32/pcmpeqd128.desc new file mode 100644 index 00000000000..6380ab8ccc9 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqd128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpeqd128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqd256.c b/regression/cbmc-library/__builtin_ia32/pcmpeqd256.c new file mode 100644 index 00000000000..139f9b6c503 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqd256.c @@ -0,0 +1,16 @@ +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +__gcc_v8si __builtin_ia32_pcmpeqd256(__gcc_v8si, __gcc_v8si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ==) for all inputs. + __gcc_v8si a, b; + __gcc_v8si r = __builtin_ia32_pcmpeqd256(a, b); + __gcc_v8si ref = a == b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7], + "__builtin_ia32_pcmpeqd256 == native =="); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqd256.desc b/regression/cbmc-library/__builtin_ia32/pcmpeqd256.desc new file mode 100644 index 00000000000..fdc80820996 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqd256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpeqd256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqw128.c b/regression/cbmc-library/__builtin_ia32/pcmpeqw128.c new file mode 100644 index 00000000000..669bb6d0b66 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqw128.c @@ -0,0 +1,16 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pcmpeqw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ==) for all inputs. + __gcc_v8hi a, b; + __gcc_v8hi r = __builtin_ia32_pcmpeqw128(a, b); + __gcc_v8hi ref = a == b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7], + "__builtin_ia32_pcmpeqw128 == native =="); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqw128.desc b/regression/cbmc-library/__builtin_ia32/pcmpeqw128.desc new file mode 100644 index 00000000000..c61fe25ed97 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpeqw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqw256.c b/regression/cbmc-library/__builtin_ia32/pcmpeqw256.c new file mode 100644 index 00000000000..39d72b8e0f9 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqw256.c @@ -0,0 +1,19 @@ +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +__gcc_v16hi __builtin_ia32_pcmpeqw256(__gcc_v16hi, __gcc_v16hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ==) for all inputs. + __gcc_v16hi a, b; + __gcc_v16hi r = __builtin_ia32_pcmpeqw256(a, b); + __gcc_v16hi ref = a == b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7] && + r[8] == ref[8] && r[9] == ref[9] && r[10] == ref[10] && + r[11] == ref[11] && r[12] == ref[12] && r[13] == ref[13] && + r[14] == ref[14] && r[15] == ref[15], + "__builtin_ia32_pcmpeqw256 == native =="); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpeqw256.desc b/regression/cbmc-library/__builtin_ia32/pcmpeqw256.desc new file mode 100644 index 00000000000..15a3db92b38 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpeqw256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpeqw256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtb128.c b/regression/cbmc-library/__builtin_ia32/pcmpgtb128.c new file mode 100644 index 00000000000..d77ce9868f5 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtb128.c @@ -0,0 +1,19 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_pcmpgtb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native >) for all inputs. + __gcc_v16qi a, b; + __gcc_v16qi r = __builtin_ia32_pcmpgtb128(a, b); + __gcc_v16qi ref = a > b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7] && + r[8] == ref[8] && r[9] == ref[9] && r[10] == ref[10] && + r[11] == ref[11] && r[12] == ref[12] && r[13] == ref[13] && + r[14] == ref[14] && r[15] == ref[15], + "__builtin_ia32_pcmpgtb128 == native >"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtb128.desc b/regression/cbmc-library/__builtin_ia32/pcmpgtb128.desc new file mode 100644 index 00000000000..68134593e5a --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpgtb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtb256.c b/regression/cbmc-library/__builtin_ia32/pcmpgtb256.c new file mode 100644 index 00000000000..e0a35f5f8e4 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtb256.c @@ -0,0 +1,24 @@ +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +__gcc_v32qi __builtin_ia32_pcmpgtb256(__gcc_v32qi, __gcc_v32qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native >) for all inputs. + __gcc_v32qi a, b; + __gcc_v32qi r = __builtin_ia32_pcmpgtb256(a, b); + __gcc_v32qi ref = a > b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7] && + r[8] == ref[8] && r[9] == ref[9] && r[10] == ref[10] && + r[11] == ref[11] && r[12] == ref[12] && r[13] == ref[13] && + r[14] == ref[14] && r[15] == ref[15] && r[16] == ref[16] && + r[17] == ref[17] && r[18] == ref[18] && r[19] == ref[19] && + r[20] == ref[20] && r[21] == ref[21] && r[22] == ref[22] && + r[23] == ref[23] && r[24] == ref[24] && r[25] == ref[25] && + r[26] == ref[26] && r[27] == ref[27] && r[28] == ref[28] && + r[29] == ref[29] && r[30] == ref[30] && r[31] == ref[31], + "__builtin_ia32_pcmpgtb256 == native >"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtb256.desc b/regression/cbmc-library/__builtin_ia32/pcmpgtb256.desc new file mode 100644 index 00000000000..efe7db4e7a9 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtb256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpgtb256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtd128.c b/regression/cbmc-library/__builtin_ia32/pcmpgtd128.c new file mode 100644 index 00000000000..4f292a46afa --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtd128.c @@ -0,0 +1,15 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_pcmpgtd128(__gcc_v4si, __gcc_v4si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native >) for all inputs. + __gcc_v4si a, b; + __gcc_v4si r = __builtin_ia32_pcmpgtd128(a, b); + __gcc_v4si ref = a > b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3], + "__builtin_ia32_pcmpgtd128 == native >"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtd128.desc b/regression/cbmc-library/__builtin_ia32/pcmpgtd128.desc new file mode 100644 index 00000000000..c98acf38b12 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtd128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpgtd128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtd256.c b/regression/cbmc-library/__builtin_ia32/pcmpgtd256.c new file mode 100644 index 00000000000..fdc03173c57 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtd256.c @@ -0,0 +1,16 @@ +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +__gcc_v8si __builtin_ia32_pcmpgtd256(__gcc_v8si, __gcc_v8si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native >) for all inputs. + __gcc_v8si a, b; + __gcc_v8si r = __builtin_ia32_pcmpgtd256(a, b); + __gcc_v8si ref = a > b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7], + "__builtin_ia32_pcmpgtd256 == native >"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtd256.desc b/regression/cbmc-library/__builtin_ia32/pcmpgtd256.desc new file mode 100644 index 00000000000..40838c745af --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtd256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpgtd256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtw128.c b/regression/cbmc-library/__builtin_ia32/pcmpgtw128.c new file mode 100644 index 00000000000..a6e1a85a5c1 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtw128.c @@ -0,0 +1,16 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pcmpgtw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native >) for all inputs. + __gcc_v8hi a, b; + __gcc_v8hi r = __builtin_ia32_pcmpgtw128(a, b); + __gcc_v8hi ref = a > b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7], + "__builtin_ia32_pcmpgtw128 == native >"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtw128.desc b/regression/cbmc-library/__builtin_ia32/pcmpgtw128.desc new file mode 100644 index 00000000000..3666d2ea036 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpgtw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtw256.c b/regression/cbmc-library/__builtin_ia32/pcmpgtw256.c new file mode 100644 index 00000000000..473493039c5 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtw256.c @@ -0,0 +1,19 @@ +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +__gcc_v16hi __builtin_ia32_pcmpgtw256(__gcc_v16hi, __gcc_v16hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native >) for all inputs. + __gcc_v16hi a, b; + __gcc_v16hi r = __builtin_ia32_pcmpgtw256(a, b); + __gcc_v16hi ref = a > b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3] && + r[4] == ref[4] && r[5] == ref[5] && r[6] == ref[6] && r[7] == ref[7] && + r[8] == ref[8] && r[9] == ref[9] && r[10] == ref[10] && + r[11] == ref[11] && r[12] == ref[12] && r[13] == ref[13] && + r[14] == ref[14] && r[15] == ref[15], + "__builtin_ia32_pcmpgtw256 == native >"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pcmpgtw256.desc b/regression/cbmc-library/__builtin_ia32/pcmpgtw256.desc new file mode 100644 index 00000000000..18e5aee5ed5 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pcmpgtw256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pcmpgtw256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmaxsb128.c b/regression/cbmc-library/__builtin_ia32/pmaxsb128.c new file mode 100644 index 00000000000..8d9056db084 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxsb128.c @@ -0,0 +1,13 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_pmaxsb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + __gcc_v16qi a = (__gcc_v16qi){ + 1, -2, 3, -4, 5, -6, 7, -8, 9, -10, 11, -12, 13, -14, 15, -16}; + __gcc_v16qi b = (__gcc_v16qi){ + -1, 2, -3, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13, 14, -15, 16}; + __gcc_v16qi r = __builtin_ia32_pmaxsb128(a, b); + __CPROVER_assert(r[0] == 1 && r[1] == 2, "max epi8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmaxsb128.desc b/regression/cbmc-library/__builtin_ia32/pmaxsb128.desc new file mode 100644 index 00000000000..96001578a50 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxsb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmaxsb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmaxsd128.c b/regression/cbmc-library/__builtin_ia32/pmaxsd128.c new file mode 100644 index 00000000000..4d9d07fbf93 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxsd128.c @@ -0,0 +1,11 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_pmaxsd128(__gcc_v4si, __gcc_v4si); + +int main() +{ + __gcc_v4si a = (__gcc_v4si){1, -2, 3, -4}; + __gcc_v4si b = (__gcc_v4si){-1, 2, -3, 4}; + __gcc_v4si r = __builtin_ia32_pmaxsd128(a, b); + __CPROVER_assert(r[0] == 1 && r[1] == 2, "max epi32"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmaxsd128.desc b/regression/cbmc-library/__builtin_ia32/pmaxsd128.desc new file mode 100644 index 00000000000..2a3c7a89a68 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxsd128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmaxsd128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmaxsw128.c b/regression/cbmc-library/__builtin_ia32/pmaxsw128.c new file mode 100644 index 00000000000..285d0c96bfb --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxsw128.c @@ -0,0 +1,11 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pmaxsw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + __gcc_v8hi a = (__gcc_v8hi){1, -2, 3, -4, 5, -6, 7, -8}; + __gcc_v8hi b = (__gcc_v8hi){-1, 2, -3, 4, -5, 6, -7, 8}; + __gcc_v8hi r = __builtin_ia32_pmaxsw128(a, b); + __CPROVER_assert(r[0] == 1 && r[1] == 2, "max epi16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmaxsw128.desc b/regression/cbmc-library/__builtin_ia32/pmaxsw128.desc new file mode 100644 index 00000000000..47602b1aa00 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxsw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmaxsw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmaxub128.c b/regression/cbmc-library/__builtin_ia32/pmaxub128.c new file mode 100644 index 00000000000..2307506d037 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxub128.c @@ -0,0 +1,15 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_pmaxub128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Lane 0 distinguishes unsigned from signed: -1 is 0xFF, the largest value + // under unsigned comparison, so the unsigned max of {-1, 0} is -1. + __gcc_v16qi a = + (__gcc_v16qi){-1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; + __gcc_v16qi b = + (__gcc_v16qi){0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1}; + __gcc_v16qi r = __builtin_ia32_pmaxub128(a, b); + __CPROVER_assert(r[0] == -1 && r[15] == 16, "max epu8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmaxub128.desc b/regression/cbmc-library/__builtin_ia32/pmaxub128.desc new file mode 100644 index 00000000000..c0a91eb8daf --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxub128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmaxub128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmaxud128.c b/regression/cbmc-library/__builtin_ia32/pmaxud128.c new file mode 100644 index 00000000000..e2a59ca4915 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxud128.c @@ -0,0 +1,14 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_pmaxud128(__gcc_v4si, __gcc_v4si); + +int main() +{ + // Lane 0 distinguishes unsigned from signed: -1 is 0xFFFFFFFF, the largest + // value under unsigned comparison, so the unsigned max of {-1, 0} is -1 + // (a signed max would pick 0). + __gcc_v4si a = (__gcc_v4si){-1, 2, 3, 4}; + __gcc_v4si b = (__gcc_v4si){0, 3, 2, 1}; + __gcc_v4si r = __builtin_ia32_pmaxud128(a, b); + __CPROVER_assert(r[0] == -1 && r[3] == 4, "max epu32"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmaxud128.desc b/regression/cbmc-library/__builtin_ia32/pmaxud128.desc new file mode 100644 index 00000000000..8e734356995 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxud128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmaxud128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmaxud256.c b/regression/cbmc-library/__builtin_ia32/pmaxud256.c new file mode 100644 index 00000000000..c5bec14a94f --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxud256.c @@ -0,0 +1,12 @@ +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +__gcc_v8si __builtin_ia32_pmaxud256(__gcc_v8si, __gcc_v8si); + +int main() +{ + // Lane 0: -1 is 0xFFFFFFFF, the unsigned max of {-1, 0}. + __gcc_v8si a = (__gcc_v8si){-1, 2, 3, 4, 5, 6, 7, 8}; + __gcc_v8si b = (__gcc_v8si){0, 3, 2, 1, 0, 0, 0, 0}; + __gcc_v8si r = __builtin_ia32_pmaxud256(a, b); + __CPROVER_assert(r[0] == -1 && r[3] == 4, "max epu32 (256)"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmaxud256.desc b/regression/cbmc-library/__builtin_ia32/pmaxud256.desc new file mode 100644 index 00000000000..93b79986958 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxud256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmaxud256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmaxuw128.c b/regression/cbmc-library/__builtin_ia32/pmaxuw128.c new file mode 100644 index 00000000000..8b973b207e1 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxuw128.c @@ -0,0 +1,13 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pmaxuw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + // Lane 0 distinguishes unsigned from signed: -1 is 0xFFFF, the largest + // value under unsigned comparison, so the unsigned max of {-1, 0} is -1. + __gcc_v8hi a = (__gcc_v8hi){-1, 2, 3, 4, 5, 6, 7, 8}; + __gcc_v8hi b = (__gcc_v8hi){0, 7, 6, 5, 4, 3, 2, 1}; + __gcc_v8hi r = __builtin_ia32_pmaxuw128(a, b); + __CPROVER_assert(r[0] == -1 && r[7] == 8, "max epu16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmaxuw128.desc b/regression/cbmc-library/__builtin_ia32/pmaxuw128.desc new file mode 100644 index 00000000000..4dd45fa9dad --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmaxuw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmaxuw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pminsb128.c b/regression/cbmc-library/__builtin_ia32/pminsb128.c new file mode 100644 index 00000000000..bd9bc28963c --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminsb128.c @@ -0,0 +1,15 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_pminsb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + __gcc_v16qi a = (__gcc_v16qi){ + 1, -2, 3, -4, 5, -6, 7, -8, 9, -10, 11, -12, 13, -14, 15, -16}; + __gcc_v16qi b = (__gcc_v16qi){ + -1, 2, -3, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13, 14, -15, 16}; + __gcc_v16qi r = __builtin_ia32_pminsb128(a, b); + // Compare as bytes: -1, -2 cast to char yield 0xFF, 0xFE on either + // signedness. + __CPROVER_assert(r[0] == (char)-1 && r[1] == (char)-2, "min epi8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pminsb128.desc b/regression/cbmc-library/__builtin_ia32/pminsb128.desc new file mode 100644 index 00000000000..0643d0b6179 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminsb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pminsb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pminsd128.c b/regression/cbmc-library/__builtin_ia32/pminsd128.c new file mode 100644 index 00000000000..0fda0a35a62 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminsd128.c @@ -0,0 +1,11 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_pminsd128(__gcc_v4si, __gcc_v4si); + +int main() +{ + __gcc_v4si a = (__gcc_v4si){1, -2, 3, -4}; + __gcc_v4si b = (__gcc_v4si){-1, 2, -3, 4}; + __gcc_v4si r = __builtin_ia32_pminsd128(a, b); + __CPROVER_assert(r[0] == -1 && r[1] == -2, "min epi32"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pminsd128.desc b/regression/cbmc-library/__builtin_ia32/pminsd128.desc new file mode 100644 index 00000000000..9f5975a043f --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminsd128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pminsd128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pminsw128.c b/regression/cbmc-library/__builtin_ia32/pminsw128.c new file mode 100644 index 00000000000..fb474403b6d --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminsw128.c @@ -0,0 +1,11 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pminsw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + __gcc_v8hi a = (__gcc_v8hi){1, -2, 3, -4, 5, -6, 7, -8}; + __gcc_v8hi b = (__gcc_v8hi){-1, 2, -3, 4, -5, 6, -7, 8}; + __gcc_v8hi r = __builtin_ia32_pminsw128(a, b); + __CPROVER_assert(r[0] == -1 && r[1] == -2, "min epi16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pminsw128.desc b/regression/cbmc-library/__builtin_ia32/pminsw128.desc new file mode 100644 index 00000000000..014e50d7944 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminsw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pminsw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pminub128.c b/regression/cbmc-library/__builtin_ia32/pminub128.c new file mode 100644 index 00000000000..c68ee28c946 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminub128.c @@ -0,0 +1,15 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_pminub128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Lane 0 distinguishes unsigned from signed: -1 is 0xFF, the largest value + // under unsigned comparison, so the unsigned min of {-1, 0} is 0. + __gcc_v16qi a = + (__gcc_v16qi){-1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; + __gcc_v16qi b = + (__gcc_v16qi){0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1}; + __gcc_v16qi r = __builtin_ia32_pminub128(a, b); + __CPROVER_assert(r[0] == 0 && r[15] == 1, "min epu8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pminub128.desc b/regression/cbmc-library/__builtin_ia32/pminub128.desc new file mode 100644 index 00000000000..ed4bebc51b7 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminub128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pminub128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pminud128.c b/regression/cbmc-library/__builtin_ia32/pminud128.c new file mode 100644 index 00000000000..c97bc1a77f8 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminud128.c @@ -0,0 +1,14 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_pminud128(__gcc_v4si, __gcc_v4si); + +int main() +{ + // Lane 0 distinguishes unsigned from signed: -1 is 0xFFFFFFFF, the largest + // value under unsigned comparison, so the unsigned min of {-1, 0} is 0 + // (a signed min would pick -1). + __gcc_v4si a = (__gcc_v4si){-1, 2, 3, 4}; + __gcc_v4si b = (__gcc_v4si){0, 3, 2, 1}; + __gcc_v4si r = __builtin_ia32_pminud128(a, b); + __CPROVER_assert(r[0] == 0 && r[3] == 1, "min epu32"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pminud128.desc b/regression/cbmc-library/__builtin_ia32/pminud128.desc new file mode 100644 index 00000000000..2d44dc3dff0 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminud128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pminud128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pminuw128.c b/regression/cbmc-library/__builtin_ia32/pminuw128.c new file mode 100644 index 00000000000..e8e04258b50 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminuw128.c @@ -0,0 +1,13 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pminuw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + // Lane 0 distinguishes unsigned from signed: -1 is 0xFFFF, the largest + // value under unsigned comparison, so the unsigned min of {-1, 0} is 0. + __gcc_v8hi a = (__gcc_v8hi){-1, 2, 3, 4, 5, 6, 7, 8}; + __gcc_v8hi b = (__gcc_v8hi){0, 7, 6, 5, 4, 3, 2, 1}; + __gcc_v8hi r = __builtin_ia32_pminuw128(a, b); + __CPROVER_assert(r[0] == 0 && r[7] == 1, "min epu16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pminuw128.desc b/regression/cbmc-library/__builtin_ia32/pminuw128.desc new file mode 100644 index 00000000000..b58b52930e7 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pminuw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pminuw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmulld128.c b/regression/cbmc-library/__builtin_ia32/pmulld128.c new file mode 100644 index 00000000000..a582362b705 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmulld128.c @@ -0,0 +1,16 @@ +#include + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_pmulld128(__gcc_v4si, __gcc_v4si); + +int main() +{ + // Lane 0 exercises two's-complement wraparound: INT_MAX * 2 keeps only the + // low 32 bits, 0xFFFFFFFE == -2. Run under --signed-overflow-check (see + // test.desc). + __gcc_v4si a = (__gcc_v4si){INT_MAX, 2, 3, 4}; + __gcc_v4si b = (__gcc_v4si){2, 6, 7, 8}; + __gcc_v4si r = __builtin_ia32_pmulld128(a, b); + __CPROVER_assert(r[0] == -2 && r[3] == 32, "mullo epi32"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmulld128.desc b/regression/cbmc-library/__builtin_ia32/pmulld128.desc new file mode 100644 index 00000000000..f0e3d66f8b1 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmulld128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmulld128.c +--signed-overflow-check +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pmullw128.c b/regression/cbmc-library/__builtin_ia32/pmullw128.c new file mode 100644 index 00000000000..dab334b8aa6 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmullw128.c @@ -0,0 +1,11 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_pmullw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + __gcc_v8hi a = (__gcc_v8hi){1, 2, 3, 4, 5, 6, 7, 8}; + __gcc_v8hi b = (__gcc_v8hi){2, 3, 4, 5, 6, 7, 8, 9}; + __gcc_v8hi r = __builtin_ia32_pmullw128(a, b); + __CPROVER_assert(r[0] == 2 && r[1] == 6 && r[7] == 72, "mullo epi16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pmullw128.desc b/regression/cbmc-library/__builtin_ia32/pmullw128.desc new file mode 100644 index 00000000000..a5c22e6c33b --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pmullw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pmullw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/por128.c b/regression/cbmc-library/__builtin_ia32/por128.c new file mode 100644 index 00000000000..443d917d23e --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/por128.c @@ -0,0 +1,14 @@ +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +__gcc_v2di __builtin_ia32_por128(__gcc_v2di, __gcc_v2di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native |) for all inputs. + __gcc_v2di a, b; + __gcc_v2di r = __builtin_ia32_por128(a, b); + __gcc_v2di ref = a | b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1], "__builtin_ia32_por128 == native |"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/por128.desc b/regression/cbmc-library/__builtin_ia32/por128.desc new file mode 100644 index 00000000000..49e47ee5685 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/por128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +por128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/por256.c b/regression/cbmc-library/__builtin_ia32/por256.c new file mode 100644 index 00000000000..de40cdb55d0 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/por256.c @@ -0,0 +1,15 @@ +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +__gcc_v4di __builtin_ia32_por256(__gcc_v4di, __gcc_v4di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native |) for all inputs. + __gcc_v4di a, b; + __gcc_v4di r = __builtin_ia32_por256(a, b); + __gcc_v4di ref = a | b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3], + "__builtin_ia32_por256 == native |"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/por256.desc b/regression/cbmc-library/__builtin_ia32/por256.desc new file mode 100644 index 00000000000..586bd591757 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/por256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +por256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psllwi128.c b/regression/cbmc-library/__builtin_ia32/psllwi128.c new file mode 100644 index 00000000000..b6e38827275 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psllwi128.c @@ -0,0 +1,11 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_psllwi128(__gcc_v8hi, int); + +int main() +{ + __gcc_v8hi a = (__gcc_v8hi){1, 2, 3, 4, 5, 6, 7, 8}; + __gcc_v8hi r = __builtin_ia32_psllwi128(a, 4); // logical left by 4 + __gcc_v8hi z = __builtin_ia32_psllwi128(a, 20); // count >= 16 -> 0 + __CPROVER_assert(r[0] == 16 && r[1] == 32 && z[0] == 0, "slli epi16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psllwi128.desc b/regression/cbmc-library/__builtin_ia32/psllwi128.desc new file mode 100644 index 00000000000..1db4a839ec1 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psllwi128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psllwi128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psradi128.c b/regression/cbmc-library/__builtin_ia32/psradi128.c new file mode 100644 index 00000000000..bcac8fcf821 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psradi128.c @@ -0,0 +1,13 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_psradi128(__gcc_v4si, int); + +int main() +{ + __gcc_v4si a = (__gcc_v4si){-16, 8, -1, 4}; + __gcc_v4si r = __builtin_ia32_psradi128(a, 2); // arithmetic right by 2 + // count >= 32 -> sign fill: -16 -> -1, 8 -> 0 + __gcc_v4si s = __builtin_ia32_psradi128(a, 40); + __CPROVER_assert( + r[0] == -4 && r[1] == 2 && s[0] == -1 && s[1] == 0, "srai epi32"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psradi128.desc b/regression/cbmc-library/__builtin_ia32/psradi128.desc new file mode 100644 index 00000000000..55d32b101bd --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psradi128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psradi128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psrlwi128.c b/regression/cbmc-library/__builtin_ia32/psrlwi128.c new file mode 100644 index 00000000000..12f4af8e8e3 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psrlwi128.c @@ -0,0 +1,12 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_psrlwi128(__gcc_v8hi, int); + +int main() +{ + // Logical (zero-fill) right shift: 0xFFFF (-1) >> 4 == 0x0FFF == 4095, + // distinguishing it from an arithmetic shift (which would give -1). + __gcc_v8hi a = (__gcc_v8hi){-1, 16, 3, 4, 5, 6, 7, 8}; + __gcc_v8hi r = __builtin_ia32_psrlwi128(a, 4); + __CPROVER_assert(r[0] == 4095 && r[1] == 1, "srli epi16"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psrlwi128.desc b/regression/cbmc-library/__builtin_ia32/psrlwi128.desc new file mode 100644 index 00000000000..3e27476f080 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psrlwi128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psrlwi128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubb.c b/regression/cbmc-library/__builtin_ia32/psubb.c new file mode 100644 index 00000000000..b3c09b7806e --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubb.c @@ -0,0 +1,18 @@ +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +__gcc_v8qi __builtin_ia32_psubb(__gcc_v8qi, __gcc_v8qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v8qi a, b; + __gcc_v8qi r = __builtin_ia32_psubb(a, b); + __gcc_v8qi_u ref = (__gcc_v8qi_u)a - (__gcc_v8qi_u)b; + __CPROVER_assert( + r[0] == (char)ref[0] && r[1] == (char)ref[1] && r[2] == (char)ref[2] && + r[3] == (char)ref[3] && r[4] == (char)ref[4] && r[5] == (char)ref[5] && + r[6] == (char)ref[6] && r[7] == (char)ref[7], + "__builtin_ia32_psubb == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubb.desc b/regression/cbmc-library/__builtin_ia32/psubb.desc new file mode 100644 index 00000000000..71728a90042 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubb.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubb.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubb128.c b/regression/cbmc-library/__builtin_ia32/psubb128.c new file mode 100644 index 00000000000..fe94fef8cd1 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubb128.c @@ -0,0 +1,22 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_psubb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v16qi a, b; + __gcc_v16qi r = __builtin_ia32_psubb128(a, b); + __gcc_v16qi_u ref = (__gcc_v16qi_u)a - (__gcc_v16qi_u)b; + __CPROVER_assert( + r[0] == (char)ref[0] && r[1] == (char)ref[1] && r[2] == (char)ref[2] && + r[3] == (char)ref[3] && r[4] == (char)ref[4] && r[5] == (char)ref[5] && + r[6] == (char)ref[6] && r[7] == (char)ref[7] && r[8] == (char)ref[8] && + r[9] == (char)ref[9] && r[10] == (char)ref[10] && + r[11] == (char)ref[11] && r[12] == (char)ref[12] && + r[13] == (char)ref[13] && r[14] == (char)ref[14] && + r[15] == (char)ref[15], + "__builtin_ia32_psubb128 == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubb128.desc b/regression/cbmc-library/__builtin_ia32/psubb128.desc new file mode 100644 index 00000000000..40976fde992 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubb256.c b/regression/cbmc-library/__builtin_ia32/psubb256.c new file mode 100644 index 00000000000..bbfc355e285 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubb256.c @@ -0,0 +1,30 @@ +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); +__gcc_v32qi __builtin_ia32_psubb256(__gcc_v32qi, __gcc_v32qi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v32qi a, b; + __gcc_v32qi r = __builtin_ia32_psubb256(a, b); + __gcc_v32qi_u ref = (__gcc_v32qi_u)a - (__gcc_v32qi_u)b; + __CPROVER_assert( + r[0] == (char)ref[0] && r[1] == (char)ref[1] && r[2] == (char)ref[2] && + r[3] == (char)ref[3] && r[4] == (char)ref[4] && r[5] == (char)ref[5] && + r[6] == (char)ref[6] && r[7] == (char)ref[7] && r[8] == (char)ref[8] && + r[9] == (char)ref[9] && r[10] == (char)ref[10] && + r[11] == (char)ref[11] && r[12] == (char)ref[12] && + r[13] == (char)ref[13] && r[14] == (char)ref[14] && + r[15] == (char)ref[15] && r[16] == (char)ref[16] && + r[17] == (char)ref[17] && r[18] == (char)ref[18] && + r[19] == (char)ref[19] && r[20] == (char)ref[20] && + r[21] == (char)ref[21] && r[22] == (char)ref[22] && + r[23] == (char)ref[23] && r[24] == (char)ref[24] && + r[25] == (char)ref[25] && r[26] == (char)ref[26] && + r[27] == (char)ref[27] && r[28] == (char)ref[28] && + r[29] == (char)ref[29] && r[30] == (char)ref[30] && + r[31] == (char)ref[31], + "__builtin_ia32_psubb256 == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubb256.desc b/regression/cbmc-library/__builtin_ia32/psubb256.desc new file mode 100644 index 00000000000..a13661f6b8f --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubb256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubb256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubd.c b/regression/cbmc-library/__builtin_ia32/psubd.c new file mode 100644 index 00000000000..3e223f870c9 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubd.c @@ -0,0 +1,16 @@ +typedef int __gcc_v2si __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); +__gcc_v2si __builtin_ia32_psubd(__gcc_v2si, __gcc_v2si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v2si a, b; + __gcc_v2si r = __builtin_ia32_psubd(a, b); + __gcc_v2si_u ref = (__gcc_v2si_u)a - (__gcc_v2si_u)b; + __CPROVER_assert( + r[0] == (int)ref[0] && r[1] == (int)ref[1], + "__builtin_ia32_psubd == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubd.desc b/regression/cbmc-library/__builtin_ia32/psubd.desc new file mode 100644 index 00000000000..3b628b4af52 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubd.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubd.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubd128.c b/regression/cbmc-library/__builtin_ia32/psubd128.c new file mode 100644 index 00000000000..630d8101ac5 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubd128.c @@ -0,0 +1,17 @@ +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); +__gcc_v4si __builtin_ia32_psubd128(__gcc_v4si, __gcc_v4si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v4si a, b; + __gcc_v4si r = __builtin_ia32_psubd128(a, b); + __gcc_v4si_u ref = (__gcc_v4si_u)a - (__gcc_v4si_u)b; + __CPROVER_assert( + r[0] == (int)ref[0] && r[1] == (int)ref[1] && r[2] == (int)ref[2] && + r[3] == (int)ref[3], + "__builtin_ia32_psubd128 == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubd128.desc b/regression/cbmc-library/__builtin_ia32/psubd128.desc new file mode 100644 index 00000000000..1ca34f34565 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubd128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubd128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubd256.c b/regression/cbmc-library/__builtin_ia32/psubd256.c new file mode 100644 index 00000000000..001597b2bd2 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubd256.c @@ -0,0 +1,18 @@ +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); +__gcc_v8si __builtin_ia32_psubd256(__gcc_v8si, __gcc_v8si); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v8si a, b; + __gcc_v8si r = __builtin_ia32_psubd256(a, b); + __gcc_v8si_u ref = (__gcc_v8si_u)a - (__gcc_v8si_u)b; + __CPROVER_assert( + r[0] == (int)ref[0] && r[1] == (int)ref[1] && r[2] == (int)ref[2] && + r[3] == (int)ref[3] && r[4] == (int)ref[4] && r[5] == (int)ref[5] && + r[6] == (int)ref[6] && r[7] == (int)ref[7], + "__builtin_ia32_psubd256 == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubd256.desc b/regression/cbmc-library/__builtin_ia32/psubd256.desc new file mode 100644 index 00000000000..b0a24982bb3 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubd256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubd256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubq128.c b/regression/cbmc-library/__builtin_ia32/psubq128.c new file mode 100644 index 00000000000..aec057f6561 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubq128.c @@ -0,0 +1,16 @@ +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); +__gcc_v2di __builtin_ia32_psubq128(__gcc_v2di, __gcc_v2di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v2di a, b; + __gcc_v2di r = __builtin_ia32_psubq128(a, b); + __gcc_v2di_u ref = (__gcc_v2di_u)a - (__gcc_v2di_u)b; + __CPROVER_assert( + r[0] == (long long)ref[0] && r[1] == (long long)ref[1], + "__builtin_ia32_psubq128 == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubq128.desc b/regression/cbmc-library/__builtin_ia32/psubq128.desc new file mode 100644 index 00000000000..aa61b3b8f3c --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubq128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubq128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubq256.c b/regression/cbmc-library/__builtin_ia32/psubq256.c new file mode 100644 index 00000000000..6474480d969 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubq256.c @@ -0,0 +1,17 @@ +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); +__gcc_v4di __builtin_ia32_psubq256(__gcc_v4di, __gcc_v4di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v4di a, b; + __gcc_v4di r = __builtin_ia32_psubq256(a, b); + __gcc_v4di_u ref = (__gcc_v4di_u)a - (__gcc_v4di_u)b; + __CPROVER_assert( + r[0] == (long long)ref[0] && r[1] == (long long)ref[1] && + r[2] == (long long)ref[2] && r[3] == (long long)ref[3], + "__builtin_ia32_psubq256 == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubq256.desc b/regression/cbmc-library/__builtin_ia32/psubq256.desc new file mode 100644 index 00000000000..230f733b7be --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubq256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubq256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubusb128.c b/regression/cbmc-library/__builtin_ia32/psubusb128.c new file mode 100644 index 00000000000..f8458eefb2b --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubusb128.c @@ -0,0 +1,14 @@ +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +__gcc_v16qi __builtin_ia32_psubusb128(__gcc_v16qi, __gcc_v16qi); + +int main() +{ + // Unsigned saturating subtract: 10-20 clamps to 0; the bytes 200-100 == 100. + __gcc_v16qi a = + (__gcc_v16qi){10, 200, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; + __gcc_v16qi b = + (__gcc_v16qi){20, 100, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}; + __gcc_v16qi r = __builtin_ia32_psubusb128(a, b); + __CPROVER_assert(r[0] == 0 && r[1] == 100, "subs epu8"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubusb128.desc b/regression/cbmc-library/__builtin_ia32/psubusb128.desc new file mode 100644 index 00000000000..0f8d81fef34 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubusb128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubusb128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubw.c b/regression/cbmc-library/__builtin_ia32/psubw.c new file mode 100644 index 00000000000..a81d07b56ff --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubw.c @@ -0,0 +1,17 @@ +typedef short __gcc_v4hi __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +__gcc_v4hi __builtin_ia32_psubw(__gcc_v4hi, __gcc_v4hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v4hi a, b; + __gcc_v4hi r = __builtin_ia32_psubw(a, b); + __gcc_v4hi_u ref = (__gcc_v4hi_u)a - (__gcc_v4hi_u)b; + __CPROVER_assert( + r[0] == (short)ref[0] && r[1] == (short)ref[1] && r[2] == (short)ref[2] && + r[3] == (short)ref[3], + "__builtin_ia32_psubw == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubw.desc b/regression/cbmc-library/__builtin_ia32/psubw.desc new file mode 100644 index 00000000000..f6a893ff5e2 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubw.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubw.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubw128.c b/regression/cbmc-library/__builtin_ia32/psubw128.c new file mode 100644 index 00000000000..1a0f193610f --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubw128.c @@ -0,0 +1,18 @@ +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +__gcc_v8hi __builtin_ia32_psubw128(__gcc_v8hi, __gcc_v8hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v8hi a, b; + __gcc_v8hi r = __builtin_ia32_psubw128(a, b); + __gcc_v8hi_u ref = (__gcc_v8hi_u)a - (__gcc_v8hi_u)b; + __CPROVER_assert( + r[0] == (short)ref[0] && r[1] == (short)ref[1] && r[2] == (short)ref[2] && + r[3] == (short)ref[3] && r[4] == (short)ref[4] && r[5] == (short)ref[5] && + r[6] == (short)ref[6] && r[7] == (short)ref[7], + "__builtin_ia32_psubw128 == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubw128.desc b/regression/cbmc-library/__builtin_ia32/psubw128.desc new file mode 100644 index 00000000000..ac9681939ac --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubw128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubw128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/psubw256.c b/regression/cbmc-library/__builtin_ia32/psubw256.c new file mode 100644 index 00000000000..d289f217475 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubw256.c @@ -0,0 +1,22 @@ +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); +__gcc_v16hi __builtin_ia32_psubw256(__gcc_v16hi, __gcc_v16hi); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native -) for all inputs. + __gcc_v16hi a, b; + __gcc_v16hi r = __builtin_ia32_psubw256(a, b); + __gcc_v16hi_u ref = (__gcc_v16hi_u)a - (__gcc_v16hi_u)b; + __CPROVER_assert( + r[0] == (short)ref[0] && r[1] == (short)ref[1] && r[2] == (short)ref[2] && + r[3] == (short)ref[3] && r[4] == (short)ref[4] && r[5] == (short)ref[5] && + r[6] == (short)ref[6] && r[7] == (short)ref[7] && r[8] == (short)ref[8] && + r[9] == (short)ref[9] && r[10] == (short)ref[10] && + r[11] == (short)ref[11] && r[12] == (short)ref[12] && + r[13] == (short)ref[13] && r[14] == (short)ref[14] && + r[15] == (short)ref[15], + "__builtin_ia32_psubw256 == native -"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/psubw256.desc b/regression/cbmc-library/__builtin_ia32/psubw256.desc new file mode 100644 index 00000000000..e8600412950 --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/psubw256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +psubw256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pxor128.c b/regression/cbmc-library/__builtin_ia32/pxor128.c new file mode 100644 index 00000000000..43fcce31ddc --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pxor128.c @@ -0,0 +1,14 @@ +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +__gcc_v2di __builtin_ia32_pxor128(__gcc_v2di, __gcc_v2di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ^) for all inputs. + __gcc_v2di a, b; + __gcc_v2di r = __builtin_ia32_pxor128(a, b); + __gcc_v2di ref = a ^ b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1], "__builtin_ia32_pxor128 == native ^"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pxor128.desc b/regression/cbmc-library/__builtin_ia32/pxor128.desc new file mode 100644 index 00000000000..1fb6c77d18a --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pxor128.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pxor128.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32/pxor256.c b/regression/cbmc-library/__builtin_ia32/pxor256.c new file mode 100644 index 00000000000..d4171685a1f --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pxor256.c @@ -0,0 +1,15 @@ +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +__gcc_v4di __builtin_ia32_pxor256(__gcc_v4di, __gcc_v4di); + +int main() +{ + // Exhaustive equivalence: the model must agree with CBMC's own + // vector semantics (native ^) for all inputs. + __gcc_v4di a, b; + __gcc_v4di r = __builtin_ia32_pxor256(a, b); + __gcc_v4di ref = a ^ b; + __CPROVER_assert( + r[0] == ref[0] && r[1] == ref[1] && r[2] == ref[2] && r[3] == ref[3], + "__builtin_ia32_pxor256 == native ^"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_ia32/pxor256.desc b/regression/cbmc-library/__builtin_ia32/pxor256.desc new file mode 100644 index 00000000000..4144dd8d1da --- /dev/null +++ b/regression/cbmc-library/__builtin_ia32/pxor256.desc @@ -0,0 +1,8 @@ +CORE gcc-only +pxor256.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_ia32_sfence/main.c b/regression/cbmc-library/__builtin_ia32/sfence.c similarity index 100% rename from regression/cbmc-library/__builtin_ia32_sfence/main.c rename to regression/cbmc-library/__builtin_ia32/sfence.c diff --git a/regression/cbmc-library/__builtin_ia32_sfence/test.desc b/regression/cbmc-library/__builtin_ia32/sfence.desc similarity index 92% rename from regression/cbmc-library/__builtin_ia32_sfence/test.desc rename to regression/cbmc-library/__builtin_ia32/sfence.desc index 9542d988e8d..325b8348c21 100644 --- a/regression/cbmc-library/__builtin_ia32_sfence/test.desc +++ b/regression/cbmc-library/__builtin_ia32/sfence.desc @@ -1,5 +1,5 @@ KNOWNBUG -main.c +sfence.c --pointer-check --bounds-check ^EXIT=0$ ^SIGNAL=0$ diff --git a/regression/cbmc-library/__builtin_neon/vabdq_v.c b/regression/cbmc-library/__builtin_neon/vabdq_v.c new file mode 100644 index 00000000000..0daa95aaff0 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vabdq_v.c @@ -0,0 +1,14 @@ +// The NEON builtin is declared by the front-end (gcc_builtin_headers_aarch64.h) +// under an AArch64 target, and its body comes from the cprover library model in +// src/ansi-c/library/arm_neon.c. The absolute difference of any vector with +// itself is zero, regardless of the lane interpretation (type code 32 = s8). +typedef char v16qi __attribute__((vector_size(16))); + +int main() +{ + v16qi a; + v16qi r = __builtin_neon_vabdq_v(a, a, 32); + for(int i = 0; i < 16; i++) + __CPROVER_assert(r[i] == 0, "vabdq of equal vectors is zero"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_neon/vabdq_v.desc b/regression/cbmc-library/__builtin_neon/vabdq_v.desc new file mode 100644 index 00000000000..a5952cb5d38 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vabdq_v.desc @@ -0,0 +1,8 @@ +CORE gcc-only +vabdq_v.c +--arch arm64 +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_neon/vbslq_v.c b/regression/cbmc-library/__builtin_neon/vbslq_v.c new file mode 100644 index 00000000000..4e61e7e18b4 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vbslq_v.c @@ -0,0 +1,16 @@ +// Generated model for the bitwise-select builtin (mnemonic BSL). The operation +// is bit-level, so it is independent of the lane type code: each result bit +// comes from a where the mask bit is set, otherwise from b. +typedef signed char v16 __attribute__((vector_size(16))); +typedef char v16qi __attribute__((vector_size(16))); + +int main() +{ + v16 mask, a, b; + v16 r = (v16)__builtin_neon_vbslq_v((v16qi)mask, (v16qi)a, (v16qi)b, 32); + for(int i = 0; i < 16; i++) + __CPROVER_assert( + r[i] == (signed char)((mask[i] & a[i]) | (~mask[i] & b[i])), + "vbslq selects bits by mask"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_neon/vbslq_v.desc b/regression/cbmc-library/__builtin_neon/vbslq_v.desc new file mode 100644 index 00000000000..e371f4d9e7e --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vbslq_v.desc @@ -0,0 +1,8 @@ +CORE gcc-only +vbslq_v.c +--arch arm64 +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_neon/vhaddq_v.c b/regression/cbmc-library/__builtin_neon/vhaddq_v.c new file mode 100644 index 00000000000..7e3a4760864 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vhaddq_v.c @@ -0,0 +1,15 @@ +// Generated model for the halving-add builtin (mnemonic UHADD); type code 48 +// selects the uint8x16 interpretation. Check it against floor((a+b)/2). +typedef unsigned char v16 __attribute__((vector_size(16))); +typedef char v16qi __attribute__((vector_size(16))); + +int main() +{ + v16 a, b; + v16 r = (v16)__builtin_neon_vhaddq_v((v16qi)a, (v16qi)b, 48); + for(int i = 0; i < 16; i++) + __CPROVER_assert( + r[i] == (unsigned char)(((int)a[i] + (int)b[i]) >> 1), + "vhaddq_u8 == floor((a+b)/2)"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_neon/vhaddq_v.desc b/regression/cbmc-library/__builtin_neon/vhaddq_v.desc new file mode 100644 index 00000000000..3da17f4008f --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vhaddq_v.desc @@ -0,0 +1,8 @@ +CORE gcc-only +vhaddq_v.c +--arch arm64 +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_neon/vmaxq_v.c b/regression/cbmc-library/__builtin_neon/vmaxq_v.c new file mode 100644 index 00000000000..466efc00b81 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vmaxq_v.c @@ -0,0 +1,14 @@ +// The model for __builtin_neon_vmaxq_v is generated by +// scripts/generate_neon_models.py from arm_neon.td. Type code 32 selects the +// int8x16 lane interpretation; verify the model agrees with a per-lane max. +typedef signed char v16 __attribute__((vector_size(16))); +typedef char v16qi __attribute__((vector_size(16))); + +int main() +{ + v16 a, b; + v16 r = (v16)__builtin_neon_vmaxq_v((v16qi)a, (v16qi)b, 32); + for(int i = 0; i < 16; i++) + __CPROVER_assert(r[i] == (a[i] > b[i] ? a[i] : b[i]), "vmaxq_s8 == max"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_neon/vmaxq_v.desc b/regression/cbmc-library/__builtin_neon/vmaxq_v.desc new file mode 100644 index 00000000000..dcfd9328239 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vmaxq_v.desc @@ -0,0 +1,8 @@ +CORE gcc-only +vmaxq_v.c +--arch arm64 +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_neon/vpmaxq_v.c b/regression/cbmc-library/__builtin_neon/vpmaxq_v.c new file mode 100644 index 00000000000..0f54fa7b3f4 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vpmaxq_v.c @@ -0,0 +1,19 @@ +// Generated model for the pairwise-maximum builtin (mnemonic SMAXP); type code +// 32 selects int8x16. The result is the pairwise maxima of a followed by those +// of b -- exercises the reshaping code path. +typedef signed char v16 __attribute__((vector_size(16))); +typedef char v16qi __attribute__((vector_size(16))); + +int main() +{ + v16 a, b; + v16 r = (v16)__builtin_neon_vpmaxq_v((v16qi)a, (v16qi)b, 32); + for(int i = 0; i < 8; i++) + { + signed char ea = a[2 * i] > a[2 * i + 1] ? a[2 * i] : a[2 * i + 1]; + signed char eb = b[2 * i] > b[2 * i + 1] ? b[2 * i] : b[2 * i + 1]; + __CPROVER_assert(r[i] == ea, "vpmaxq_s8 lower half from a"); + __CPROVER_assert(r[8 + i] == eb, "vpmaxq_s8 upper half from b"); + } + return 0; +} diff --git a/regression/cbmc-library/__builtin_neon/vpmaxq_v.desc b/regression/cbmc-library/__builtin_neon/vpmaxq_v.desc new file mode 100644 index 00000000000..9d00f8df904 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vpmaxq_v.desc @@ -0,0 +1,8 @@ +CORE gcc-only +vpmaxq_v.c +--arch arm64 +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_neon/vqaddq_v.c b/regression/cbmc-library/__builtin_neon/vqaddq_v.c new file mode 100644 index 00000000000..6898e2ac9b7 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vqaddq_v.c @@ -0,0 +1,17 @@ +// Generated model for the saturating-add builtin (mnemonic SQADD); type code +// 32 selects the int8x16 interpretation. Check it against a clamped reference. +typedef signed char v16 __attribute__((vector_size(16))); +typedef char v16qi __attribute__((vector_size(16))); + +int main() +{ + v16 a, b; + v16 r = (v16)__builtin_neon_vqaddq_v((v16qi)a, (v16qi)b, 32); + for(int i = 0; i < 16; i++) + { + int s = (int)a[i] + (int)b[i]; + int ref = s < -128 ? -128 : (s > 127 ? 127 : s); + __CPROVER_assert(r[i] == ref, "vqaddq_s8 saturates"); + } + return 0; +} diff --git a/regression/cbmc-library/__builtin_neon/vqaddq_v.desc b/regression/cbmc-library/__builtin_neon/vqaddq_v.desc new file mode 100644 index 00000000000..61d389d75fb --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vqaddq_v.desc @@ -0,0 +1,8 @@ +CORE gcc-only +vqaddq_v.c +--arch arm64 +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc-library/__builtin_neon/vtstq_v.c b/regression/cbmc-library/__builtin_neon/vtstq_v.c new file mode 100644 index 00000000000..4a20457837b --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vtstq_v.c @@ -0,0 +1,14 @@ +// Generated model for the test-bits builtin (mnemonic CMTST); type code 32 +// selects int8x16. Each lane is all-ones where (a & b) is non-zero. +typedef signed char v16 __attribute__((vector_size(16))); +typedef char v16qi __attribute__((vector_size(16))); + +int main() +{ + v16 a, b; + v16 r = (v16)__builtin_neon_vtstq_v((v16qi)a, (v16qi)b, 32); + for(int i = 0; i < 16; i++) + __CPROVER_assert( + r[i] == ((a[i] & b[i]) != 0 ? -1 : 0), "vtstq_s8 sets lanes on bit test"); + return 0; +} diff --git a/regression/cbmc-library/__builtin_neon/vtstq_v.desc b/regression/cbmc-library/__builtin_neon/vtstq_v.desc new file mode 100644 index 00000000000..9085a3026e3 --- /dev/null +++ b/regression/cbmc-library/__builtin_neon/vtstq_v.desc @@ -0,0 +1,8 @@ +CORE gcc-only +vtstq_v.c +--arch arm64 +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc/SIMD_ia32_models/main.c b/regression/cbmc/SIMD_ia32_models/main.c new file mode 100644 index 00000000000..e6f304c54f7 --- /dev/null +++ b/regression/cbmc/SIMD_ia32_models/main.c @@ -0,0 +1,1014 @@ +// Auto-generated by scripts/generate_simd_smoke_test.py +// Exercises every modelled SIMD builtin once so the library models are +// type-checked, linked and symex'd. See doc/neon-intrinsic-models.md. + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef int __gcc_v16si __attribute__((__vector_size__(64))); +typedef unsigned int __gcc_v16si_u __attribute__((__vector_size__(64))); +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); +typedef unsigned short __gcc_v32hi_u __attribute__((__vector_size__(64))); +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef signed char __gcc_v64qi_s __attribute__((__vector_size__(64))); +typedef unsigned char __gcc_v64qi_u __attribute__((__vector_size__(64))); +typedef long long __gcc_v8di __attribute__((__vector_size__(64))); +typedef unsigned long long __gcc_v8di_u __attribute__((__vector_size__(64))); +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +int main(void) +{ + { + __gcc_v32qi a0 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pabsb256(a0); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pabsw256(a0); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_paddb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_paddb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_paddb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + __gcc_v8si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8si r = __builtin_ia32_paddd256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v2di a0 = {0}; + __gcc_v2di a1 = {0}; + __gcc_v2di a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v2di r = __builtin_ia32_paddq128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4di a0 = {0}; + __gcc_v4di a1 = {0}; + __gcc_v4di a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v4di r = __builtin_ia32_paddq256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8di a0 = {0}; + __gcc_v8di a1 = {0}; + __gcc_v8di a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8di r = __builtin_ia32_paddq512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_paddsb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_paddsb256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_paddsb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_paddsb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_paddsw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_paddsw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_paddsw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_paddsw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_paddusb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_paddusb256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_paddusb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_paddusb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_paddusw128(a0, a1); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_paddusw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_paddusw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_paddusw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_paddusw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_paddw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_paddw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_paddw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_pavgb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pavgb256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pavgb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_pavgb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_pavgw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pavgw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pavgw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_pavgw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_pmaxsb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pmaxsb256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pmaxsb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_pmaxsb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4si a0 = {0}; + __gcc_v4si a1 = {0}; + __gcc_v4si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v4si r = __builtin_ia32_pmaxsd128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pmaxsd256(a0, a1); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + __gcc_v8si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pmaxsd256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16si a0 = {0}; + __gcc_v16si a1 = {0}; + __gcc_v16si a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16si r = __builtin_ia32_pmaxsd512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_pmaxsw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pmaxsw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pmaxsw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_pmaxsw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_pmaxub128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pmaxub256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pmaxub256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_pmaxub512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4si a0 = {0}; + __gcc_v4si a1 = {0}; + __gcc_v4si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v4si r = __builtin_ia32_pmaxud128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + __gcc_v8si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pmaxud256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16si a0 = {0}; + __gcc_v16si a1 = {0}; + __gcc_v16si a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16si r = __builtin_ia32_pmaxud512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_pmaxuw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pmaxuw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pmaxuw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_pmaxuw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_pminsb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pminsb256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pminsb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_pminsb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4si a0 = {0}; + __gcc_v4si a1 = {0}; + __gcc_v4si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v4si r = __builtin_ia32_pminsd128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pminsd256(a0, a1); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + __gcc_v8si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pminsd256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16si a0 = {0}; + __gcc_v16si a1 = {0}; + __gcc_v16si a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16si r = __builtin_ia32_pminsd512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_pminsw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pminsw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pminsw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_pminsw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_pminub128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pminub256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_pminub256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_pminub512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4si a0 = {0}; + __gcc_v4si a1 = {0}; + __gcc_v4si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v4si r = __builtin_ia32_pminud128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pminud256(a0, a1); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + __gcc_v8si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pminud256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16si a0 = {0}; + __gcc_v16si a1 = {0}; + __gcc_v16si a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16si r = __builtin_ia32_pminud512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_pminuw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pminuw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pminuw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_pminuw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4si a0 = {0}; + __gcc_v4si a1 = {0}; + __gcc_v4si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v4si r = __builtin_ia32_pmulld128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pmulld256(a0, a1); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + __gcc_v8si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pmulld256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16si a0 = {0}; + __gcc_v16si a1 = {0}; + __gcc_v16si a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16si r = __builtin_ia32_pmulld512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_pmullw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pmullw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_pmullw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_pmullw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4si a0 = {0}; + volatile __gcc_v4si r = __builtin_ia32_pslldi128(a0, 1); + (void)r; + } + { + __gcc_v8si a0 = {0}; + volatile __gcc_v8si r = __builtin_ia32_pslldi256(a0, 1); + (void)r; + } + { + __gcc_v2di a0 = {0}; + volatile __gcc_v2di r = __builtin_ia32_psllqi128(a0, 1); + (void)r; + } + { + __gcc_v4di a0 = {0}; + volatile __gcc_v4di r = __builtin_ia32_psllqi256(a0, 1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_psllwi256(a0, 1); + (void)r; + } + { + __gcc_v8si a0 = {0}; + volatile __gcc_v8si r = __builtin_ia32_psradi256(a0, 1); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_psrawi128(a0, 1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_psrawi256(a0, 1); + (void)r; + } + { + __gcc_v4si a0 = {0}; + volatile __gcc_v4si r = __builtin_ia32_psrldi128(a0, 1); + (void)r; + } + { + __gcc_v8si a0 = {0}; + volatile __gcc_v8si r = __builtin_ia32_psrldi256(a0, 1); + (void)r; + } + { + __gcc_v2di a0 = {0}; + volatile __gcc_v2di r = __builtin_ia32_psrlqi128(a0, 1); + (void)r; + } + { + __gcc_v4di a0 = {0}; + volatile __gcc_v4di r = __builtin_ia32_psrlqi256(a0, 1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_psrlwi256(a0, 1); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_psubb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_psubb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_psubb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4si a0 = {0}; + __gcc_v4si a1 = {0}; + __gcc_v4si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v4si r = __builtin_ia32_psubd128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8si a0 = {0}; + __gcc_v8si a1 = {0}; + __gcc_v8si a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8si r = __builtin_ia32_psubd256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16si a0 = {0}; + __gcc_v16si a1 = {0}; + __gcc_v16si a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16si r = __builtin_ia32_psubd512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v2di a0 = {0}; + __gcc_v2di a1 = {0}; + __gcc_v2di a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v2di r = __builtin_ia32_psubq128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v4di a0 = {0}; + __gcc_v4di a1 = {0}; + __gcc_v4di a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v4di r = __builtin_ia32_psubq256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8di a0 = {0}; + __gcc_v8di a1 = {0}; + __gcc_v8di a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8di r = __builtin_ia32_psubq512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_psubsb128(a0, a1); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_psubsb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_psubsb256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_psubsb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_psubsb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_psubsw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_psubsw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_psubsw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_psubsw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + __gcc_v16qi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16qi r = __builtin_ia32_psubusb128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_psubusb256(a0, a1); + (void)r; + } + { + __gcc_v32qi a0 = {0}; + __gcc_v32qi a1 = {0}; + __gcc_v32qi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32qi r = __builtin_ia32_psubusb256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v64qi a0 = {0}; + __gcc_v64qi a1 = {0}; + __gcc_v64qi a2 = {0}; + unsigned long long a3 = {0}; + volatile __gcc_v64qi r = __builtin_ia32_psubusb512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_psubusw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_psubusw256(a0, a1); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_psubusw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_psubusw512_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v8hi a0 = {0}; + __gcc_v8hi a1 = {0}; + __gcc_v8hi a2 = {0}; + unsigned char a3 = {0}; + volatile __gcc_v8hi r = __builtin_ia32_psubw128_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v16hi a0 = {0}; + __gcc_v16hi a1 = {0}; + __gcc_v16hi a2 = {0}; + unsigned short a3 = {0}; + volatile __gcc_v16hi r = __builtin_ia32_psubw256_mask(a0, a1, a2, a3); + (void)r; + } + { + __gcc_v32hi a0 = {0}; + __gcc_v32hi a1 = {0}; + __gcc_v32hi a2 = {0}; + unsigned int a3 = {0}; + volatile __gcc_v32hi r = __builtin_ia32_psubw512_mask(a0, a1, a2, a3); + (void)r; + } + __CPROVER_assert(1, "SIMD model smoke test"); + return 0; +} diff --git a/regression/cbmc/SIMD_ia32_models/test.desc b/regression/cbmc/SIMD_ia32_models/test.desc new file mode 100644 index 00000000000..83b8819429a --- /dev/null +++ b/regression/cbmc/SIMD_ia32_models/test.desc @@ -0,0 +1,8 @@ +CORE gcc-only +main.c + +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/regression/cbmc/SIMD_neon_models/main.c b/regression/cbmc/SIMD_neon_models/main.c new file mode 100644 index 00000000000..a6de2a7d2a2 --- /dev/null +++ b/regression/cbmc/SIMD_neon_models/main.c @@ -0,0 +1,143 @@ +// Auto-generated by scripts/generate_simd_smoke_test.py +// Exercises every modelled SIMD builtin once so the library models are +// type-checked, linked and symex'd. See doc/neon-intrinsic-models.md. + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef long long __gcc_v1di_s __attribute__((__vector_size__(8))); +typedef unsigned long long __gcc_v1di_u __attribute__((__vector_size__(8))); +typedef long long __gcc_v2di_s __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); + +int main(void) +{ + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vabd_v(a0, a1, 0); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + __gcc_v8qi a2 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vbsl_v(a0, a1, a2, 1); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vhadd_v(a0, a1, 0); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vhsub_v(a0, a1, 0); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + volatile __gcc_v16qi r = __builtin_neon_vhsubq_v(a0, a1, 32); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vmax_v(a0, a1, 0); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vmin_v(a0, a1, 0); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + volatile __gcc_v16qi r = __builtin_neon_vminq_v(a0, a1, 32); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vpadd_v(a0, a1, 0); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + volatile __gcc_v16qi r = __builtin_neon_vpaddq_v(a0, a1, 32); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vpmax_v(a0, a1, 0); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vpmin_v(a0, a1, 0); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + volatile __gcc_v16qi r = __builtin_neon_vpminq_v(a0, a1, 32); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vqadd_v(a0, a1, 0); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vqsub_v(a0, a1, 0); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + volatile __gcc_v16qi r = __builtin_neon_vqsubq_v(a0, a1, 32); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vrhadd_v(a0, a1, 0); + (void)r; + } + { + __gcc_v16qi a0 = {0}; + __gcc_v16qi a1 = {0}; + volatile __gcc_v16qi r = __builtin_neon_vrhaddq_v(a0, a1, 32); + (void)r; + } + { + __gcc_v8qi a0 = {0}; + __gcc_v8qi a1 = {0}; + volatile __gcc_v8qi r = __builtin_neon_vtst_v(a0, a1, 0); + (void)r; + } + __CPROVER_assert(1, "SIMD model smoke test"); + return 0; +} diff --git a/regression/cbmc/SIMD_neon_models/test.desc b/regression/cbmc/SIMD_neon_models/test.desc new file mode 100644 index 00000000000..95d3979cc1f --- /dev/null +++ b/regression/cbmc/SIMD_neon_models/test.desc @@ -0,0 +1,8 @@ +CORE gcc-only +main.c +--arch arm64 +^EXIT=0$ +^SIGNAL=0$ +^VERIFICATION SUCCESSFUL$ +-- +^warning: ignoring diff --git a/scripts/check_intrinsic_models_sync.sh b/scripts/check_intrinsic_models_sync.sh new file mode 100755 index 00000000000..c2c2717ebfd --- /dev/null +++ b/scripts/check_intrinsic_models_sync.sh @@ -0,0 +1,29 @@ +#!/usr/bin/env bash +# +# Verify that src/ansi-c/library/x86_intrinsics.c is in sync with its +# generator (scripts/generate_intrinsic_models.py). The generated library +# must never be hand-edited; this check fails if regenerating it would +# produce a different file, so a stale committed copy (or a MODELS change +# without regeneration) is caught in CI. + +set -e + +script_dir=$(cd "$(dirname "$0")" && pwd) +root=$(cd "$script_dir/.." && pwd) +committed="$root/src/ansi-c/library/x86_intrinsics.c" + +tmp=$(mktemp) +trap 'rm -f "$tmp"' EXIT + +python3 "$script_dir/generate_intrinsic_models.py" --cbmc-root "$root" -o "$tmp" + +if ! diff -u "$committed" "$tmp"; then + echo + echo "ERROR: src/ansi-c/library/x86_intrinsics.c is out of sync with" + echo "scripts/generate_intrinsic_models.py. Regenerate it with:" + echo " python3 scripts/generate_intrinsic_models.py \\" + echo " -o src/ansi-c/library/x86_intrinsics.c" + exit 1 +fi + +echo "x86_intrinsics.c is in sync with the generator." diff --git a/scripts/generate_intrinsic_models.py b/scripts/generate_intrinsic_models.py new file mode 100755 index 00000000000..a2d3c5f8c15 --- /dev/null +++ b/scripts/generate_intrinsic_models.py @@ -0,0 +1,809 @@ +#!/usr/bin/env python3 +""" +Generate CBMC library models for x86 SIMD intrinsics. + +Models are described by the curated MODELS table below -- one entry per Intel +_mm_* intrinsic, giving the element type, lane count, per-lane body and (where +applicable) signedness, shift parameter, equivalence oracle and AVX-512 mask +type. Wider-vector (AVX2 256-bit, AVX-512 512-bit) and merge-masked variants +are derived automatically from the 128-bit base entries. Each model is emitted +as a CBMC library function keyed by its GCC __builtin_ia32_* name, and the tool +cross-checks against the __builtin_ia32_* declarations shipped in CBMC's +compiler headers (src/ansi-c/compiler_headers/gcc_builtin_headers_ia32*.h) so +that it only emits models for builtins CBMC actually knows about. + +The MODELS table is the authoritative, human-reviewed source of truth. The XML +modes below are *maintainer aids* for extending it; they never feed the +committed library directly. + +Modes +----- + -o FILE + (Re)generate the library models into FILE (normally + src/ansi-c/library/x86_intrinsics.c). Output is piped through + clang-format-15 so regeneration is idempotent. CI re-runs this and + diffs the result via scripts/check_intrinsic_models_sync.sh, so the + committed file must always match the generator. + + --status + Print a coverage report: which declared __builtin_ia32_* builtins are + modeled, grouped by CPUID feature. Use this to see what is left to do. + + --status --xml data-latest.xml + As --status, plus a survey of which not-yet-modeled builtins have an + in Intel's Intrinsics Guide XML that is simple enough for the + --emit-drafts translator to handle (see below). Helps pick the next + tractable batch to model. + + --emit-drafts data-latest.xml + Maintainer aid for growing the MODELS table. Translates the simple + element-wise pseudocode of not-yet-modeled intrinsics (see + parse_operation() for the exact accepted shape) into *draft* Model() + entries printed to stdout for review, and self-checks the translator by + re-deriving the geometry of the hand-written models and reporting any + mismatch. The drafts are intentionally incomplete: the translator does + NOT infer signedness or apply the UB-hardening (unsigned wrapping + arithmetic, modular negation) that correct models need, so a human must + finish and move each draft into MODELS. Nothing is written to the + library by this mode. + + --emit-tests DIR + Write exhaustive-equivalence regression tests (model == CBMC's native + vector operator for all inputs) under DIR for every model with an + oracle. Used to (re)generate the per-function cbmc-library tests. + +Typical workflow for adding intrinsics +-------------------------------------- + 1. scripts/generate_intrinsic_models.py --status --xml data-latest.xml + to find declared-but-unmodeled builtins with tractable pseudocode; + 2. --emit-drafts data-latest.xml to get draft Model() entries; + 3. review/finish each draft (signedness, UB-hardening) and add it to MODELS; + 4. -o src/ansi-c/library/x86_intrinsics.c to regenerate the library; + 5. --emit-tests regression/cbmc-library/__builtin_ia32 to refresh tests. + +The Intel Intrinsics Guide XML used by --xml/--emit-drafts can be downloaded +from: + https://www.intel.com/content/dam/develop/public/us/en/include/intrinsics-guide/data-latest.xml +""" + +import argparse +import glob +import os +import re +import shutil +import subprocess +import sys +import xml.etree.ElementTree as ET +from dataclasses import dataclass + +# GCC vector types used in CBMC headers, keyed by (element_c_type, count) +VEC_TYPES = { + ("char", 16): "__gcc_v16qi", + ("short", 8): "__gcc_v8hi", + ("int", 4): "__gcc_v4si", + ("long long", 2): "__gcc_v2di", + ("float", 4): "__gcc_v4sf", + ("double", 2): "__gcc_v2df", + ("char", 8): "__gcc_v8qi", + ("short", 4): "__gcc_v4hi", + ("int", 2): "__gcc_v2si", + # 256-bit (AVX2) + ("char", 32): "__gcc_v32qi", + ("short", 16): "__gcc_v16hi", + ("int", 8): "__gcc_v8si", + ("long long", 4): "__gcc_v4di", + # 512-bit (AVX-512) + ("char", 64): "__gcc_v64qi", + ("short", 32): "__gcc_v32hi", + ("int", 16): "__gcc_v16si", + ("long long", 8): "__gcc_v8di", +} + +# AVX-512 write-mask C type (__mmask8/16/32/64) for a given lane count: the +# smallest mask type with at least one bit per lane. +def mask_type_for(count): + if count <= 8: + return "unsigned char" + if count <= 16: + return "unsigned short" + if count <= 32: + return "unsigned int" + if count <= 64: + return "unsigned long long" + return None + +# Bytes per element C type. +ELEM_SIZE = {"char": 1, "short": 2, "int": 4, "long long": 8} + +# The library file this tool owns and (re)generates. Its models are this +# tool's own output, so they are excluded from the "already modeled elsewhere" +# check that decides what to emit (keeping regeneration idempotent). +GENERATED_LIBRARY = os.path.join("src", "ansi-c", "library", "x86_intrinsics.c") + + +@dataclass +class Model: + """A single per-element SIMD intrinsic model. + + builtin: the GCC __builtin_ia32_* name the model implements. + elem: base element C type ("char", "short", "int", "long long"). + count: number of lanes. + body: per-element body template using {a}, {b} (operands) and {j} + (lane index), assigned to dst[j]. + sign: how the per-element operation is carried out, by aliasing the + operands to a vector of the chosen signedness before the loop and + casting the result back: + "" - use the (signed-by-default) vector type as-is; + "signed" - force signed semantics. Needed where 'char' may be + unsigned (e.g. ARM): without this '< 0' is always + false (-Werror=type-limits in library_check.sh) and + 'a > b' would silently become an unsigned compare, + which is wrong for signed intrinsics like + _mm_max_epi8; + "unsigned" - perform the operation in the matching unsigned type. + Used both for genuinely unsigned intrinsics (min/max + epu*, avg) and for the wrapping signed-arithmetic + intrinsics (add/sub/mullo on 32/64-bit lanes), where + 'int + int' etc. would be signed-overflow UB: + unsigned arithmetic is well-defined modular and the + cast back reproduces the two's-complement result. + scalar2: C type of a scalar second parameter (e.g. "int" for a shift + count) instead of a second vector operand. When set, the body + refers to it as {b} (a scalar, not {b}[{j}]). + oracle: a native C vector operator ("+", "-", "*") for which CBMC's own + vector semantics provide an independent reference; --emit-tests + then generates an exhaustive equivalence proof (model == native + operator for all inputs). + mask_type: when set (to an __mmask C type), this is an AVX-512 merge-masked + variant: the function takes (a, b, merge-source, mask) and each + lane is the base body if its mask bit is set, else the merge + source. body/sign describe the underlying (unmasked) operation. + """ + builtin: str + elem: str + count: int + body: str + sign: str = "" + scalar2: str = None + oracle: str = None + mask_type: str = None + + +# Intel _mm_* name -> Model +MODELS = { + # --- add (32/64-bit done unsigned to avoid signed-overflow UB) --- + "_mm_add_epi8": Model("__builtin_ia32_paddb128", "char", 16, "{a}[{j}] + {b}[{j}]", oracle="+"), + "_mm_add_epi16": Model("__builtin_ia32_paddw128", "short", 8, "{a}[{j}] + {b}[{j}]", oracle="+"), + "_mm_add_epi32": Model("__builtin_ia32_paddd128", "int", 4, "{a}[{j}] + {b}[{j}]", sign="unsigned", oracle="+"), + "_mm_add_epi64": Model("__builtin_ia32_paddq128", "long long", 2, "{a}[{j}] + {b}[{j}]", sign="unsigned", oracle="+"), + # --- sub (ditto) --- + "_mm_sub_epi8": Model("__builtin_ia32_psubb128", "char", 16, "{a}[{j}] - {b}[{j}]", oracle="-"), + "_mm_sub_epi16": Model("__builtin_ia32_psubw128", "short", 8, "{a}[{j}] - {b}[{j}]", oracle="-"), + "_mm_sub_epi32": Model("__builtin_ia32_psubd128", "int", 4, "{a}[{j}] - {b}[{j}]", sign="unsigned", oracle="-"), + "_mm_sub_epi64": Model("__builtin_ia32_psubq128", "long long", 2, "{a}[{j}] - {b}[{j}]", sign="unsigned", oracle="-"), + # --- min/max signed --- + "_mm_min_epi8": Model("__builtin_ia32_pminsb128", "char", 16, "{a}[{j}] < {b}[{j}] ? {a}[{j}] : {b}[{j}]", sign="signed"), + "_mm_min_epi16": Model("__builtin_ia32_pminsw128", "short", 8, "{a}[{j}] < {b}[{j}] ? {a}[{j}] : {b}[{j}]"), + "_mm_min_epi32": Model("__builtin_ia32_pminsd128", "int", 4, "{a}[{j}] < {b}[{j}] ? {a}[{j}] : {b}[{j}]"), + "_mm_max_epi8": Model("__builtin_ia32_pmaxsb128", "char", 16, "{a}[{j}] > {b}[{j}] ? {a}[{j}] : {b}[{j}]", sign="signed"), + "_mm_max_epi16": Model("__builtin_ia32_pmaxsw128", "short", 8, "{a}[{j}] > {b}[{j}] ? {a}[{j}] : {b}[{j}]"), + "_mm_max_epi32": Model("__builtin_ia32_pmaxsd128", "int", 4, "{a}[{j}] > {b}[{j}] ? {a}[{j}] : {b}[{j}]"), + # --- min/max unsigned --- + "_mm_min_epu8": Model("__builtin_ia32_pminub128", "char", 16, "{a}[{j}] < {b}[{j}] ? {a}[{j}] : {b}[{j}]", sign="unsigned"), + "_mm_max_epu8": Model("__builtin_ia32_pmaxub128", "char", 16, "{a}[{j}] > {b}[{j}] ? {a}[{j}] : {b}[{j}]", sign="unsigned"), + "_mm_min_epu16": Model("__builtin_ia32_pminuw128", "short", 8, "{a}[{j}] < {b}[{j}] ? {a}[{j}] : {b}[{j}]", sign="unsigned"), + "_mm_max_epu16": Model("__builtin_ia32_pmaxuw128", "short", 8, "{a}[{j}] > {b}[{j}] ? {a}[{j}] : {b}[{j}]", sign="unsigned"), + "_mm_min_epu32": Model("__builtin_ia32_pminud128", "int", 4, "{a}[{j}] < {b}[{j}] ? {a}[{j}] : {b}[{j}]", sign="unsigned"), + "_mm_max_epu32": Model("__builtin_ia32_pmaxud128", "int", 4, "{a}[{j}] > {b}[{j}] ? {a}[{j}] : {b}[{j}]", sign="unsigned"), + # --- abs (32-bit uses unsigned modular negation to avoid -INT_MIN UB) --- + "_mm_abs_epi8": Model("__builtin_ia32_pabsb128", "char", 16, "{a}[{j}] < 0 ? -{a}[{j}] : {a}[{j}]", sign="signed"), + "_mm_abs_epi16": Model("__builtin_ia32_pabsw128", "short", 8, "{a}[{j}] < 0 ? -{a}[{j}] : {a}[{j}]"), + "_mm_abs_epi32": Model("__builtin_ia32_pabsd128", "int", 4, "{a}[{j}] < 0 ? (int)(0u - (unsigned){a}[{j}]) : {a}[{j}]"), + # --- compare (result is all-1s or all-0s per element) --- + "_mm_cmpeq_epi8": Model("__builtin_ia32_pcmpeqb128", "char", 16, "{a}[{j}] == {b}[{j}] ? -1 : 0", oracle="=="), + "_mm_cmpeq_epi16": Model("__builtin_ia32_pcmpeqw128", "short", 8, "{a}[{j}] == {b}[{j}] ? -1 : 0", oracle="=="), + "_mm_cmpeq_epi32": Model("__builtin_ia32_pcmpeqd128", "int", 4, "{a}[{j}] == {b}[{j}] ? -1 : 0", oracle="=="), + "_mm_cmpgt_epi8": Model("__builtin_ia32_pcmpgtb128", "char", 16, "{a}[{j}] > {b}[{j}] ? -1 : 0", sign="signed", oracle=">"), + "_mm_cmpgt_epi16": Model("__builtin_ia32_pcmpgtw128", "short", 8, "{a}[{j}] > {b}[{j}] ? -1 : 0", oracle=">"), + "_mm_cmpgt_epi32": Model("__builtin_ia32_pcmpgtd128", "int", 4, "{a}[{j}] > {b}[{j}] ? -1 : 0", oracle=">"), + # --- average unsigned --- + "_mm_avg_epu8": Model("__builtin_ia32_pavgb128", "char", 16, "({a}[{j}] + {b}[{j}] + 1) >> 1", sign="unsigned"), + "_mm_avg_epu16": Model("__builtin_ia32_pavgw128", "short", 8, "({a}[{j}] + {b}[{j}] + 1) >> 1", sign="unsigned"), + # --- mullo (low half of multiply; 32-bit done unsigned to avoid UB) --- + "_mm_mullo_epi16": Model("__builtin_ia32_pmullw128", "short", 8, "{a}[{j}] * {b}[{j}]"), + "_mm_mullo_epi32": Model("__builtin_ia32_pmulld128", "int", 4, "{a}[{j}] * {b}[{j}]", sign="unsigned"), + # --- bitwise (whole-register; modelled on 64-bit lanes) --- + "_mm_and_si128": Model("__builtin_ia32_pand128", "long long", 2, "{a}[{j}] & {b}[{j}]", oracle="&"), + "_mm_or_si128": Model("__builtin_ia32_por128", "long long", 2, "{a}[{j}] | {b}[{j}]", oracle="|"), + "_mm_xor_si128": Model("__builtin_ia32_pxor128", "long long", 2, "{a}[{j}] ^ {b}[{j}]", oracle="^"), + "_mm_andnot_si128": Model("__builtin_ia32_pandn128", "long long", 2, "~{a}[{j}] & {b}[{j}]", oracle="andnot"), + # --- MMX 64-bit add/sub (32-bit lanes done unsigned to avoid UB) --- + "_mm_add_pi8": Model("__builtin_ia32_paddb", "char", 8, "{a}[{j}] + {b}[{j}]", oracle="+"), + "_mm_add_pi16": Model("__builtin_ia32_paddw", "short", 4, "{a}[{j}] + {b}[{j}]", oracle="+"), + "_mm_add_pi32": Model("__builtin_ia32_paddd", "int", 2, "{a}[{j}] + {b}[{j}]", sign="unsigned", oracle="+"), + "_mm_sub_pi8": Model("__builtin_ia32_psubb", "char", 8, "{a}[{j}] - {b}[{j}]", oracle="-"), + "_mm_sub_pi16": Model("__builtin_ia32_psubw", "short", 4, "{a}[{j}] - {b}[{j}]", oracle="-"), + "_mm_sub_pi32": Model("__builtin_ia32_psubd", "int", 2, "{a}[{j}] - {b}[{j}]", sign="unsigned", oracle="-"), + # --- saturating add: clamp to the element type's range --- + "_mm_adds_epi8": Model("__builtin_ia32_paddsb128", "char", 16, "({a}[{j}] + {b}[{j}]) < -128 ? -128 : ({a}[{j}] + {b}[{j}]) > 127 ? 127 : {a}[{j}] + {b}[{j}]", sign="signed"), + "_mm_adds_epi16": Model("__builtin_ia32_paddsw128", "short", 8, "({a}[{j}] + {b}[{j}]) < -32768 ? -32768 : ({a}[{j}] + {b}[{j}]) > 32767 ? 32767 : {a}[{j}] + {b}[{j}]"), + "_mm_adds_epu8": Model("__builtin_ia32_paddusb128", "char", 16, "({a}[{j}] + {b}[{j}]) > 255 ? 255 : {a}[{j}] + {b}[{j}]", sign="unsigned"), + "_mm_adds_epu16": Model("__builtin_ia32_paddusw128", "short", 8, "({a}[{j}] + {b}[{j}]) > 65535 ? 65535 : {a}[{j}] + {b}[{j}]", sign="unsigned"), + # --- saturating sub --- + "_mm_subs_epi8": Model("__builtin_ia32_psubsb128", "char", 16, "({a}[{j}] - {b}[{j}]) < -128 ? -128 : ({a}[{j}] - {b}[{j}]) > 127 ? 127 : {a}[{j}] - {b}[{j}]", sign="signed"), + "_mm_subs_epi16": Model("__builtin_ia32_psubsw128", "short", 8, "({a}[{j}] - {b}[{j}]) < -32768 ? -32768 : ({a}[{j}] - {b}[{j}]) > 32767 ? 32767 : {a}[{j}] - {b}[{j}]"), + "_mm_subs_epu8": Model("__builtin_ia32_psubusb128", "char", 16, "({a}[{j}] - {b}[{j}]) < 0 ? 0 : {a}[{j}] - {b}[{j}]", sign="unsigned"), + "_mm_subs_epu16": Model("__builtin_ia32_psubusw128", "short", 8, "({a}[{j}] - {b}[{j}]) < 0 ? 0 : {a}[{j}] - {b}[{j}]", sign="unsigned"), + # --- shift by immediate (count in a scalar int) --- + # Logical shifts use unsigned lanes (well-defined modular shift); a count + # of >= element width yields 0. Casting the count to unsigned also makes a + # negative/out-of-range immediate clamp to "too large" rather than UB. + "_mm_slli_epi16": Model("__builtin_ia32_psllwi128", "short", 8, "(unsigned){b} >= 16 ? 0 : {a}[{j}] << {b}", sign="unsigned", scalar2="int"), + "_mm_slli_epi32": Model("__builtin_ia32_pslldi128", "int", 4, "(unsigned){b} >= 32 ? 0 : {a}[{j}] << {b}", sign="unsigned", scalar2="int"), + "_mm_slli_epi64": Model("__builtin_ia32_psllqi128", "long long", 2, "(unsigned){b} >= 64 ? 0 : {a}[{j}] << {b}", sign="unsigned", scalar2="int"), + "_mm_srli_epi16": Model("__builtin_ia32_psrlwi128", "short", 8, "(unsigned){b} >= 16 ? 0 : {a}[{j}] >> {b}", sign="unsigned", scalar2="int"), + "_mm_srli_epi32": Model("__builtin_ia32_psrldi128", "int", 4, "(unsigned){b} >= 32 ? 0 : {a}[{j}] >> {b}", sign="unsigned", scalar2="int"), + "_mm_srli_epi64": Model("__builtin_ia32_psrlqi128", "long long", 2, "(unsigned){b} >= 64 ? 0 : {a}[{j}] >> {b}", sign="unsigned", scalar2="int"), + # Arithmetic right shift uses signed lanes; a count of >= width yields the + # sign fill (-1 for negative inputs, 0 otherwise). + "_mm_srai_epi16": Model("__builtin_ia32_psrawi128", "short", 8, "(unsigned){b} >= 16 ? ({a}[{j}] < 0 ? -1 : 0) : {a}[{j}] >> {b}", scalar2="int"), + "_mm_srai_epi32": Model("__builtin_ia32_psradi128", "int", 4, "(unsigned){b} >= 32 ? ({a}[{j}] < 0 ? -1 : 0) : {a}[{j}] >> {b}", scalar2="int"), +} + + +def width_variants(declared): + """Derive wider-vector variants of the 128-bit base MODELS entries. + + The per-element body is width-independent, so a 256-bit (AVX2) variant + differs only in the builtin name (...128 -> ...256), the Intel name + (_mm_ -> _mm256_) and the lane count (doubled). A variant is produced only + when its builtin is actually declared in CBMC's headers. (512-bit AVX-512 + forms are mask-only -- e.g. ...512_mask -- and are handled separately.)""" + variants = {} + for intel_name, m in MODELS.items(): + if not m.builtin.endswith("128"): + continue + builtin256 = m.builtin[:-len("128")] + "256" + if builtin256 not in declared: + continue + name256 = intel_name.replace("_mm_", "_mm256_", 1) + variants[name256] = Model( + builtin256, m.elem, m.count * 2, m.body, m.sign, m.scalar2, + m.oracle) + return variants + + +def mask_variants(declared): + """Derive AVX-512 merge-masked variants (128-, 256- and 512-bit) of the + binary pointwise base entries (those with a second vector operand and no + scalar parameter), gated on the ..._mask builtin being declared. + The masking is a uniform wrapper over the base per-element body. (There is + no separate _maskz builtin for these ops: zero-masking is the _mask form + with a zero merge source.)""" + variants = {} + for intel_name, m in MODELS.items(): + if m.scalar2 or "{b}" not in m.body or not m.builtin.endswith("128"): + continue + # Masked compares (pcmp*_mask) are not merge-masked vector ops: they + # return an __mmask and take (a, b, k), so the merge-mask wrapper below + # would give them the wrong signature. Skip the compare base ops. + if m.oracle in ("==", ">"): + continue + mnemonic = m.builtin[len("__builtin_ia32_"):-len("128")] + for width, factor in (("128", 1), ("256", 2), ("512", 4)): + builtin = f"__builtin_ia32_{mnemonic}{width}_mask" + count = m.count * factor + mask_type = mask_type_for(count) + if (builtin not in declared or mask_type is None + or (m.elem, count) not in VEC_TYPES): + continue + prefix = "_mm_mask_" if width == "128" else f"_mm{width}_mask_" + name = intel_name.replace("_mm_", prefix, 1) + variants[name] = Model( + builtin, m.elem, count, m.body, m.sign, mask_type=mask_type) + return variants + + +def get_existing_models(cbmc_root, exclude=None): + """Collect __builtin_ia32_* models already present in the library. When + regenerating a file, that file is passed as *exclude* so its own models do + not count as "already present" (keeping regeneration idempotent).""" + models = set() + lib_dir = os.path.join(cbmc_root, "src", "ansi-c", "library") + exclude = os.path.abspath(exclude) if exclude else None + for fname in os.listdir(lib_dir): + if not fname.endswith(".c"): + continue + path = os.path.join(lib_dir, fname) + if exclude and os.path.abspath(path) == exclude: + continue + with open(path) as f: + for m in re.finditer(r'/\* FUNCTION: (__builtin_ia32_\w+)', f.read()): + models.add(m.group(1)) + return models + + +def get_declared_builtins(cbmc_root): + builtins = set() + pattern = os.path.join(cbmc_root, "src", "ansi-c", "compiler_headers", + "gcc_builtin_headers_ia32*.h") + for hdr in glob.glob(pattern): + with open(hdr) as f: + for m in re.finditer(r'(__builtin_ia32_\w+)', f.read()): + builtins.add(m.group(1)) + return builtins + + +def emit_model(model): + """Emit a CBMC library model function for a Model, or None if the + (element type, count) combination has no known GCC vector type.""" + vec_type = VEC_TYPES.get((model.elem, model.count)) + if vec_type is None: + return None + + total_bytes = model.count * ELEM_SIZE[model.elem] + vec_typedef = (f"typedef {model.elem} {vec_type} " + f"__attribute__((__vector_size__({total_bytes})));") + + lines = [f"/* FUNCTION: {model.builtin} */", "", vec_typedef, ""] + + # Determine the type the per-element operation runs in (work_type) and, + # if it differs from the public vector type, the alias typedef for it. + work_type = vec_type + if model.sign in ("signed", "unsigned"): + work_type = f"{vec_type}_{'u' if model.sign == 'unsigned' else 's'}" + work_typedef = (f"typedef {model.sign} {model.elem} {work_type} " + f"__attribute__((__vector_size__({total_bytes})));") + lines.insert(3, work_typedef) + + scalar = model.scalar2 is not None + n_params = 2 if (scalar or "{b}" in model.body) else 1 + # A scalar second operand is referred to directly as {b} (not {b}[{j}]). + body = model.body.format(a="a_", b=("b" if scalar else "b_"), j="j") + cast = f"({work_type})" if work_type != vec_type else "" + + if n_params == 1: + lines.append(f"{vec_type} {model.builtin}({vec_type} a)") + elif scalar: + lines.append( + f"{vec_type} {model.builtin}({vec_type} a, {model.scalar2} b)") + else: + lines.append(f"{vec_type} {model.builtin}({vec_type} a, {vec_type} b)") + + lines.append("{") + lines.append(f" {work_type} a_ = {cast}a;") + if n_params > 1 and not scalar: + lines.append(f" {work_type} b_ = {cast}b;") + lines.append(f" {work_type} dst;") + lines.append(f" for(int j = 0; j < {model.count}; j++)") + lines.append(f" dst[j] = {body};") + if work_type != vec_type: + lines.append(f" return ({vec_type})dst;") + else: + lines.append(" return dst;") + lines.append("}") + lines.append("") + return "\n".join(lines) + + +def emit_masked_model(model): + """Emit an AVX-512 merge-masked model: per lane, the base body if the + mask bit is set, otherwise the corresponding lane of the merge source.""" + vec_type = VEC_TYPES.get((model.elem, model.count)) + if vec_type is None: + return None + total_bytes = model.count * ELEM_SIZE[model.elem] + lines = [f"/* FUNCTION: {model.builtin} */", "", + f"typedef {model.elem} {vec_type} " + f"__attribute__((__vector_size__({total_bytes})));"] + work_type = vec_type + if model.sign in ("signed", "unsigned"): + work_type = f"{vec_type}_{'u' if model.sign == 'unsigned' else 's'}" + lines.append(f"typedef {model.sign} {model.elem} {work_type} " + f"__attribute__((__vector_size__({total_bytes})));") + lines.append("") + body = model.body.format(a="a_", b="b_", j="j") + cast = f"({work_type})" if work_type != vec_type else "" + lines.append(f"{vec_type} {model.builtin}({vec_type} a, {vec_type} b, " + f"{vec_type} src, {model.mask_type} k)") + lines.append("{") + lines.append(f" {work_type} a_ = {cast}a;") + lines.append(f" {work_type} b_ = {cast}b;") + lines.append(f" {vec_type} dst;") + lines.append(f" for(int j = 0; j < {model.count}; j++)") + lines.append(f" dst[j] = (k >> j) & 1 ? ({model.elem})({body}) : src[j];") + lines.append(" return dst;") + lines.append("}") + lines.append("") + return "\n".join(lines) + + +# --- Intel Intrinsics Guide XML survey (--xml) ----------------------------- + +# Base C element type for an element bit width. +_BITS_TO_ELEM = {8: "char", 16: "short", 32: "int", 64: "long long"} + + +def parse_operation(op_text): + """Translate a simple element-wise Intel into + (elem, count, body) for the generator, or None if it is not the supported + shape: a single 'FOR j := 0 to N' loop with one + 'dst[i+W-1:i] := ' assignment whose expression uses only the per-lane + operands a/b, the operators + - *, and parentheses. + + This deliberately does NOT infer signedness or apply the UB-hardening + (unsigned wrapping arithmetic etc.) that the hand-written MODELS use, so + its output is a draft for human review rather than a finished model.""" + if not op_text: + return None + # Drop trailing upper-bits-zero lines such as 'dst[MAX:256] := 0'. + lines = [ln for ln in op_text.strip().splitlines() + if not re.match(r'\s*dst\[(?:MAX|\d+):\d+\]\s*:=\s*0\s*$', ln)] + text = "\n".join(lines) + m_for = re.search(r'FOR\s+j\s*:=\s*0\s+to\s+(\d+)', text) + if not m_for or len(re.findall(r'\bFOR\b', text)) != 1: + return None + count = int(m_for.group(1)) + 1 + assignments = re.findall(r'dst\[i\+(\d+):i\]\s*:=\s*(.+)', text) + if len(assignments) != 1: + return None + width = int(assignments[0][0]) + 1 + elem = _BITS_TO_ELEM.get(width) + if elem is None: + return None + expr = assignments[0][1].strip() + # Reject widening/narrowing ops: every operand lane slice must have the + # same width as the destination lane (e.g. _mm_mul_epu32 reads 32-bit + # halves into a 64-bit dst and must not be translated element-wise). + operand_widths = {int(w) + 1 + for w in re.findall(r'\b[ab]\[i\+(\d+):i\]', expr)} + if operand_widths and operand_widths != {width}: + return None + # Per-lane slices a[i+W-1:i] / b[i+W-1:i] become {a}[{j}] / {b}[{j}]. + expr = re.sub(r'\ba\[i\+\d+:i\]', '{a}[{j}]', expr) + expr = re.sub(r'\bb\[i\+\d+:i\]', '{b}[{j}]', expr) + # Anything other than the lane placeholders, + - *, parentheses and + # whitespace means we do not fully understand the expression. + residue = re.sub(r'\{a\}\[\{j\}\]|\{b\}\[\{j\}\]|[-+*()\s]', '', expr) + if residue: + return None + return elem, count, expr + + +def xml_emit_drafts(xml_path, declared, existing, all_models): + """Return (drafts, geometry_mismatches). drafts maps a not-yet-modeled + declared builtin to (intel_name, elem, count, body) derived from its + pseudocode. geometry_mismatches lists already-modeled builtins where the + translator's (elem, count) disagrees with the hand-written Model -- a + self-check that the translator reads the pseudocode geometry correctly.""" + root = ET.parse(xml_path).getroot() + builtin_to_model = {m.builtin: m for m in all_models.values()} + modeled = set(builtin_to_model) | existing + drafts = {} + mismatches = [] + for intrinsic in root.iter("intrinsic"): + operation = intrinsic.find("operation") + parsed = parse_operation( + operation.text if operation is not None else None) + if not parsed: + continue + elem, count, body = parsed + for builtin in _builtin_candidates(intrinsic): + if builtin not in declared: + continue + model = builtin_to_model.get(builtin) + if model is not None: + if (model.elem, model.count) != (elem, count): + mismatches.append( + (builtin, (elem, count), (model.elem, model.count))) + elif builtin not in modeled: + drafts.setdefault( + builtin, (intrinsic.get("name"), elem, count, body)) + return drafts, mismatches + + +# Width (and hence GCC builtin suffix) implied by an form. +def _instruction_width(form): + f = (form or "").lower() + if "zmm" in f: + return "512" + if "ymm" in f: + return "256" + if "xmm" in f: + return "128" + if "mm" in f: + return "" # 64-bit MMX builtins typically carry no width suffix + return None + + +def _builtin_candidates(intrinsic): + """Best-effort set of GCC builtin names an might correspond to, + derived from its mnemonic(s) and register width. Heuristic: + AVX-512 masked variants and a few irregular names will not map.""" + out = set() + for instr in intrinsic.findall("instruction"): + mnemonic = (instr.get("name") or "").lower() + width = _instruction_width(instr.get("form")) + if mnemonic and width is not None: + out.add(f"__builtin_ia32_{mnemonic}{width}") + return out + + +def _is_auto_generatable(operation): + """Heuristic: does this pseudocode have the simple per-element + shape the generator can already emit (a single FOR loop assigning dst[...] + from a/b, with no control flow or helper-function calls)?""" + if not operation: + return False + s = operation.strip() + # exactly one FOR ... ENDFOR (note "ENDFOR" also contains "FOR") + if len(re.findall(r'\bFOR\b', s)) != 1 or "ENDFOR" not in s: + return False + if re.search(r'\b(CASE|IF|ELSE|RETURN|DEFINE|WHILE)\b', s): + return False + body = "\n".join(line for line in s.splitlines() + if not re.search(r'\b(FOR|ENDFOR)\b', line)) + if "dst[" not in body: + return False + # reject helper-function calls such as ABS(), SignExtend(), Saturate*() + if re.search(r'[A-Za-z_]\w*\s*\(', body): + return False + return True + + +def xml_autogen_candidates(xml_path, declared, existing): + """Return a sorted list of (intel_name, builtin) for not-yet-modeled + builtins whose Intel pseudocode looks auto-generatable, plus the total + number of auto-generatable intrinsics seen (regardless of mapping).""" + root = ET.parse(xml_path).getroot() + missing = declared - existing + candidates = {} + total_parseable = 0 + for intrinsic in root.iter("intrinsic"): + operation = intrinsic.find("operation") + op_text = operation.text if operation is not None else None + if not _is_auto_generatable(op_text): + continue + total_parseable += 1 + name = intrinsic.get("name") + for builtin in _builtin_candidates(intrinsic): + if builtin in missing: + candidates.setdefault(builtin, name) + return (sorted((name, b) for b, name in candidates.items()), + total_parseable) + + +def xml_cpuid_coverage(xml_path, declared, existing): + """Per CPUID feature, how many mappable-to-declared builtins are modeled. + Returns rows (feature, modeled_count, declared_count) sorted by declared + count descending. Grouped by the intrinsic's first element.""" + from collections import defaultdict + root = ET.parse(xml_path).getroot() + decl = defaultdict(set) + modeled = defaultdict(set) + for intrinsic in root.iter("intrinsic"): + feature = intrinsic.findtext("CPUID") or "(none)" + for builtin in _builtin_candidates(intrinsic): + if builtin in declared: + decl[feature].add(builtin) + if builtin in existing: + modeled[feature].add(builtin) + return [(feat, len(modeled[feat]), len(decl[feat])) + for feat in sorted(decl, key=lambda f: len(decl[f]), reverse=True)] + + +def format_output(text, assume_filename): + """Run generated C through clang-format so bodies of any length come out + matching the project style (and the CI clang-format check). A no-op on + already-clean output; if clang-format is unavailable the text is returned + unchanged (CI's clang-format check would then catch any divergence). + *assume_filename* tells clang-format which .clang-format / language to use.""" + for clang_format in ("clang-format-15", "clang-format"): + if shutil.which(clang_format): + result = subprocess.run( + [clang_format, "--assume-filename", assume_filename], + input=text, capture_output=True, text=True) + if result.returncode == 0: + return result.stdout + break + sys.stderr.write("warning: clang-format not found; output not reformatted\n") + return text + + +def equivalence_test(model): + """C source for an exhaustive equivalence test: model(a, b) must equal a + reference built from CBMC's native vector operators for all inputs. The + reference is independent of the library model (CBMC implements vector + operators directly). Returns None if the model has no oracle / no vector + type. + + Arithmetic references (+, -, *) are computed on unsigned lanes so they are + overflow-clean and wrap like the hardware; bitwise and comparison + references use the signed vector type directly (signedness is irrelevant + to & | ^ and ==, and pcmpgt is a signed compare).""" + if not model.oracle: + return None + vec = VEC_TYPES.get((model.elem, model.count)) + if vec is None: + return None + nbytes = model.count * ELEM_SIZE[model.elem] + decls = [f"typedef {model.elem} {vec} " + f"__attribute__((__vector_size__({nbytes})));"] + if model.oracle in ("+", "-", "*"): + uvec = vec + "_u" + decls.append(f"typedef unsigned {model.elem} {uvec} " + f"__attribute__((__vector_size__({nbytes})));") + ref_type = uvec + ref_expr = f"({uvec})a {model.oracle} ({uvec})b" + lane = lambda k: f"r[{k}] == ({model.elem})ref[{k}]" + desc = f"native {model.oracle}" + elif model.oracle == "andnot": + ref_type = vec + ref_expr = "~a & b" + lane = lambda k: f"r[{k}] == ref[{k}]" + desc = "native ~a & b" + else: # & | ^ == > + ref_type = vec + ref_expr = f"a {model.oracle} b" + lane = lambda k: f"r[{k}] == ref[{k}]" + desc = f"native {model.oracle}" + decls.append(f"{vec} {model.builtin}({vec}, {vec});") + lanes = " && ".join(lane(k) for k in range(model.count)) + return ( + "\n".join(decls) + "\n\n" + "int main()\n" + "{\n" + " // Exhaustive equivalence: the model must agree with CBMC's own\n" + f" // vector semantics ({desc}) for all inputs.\n" + f" {vec} a, b;\n" + f" {vec} r = {model.builtin}(a, b);\n" + f" {ref_type} ref = {ref_expr};\n" + f" __CPROVER_assert(\n {lanes},\n" + f' "{model.builtin} == {desc}");\n' + " return 0;\n" + "}\n") + + +TEST_DESC = ("CORE gcc-only\nmain.c\n\n" + "^EXIT=0$\n^SIGNAL=0$\n^VERIFICATION SUCCESSFUL$\n--\n" + "^warning: ignoring\n") + + +def emit_tests(out_dir, all_models): + """Write an exhaustive-equivalence regression test (main.c + test.desc) + under out_dir// for every model that has a native-operator + oracle. Returns the number of tests written.""" + written = 0 + for model in all_models.values(): + source = equivalence_test(model) + if source is None: + continue + test_dir = os.path.join(out_dir, model.builtin) + os.makedirs(test_dir, exist_ok=True) + main_c = os.path.join(test_dir, "main.c") + with open(main_c, "w") as f: + f.write(format_output(source, main_c)) + with open(os.path.join(test_dir, "test.desc"), "w") as f: + f.write(TEST_DESC) + written += 1 + return written + + +def main(): + p = argparse.ArgumentParser(description=__doc__, + formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--cbmc-root", default=".") + p.add_argument("-o", "--output") + p.add_argument("--status", action="store_true", + help="Show declared vs modeled intrinsics") + p.add_argument("--xml", + help="Intel Intrinsics Guide data-latest.xml; with --status, " + "survey which not-yet-modeled builtins have " + "auto-generatable pseudocode") + p.add_argument("--emit-drafts", metavar="XML", + help="Translate the simple element-wise pseudocode of " + "not-yet-modeled intrinsics into draft Model() entries " + "for review (signedness and UB-hardening still need a " + "human), and self-check the translator against the " + "hand-written models") + p.add_argument("--emit-tests", metavar="DIR", + help="Write exhaustive-equivalence regression tests " + "(model == CBMC's native vector operator for all " + "inputs) under DIR for every model with an oracle") + args = p.parse_args() + + existing = get_existing_models(args.cbmc_root) + declared = get_declared_builtins(args.cbmc_root) + # The base 128-bit MODELS plus derived wider-vector variants (gated on the + # builtin being declared) form the full set this tool can emit. + all_models = {**MODELS, **width_variants(declared), + **mask_variants(declared)} + + if args.emit_tests: + n = emit_tests(args.emit_tests, all_models) + sys.stderr.write(f"Wrote {n} equivalence test(s) under " + f"{args.emit_tests}\n") + return + + if args.emit_drafts: + drafts, mismatches = xml_emit_drafts( + args.emit_drafts, declared, existing, all_models) + sys.stderr.write( + f"Translator self-check: {len(mismatches)} geometry mismatch(es) " + f"against hand-written models.\n") + for builtin, got, want in mismatches: + sys.stderr.write(f" MISMATCH {builtin}: derived {got} vs {want}\n") + print(f"# {len(drafts)} draft model(s) from element-wise pseudocode.") + print("# Review each: infer signedness, and harden against signed UB") + print("# (unsigned wrapping arithmetic, modular negation) before use.") + for builtin in sorted(drafts): + iname, elem, count, body = drafts[builtin] + print(f' "{iname}": Model("{builtin}", "{elem}", {count}, ' + f'"{body}"),') + return + + if args.status: + print(f"Declared __builtin_ia32_* in CBMC headers: {len(declared)}") + print(f"Already modeled in library: {len(existing)}") + print(f"Missing models: {len(declared) - len(existing)}") + can = [(iname, m.builtin) for iname, m in all_models.items() + if m.builtin in declared and m.builtin not in existing] + print(f"\nCan auto-generate from MODELS ({len(can)}):") + for iname, bname in sorted(can, key=lambda x: x[1]): + print(f" {bname} ({iname})") + not_yet = declared - existing - {m.builtin for m in all_models.values()} + print(f"\nNot yet covered by this tool: {len(not_yet)}") + if args.xml: + candidates, total = xml_autogen_candidates( + args.xml, declared, existing) + print(f"\nIntel intrinsics with auto-generatable pseudocode: " + f"{total}") + print(f"... mapping to a not-yet-modeled CBMC builtin " + f"({len(candidates)}):") + for iname, bname in candidates: + print(f" {bname} ({iname})") + rows = xml_cpuid_coverage(args.xml, declared, existing) + print(f"\nCoverage by CPUID feature (modeled / mappable-declared):") + for feat, n_modeled, n_declared in rows: + print(f" {feat:20s} {n_modeled:5d} / {n_declared}") + return + + # Emit a model unless that builtin is already modeled in another library + # file (the owned GENERATED_LIBRARY is excluded so regeneration is + # idempotent rather than emitting nothing). + external = get_existing_models( + args.cbmc_root, exclude=os.path.join(args.cbmc_root, GENERATED_LIBRARY)) + models = [] + for intel_name, model in sorted(all_models.items(), + key=lambda x: x[1].builtin): + if model.builtin in external: + continue + if model.builtin not in declared: + print(f"Skip {model.builtin}: not declared in CBMC headers", + file=sys.stderr) + continue + emitted = (emit_masked_model(model) if model.mask_type + else emit_model(model)) + if emitted: + models.append(emitted) + + header = ( + "// x86 SIMD intrinsic models for CBMC\n" + "// Generated by scripts/generate_intrinsic_models.py\n" + f"// Models: {len(models)}\n\n" + ) + output = header + "\n".join(models) + output = format_output( + output, os.path.join(args.cbmc_root, GENERATED_LIBRARY)) + + if args.output: + with open(args.output, "w") as f: + f.write(output) + print(f"Generated {len(models)} models -> {args.output}", + file=sys.stderr) + else: + print(output) + + +if __name__ == "__main__": + main() diff --git a/scripts/generate_neon_models.py b/scripts/generate_neon_models.py new file mode 100644 index 00000000000..9071d95029e --- /dev/null +++ b/scripts/generate_neon_models.py @@ -0,0 +1,442 @@ +#!/usr/bin/env python3 +# +# Draft generator for ARM/AArch64 NEON CBMC library models. +# +# Two-source design: +# +# * Structure comes from Clang's arm_neon.td: which builtins exist, the +# element types each supports, and -- since Clang's NEON builtins are +# polymorphic -- the NeonTypeFlags type code that selects the lane type at +# each call site. An intrinsic defined with an *OpInst class is open-coded +# by into native C operators, so it needs no model and is +# skipped here; only the opaque SInst/IInst/... builtins are modelled. +# +# * Semantics come from OP_TABLE below. arm_neon.td carries no semantics +# the opaque builtins (the Operation field is OP_NONE), so the per-lane +# computation is supplied here. For the mechanically-translatable ops +# (min/max/absolute-difference/...) the body is obvious from the operation +# and is encoded directly. The non-trivial ops (saturating, rounding, +# narrowing, floating-point estimate, table, crypto, ...) need real +# pseudocode -- ultimately from ARM's machine-readable spec -- and are +# reported as unmodelled rather than guessed at. +# +# The emitted models match the declarations in gcc_builtin_headers_aarch64.h: +# every operand is the byte-representative lane type (__gcc_v16qi for 128-bit, +# __gcc_v8qi for 64-bit) plus an int type code, exactly as calls +# them. + +import argparse +import re +import shutil +import subprocess +import sys + +# NeonTypeFlags element-type enum (clang/Basic/TargetBuiltins.h) for the +# base types we model, plus the lane bit width. The full integer type code is +# EltType | (unsigned ? 0x10 : 0) | (quad ? 0x20 : 0) +INT_BASE = { + 'c': ('Int8', 0, 8), + 's': ('Int16', 1, 16), + 'i': ('Int32', 2, 32), + 'l': ('Int64', 3, 64), + } +UNSIGNED_FLAG = 0x10 +QUAD_FLAG = 0x20 + +# gcc vector typedef stem for a lane width (see gcc_builtin_headers_types). +STEM = {8: 'qi', 16: 'hi', 32: 'si', 64: 'di'} +# scalar C type for a lane. +SCALAR = { + (8, False): 'signed char', (8, True): 'unsigned char', + (16, False): 'short', (16, True): 'unsigned short', + (32, False): 'int', (32, True): 'unsigned int', + (64, False): 'long long', (64, True): 'unsigned long long', + } +# next wider *signed* type, used to compute a signed difference without +# overflow. +WIDER = {8: 'int', 16: 'int', 32: 'long long', 64: '__int128'} + + +def sat_bounds(signed, width): + """Return (lo, hi) C integer literals for a lane's saturation range.""" + if signed: + hi = 2 ** (width - 1) - 1 + if width == 64: + return '(-{}LL - 1)'.format(hi), '{}LL'.format(hi) + return str(-2 ** (width - 1)), str(hi) + hi = 2 ** width - 1 + if width == 64: + return '0', '{}ULL'.format(hi) + return '0', str(hi) + + +def lane_body(op, signed, width): + """Return the loop body computing r[i] from x[i], y[i] for one lane, for an + element-wise (non-reshaping) operation. Signed arithmetic is widened to + avoid signed-overflow undefined behaviour.""" + wide = WIDER[width] + if op == 'vmax': + return 'r[i] = x[i] > y[i] ? x[i] : y[i];' + if op == 'vmin': + return 'r[i] = x[i] < y[i] ? x[i] : y[i];' + if op == 'vabd': + if signed: + return ('{{ {w} d = ({w})x[i] - ({w})y[i]; ' + 'r[i] = d < 0 ? -d : d; }}').format(w=wide) + return 'r[i] = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i];' + if op == 'vhadd': # halving add: floor((a + b) / 2) + return 'r[i] = (({w})x[i] + ({w})y[i]) >> 1;'.format(w=wide) + if op == 'vhsub': # halving subtract + return 'r[i] = (({w})x[i] - ({w})y[i]) >> 1;'.format(w=wide) + if op == 'vrhadd': # rounding halving add: floor((a + b + 1) / 2) + return 'r[i] = (({w})x[i] + ({w})y[i] + 1) >> 1;'.format(w=wide) + if op == 'vqadd': # saturating add + lo, hi = sat_bounds(signed, width) + if width == 64 and signed: + # avoid __int128 (rejected by -pedantic): detect overflow on the + # wrapped sum instead of widening. + return ( + '{{ long long s = (long long)(' + '(unsigned long long)x[i] + (unsigned long long)y[i]); ' + 'r[i] = ((x[i] ^ s) & (y[i] ^ s)) < 0 ' + '? (x[i] < 0 ? {lo} : {hi}) : s; }}').format(lo=lo, hi=hi) + if width == 64: + return ('{{ unsigned long long s = x[i] + y[i]; ' + 'r[i] = s < x[i] ? {hi} : s; }}').format(hi=hi) + if signed: + return ('{{ {w} s = ({w})x[i] + ({w})y[i]; ' + 'r[i] = s < {lo} ? {lo} : (s > {hi} ? {hi} : s); }}' + ).format(w=wide, lo=lo, hi=hi) + return ('{{ {w} s = ({w})x[i] + ({w})y[i]; ' + 'r[i] = s > {hi} ? {hi} : s; }}').format(w=wide, hi=hi) + if op == 'vqsub': # saturating subtract + lo, hi = sat_bounds(signed, width) + if not signed: + return 'r[i] = x[i] > y[i] ? x[i] - y[i] : 0;' + if width == 64: + return ('{{ long long d = (long long)((unsigned long long)x[i] ' + '- (unsigned long long)y[i]); ' + 'r[i] = ((x[i] ^ y[i]) & (x[i] ^ d)) < 0 ' + '? (x[i] < 0 ? {lo} : {hi}) : d; }}').format(lo=lo, hi=hi) + return ('{{ {w} s = ({w})x[i] - ({w})y[i]; ' + 'r[i] = s < {lo} ? {lo} : (s > {hi} ? {hi} : s); }}' + ).format(w=wide, lo=lo, hi=hi) + if op == 'vtst': # test bits: all-ones per lane where (a & b) != 0 + return 'r[i] = (x[i] & y[i]) != 0 ? -1 : 0;' + raise KeyError(op) + + +def pair_reduce(op, signed, width, p, q): + """Return an expression combining two adjacent lanes p, q for a pairwise + (reshaping) operation.""" + if op == 'vpmax': + return '{p} > {q} ? {p} : {q}'.format(p=p, q=q) + if op == 'vpmin': + return '{p} < {q} ? {p} : {q}'.format(p=p, q=q) + if op == 'vpadd': # modular add; compute unsigned to avoid overflow UB + u = SCALAR[(width, True)] + return '({u}){p} + ({u}){q}'.format(u=u, p=p, q=q) + raise KeyError(op) + + +# Element-wise opaque builtins we can model directly (one lane in, one out). +OP_TABLE = {'vabd', 'vmax', 'vmin', 'vqadd', 'vqsub', 'vhadd', 'vhsub', + 'vrhadd', 'vtst'} +# Pairwise opaque builtins (reduce adjacent lane pairs, concatenating a, b). +PAIRWISE = {'vpadd', 'vpmax', 'vpmin'} +# Bitwise-select: r = (mask & a) | (~mask & b); bit-level, so type-independent. +BITSELECT = {'vbsl'} +MODELLED = OP_TABLE | PAIRWISE | BITSELECT + +# AArch64 instruction mnemonic (from ACLE advsimd.md) -> operation kind. The +# instruction mnemonic is the authoritative semantic identity of an intrinsic; +# this compact table is the hand-written "semantics" source (see +# doc/neon-intrinsic-models.md). Extend it (and MODELLED / lane_body / +# pair_reduce) to cover more instruction families. +INSTR_TABLE = { + 'SABD': 'vabd', 'UABD': 'vabd', + 'SMAX': 'vmax', 'UMAX': 'vmax', + 'SMIN': 'vmin', 'UMIN': 'vmin', + 'SQADD': 'vqadd', 'UQADD': 'vqadd', + 'SQSUB': 'vqsub', 'UQSUB': 'vqsub', + 'SHADD': 'vhadd', 'UHADD': 'vhadd', + 'SHSUB': 'vhsub', 'UHSUB': 'vhsub', + 'SRHADD': 'vrhadd', 'URHADD': 'vrhadd', + 'ADDP': 'vpadd', + 'SMAXP': 'vpmax', 'UMAXP': 'vpmax', + 'SMINP': 'vpmin', 'UMINP': 'vpmin', + 'CMTST': 'vtst', + 'BSL': 'vbsl', + } + + +def typed_intrinsic(base, width, unsigned, quad): + """Reconstruct the ACLE typed-intrinsic name, e.g. ('vabd', 8, False, True) + -> 'vabdq_s8'.""" + suffix = ('u' if unsigned else 's') + str(width) + return base + ('q' if quad else '') + '_' + suffix + + +ACLE_NAME_RE = re.compile(r'intrinsics/(\w+)"') +ACLE_MNEM_RE = re.compile(r'`([A-Z][A-Z0-9]+)\b') + + +def parse_acle(md_text): + """Parse ARM's ACLE neon_intrinsics/advsimd.md into {intrinsic: mnemonic}. + Each intrinsic is a markdown table row carrying a link to its guide page + and the AArch64 instruction in backticks.""" + mapping = {} + for line in md_text.splitlines(): + if '' not in line: + continue + nm = ACLE_NAME_RE.search(line) + if not nm: + continue + mn = ACLE_MNEM_RE.search(line) + mapping[nm.group(1)] = mn.group(1) if mn else None + return mapping + + +def parse_typespec(typespec): + """Yield (base_char, unsigned, quad, other) for each type in a typespec, + e.g. 'csiUcQUs' -> Int8, Int16, Int32, uInt8, quad-uInt16. 'other' is set + when a modifier we do not model is present (S scalar, P poly, ...), so the + caller can skip those variants -- they belong to different builtins.""" + i = 0 + while i < len(typespec): + unsigned = quad = other = False + while typespec[i].isupper(): + if typespec[i] == 'U': + unsigned = True + elif typespec[i] == 'Q': + quad = True + else: + other = True + i += 1 + yield typespec[i], unsigned, quad, other + i += 1 + + +INST_RE = re.compile( + r'def\s+\w+\s*:\s*([A-Za-z]*Inst)<\s*"([^"]+)"\s*,\s*"[^"]*"\s*,' + r'\s*"([^"]+)"') + + +def collect(td_text): + """Return {builtin_name: [(code, width, unsigned), ...]} for the modelled + ops, plus a sorted list of intrinsic names skipped for want of + semantics.""" + builtins = {} + skipped = set() + for m in INST_RE.finditer(td_text): + cls, name, typespec = m.group(1), m.group(2), m.group(3) + if cls.endswith('OpInst'): + continue # open-coded -> native operators, no model needed + if name not in MODELLED: + skipped.add(name) + continue + for base, unsigned, quad, other in parse_typespec(typespec): + if other or base not in INT_BASE: + continue # scalar/poly/float: not a plain integer vector + _, elt_enum, width = INT_BASE[base] + code = elt_enum | (UNSIGNED_FLAG if unsigned else 0) | \ + (QUAD_FLAG if quad else 0) + builtin = '__builtin_neon_' + name + ('q' if quad else '') + '_v' + # de-duplicate by type code: several .td records may map to the + # same polymorphic builtin (e.g. scalar variants), and a switch + # cannot repeat a case label. + builtins.setdefault(builtin, {})[code] = (width, unsigned) + models = {b: [(c, w, u) for c, (w, u) in sorted(d.items())] + for b, d in builtins.items()} + return models, sorted(skipped) + + +def emit_model(builtin, cases, acle=None): + """Emit one /* FUNCTION */ block. cases is a list of (code, width, + unsigned); all share the same total width (64- or 128-bit). If an ACLE + {intrinsic: mnemonic} map is given, annotate the model with the + authoritative instruction mnemonic(s) for provenance.""" + op = builtin[len('__builtin_neon_'):].rstrip('_v').rstrip('q') + quad = builtin.endswith('q_v') + total_bytes = 16 if quad else 8 + rep = '__gcc_v{}qi'.format(total_bytes) + + mnemonics = [] + if acle is not None: + for _, width, unsigned in sorted(cases): + mn = acle.get(typed_intrinsic(op, width, unsigned, quad)) + if mn and mn not in mnemonics: + mnemonics.append(mn) + + if op in BITSELECT: + # Bitwise select operates on the raw bits, so it is independent of the + # lane type code: r = (mask & a) | (~mask & b). + out = ['/* FUNCTION: {} */'.format(builtin), ''] + if mnemonics: + out.append( + '// Arm instruction(s): {} (per ACLE advsimd.md)'.format( + ', '.join(mnemonics))) + out.append('') + out.append( + 'typedef char {} __attribute__((__vector_size__({})));'.format( + rep, total_bytes)) + out.append('') + out.append('{rep} {b}({rep} mask, {rep} a, {rep} b, int type)'.format( + rep=rep, b=builtin)) + out.append('{') + out.append(' (void)type;') + out.append(' return (mask & a) | (~mask & b);') + out.append('}') + return '\n'.join(out) + + # Collect the lane typedefs we need. + typedefs = ['typedef char {} __attribute__((__vector_size__({})));'.format( + rep, total_bytes)] + seen = {rep} + body_cases = [] + for code, width, unsigned in sorted(cases): + lanes = total_bytes * 8 // width + suffix = 'u' if unsigned else 's' + lane_t = '__gcc_v{}{}_{}'.format(lanes, STEM[width], suffix) + if lane_t not in seen: + typedefs.append( + 'typedef {} {} __attribute__((__vector_size__({})));'.format( + SCALAR[(width, unsigned)], lane_t, total_bytes)) + seen.add(lane_t) + if op in PAIRWISE: + rx = pair_reduce(op, not unsigned, width, + 'x[2 * i]', 'x[2 * i + 1]') + ry = pair_reduce(op, not unsigned, width, + 'y[2 * i]', 'y[2 * i + 1]') + body_cases.append( + ' case {code}:\n' + ' {{\n' + ' {t} x = ({t})a, y = ({t})b, r;\n' + ' int h = {n} / 2;\n' + ' for(int i = 0; i < h; i++)\n' + ' r[i] = {rx};\n' + ' for(int i = 0; i < h; i++)\n' + ' r[h + i] = {ry};\n' + ' return ({rep})r;\n' + ' }}'.format(code=code, t=lane_t, n=lanes, rx=rx, ry=ry, + rep=rep)) + else: + body = lane_body(op, not unsigned, width) + body_cases.append( + ' case {}:\n' + ' {{\n' + ' {t} x = ({t})a, y = ({t})b, r;\n' + ' for(int i = 0; i < {n}; i++)\n' + ' {body}\n' + ' return ({rep})r;\n' + ' }}'.format(code, t=lane_t, n=lanes, body=body, rep=rep)) + + out = ['/* FUNCTION: {} */'.format(builtin), ''] + if mnemonics: + out.append( + '// Arm instruction(s): {} (per ACLE advsimd.md)'.format( + ', '.join(mnemonics))) + out.append('') + out += typedefs + out.append('') + out.append( + '{rep} {b}({rep} a, {rep} b, int type)'.format(rep=rep, b=builtin)) + out.append('{') + out.append(' switch(type)') + out.append(' {') + out += body_cases + out.append(' }') + out.append('') + out.append(' {} r = {{0}};'.format(rep)) + out.append(' return r;') + out.append('}') + return '\n'.join(out) + + +def audit(td_text, acle): + """Report, over the opaque (model-needing) builtins, how the ACLE + instruction mnemonics distribute and how far INSTR_TABLE covers them -- the + modeling roadmap.""" + import collections + covered = collections.Counter() + todo = collections.Counter() + for m in INST_RE.finditer(td_text): + cls, name, typespec = m.group(1), m.group(2), m.group(3) + if cls.endswith('OpInst'): + continue + for base, unsigned, quad, other in parse_typespec(typespec): + if other or base not in INT_BASE: + continue + _, _, width = INT_BASE[base] + mn = acle.get(typed_intrinsic(name, width, unsigned, quad)) + if mn is None: + continue + (covered if mn in INSTR_TABLE else todo)[mn] += 1 + sys.stderr.write( + 'ACLE audit: {} integer opaque-builtin lane-variants map to mnemonics ' + 'INSTR_TABLE covers; {} do not yet.\n'.format( + sum(covered.values()), sum(todo.values()))) + sys.stderr.write(' covered mnemonics: {}\n'.format( + ', '.join('{}={}'.format(k, v) for k, v in covered.most_common()))) + sys.stderr.write(' top uncovered (modeling roadmap): {}\n'.format( + ', '.join('{}={}'.format(k, v) for k, v in todo.most_common(15)))) + + +def format_output(text): + """Run the generated C through clang-format so it matches the project style + (and the CI clang-format check), keeping regeneration idempotent. A no-op + on already-clean output; if clang-format is unavailable the text is left + unchanged.""" + for clang_format in ('clang-format-15', 'clang-format'): + if shutil.which(clang_format): + result = subprocess.run( + [clang_format, '--assume-filename', 'arm_neon.c'], + input=text, capture_output=True, text=True) + if result.returncode == 0: + return result.stdout + break + sys.stderr.write( + 'warning: clang-format not found; output left unformatted\n') + return text + + +def main(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument('arm_neon_td', help='path to clang arm_neon.td') + parser.add_argument( + '--acle', metavar='ADVSIMD_MD', + help='path to ARM ACLE neon_intrinsics/advsimd.md; keys semantics on ' + 'the authoritative instruction mnemonic and annotates provenance') + parser.add_argument( + '-o', '--output', help='output .c file (default: stdout)') + args = parser.parse_args() + + with open(args.arm_neon_td) as f: + td_text = f.read() + builtins, skipped = collect(td_text) + + acle = None + if args.acle: + with open(args.acle) as f: + acle = parse_acle(f.read()) + + blocks = [emit_model(b, cases, acle) + for b, cases in sorted(builtins.items())] + text = format_output('\n\n'.join(blocks) + '\n') + + if args.output: + with open(args.output, 'w') as f: + f.write(text) + else: + sys.stdout.write(text) + + sys.stderr.write( + 'generated {} model(s) for {} op(s); {} other opaque intrinsic(s) ' + 'need ARM-sourced semantics\n'.format( + len(builtins), len(MODELLED), len(skipped))) + if acle is not None: + audit(td_text, acle) + + +if __name__ == '__main__': + main() diff --git a/scripts/generate_simd_smoke_test.py b/scripts/generate_simd_smoke_test.py new file mode 100644 index 00000000000..6417ecb0b04 --- /dev/null +++ b/scripts/generate_simd_smoke_test.py @@ -0,0 +1,136 @@ +#!/usr/bin/env python3 +# +# Generate an aggregate "smoke test" for the SIMD intrinsic library models. +# +# Many generated x86 (__builtin_ia32_*) and ARM NEON (__builtin_neon_*) models +# are not worth an individual cbmc-library equivalence test, but every model +# should still be exercised so that it type-checks, links and symexes without +# error. This script parses a generated library file (x86_intrinsics.c or +# arm_neon.c), and emits a single C file that calls every modelled builtin once +# with nondeterministic arguments. The result is placed under +# regression/cbmc/SIMD*; library_check.sh treats the builtins it references as +# covered. +# +# The builtins are declared by the front-end (for the matching --arch), so the +# test only needs to reproduce the vector typedefs and call each function with +# arguments of the right type. A constant is passed for the trailing NeonType +# code (the first switch case) or an x86 shift immediate. + +import argparse +import re +import sys + +# Split the library into /* FUNCTION: name */ blocks. +BLOCK_RE = re.compile(r'/\* FUNCTION: (\S+) \*/(.*?)(?=/\* FUNCTION:|\Z)', + re.DOTALL) +TYPEDEF_RE = re.compile(r'^typedef .*?;$', re.MULTILINE) +TYPEDEF_NAME_RE = re.compile(r'\b(__gcc_\w+)\b\s*__attribute__') +CASE_RE = re.compile(r'\bcase (\d+):') + + +def parse_signature(name, block): + """Return (return_type, [param_types]) for the function definition of + `name` in `block`, or None if not found.""" + m = re.search( + r'([A-Za-z_][\w ]*?\**)\s*\b' + re.escape(name) + + r'\s*\(([^;{]*?)\)\s*\{', block, re.DOTALL) + if not m: + return None + ret = ' '.join(m.group(1).split()) + params = [] + for p in m.group(2).split(','): + p = ' '.join(p.split()) + if not p or p == 'void': + continue + # the parameter type is everything but the trailing identifier + params.append(p.rsplit(' ', 1)[0]) + return ret, params + + +def main(): + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument('library', help='generated library .c (x86/neon)') + parser.add_argument( + '--exclude', action='append', default=[], metavar='DIR', + help='skip builtins already referenced by a .c under DIR (e.g. the ' + 'consolidated cbmc-library directory of individual tests)') + parser.add_argument('-o', '--output', help='output .c (default: stdout)') + args = parser.parse_args() + + import glob + import os + excluded = set() + for d in args.exclude: + for cfile in glob.glob(os.path.join(d, '*.c')): + excluded.update( + re.findall(r'__builtin_(?:ia32|neon)_\w+', open(cfile).read())) + + text = open(args.library).read() + typedefs = {} # name -> full typedef line (de-duplicated) + calls = [] + skipped = 0 + for name, block in BLOCK_RE.findall(text): + if name in excluded: + continue + for td in TYPEDEF_RE.findall(block): + nm = TYPEDEF_NAME_RE.search(td) + if nm: + typedefs[nm.group(1)] = td + sig = parse_signature(name, block) + if sig is None: + skipped += 1 + continue + ret, params = sig + case = CASE_RE.search(block) + type_code = case.group(1) if case else '1' + args_src = [] + decls = [] + for i, ptype in enumerate(params): + if ptype == 'int': + # NeonType code (first switch case) or x86 shift immediate + args_src.append(type_code) + else: + # zero-initialise the argument: this is a smoke test (every + # model must type-check, link and symex), so constant inputs + # that CBMC can fold keep it fast; the per-function equivalence + # tests cover behaviour with nondeterministic inputs. + decls.append(' {} a{} = {{0}};'.format(ptype, i)) + args_src.append('a{}'.format(i)) + calls.append( + ' {{\n' + '{decls}\n' + ' volatile {ret} r = {name}({args});\n' + ' (void)r;\n' + ' }}'.format( + decls='\n'.join(decls), ret=ret, name=name, + args=', '.join(args_src))) + + out = [] + out.append('// Auto-generated by scripts/generate_simd_smoke_test.py') + out.append('// Exercises every modelled SIMD builtin once so the library ' + 'models are') + out.append('// type-checked, linked and symex\'d. See ' + 'doc/neon-intrinsic-models.md.') + out.append('') + for nm in sorted(typedefs): + out.append(typedefs[nm]) + out.append('') + out.append('int main(void)') + out.append('{') + out.extend(calls) + out.append(' __CPROVER_assert(1, "SIMD model smoke test");') + out.append(' return 0;') + out.append('}') + result = '\n'.join(out) + '\n' + + if args.output: + with open(args.output, 'w') as f: + f.write(result) + else: + sys.stdout.write(result) + sys.stderr.write('emitted {} call(s); skipped {} (no parseable ' + 'signature)\n'.format(len(calls), skipped)) + + +if __name__ == '__main__': + main() diff --git a/src/ansi-c/CMakeLists.txt b/src/ansi-c/CMakeLists.txt index 9934bb9d455..1c58b64757e 100644 --- a/src/ansi-c/CMakeLists.txt +++ b/src/ansi-c/CMakeLists.txt @@ -65,6 +65,7 @@ make_inc(compiler_headers/clang_builtin_headers) make_inc(compiler_headers/cw_builtin_headers) make_inc(compiler_headers/gcc_builtin_headers_alpha) make_inc(compiler_headers/gcc_builtin_headers_arm) +make_inc(compiler_headers/gcc_builtin_headers_aarch64) make_inc(compiler_headers/gcc_builtin_headers_generic) make_inc(compiler_headers/gcc_builtin_headers_ia32) make_inc(compiler_headers/gcc_builtin_headers_ia32-2) @@ -92,6 +93,7 @@ set(extra_dependencies ${CMAKE_CURRENT_BINARY_DIR}/compiler_headers/cw_builtin_headers.inc ${CMAKE_CURRENT_BINARY_DIR}/compiler_headers/gcc_builtin_headers_alpha.inc ${CMAKE_CURRENT_BINARY_DIR}/compiler_headers/gcc_builtin_headers_arm.inc + ${CMAKE_CURRENT_BINARY_DIR}/compiler_headers/gcc_builtin_headers_aarch64.inc ${CMAKE_CURRENT_BINARY_DIR}/compiler_headers/gcc_builtin_headers_generic.inc ${CMAKE_CURRENT_BINARY_DIR}/compiler_headers/gcc_builtin_headers_ia32-2.inc ${CMAKE_CURRENT_BINARY_DIR}/compiler_headers/gcc_builtin_headers_ia32-3.inc diff --git a/src/ansi-c/Makefile b/src/ansi-c/Makefile index cdf202904df..5bd86c56027 100644 --- a/src/ansi-c/Makefile +++ b/src/ansi-c/Makefile @@ -71,6 +71,7 @@ BUILTIN_FILES = \ compiler_headers/cw_builtin_headers.inc \ compiler_headers/gcc_builtin_headers_alpha.inc \ compiler_headers/gcc_builtin_headers_arm.inc \ + compiler_headers/gcc_builtin_headers_aarch64.inc \ compiler_headers/gcc_builtin_headers_generic.inc \ compiler_headers/gcc_builtin_headers_ia32-2.inc \ compiler_headers/gcc_builtin_headers_ia32-3.inc \ diff --git a/src/ansi-c/ansi_c_convert_type.cpp b/src/ansi-c/ansi_c_convert_type.cpp index c7ed6453c90..321c3724e3e 100644 --- a/src/ansi-c/ansi_c_convert_type.cpp +++ b/src/ansi-c/ansi_c_convert_type.cpp @@ -196,6 +196,8 @@ void ansi_c_convert_typet::read_rec(const typet &type) { // note that this is not yet a vector_typet -- this is a size only vector_size = static_cast(type.find(ID_size)); + // neon_vector_type gives the size as a lane count rather than in bytes + vector_lanes = type.get_bool(ID_C_vector_lanes); } else if(type.id()==ID_void) { @@ -659,6 +661,8 @@ void ansi_c_convert_typet::build_type_with_subtype(typet &type) const { type_with_subtypet new_type(ID_frontend_vector, type); new_type.set(ID_size, vector_size); + if(vector_lanes) + new_type.set(ID_C_vector_lanes, true); new_type.add_source_location()=vector_size.source_location(); type=new_type; } diff --git a/src/ansi-c/ansi_c_convert_type.h b/src/ansi-c/ansi_c_convert_type.h index 043198d8fb5..a79d494c653 100644 --- a/src/ansi-c/ansi_c_convert_type.h +++ b/src/ansi-c/ansi_c_convert_type.h @@ -40,7 +40,7 @@ class ansi_c_convert_typet typet gcc_attribute_mode; - bool packed, aligned; + bool packed, aligned, vector_lanes; exprt vector_size, alignment, bv_width, fraction_width; exprt msc_based; // this is Visual Studio bool constructor, destructor; @@ -106,6 +106,7 @@ class ansi_c_convert_typet gcc_attribute_mode(static_cast(get_nil_irep())), packed(false), aligned(false), + vector_lanes(false), vector_size(nil_exprt{}), alignment(nil_exprt{}), bv_width(nil_exprt{}), diff --git a/src/ansi-c/ansi_c_internal_additions.cpp b/src/ansi-c/ansi_c_internal_additions.cpp index 14d7cf52a9c..c32a362636e 100644 --- a/src/ansi-c/ansi_c_internal_additions.cpp +++ b/src/ansi-c/ansi_c_internal_additions.cpp @@ -89,6 +89,11 @@ const char gcc_builtin_headers_arm[] = "#line 1 \"gcc_builtin_headers_arm.h\"\n" #include "compiler_headers/gcc_builtin_headers_arm.inc" // IWYU pragma: keep ; // NOLINT(whitespace/semicolon) +const char gcc_builtin_headers_aarch64[] = + "#line 1 \"gcc_builtin_headers_aarch64.h\"\n" +#include "compiler_headers/gcc_builtin_headers_aarch64.inc" // IWYU pragma: keep + ; // NOLINT(whitespace/semicolon) + const char gcc_builtin_headers_mips[] = "#line 1 \"gcc_builtin_headers_mips.h\"\n" #include "compiler_headers/gcc_builtin_headers_mips.inc" // IWYU pragma: keep diff --git a/src/ansi-c/ansi_c_internal_additions.h b/src/ansi-c/ansi_c_internal_additions.h index 4f55798a36c..c21fcf80c19 100644 --- a/src/ansi-c/ansi_c_internal_additions.h +++ b/src/ansi-c/ansi_c_internal_additions.h @@ -35,6 +35,7 @@ extern const char gcc_builtin_headers_ia32_8[]; extern const char gcc_builtin_headers_ia32_9[]; extern const char gcc_builtin_headers_alpha[]; extern const char gcc_builtin_headers_arm[]; +extern const char gcc_builtin_headers_aarch64[]; extern const char gcc_builtin_headers_mips[]; extern const char gcc_builtin_headers_power[]; extern const char arm_builtin_headers[]; diff --git a/src/ansi-c/builtin_factory.cpp b/src/ansi-c/builtin_factory.cpp index 172cb96b05a..f7a6d6b76f4 100644 --- a/src/ansi-c/builtin_factory.cpp +++ b/src/ansi-c/builtin_factory.cpp @@ -206,6 +206,9 @@ bool builtin_factory( { if(find_pattern(pattern, gcc_builtin_headers_arm, s)) return convert(identifier, s, symbol_table, mh); + + if(find_pattern(pattern, gcc_builtin_headers_aarch64, s)) + return convert(identifier, s, symbol_table, mh); } else if(config.ansi_c.arch=="mips64el" || config.ansi_c.arch=="mipsn32el" || diff --git a/src/ansi-c/c_typecheck_type.cpp b/src/ansi-c/c_typecheck_type.cpp index 2362e3966f6..7b26e7d3390 100644 --- a/src/ansi-c/c_typecheck_type.cpp +++ b/src/ansi-c/c_typecheck_type.cpp @@ -708,6 +708,10 @@ void c_typecheck_baset::typecheck_vector_type(typet &type) exprt size = static_cast(type.find(ID_size)); const source_locationt source_location = size.find_source_location(); + // neon_vector_type gives the size as a lane count, whereas vector_size (and + // hence the default below) gives it in bytes. + const bool size_is_lane_count = type.get_bool(ID_C_vector_lanes); + typecheck_expr(size); typet subtype = to_type_with_subtype(type).subtype(); @@ -770,14 +774,17 @@ void c_typecheck_baset::typecheck_vector_type(typet &type) } // adjust by width of base type - if(s % *sub_size != 0) + if(!size_is_lane_count) { - throw errort().with_location(source_location) - << "vector size (" << s << ") expected to be multiple of base type size (" - << *sub_size << ")"; - } + if(s % *sub_size != 0) + { + throw errort().with_location(source_location) + << "vector size (" << s + << ") expected to be multiple of base type size (" << *sub_size << ")"; + } - s /= *sub_size; + s /= *sub_size; + } // produce the type with ID_vector vector_typet new_type( diff --git a/src/ansi-c/compiler_headers/clang_builtins.py b/src/ansi-c/compiler_headers/clang_builtins.py index 357f60778a5..397f98a0c29 100755 --- a/src/ansi-c/compiler_headers/clang_builtins.py +++ b/src/ansi-c/compiler_headers/clang_builtins.py @@ -2,56 +2,45 @@ # # Download Clang builtin declarations from the llvm-project git repository and # parse them to generate declarations to use from within our C front-end. +# +# Two input formats are supported: +# +# 1. The TableGen ".td" builtin databases (the default). As of LLVM 20 the +# per-target databases were migrated from the X-macro ".def" files to +# TableGen, where a builtin is a record such as +# +# def paddd128 : X86Builtin<"_Vector<4, int>(_Vector<4, int>, " +# "_Vector<4, int>)">; +# +# The record class determines the name prefix (X86Builtin adds +# "__builtin_ia32_") and the prototype is an almost-C signature whose only +# special constructs are "_Vector" and "_Constant Type". +# These files are fetched straight from the llvm-project repository, so no +# LLVM build is required. +# +# 2. ".inc" files produced by clang-tblgen (--inc PREFIX:PATH). Targets such +# as ARM NEON do not spell their builtins in directly-parseable TableGen; +# they are generated by clang-tblgen, e.g. +# +# clang-tblgen -gen-arm-neon-sema -I clang/include/clang/Basic \ +# -I clang/include clang/include/clang/Basic/arm_neon.td \ +# -o neon_sema.inc +# +# The resulting "..._BUILTIN_INFOS" section lists every builtin with its +# name and the classic compact type encoding (e.g. "V8ScV8ScV8Sci"). This +# mode parses that section, prepending the supplied PREFIX (for NEON that is +# "__builtin_neon_") to each spelling. +# +# In both cases the resulting declarations are diffed against the declarations +# already present in the gcc_builtin_headers_*.h files passed as arguments. +import argparse import re import requests import sys - -prefix_map = { - 'I': '', - 'N': '', - 'O': 'long long', - 'S': 'signed', - 'U': 'unsigned', - 'W': 'int64_t', - 'Z': 'int32_t' - } - -# we don't support: -# G -> id (Objective-C) -# H -> SEL (Objective-C) -# M -> struct objc_super (Objective-C) -# q -> Scalable vector, followed by the number of elements and base type -# E -> ext_vector, followed by the number of elements and base type -# A -> "reference" to __builtin_va_list -typespec_map = { - 'F': 'const CFString', - 'J': 'jmp_buf', - 'K': 'ucontext_t', - 'P': 'FILE', - 'Y': 'ptrdiff_t', - 'a': '__builtin_va_list', - 'b': '_Bool', - 'c': 'char', - 'd': 'double', - 'f': 'float', - 'h': '__fp16', - 'i': 'int', - 'p': 'pid_t', - 's': 'short', - 'v': 'void', - 'w': 'wchar_t', - 'x': '_Float16', - 'y': '__bf16', - 'z': '__CPROVER_size_t' - } - -# we don't support: -# & -> reference (optionally followed by an address space number) -modifier_map = {'C': 'const', 'D': 'volatile', 'R': 'restrict'} - -# declarations as found in ansi-c/gcc_builtin_headers_types.h +# Map a (element type, lane count) of a vector type to the corresponding +# typedef from ansi-c/gcc_builtin_headers_types.h. vector_map = { 'char': { 8: '__gcc_v8qi', @@ -68,7 +57,6 @@ 16: '__gcc_v16hi', 32: '__gcc_v32hi' }, - # new 'unsigned short': { 8: '__gcc_v8uhi', 16: '__gcc_v16uhi', @@ -81,7 +69,6 @@ 16: '__gcc_v16si', 256: '__gcc_v256si' }, - # new 'unsigned int': { 4: '__gcc_v4usi', 8: '__gcc_v8usi', @@ -93,13 +80,11 @@ 4: '__gcc_v4di', 8: '__gcc_v8di' }, - # new 'unsigned long long int': { 2: '__gcc_v2udi', 4: '__gcc_v4udi', 8: '__gcc_v8udi', }, - # new '_Float16': { 8: '__gcc_v8hf', 16: '__gcc_v16hf', @@ -123,132 +108,466 @@ } } +# Element type spellings that name the same scalar as a vector_map key. +element_aliases = { + 'signed char': 'char', + 'int32_t': 'int', + 'unsigned int32_t': 'unsigned int', + 'int64_t': 'long long int', + 'unsigned int64_t': 'unsigned long long int', + '__fp16': '_Float16', + } + +# Map a TableGen builtin record class to the name prefix it implies (see the +# "RequiredNamePrefix" fields in BuiltinsX86Base.td). X86Builtin uses +# "__builtin_ia32_", the *NoPrefix* and library variants spell names verbatim. +class_prefix_map = { + 'X86Builtin': '__builtin_ia32_', + 'X86NoPrefixBuiltin': '', + 'X86LibBuiltin': '', + } + + +class UnmappableType(Exception): + """Raised for types we cannot express, e.g. vectors of pointers + (gather/scatter) or vector widths/elements absent from vector_map.""" -def parse_prefix(types, i): + +def vector_typedef(element, count): + element = element.strip() + element = element_aliases.get(element, element) + widths = vector_map.get(element) + if not widths or count not in widths: + raise UnmappableType( + 'no typedef for vector of {} x {}'.format(count, element)) + return widths[count] + + +# --- TableGen ".td" parser (format 1) -------------------------------------- + +def strip_comments(text): + """Remove // line comments while leaving string literals intact.""" + out = [] + i = 0 + in_string = False + while i < len(text): + c = text[i] + if in_string: + out.append(c) + if c == '"': + in_string = False + i += 1 + elif c == '"': + in_string = True + out.append(c) + i += 1 + elif c == '/' and i + 1 < len(text) and text[i + 1] == '/': + while i < len(text) and text[i] != '\n': + i += 1 + else: + out.append(c) + i += 1 + return ''.join(out) + + +def skip_ws(s, pos, end): + while pos < end and s[pos].isspace(): + pos += 1 + return pos + + +def is_keyword(s, pos, word): + if not s.startswith(word, pos): + return False + after = pos + len(word) + return after >= len(s) or not (s[after].isalnum() or s[after] == '_') + + +def match_delim(s, pos, open_ch, close_ch): + """Given s[pos] == open_ch, return the index just past the matching + close_ch, honouring nesting and string literals.""" + assert s[pos] == open_ch + depth = 0 + in_string = False + while pos < len(s): + c = s[pos] + if in_string: + if c == '"': + in_string = False + elif c == '"': + in_string = True + elif c == open_ch: + depth += 1 + elif c == close_ch: + depth -= 1 + if depth == 0: + return pos + 1 + pos += 1 + raise ValueError('unbalanced ' + open_ch) + + +def find_in_keyword(s, pos, end): + """Find the standalone 'in' keyword that terminates a let/foreach head, + skipping over [], (), <> groups and string literals.""" + depth = 0 + in_string = False + while pos < end: + c = s[pos] + if in_string: + if c == '"': + in_string = False + pos += 1 + elif c == '"': + in_string = True + pos += 1 + elif c in '[(<': + depth += 1 + pos += 1 + elif c in '])>': + depth -= 1 + pos += 1 + elif depth == 0 and is_keyword(s, pos, 'in'): + return pos + else: + pos += 1 + raise ValueError("missing 'in' keyword") + + +def split_top_level(s, sep): + """Split s on sep, but only at <>/() nesting depth 0.""" + parts = [] + depth = 0 + last = 0 + for i, c in enumerate(s): + if c in '<(': + depth += 1 + elif c in '>)': + depth -= 1 + elif c == sep and depth == 0: + parts.append(s[last:i]) + last = i + 1 + parts.append(s[last:]) + return parts + + +def normalize_type(t): + """Translate a single prototype type into C, expanding _Vector<> and + dropping the _Constant marker (which only constrains the argument to be a + compile-time constant).""" + t = t.strip() + t = re.sub(r'\b_Constant\b\s*', '', t).strip() + + def repl(m): + count = int(m.group(1)) + element = m.group(2).strip() + if '*' in element: + raise UnmappableType('vector of pointers: ' + m.group(0)) + return vector_typedef(element, count) + + t = re.sub(r'_Vector<\s*(\d+)\s*,\s*([^>]+)>', repl, t) + return re.sub(r'\s+', ' ', t).strip() + + +def build_declaration(name, prototype): + """Build a C declaration string from a builtin name and its TableGen + prototype 'ReturnType(ArgType, ...)'.""" + depth = 0 + split = -1 + for i, c in enumerate(prototype): + if c == '<': + depth += 1 + elif c == '>': + depth -= 1 + elif c == '(' and depth == 0: + split = i + break + assert split >= 0, 'no argument list in prototype: ' + prototype + + ret = normalize_type(prototype[:split]) + args_str = prototype[split + 1:prototype.rfind(')')].strip() + if args_str == '' or args_str == 'void': + args = ['void'] + else: + args = [normalize_type(a) for a in split_top_level(args_str, ',')] + + return ret + ' ' + name + '(' + ', '.join(args) + ');' + + +def resolve_name(raw, bindings): + """Resolve a (possibly '#'-pasted) TableGen record name using the current + foreach variable bindings.""" + pieces = [] + for piece in raw.split('#'): + piece = piece.strip() + if piece.startswith('"') and piece.endswith('"'): + pieces.append(piece[1:-1]) + elif piece in bindings: + pieces.append(bindings[piece]) + else: + pieces.append(piece) + return ''.join(pieces) + + +DEF_HEAD_RE = re.compile(r'def\s+(.+?)\s*:\s*(\w+)\s*<', re.DOTALL) +FEATURES_RE = re.compile(r'\bFeatures\s*=\s*"([^"]*)"') +FOREACH_RE = re.compile(r'foreach\s+(\w+)\s*=\s*') + + +def parse_def(s, pos, end, bindings, group, out, stats): + m = DEF_HEAD_RE.match(s, pos) + assert m, 'unparseable def at: ' + s[pos:pos + 60] + raw_name, cls = m.group(1), m.group(2) + # The prototype is the template argument of the record class. Its '<'/'>' + # delimiters nest, and the prototype string itself contains '<'/'>' (inside + # _Vector<>) -- match_delim ignores those as they sit inside string + # literals. TableGen concatenates adjacent string literals, so join them. + lt = m.end() - 1 + gt = match_delim(s, lt, '<', '>') + prototype = ''.join(re.findall(r'"([^"]*)"', s[lt + 1:gt - 1])) + pos = gt + # Consume the trailing ';' or the '{ ... }' body of the record. + pos = skip_ws(s, pos, end) + if pos < end and s[pos] == '{': + pos = match_delim(s, pos, '{', '}') + elif pos < end and s[pos] == ';': + pos += 1 + + prefix = class_prefix_map.get(cls) + if prefix is None: + stats['unknown_class'] += 1 + return pos + + name = prefix + resolve_name(raw_name, bindings) + try: + out.setdefault(group, {})[name] = build_declaration(name, prototype) + except UnmappableType: + stats['skipped'] += 1 + return pos + + +def parse_let(s, pos, end, bindings, group, out, stats): + in_pos = find_in_keyword(s, pos + len('let'), end) + assigns = s[pos + len('let'):in_pos] + fm = FEATURES_RE.search(assigns) + new_group = fm.group(1) if fm else group + body = skip_ws(s, in_pos + len('in'), end) + return parse_body(s, body, end, bindings, new_group, out, stats) + + +def parse_foreach(s, pos, end, bindings, group, out, stats): + m = FOREACH_RE.match(s, pos) + assert m, 'unparseable foreach at: ' + s[pos:pos + 60] + var = m.group(1) + lst_start = skip_ws(s, m.end(), end) + assert s[lst_start] == '[', 'expected list in foreach' + lst_end = match_delim(s, lst_start, '[', ']') + values = re.findall(r'"([^"]*)"', s[lst_start:lst_end]) + in_pos = find_in_keyword(s, lst_end, end) + body = skip_ws(s, in_pos + len('in'), end) + last = body + for value in values: + new_bindings = dict(bindings) + new_bindings[var] = value + last = parse_body(s, body, end, new_bindings, group, out, stats) + return last + + +def parse_body(s, pos, end, bindings, group, out, stats): + """Parse the body of a let/foreach: either a braced block or a single + nested construct.""" + pos = skip_ws(s, pos, end) + if pos < end and s[pos] == '{': + block_end = match_delim(s, pos, '{', '}') + walk(s, pos + 1, block_end - 1, bindings, group, out, stats) + return block_end + return parse_construct(s, pos, end, bindings, group, out, stats) + + +def parse_construct(s, pos, end, bindings, group, out, stats): + if is_keyword(s, pos, 'def'): + return parse_def(s, pos, end, bindings, group, out, stats) + if is_keyword(s, pos, 'let'): + return parse_let(s, pos, end, bindings, group, out, stats) + if is_keyword(s, pos, 'foreach'): + return parse_foreach(s, pos, end, bindings, group, out, stats) + return pos + 1 + + +def walk(s, pos, end, bindings, group, out, stats): + while pos < end: + pos = skip_ws(s, pos, end) + if pos >= end: + break + if is_keyword(s, pos, 'def') or is_keyword(s, pos, 'let') or \ + is_keyword(s, pos, 'foreach'): + pos = parse_construct(s, pos, end, bindings, group, out, stats) + elif is_keyword(s, pos, 'include'): + semi = s.find(';', pos) + pos = semi + 1 if semi != -1 else end + else: + pos += 1 + return pos + + +def process_td(text, default_group): + """Parse one TableGen builtin database, returning {group: {name: decl}}.""" + text = strip_comments(text) + out = {} + stats = {'skipped': 0, 'unknown_class': 0} + walk(text, 0, len(text), {}, default_group, out, stats) + if stats['skipped'] or stats['unknown_class']: + sys.stderr.write( + 'note: skipped {} builtin(s) with unmappable types, {} with ' + 'unknown record class\n'.format( + stats['skipped'], stats['unknown_class'])) + return out + + +# --- clang-tblgen ".inc" parser (format 2) --------------------------------- +# +# The compact type encoding used in the "..._BUILTIN_INFOS" sections, e.g. +# "V8ScV8ScV8Sci": a 'V' followed by a lane count introduces a vector, type +# prefixes ('S' signed, 'U' unsigned, ...) precede a base type spec ('c' char, +# 'i' int, 'f' float, ...), 'I' marks a compile-time constant argument, and +# '*'/'C'/'D'/'R' apply pointer and qualifiers. + +encoding_prefix_map = { + 'I': '', # _Constant: only constrains the argument + 'N': '', + 'O': 'long long', + 'S': 'signed', + 'U': 'unsigned', + 'W': 'int64_t', + 'Z': 'int32_t', + } + +encoding_typespec_map = { + 'b': '_Bool', + 'c': 'char', + 'd': 'double', + 'f': 'float', + 'h': '__fp16', + 'i': 'int', + 's': 'short', + 'v': 'void', + 'x': '_Float16', + 'y': '__bf16', + 'z': '__CPROVER_size_t', + } + +encoding_modifier_map = {'C': 'const', 'D': 'volatile', 'R': 'restrict'} + + +def parse_encoding_prefix(types, i): prefix = [] while i < len(types): p = types[i] - if i + 3 < len(types) and types[i:i+4] == 'LLLi': + if types[i:i + 4] == 'LLLi': prefix.append('__int128_t') i += 4 - elif i + 1 < len(types) and types[i:i+2] == 'LL': + elif types[i:i + 2] == 'LL': prefix.extend(['long', 'long']) i += 2 elif p == 'L': prefix.append('long') i += 1 - elif i + 1 < len(types) and types[i:i+2] == 'SJ': - break - elif i + 1 < len(types) and ( - types[i:i+2] == 'Wi' or types[i:i+2] == 'Zi'): - prefix.append(prefix_map[p]) + elif types[i:i + 2] in ('Wi', 'Zi'): + prefix.append(encoding_prefix_map[p]) i += 2 - elif prefix_map.get(p) is not None: - mapped = prefix_map[p] - if len(mapped): - prefix.append(prefix_map[p]) + elif encoding_prefix_map.get(p) is not None: + mapped = encoding_prefix_map[p] + if mapped: + prefix.append(mapped) i += 1 else: break - return prefix, i -def build_type_inner(types, i): - (typespec, i) = parse_prefix(types, i) +def build_encoding_type_inner(types, i): + (typespec, i) = parse_encoding_prefix(types, i) if i < len(types): t = types[i] - if i + 2 < len(types) and t == 'V': - m = re.match(r'(\d+).*', types[i+1:]) - if m and i + 1 + len(m[1]) < len(types): - (elem_type_list, next_i) = build_type_inner( - types, i + 1 + len(m[1])) - elem_type = ' '.join(elem_type_list) - if vector_map.get(elem_type): - typespec.append(vector_map[elem_type][int(m[1])]) - i = next_i - elif i + 1 < len(types) and t == 'X' and ( - typespec_map.get(types[i + 1])): - typespec.append(typespec_map[types[i + 1]]) - typespec_map.append('_Complex') - i += 2 - elif i + 1 < len(types) and types[i:i+2] == 'SJ': - typespec.append('sigjmp_buf') - i += 2 + if t == 'V': + m = re.match(r'(\d+)', types[i + 1:]) + count = int(m.group(1)) + (elem_list, next_i) = build_encoding_type_inner( + types, i + 1 + len(m.group(1))) + typespec.append(vector_typedef(' '.join(elem_list), count)) + i = next_i elif t == '.' and i + 1 == len(types): typespec.append('...') i += 1 - elif typespec_map.get(t): - typespec.append(typespec_map[t]) + elif encoding_typespec_map.get(t): + typespec.append(encoding_typespec_map[t]) i += 1 return typespec, i -def build_type(types, i): - (typespec, i) = build_type_inner(types, i) - +def build_encoding_type(types, i): + (typespec, i) = build_encoding_type_inner(types, i) while i < len(types): s = types[i] if s == '*': typespec.append('*') i += 1 - elif modifier_map.get(s): - typespec.insert(0, modifier_map[s]) + elif encoding_modifier_map.get(s): + typespec.insert(0, encoding_modifier_map[s]) i += 1 else: break - return ' '.join(typespec), i -def process_line(name, types, attributes): - """ - Process the macro declaring "name" as specified at the top of - https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/Builtins.def - We don't yet parse attributes. - """ - +def decode_encoding(name, encoding): + """Decode a compact type encoding into a C declaration, raising + UnmappableType if it uses a construct we cannot represent.""" type_specs = [] i = 0 - while i < len(types): - (t, i_updated) = build_type(types, i) - assert i_updated > i, ('failed to parse type spec of ' + name + ': ' + - types[i:]) + while i < len(encoding): + (t, i_updated) = build_encoding_type(encoding, i) + if i_updated <= i: + raise UnmappableType( + 'unparseable encoding for ' + name + ': ' + encoding[i:]) i = i_updated type_specs.append(t) - assert len(type_specs), 'missing return type in ' + types + if not type_specs: + raise UnmappableType('empty encoding for ' + name) if len(type_specs) == 1: type_specs.append('void') return type_specs[0] + ' ' + name + '(' + ', '.join(type_specs[1:]) + ');' -def process(input_lines): - declarations = {} - for l in input_lines: - m = re.match(r'BUILTIN\((\w+),\s*"(.+)",\s*"(.*)"\)', l) - if m: - declaration = process_line(m[1], m[2], m[3]) - if not declarations.get('clang'): - declarations['clang'] = {} - declarations['clang'][m[1]] = declaration - continue - m = re.match( - r'TARGET_BUILTIN\((\w+),\s*"(.+)",\s*"(.*)",\s*"(.*)"\)', l) - if m: - declaration = process_line(m[1], m[2], m[3]) - group = m[4] - if len(group) == 0: - group = 'clang' - if not declarations.get(group): - declarations[group] = {} - declarations[group][m[1]] = declaration - - return declarations +INFO_RE = re.compile( + r'StrOffsets\{\s*\d+ /\* (\S+) \*/,\s*\d+ /\* (.+?) \*/,' + r'\s*\d+ /\* .*? \*/,\s*(?:\d+ /\* (.+?) \*/|0)') + + +def process_inc(text, prefix): + """Parse the ..._BUILTIN_INFOS section of a clang-tblgen .inc file, + returning {group: {name: decl}}.""" + out = {} + stats = {'skipped': 0} + for m in INFO_RE.finditer(text): + name = prefix + m.group(1) + encoding = m.group(2) + group = m.group(3) if m.group(3) else 'builtins' + try: + out.setdefault(group, {})[name] = decode_encoding(name, encoding) + except UnmappableType: + stats['skipped'] += 1 + if stats['skipped']: + sys.stderr.write( + 'note: skipped {} builtin(s) with unmappable types\n'.format( + stats['skipped'])) + return out +# --- output ---------------------------------------------------------------- + def print_declarations(declaration_map, known_declarations): for k, v in sorted(declaration_map.items()): new_decls = [] @@ -267,33 +586,50 @@ def print_declarations(declaration_map, known_declarations): print(decl) -def read_declarations(): +def read_declarations(headers): known_declarations = {} - for fname in sys.argv[1:]: + for fname in headers: with open(fname) as f: - lines = f.readlines() - for l in lines: + for l in f.readlines(): m = re.match(r'.* (\w+)\(.*\);', l) if m: known_declarations[m[1]] = m[0] - return known_declarations +def merge(declaration_map, additions): + for k, v in additions.items(): + declaration_map.setdefault(k, {}).update(v) + + def main(): - known_declarations = read_declarations() - base_url = ('https://raw.githubusercontent.com/llvm/llvm-project/' + - 'main/clang/include/clang/Basic/') - files = ['BuiltinsX86.def', 'BuiltinsX86_64.def'] + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + 'headers', nargs='*', + help='gcc_builtin_headers_*.h files to diff the result against') + parser.add_argument( + '--inc', action='append', default=[], metavar='PREFIX:PATH', + help='parse a clang-tblgen-generated .inc file instead of the ' + 'TableGen .td databases, prepending PREFIX to each builtin ' + 'name (e.g. __builtin_neon_:neon_sema.inc)') + args = parser.parse_args() + + known_declarations = read_declarations(args.headers) declaration_map = {} - for f in files: - url = base_url + f - lines = requests.get(base_url + f).text.split('\n') - for k, v in process(lines).items(): - if not declaration_map.get(k): - declaration_map[k] = v - else: - declaration_map[k].update(v) + + if args.inc: + for spec in args.inc: + prefix, _, path = spec.partition(':') + if not path: + parser.error('--inc expects PREFIX:PATH, got: ' + spec) + with open(path) as f: + merge(declaration_map, process_inc(f.read(), prefix)) + else: + base_url = ('https://raw.githubusercontent.com/llvm/llvm-project/' + + 'main/clang/include/clang/Basic/') + for f in ['BuiltinsX86.td', 'BuiltinsX86_64.td']: + merge(declaration_map, process_td(requests.get(base_url + f).text, + 'x86')) print_declarations(declaration_map, known_declarations) diff --git a/src/ansi-c/compiler_headers/gcc_builtin_headers_aarch64.h b/src/ansi-c/compiler_headers/gcc_builtin_headers_aarch64.h new file mode 100644 index 00000000000..4823334625c --- /dev/null +++ b/src/ansi-c/compiler_headers/gcc_builtin_headers_aarch64.h @@ -0,0 +1,1055 @@ +// clang-format off +__gcc_v8qi __builtin_neon___a32_vcvt_bf16_f32(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_splat_lane_bf16(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_splat_lane_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_splat_laneq_bf16(__gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_splat_laneq_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_splatq_lane_bf16(__gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_splatq_lane_v(__gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_splatq_laneq_bf16(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_splatq_laneq_v(__gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vabd_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vabd_v(__gcc_v8qi, __gcc_v8qi, int); +double __builtin_neon_vabdd_f64(double, double); +__gcc_v16qi __builtin_neon_vabdq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vabdq_v(__gcc_v16qi, __gcc_v16qi, int); +float __builtin_neon_vabds_f32(float, float); +__gcc_v8qi __builtin_neon_vabs_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vabs_v(__gcc_v8qi, int); +int64_t int64_t __builtin_neon_vabsd_s64(void); +__gcc_v16qi __builtin_neon_vabsq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vabsq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vadd_v(__gcc_v8qi, __gcc_v8qi, int); +int64_t int64_t int64_t __builtin_neon_vaddd_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vaddd_u64(void); +__gcc_v8qi __builtin_neon_vaddhn_v(__gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vaddlv_s16(__gcc_v4hi); +int64_t __gcc_v2si __builtin_neon_vaddlv_s32(void); +short __builtin_neon_vaddlv_s8(__gcc_v8qi); +int __builtin_neon_vaddlvq_s16(__gcc_v8hi); +int64_t __gcc_v4si __builtin_neon_vaddlvq_s32(void); +short __builtin_neon_vaddlvq_s8(__gcc_v16qi); +unsigned int __builtin_neon_vaddlvq_u16(__gcc_v8uhi); +unsigned int64_t __gcc_v4usi __builtin_neon_vaddlvq_u32(void); +unsigned __int128_t unsigned __int128_t unsigned __int128_t __builtin_neon_vaddq_p128(void); +__gcc_v16qi __builtin_neon_vaddq_v(__gcc_v16qi, __gcc_v16qi, int); +float __builtin_neon_vaddv_f32(__gcc_v2sf); +short __builtin_neon_vaddv_s16(__gcc_v4hi); +int __builtin_neon_vaddv_s32(__gcc_v2si); +signed char __builtin_neon_vaddv_s8(__gcc_v8qi); +float __builtin_neon_vaddvq_f32(__gcc_v4sf); +double __builtin_neon_vaddvq_f64(__gcc_v2df); +short __builtin_neon_vaddvq_s16(__gcc_v8hi); +int __builtin_neon_vaddvq_s32(__gcc_v4si); +int64_t __gcc_v2di __builtin_neon_vaddvq_s64(void); +signed char __builtin_neon_vaddvq_s8(__gcc_v16qi); +unsigned short __builtin_neon_vaddvq_u16(__gcc_v8uhi); +unsigned int __builtin_neon_vaddvq_u32(__gcc_v4usi); +unsigned int64_t __gcc_v2udi __builtin_neon_vaddvq_u64(void); +__gcc_v16qi __builtin_neon_vaesdq_u8(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vaeseq_u8(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vaesimcq_u8(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vaesmcq_u8(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vamax_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vamax_f32(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vamaxq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vamaxq_f32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vamaxq_f64(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vamin_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vamin_f32(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vaminq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vaminq_f32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vaminq_f64(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbcaxq_s16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbcaxq_s32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbcaxq_s64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbcaxq_s8(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbcaxq_u16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbcaxq_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbcaxq_u64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbcaxq_u8(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vbfdot_f32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vbfdotq_f32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbfmlalbq_f32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbfmlaltq_f32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vbfmmlaq_f32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vbsl_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vbslq_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vcadd_rot270_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcadd_rot270_f32(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcadd_rot90_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcadd_rot90_f32(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vcaddq_rot270_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcaddq_rot270_f32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcaddq_rot270_f64(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcaddq_rot90_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcaddq_rot90_f32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcaddq_rot90_f64(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vcage_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcage_v(__gcc_v8qi, __gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vcaged_f64(double); +__gcc_v16qi __builtin_neon_vcageq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcageq_v(__gcc_v16qi, __gcc_v16qi, int); +unsigned int __builtin_neon_vcages_f32(float, float); +__gcc_v8qi __builtin_neon_vcagt_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcagt_v(__gcc_v8qi, __gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vcagtd_f64(double); +__gcc_v16qi __builtin_neon_vcagtq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcagtq_v(__gcc_v16qi, __gcc_v16qi, int); +unsigned int __builtin_neon_vcagts_f32(float, float); +__gcc_v8qi __builtin_neon_vcale_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcale_v(__gcc_v8qi, __gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vcaled_f64(double); +__gcc_v16qi __builtin_neon_vcaleq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcaleq_v(__gcc_v16qi, __gcc_v16qi, int); +unsigned int __builtin_neon_vcales_f32(float, float); +__gcc_v8qi __builtin_neon_vcalt_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcalt_v(__gcc_v8qi, __gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vcaltd_f64(double); +__gcc_v16qi __builtin_neon_vcaltq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcaltq_v(__gcc_v16qi, __gcc_v16qi, int); +unsigned int __builtin_neon_vcalts_f32(float, float); +unsigned int64_t double __builtin_neon_vceqd_f64(double); +unsigned int64_t int64_t int64_t __builtin_neon_vceqd_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vceqd_u64(void); +unsigned int __builtin_neon_vceqs_f32(float, float); +__gcc_v8qi __builtin_neon_vceqz_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vceqz_v(__gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vceqzd_f64(void); +unsigned int64_t int64_t __builtin_neon_vceqzd_s64(void); +unsigned int64_t unsigned int64_t __builtin_neon_vceqzd_u64(void); +__gcc_v16qi __builtin_neon_vceqzq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vceqzq_v(__gcc_v16qi, int); +unsigned int __builtin_neon_vceqzs_f32(float); +unsigned int64_t double __builtin_neon_vcged_f64(double); +unsigned int64_t int64_t int64_t __builtin_neon_vcged_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vcged_u64(void); +unsigned int __builtin_neon_vcges_f32(float, float); +__gcc_v8qi __builtin_neon_vcgez_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcgez_v(__gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vcgezd_f64(void); +unsigned int64_t int64_t __builtin_neon_vcgezd_s64(void); +__gcc_v16qi __builtin_neon_vcgezq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcgezq_v(__gcc_v16qi, int); +unsigned int __builtin_neon_vcgezs_f32(float); +unsigned int64_t double __builtin_neon_vcgtd_f64(double); +unsigned int64_t int64_t int64_t __builtin_neon_vcgtd_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vcgtd_u64(void); +unsigned int __builtin_neon_vcgts_f32(float, float); +__gcc_v8qi __builtin_neon_vcgtz_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcgtz_v(__gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vcgtzd_f64(void); +unsigned int64_t int64_t __builtin_neon_vcgtzd_s64(void); +__gcc_v16qi __builtin_neon_vcgtzq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcgtzq_v(__gcc_v16qi, int); +unsigned int __builtin_neon_vcgtzs_f32(float); +unsigned int64_t double __builtin_neon_vcled_f64(double); +unsigned int64_t int64_t int64_t __builtin_neon_vcled_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vcled_u64(void); +unsigned int __builtin_neon_vcles_f32(float, float); +__gcc_v8qi __builtin_neon_vclez_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vclez_v(__gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vclezd_f64(void); +unsigned int64_t int64_t __builtin_neon_vclezd_s64(void); +__gcc_v16qi __builtin_neon_vclezq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vclezq_v(__gcc_v16qi, int); +unsigned int __builtin_neon_vclezs_f32(float); +__gcc_v8qi __builtin_neon_vcls_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vclsq_v(__gcc_v16qi, int); +unsigned int64_t double __builtin_neon_vcltd_f64(double); +unsigned int64_t int64_t int64_t __builtin_neon_vcltd_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vcltd_u64(void); +unsigned int __builtin_neon_vclts_f32(float, float); +__gcc_v8qi __builtin_neon_vcltz_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcltz_v(__gcc_v8qi, int); +unsigned int64_t double __builtin_neon_vcltzd_f64(void); +unsigned int64_t int64_t __builtin_neon_vcltzd_s64(void); +__gcc_v16qi __builtin_neon_vcltzq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcltzq_v(__gcc_v16qi, int); +unsigned int __builtin_neon_vcltzs_f32(float); +__gcc_v8qi __builtin_neon_vclz_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vclzq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vcmla_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcmla_f32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcmla_rot180_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcmla_rot180_f32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcmla_rot270_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcmla_rot270_f32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcmla_rot90_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcmla_rot90_f32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_f32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_f64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot180_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot180_f32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot180_f64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot270_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot270_f32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot270_f64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot90_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot90_f32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcmlaq_rot90_f64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vcnt_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vcntq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vcvt_bf16_f32(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vcvt_f16_f32(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vcvt_f16_s16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_f16_u16(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vcvt_f32_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_f32_f64(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vcvt_f32_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vcvt_f64_f32(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_f64_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_n_f16_s16(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_f16_u16(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_f32_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_f64_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_s16_f16(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_s32_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_s64_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_u16_f16(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_u32_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_n_u64_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vcvt_s16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_s32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_s64_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_u16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_u32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvt_u64_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvta_s16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvta_s32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvta_s64_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvta_u16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvta_u32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvta_u64_v(__gcc_v8qi, int); +int __builtin_neon_vcvtad_s32_f64(double); +int64_t double __builtin_neon_vcvtad_s64_f64(void); +unsigned int __builtin_neon_vcvtad_u32_f64(double); +unsigned int64_t double __builtin_neon_vcvtad_u64_f64(void); +__gcc_v16qi __builtin_neon_vcvtaq_s16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtaq_s32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtaq_s64_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtaq_u16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtaq_u32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtaq_u64_v(__gcc_v16qi, int); +int __builtin_neon_vcvtas_s32_f32(float); +int64_t float __builtin_neon_vcvtas_s64_f32(void); +unsigned int __builtin_neon_vcvtas_u32_f32(float); +unsigned int64_t float __builtin_neon_vcvtas_u64_f32(void); +double __builtin_neon_vcvtd_f64_s64(int64_t); +double __builtin_neon_vcvtd_f64_u64(unsigned int64_t); +double __builtin_neon_vcvtd_n_f64_s64(int64_t int); +double __builtin_neon_vcvtd_n_f64_u64(unsigned int64_t int); +int64_t double __builtin_neon_vcvtd_n_s64_f64(int); +unsigned int64_t double __builtin_neon_vcvtd_n_u64_f64(int); +int __builtin_neon_vcvtd_s32_f64(double); +int64_t double __builtin_neon_vcvtd_s64_f64(void); +unsigned int __builtin_neon_vcvtd_u32_f64(double); +unsigned int64_t double __builtin_neon_vcvtd_u64_f64(void); +__bf16 __builtin_neon_vcvth_bf16_f32(float); +__gcc_v8qi __builtin_neon_vcvtm_s16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtm_s32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtm_s64_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtm_u16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtm_u32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtm_u64_v(__gcc_v8qi, int); +int __builtin_neon_vcvtmd_s32_f64(double); +int64_t double __builtin_neon_vcvtmd_s64_f64(void); +unsigned int __builtin_neon_vcvtmd_u32_f64(double); +unsigned int64_t double __builtin_neon_vcvtmd_u64_f64(void); +__gcc_v16qi __builtin_neon_vcvtmq_s16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtmq_s32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtmq_s64_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtmq_u16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtmq_u32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtmq_u64_v(__gcc_v16qi, int); +int __builtin_neon_vcvtms_s32_f32(float); +int64_t float __builtin_neon_vcvtms_s64_f32(void); +unsigned int __builtin_neon_vcvtms_u32_f32(float); +unsigned int64_t float __builtin_neon_vcvtms_u64_f32(void); +__gcc_v8qi __builtin_neon_vcvtn_s16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtn_s32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtn_s64_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtn_u16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtn_u32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtn_u64_v(__gcc_v8qi, int); +int __builtin_neon_vcvtnd_s32_f64(double); +int64_t double __builtin_neon_vcvtnd_s64_f64(void); +unsigned int __builtin_neon_vcvtnd_u32_f64(double); +unsigned int64_t double __builtin_neon_vcvtnd_u64_f64(void); +__gcc_v16qi __builtin_neon_vcvtnq_s16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtnq_s32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtnq_s64_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtnq_u16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtnq_u32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtnq_u64_v(__gcc_v16qi, int); +int __builtin_neon_vcvtns_s32_f32(float); +int64_t float __builtin_neon_vcvtns_s64_f32(void); +unsigned int __builtin_neon_vcvtns_u32_f32(float); +unsigned int64_t float __builtin_neon_vcvtns_u64_f32(void); +__gcc_v8qi __builtin_neon_vcvtp_s16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtp_s32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtp_s64_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtp_u16_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtp_u32_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vcvtp_u64_v(__gcc_v8qi, int); +int __builtin_neon_vcvtpd_s32_f64(double); +int64_t double __builtin_neon_vcvtpd_s64_f64(void); +unsigned int __builtin_neon_vcvtpd_u32_f64(double); +unsigned int64_t double __builtin_neon_vcvtpd_u64_f64(void); +__gcc_v16qi __builtin_neon_vcvtpq_s16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtpq_s32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtpq_s64_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtpq_u16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtpq_u32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtpq_u64_v(__gcc_v16qi, int); +int __builtin_neon_vcvtps_s32_f32(float); +int64_t float __builtin_neon_vcvtps_s64_f32(void); +unsigned int __builtin_neon_vcvtps_u32_f32(float); +unsigned int64_t float __builtin_neon_vcvtps_u64_f32(void); +__gcc_v16qi __builtin_neon_vcvtq_f16_s16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_f16_u16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_f32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_f64_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_high_bf16_f32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_low_bf16_f32(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_n_f16_s16(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_f16_u16(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_f32_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_f64_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_s16_f16(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_s32_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_s64_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_u16_f16(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_u32_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_n_u64_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vcvtq_s16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_s32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_s64_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_u16_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_u32_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vcvtq_u64_v(__gcc_v16qi, int); +float __builtin_neon_vcvts_f32_s32(int); +float __builtin_neon_vcvts_f32_u32(unsigned int); +float __builtin_neon_vcvts_n_f32_s32(int, int); +float __builtin_neon_vcvts_n_f32_u32(unsigned int, int); +int __builtin_neon_vcvts_n_s32_f32(float, int); +unsigned int __builtin_neon_vcvts_n_u32_f32(float, int); +int __builtin_neon_vcvts_s32_f32(float); +int64_t float __builtin_neon_vcvts_s64_f32(void); +unsigned int __builtin_neon_vcvts_u32_f32(float); +unsigned int64_t float __builtin_neon_vcvts_u64_f32(void); +__gcc_v8qi __builtin_neon_vcvtx_f32_v(__gcc_v16qi, int); +float __builtin_neon_vcvtxd_f32_f64(double); +__gcc_v8qi __builtin_neon_vdot_f32_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vdot_lane_f32_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vdot_laneq_f32_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vdot_s32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vdot_u32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vdotq_f32_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vdotq_lane_f32_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vdotq_laneq_f32_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vdotq_s32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vdotq_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +unsigned char __builtin_neon_vdupb_lane_i8(__gcc_v8qi, int); +unsigned char __builtin_neon_vdupb_laneq_i8(__gcc_v16qi, int); +double __builtin_neon_vdupd_laneq_f64(__gcc_v2df, int); +unsigned short __builtin_neon_vduph_lane_i16(__gcc_v4hi, int); +__bf16 __builtin_neon_vduph_laneq_bf16(__gcc_v8hf, int); +__fp16 __builtin_neon_vduph_laneq_f16(__gcc_v8hf, int); +unsigned short __builtin_neon_vduph_laneq_i16(__gcc_v8hi, int); +float __builtin_neon_vdups_lane_f32(__gcc_v2sf, int); +unsigned int __builtin_neon_vdups_lane_i32(__gcc_v2si, int); +float __builtin_neon_vdups_laneq_f32(__gcc_v4sf, int); +unsigned int __builtin_neon_vdups_laneq_i32(__gcc_v4si, int); +__gcc_v16qi __builtin_neon_veor3q_s16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_veor3q_s32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_veor3q_s64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_veor3q_s8(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_veor3q_u16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_veor3q_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_veor3q_u64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_veor3q_u8(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vext_v(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vextq_v(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vfma_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vfma_lane_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vfma_lane_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vfma_laneq_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vfma_laneq_v(__gcc_v8qi, __gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vfma_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +double __builtin_neon_vfmad_laneq_f64(double, double, __gcc_v2df, int); +__fp16 __builtin_neon_vfmah_laneq_f16(__fp16, __fp16, __gcc_v8hf, int); +__gcc_v16qi __builtin_neon_vfmaq_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vfmaq_lane_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vfmaq_lane_v(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vfmaq_laneq_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vfmaq_laneq_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vfmaq_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +float __builtin_neon_vfmas_lane_f32(float, float, __gcc_v2sf, int); +float __builtin_neon_vfmas_laneq_f32(float, float, __gcc_v4sf, int); +__gcc_v8qi __builtin_neon_vfmlal_high_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vfmlal_low_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vfmlalq_high_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vfmlalq_low_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vfmlsl_high_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vfmlsl_low_f16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vfmlslq_high_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vfmlslq_low_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +float __builtin_neon_vget_lane_f32(__gcc_v2sf, int); +unsigned short __builtin_neon_vget_lane_i16(__gcc_v4hi, int); +unsigned int __builtin_neon_vget_lane_i32(__gcc_v2si, int); +unsigned char __builtin_neon_vget_lane_i8(__gcc_v8qi, int); +__bf16 __builtin_neon_vgetq_lane_bf16(__gcc_v8hf, int); +float __builtin_neon_vgetq_lane_f32(__gcc_v4sf, int); +double __builtin_neon_vgetq_lane_f64(__gcc_v2df, int); +unsigned short __builtin_neon_vgetq_lane_i16(__gcc_v8hi, int); +unsigned int __builtin_neon_vgetq_lane_i32(__gcc_v4si, int); +unsigned char __builtin_neon_vgetq_lane_i8(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vhadd_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vhaddq_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vhsub_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vhsubq_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vld1_bf16(const void *, int); +void __builtin_neon_vld1_bf16_x2(void *, const void *, int); +void __builtin_neon_vld1_bf16_x3(void *, const void *, int); +void __builtin_neon_vld1_bf16_x4(void *, const void *, int); +__gcc_v8qi __builtin_neon_vld1_dup_bf16(const void *, int); +__gcc_v8qi __builtin_neon_vld1_dup_v(const void *, int); +__gcc_v8qi __builtin_neon_vld1_lane_bf16(const void *, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vld1_lane_v(const void *, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vld1_v(const void *, int); +void __builtin_neon_vld1_x2_v(void *, const void *, int); +void __builtin_neon_vld1_x3_v(void *, const void *, int); +void __builtin_neon_vld1_x4_v(void *, const void *, int); +__gcc_v16qi __builtin_neon_vld1q_bf16(const void *, int); +void __builtin_neon_vld1q_bf16_x2(void *, const void *, int); +void __builtin_neon_vld1q_bf16_x3(void *, const void *, int); +void __builtin_neon_vld1q_bf16_x4(void *, const void *, int); +__gcc_v16qi __builtin_neon_vld1q_dup_bf16(const void *, int); +__gcc_v16qi __builtin_neon_vld1q_dup_v(const void *, int); +__gcc_v16qi __builtin_neon_vld1q_lane_bf16(const void *, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vld1q_lane_v(const void *, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vld1q_v(const void *, int); +void __builtin_neon_vld1q_x2_v(void *, const void *, int); +void __builtin_neon_vld1q_x3_v(void *, const void *, int); +void __builtin_neon_vld1q_x4_v(void *, const void *, int); +void __builtin_neon_vld2_bf16(void *, const void *, int); +void __builtin_neon_vld2_dup_bf16(void *, const void *, int); +void __builtin_neon_vld2_dup_v(void *, const void *, int); +void __builtin_neon_vld2_lane_bf16(void *, const void *, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vld2_lane_v(void *, const void *, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vld2_v(void *, const void *, int); +void __builtin_neon_vld2q_bf16(void *, const void *, int); +void __builtin_neon_vld2q_dup_bf16(void *, const void *, int); +void __builtin_neon_vld2q_dup_v(void *, const void *, int); +void __builtin_neon_vld2q_lane_bf16(void *, const void *, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vld2q_lane_v(void *, const void *, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vld2q_v(void *, const void *, int); +void __builtin_neon_vld3_bf16(void *, const void *, int); +void __builtin_neon_vld3_dup_bf16(void *, const void *, int); +void __builtin_neon_vld3_dup_v(void *, const void *, int); +void __builtin_neon_vld3_lane_bf16(void *, const void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vld3_lane_v(void *, const void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vld3_v(void *, const void *, int); +void __builtin_neon_vld3q_bf16(void *, const void *, int); +void __builtin_neon_vld3q_dup_bf16(void *, const void *, int); +void __builtin_neon_vld3q_dup_v(void *, const void *, int); +void __builtin_neon_vld3q_lane_bf16(void *, const void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vld3q_lane_v(void *, const void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vld3q_v(void *, const void *, int); +void __builtin_neon_vld4_bf16(void *, const void *, int); +void __builtin_neon_vld4_dup_bf16(void *, const void *, int); +void __builtin_neon_vld4_dup_v(void *, const void *, int); +void __builtin_neon_vld4_lane_bf16(void *, const void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vld4_lane_v(void *, const void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vld4_v(void *, const void *, int); +void __builtin_neon_vld4q_bf16(void *, const void *, int); +void __builtin_neon_vld4q_dup_bf16(void *, const void *, int); +void __builtin_neon_vld4q_dup_v(void *, const void *, int); +void __builtin_neon_vld4q_lane_bf16(void *, const void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vld4q_lane_v(void *, const void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vld4q_v(void *, const void *, int); +__gcc_v8qi __builtin_neon_vldap1_lane_f64(const void *, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vldap1_lane_p64(const void *, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vldap1_lane_s64(const void *, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vldap1_lane_u64(const void *, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vldap1q_lane_f64(const void *, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vldap1q_lane_p64(const void *, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vldap1q_lane_s64(const void *, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vldap1q_lane_u64(const void *, __gcc_v16qi, int, int); +const unsigned __int128_t void * __builtin_neon_vldrq_p128(void); +__gcc_v16qi __builtin_neon_vluti2_lane_bf16(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_lane_f16(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_lane_mf8(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_lane_p16(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_lane_p8(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_lane_s16(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_lane_s8(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_lane_u16(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_lane_u8(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_bf16(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_f16(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_mf8(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_p16(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_p8(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_s16(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_s8(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_u16(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2_laneq_u8(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_bf16(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_f16(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_mf8(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_p16(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_p8(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_s16(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_s8(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_u16(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_lane_u8(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_bf16(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_f16(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_mf8(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_p16(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_p8(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_s16(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_s8(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_u16(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti2q_laneq_u8(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_bf16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_f16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_mf8(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_p16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_p8(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_s16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_s8(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_u16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_lane_u8(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_bf16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_f16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_mf8(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_p16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_p8(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_s16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_s8(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_u16_x2(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vluti4q_laneq_u8(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vmax_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vmax_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vmaxnm_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vmaxnm_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vmaxnmq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vmaxnmq_v(__gcc_v16qi, __gcc_v16qi, int); +__fp16 __builtin_neon_vmaxnmv_f16(__gcc_v8qi); +float __builtin_neon_vmaxnmv_f32(__gcc_v2sf); +__fp16 __builtin_neon_vmaxnmvq_f16(__gcc_v16qi); +float __builtin_neon_vmaxnmvq_f32(__gcc_v4sf); +double __builtin_neon_vmaxnmvq_f64(__gcc_v2df); +__gcc_v16qi __builtin_neon_vmaxq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vmaxq_v(__gcc_v16qi, __gcc_v16qi, int); +__fp16 __builtin_neon_vmaxv_f16(__gcc_v8qi); +float __builtin_neon_vmaxv_f32(__gcc_v2sf); +short __builtin_neon_vmaxv_s16(__gcc_v4hi); +int __builtin_neon_vmaxv_s32(__gcc_v2si); +signed char __builtin_neon_vmaxv_s8(__gcc_v8qi); +__fp16 __builtin_neon_vmaxvq_f16(__gcc_v16qi); +float __builtin_neon_vmaxvq_f32(__gcc_v4sf); +double __builtin_neon_vmaxvq_f64(__gcc_v2df); +short __builtin_neon_vmaxvq_s16(__gcc_v8hi); +int __builtin_neon_vmaxvq_s32(__gcc_v4si); +signed char __builtin_neon_vmaxvq_s8(__gcc_v16qi); +unsigned short __builtin_neon_vmaxvq_u16(__gcc_v8uhi); +unsigned int __builtin_neon_vmaxvq_u32(__gcc_v4usi); +__gcc_v8qi __builtin_neon_vmin_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vmin_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vminnm_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vminnm_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vminnmq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vminnmq_v(__gcc_v16qi, __gcc_v16qi, int); +__fp16 __builtin_neon_vminnmv_f16(__gcc_v8qi); +float __builtin_neon_vminnmv_f32(__gcc_v2sf); +__fp16 __builtin_neon_vminnmvq_f16(__gcc_v16qi); +float __builtin_neon_vminnmvq_f32(__gcc_v4sf); +double __builtin_neon_vminnmvq_f64(__gcc_v2df); +__gcc_v16qi __builtin_neon_vminq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vminq_v(__gcc_v16qi, __gcc_v16qi, int); +__fp16 __builtin_neon_vminv_f16(__gcc_v8qi); +float __builtin_neon_vminv_f32(__gcc_v2sf); +short __builtin_neon_vminv_s16(__gcc_v4hi); +int __builtin_neon_vminv_s32(__gcc_v2si); +signed char __builtin_neon_vminv_s8(__gcc_v8qi); +__fp16 __builtin_neon_vminvq_f16(__gcc_v16qi); +float __builtin_neon_vminvq_f32(__gcc_v4sf); +double __builtin_neon_vminvq_f64(__gcc_v2df); +short __builtin_neon_vminvq_s16(__gcc_v8hi); +int __builtin_neon_vminvq_s32(__gcc_v4si); +signed char __builtin_neon_vminvq_s8(__gcc_v16qi); +unsigned short __builtin_neon_vminvq_u16(__gcc_v8uhi); +unsigned int __builtin_neon_vminvq_u32(__gcc_v4usi); +__gcc_v16qi __builtin_neon_vmmlaq_f16_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vmmlaq_f32_f16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vmmlaq_s32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vmmlaq_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vmovl_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vmovn_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vmul_lane_v(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vmul_laneq_v(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vmul_v(__gcc_v8qi, __gcc_v8qi, int); +unsigned __int128_t unsigned int64_t unsigned int64_t __builtin_neon_vmull_p64(void); +__gcc_v16qi __builtin_neon_vmull_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vmulq_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vmulx_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vmulx_v(__gcc_v8qi, __gcc_v8qi, int); +double __builtin_neon_vmulxd_f64(double, double); +__fp16 __builtin_neon_vmulxh_laneq_f16(__fp16, __gcc_v8hf, int); +__gcc_v16qi __builtin_neon_vmulxq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vmulxq_v(__gcc_v16qi, __gcc_v16qi, int); +float __builtin_neon_vmulxs_f32(float, float); +int64_t int64_t __builtin_neon_vnegd_s64(void); +__gcc_v8qi __builtin_neon_vpadal_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vpadalq_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vpadd_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vpadd_v(__gcc_v8qi, __gcc_v8qi, int); +double __builtin_neon_vpaddd_f64(__gcc_v2df); +int64_t __gcc_v2di __builtin_neon_vpaddd_s64(void); +unsigned int64_t __gcc_v2udi __builtin_neon_vpaddd_u64(void); +__gcc_v8qi __builtin_neon_vpaddl_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vpaddlq_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vpaddq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vpaddq_v(__gcc_v16qi, __gcc_v16qi, int); +float __builtin_neon_vpadds_f32(__gcc_v2sf); +__gcc_v8qi __builtin_neon_vpmax_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vpmax_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vpmaxnm_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vpmaxnm_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vpmaxnmq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vpmaxnmq_v(__gcc_v16qi, __gcc_v16qi, int); +double __builtin_neon_vpmaxnmqd_f64(__gcc_v2df); +float __builtin_neon_vpmaxnms_f32(__gcc_v2sf); +__gcc_v16qi __builtin_neon_vpmaxq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vpmaxq_v(__gcc_v16qi, __gcc_v16qi, int); +double __builtin_neon_vpmaxqd_f64(__gcc_v2df); +float __builtin_neon_vpmaxs_f32(__gcc_v2sf); +__gcc_v8qi __builtin_neon_vpmin_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vpmin_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vpminnm_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vpminnm_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vpminnmq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vpminnmq_v(__gcc_v16qi, __gcc_v16qi, int); +double __builtin_neon_vpminnmqd_f64(__gcc_v2df); +float __builtin_neon_vpminnms_f32(__gcc_v2sf); +__gcc_v16qi __builtin_neon_vpminq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vpminq_v(__gcc_v16qi, __gcc_v16qi, int); +double __builtin_neon_vpminqd_f64(__gcc_v2df); +float __builtin_neon_vpmins_f32(__gcc_v2sf); +__gcc_v8qi __builtin_neon_vqabs_v(__gcc_v8qi, int); +signed char __builtin_neon_vqabsb_s8(signed char); +int64_t int64_t __builtin_neon_vqabsd_s64(void); +short __builtin_neon_vqabsh_s16(short); +__gcc_v16qi __builtin_neon_vqabsq_v(__gcc_v16qi, int); +int __builtin_neon_vqabss_s32(int); +__gcc_v8qi __builtin_neon_vqadd_v(__gcc_v8qi, __gcc_v8qi, int); +signed char __builtin_neon_vqaddb_s8(signed char, signed char); +unsigned char __builtin_neon_vqaddb_u8(unsigned char, unsigned char); +int64_t int64_t int64_t __builtin_neon_vqaddd_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vqaddd_u64(void); +short __builtin_neon_vqaddh_s16(short, short); +unsigned short __builtin_neon_vqaddh_u16(unsigned short, unsigned short); +__gcc_v16qi __builtin_neon_vqaddq_v(__gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vqadds_s32(int, int); +unsigned int __builtin_neon_vqadds_u32(unsigned int, unsigned int); +__gcc_v16qi __builtin_neon_vqdmlal_v(__gcc_v16qi, __gcc_v8qi, __gcc_v8qi, int); +int __builtin_neon_vqdmlalh_lane_s16(int, short, __gcc_v4hi, int); +int __builtin_neon_vqdmlalh_laneq_s16(int, short, __gcc_v8hi, int); +int __builtin_neon_vqdmlalh_s16(int, short, short); +int64_t int64_t int __builtin_neon_vqdmlals_lane_s32(__gcc_v2si, int); +int64_t int64_t int __builtin_neon_vqdmlals_laneq_s32(__gcc_v4si, int); +int64_t int64_t int __builtin_neon_vqdmlals_s32(int); +__gcc_v16qi __builtin_neon_vqdmlsl_v(__gcc_v16qi, __gcc_v8qi, __gcc_v8qi, int); +int __builtin_neon_vqdmlslh_lane_s16(int, short, __gcc_v4hi, int); +int __builtin_neon_vqdmlslh_laneq_s16(int, short, __gcc_v8hi, int); +int __builtin_neon_vqdmlslh_s16(int, short, short); +int64_t int64_t int __builtin_neon_vqdmlsls_lane_s32(__gcc_v2si, int); +int64_t int64_t int __builtin_neon_vqdmlsls_laneq_s32(__gcc_v4si, int); +int64_t int64_t int __builtin_neon_vqdmlsls_s32(int); +__gcc_v8qi __builtin_neon_vqdmulh_lane_v(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vqdmulh_laneq_v(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vqdmulh_v(__gcc_v8qi, __gcc_v8qi, int); +short __builtin_neon_vqdmulhh_s16(short, short); +__gcc_v16qi __builtin_neon_vqdmulhq_lane_v(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vqdmulhq_laneq_v(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vqdmulhq_v(__gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vqdmulhs_s32(int, int); +__gcc_v16qi __builtin_neon_vqdmull_v(__gcc_v8qi, __gcc_v8qi, int); +int __builtin_neon_vqdmullh_s16(short, short); +int64_t int __builtin_neon_vqdmulls_s32(int); +__gcc_v8qi __builtin_neon_vqmovn_v(__gcc_v16qi, int); +int __builtin_neon_vqmovnd_s64(int64_t); +unsigned int __builtin_neon_vqmovnd_u64(unsigned int64_t); +signed char __builtin_neon_vqmovnh_s16(short); +unsigned char __builtin_neon_vqmovnh_u16(unsigned short); +short __builtin_neon_vqmovns_s32(int); +unsigned short __builtin_neon_vqmovns_u32(unsigned int); +__gcc_v8qi __builtin_neon_vqmovun_v(__gcc_v16qi, int); +unsigned int __builtin_neon_vqmovund_s64(int64_t); +unsigned char __builtin_neon_vqmovunh_s16(short); +unsigned short __builtin_neon_vqmovuns_s32(int); +__gcc_v8qi __builtin_neon_vqneg_v(__gcc_v8qi, int); +signed char __builtin_neon_vqnegb_s8(signed char); +int64_t int64_t __builtin_neon_vqnegd_s64(void); +short __builtin_neon_vqnegh_s16(short); +__gcc_v16qi __builtin_neon_vqnegq_v(__gcc_v16qi, int); +int __builtin_neon_vqnegs_s32(int); +__gcc_v8qi __builtin_neon_vqrdmlah_s16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vqrdmlah_s32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +short __builtin_neon_vqrdmlahh_s16(short, short, short); +__gcc_v16qi __builtin_neon_vqrdmlahq_s16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vqrdmlahq_s32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vqrdmlahs_s32(int, int, int); +__gcc_v8qi __builtin_neon_vqrdmlsh_s16(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vqrdmlsh_s32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +short __builtin_neon_vqrdmlshh_s16(short, short, short); +__gcc_v16qi __builtin_neon_vqrdmlshq_s16(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vqrdmlshq_s32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vqrdmlshs_s32(int, int, int); +__gcc_v8qi __builtin_neon_vqrdmulh_lane_v(__gcc_v8qi, __gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vqrdmulh_laneq_v(__gcc_v8qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vqrdmulh_v(__gcc_v8qi, __gcc_v8qi, int); +short __builtin_neon_vqrdmulhh_s16(short, short); +__gcc_v16qi __builtin_neon_vqrdmulhq_lane_v(__gcc_v16qi, __gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vqrdmulhq_laneq_v(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vqrdmulhq_v(__gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vqrdmulhs_s32(int, int); +__gcc_v8qi __builtin_neon_vqrshl_v(__gcc_v8qi, __gcc_v8qi, int); +signed char __builtin_neon_vqrshlb_s8(signed char, signed char); +unsigned char __builtin_neon_vqrshlb_u8(unsigned char, signed char); +int64_t int64_t int64_t __builtin_neon_vqrshld_s64(void); +unsigned int64_t unsigned int64_t int64_t __builtin_neon_vqrshld_u64(void); +short __builtin_neon_vqrshlh_s16(short, short); +unsigned short __builtin_neon_vqrshlh_u16(unsigned short, short); +__gcc_v16qi __builtin_neon_vqrshlq_v(__gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vqrshls_s32(int, int); +unsigned int __builtin_neon_vqrshls_u32(unsigned int, int); +__gcc_v8qi __builtin_neon_vqrshrn_n_v(__gcc_v16qi, int, int); +int __builtin_neon_vqrshrnd_n_s64(int64_t int); +unsigned int __builtin_neon_vqrshrnd_n_u64(unsigned int64_t int); +signed char __builtin_neon_vqrshrnh_n_s16(short, int); +unsigned char __builtin_neon_vqrshrnh_n_u16(unsigned short, int); +short __builtin_neon_vqrshrns_n_s32(int, int); +unsigned short __builtin_neon_vqrshrns_n_u32(unsigned int, int); +__gcc_v8qi __builtin_neon_vqrshrun_n_v(__gcc_v16qi, int, int); +unsigned int __builtin_neon_vqrshrund_n_s64(int64_t int); +unsigned char __builtin_neon_vqrshrunh_n_s16(short, int); +unsigned short __builtin_neon_vqrshruns_n_s32(int, int); +__gcc_v8qi __builtin_neon_vqshl_n_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vqshl_v(__gcc_v8qi, __gcc_v8qi, int); +signed char __builtin_neon_vqshlb_n_s8(signed char, int); +unsigned char __builtin_neon_vqshlb_n_u8(unsigned char, int); +signed char __builtin_neon_vqshlb_s8(signed char, signed char); +unsigned char __builtin_neon_vqshlb_u8(unsigned char, signed char); +int64_t int64_t int __builtin_neon_vqshld_n_s64(void); +unsigned int64_t unsigned int64_t int __builtin_neon_vqshld_n_u64(void); +int64_t int64_t int64_t __builtin_neon_vqshld_s64(void); +unsigned int64_t unsigned int64_t int64_t __builtin_neon_vqshld_u64(void); +short __builtin_neon_vqshlh_n_s16(short, int); +unsigned short __builtin_neon_vqshlh_n_u16(unsigned short, int); +short __builtin_neon_vqshlh_s16(short, short); +unsigned short __builtin_neon_vqshlh_u16(unsigned short, short); +__gcc_v16qi __builtin_neon_vqshlq_n_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vqshlq_v(__gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vqshls_n_s32(int, int); +unsigned int __builtin_neon_vqshls_n_u32(unsigned int, int); +int __builtin_neon_vqshls_s32(int, int); +unsigned int __builtin_neon_vqshls_u32(unsigned int, int); +__gcc_v8qi __builtin_neon_vqshlu_n_v(__gcc_v8qi, int, int); +signed char __builtin_neon_vqshlub_n_s8(signed char, int); +int64_t int64_t int __builtin_neon_vqshlud_n_s64(void); +short __builtin_neon_vqshluh_n_s16(short, int); +__gcc_v16qi __builtin_neon_vqshluq_n_v(__gcc_v16qi, int, int); +int __builtin_neon_vqshlus_n_s32(int, int); +__gcc_v8qi __builtin_neon_vqshrn_n_v(__gcc_v16qi, int, int); +int __builtin_neon_vqshrnd_n_s64(int64_t int); +unsigned int __builtin_neon_vqshrnd_n_u64(unsigned int64_t int); +signed char __builtin_neon_vqshrnh_n_s16(short, int); +unsigned char __builtin_neon_vqshrnh_n_u16(unsigned short, int); +short __builtin_neon_vqshrns_n_s32(int, int); +unsigned short __builtin_neon_vqshrns_n_u32(unsigned int, int); +__gcc_v8qi __builtin_neon_vqshrun_n_v(__gcc_v16qi, int, int); +unsigned int __builtin_neon_vqshrund_n_s64(int64_t int); +unsigned char __builtin_neon_vqshrunh_n_s16(short, int); +unsigned short __builtin_neon_vqshruns_n_s32(int, int); +__gcc_v8qi __builtin_neon_vqsub_v(__gcc_v8qi, __gcc_v8qi, int); +signed char __builtin_neon_vqsubb_s8(signed char, signed char); +unsigned char __builtin_neon_vqsubb_u8(unsigned char, unsigned char); +int64_t int64_t int64_t __builtin_neon_vqsubd_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vqsubd_u64(void); +short __builtin_neon_vqsubh_s16(short, short); +unsigned short __builtin_neon_vqsubh_u16(unsigned short, unsigned short); +__gcc_v16qi __builtin_neon_vqsubq_v(__gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vqsubs_s32(int, int); +unsigned int __builtin_neon_vqsubs_u32(unsigned int, unsigned int); +__gcc_v8qi __builtin_neon_vqtbl1_v(__gcc_v16qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vqtbl1q_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vqtbl2_v(__gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vqtbl2q_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vqtbl3_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vqtbl3q_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vqtbl4_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vqtbl4q_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vqtbx1_v(__gcc_v8qi, __gcc_v16qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vqtbx1q_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vqtbx2_v(__gcc_v8qi, __gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vqtbx2q_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vqtbx3_v(__gcc_v8qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vqtbx3q_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vqtbx4_v(__gcc_v8qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vqtbx4q_v(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vraddhn_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrax1q_u64(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrbit_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrbitq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrecpe_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrecpe_v(__gcc_v8qi, int); +double __builtin_neon_vrecped_f64(double); +__gcc_v16qi __builtin_neon_vrecpeq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrecpeq_v(__gcc_v16qi, int); +float __builtin_neon_vrecpes_f32(float); +__gcc_v8qi __builtin_neon_vrecps_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrecps_v(__gcc_v8qi, __gcc_v8qi, int); +double __builtin_neon_vrecpsd_f64(double, double); +__gcc_v16qi __builtin_neon_vrecpsq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrecpsq_v(__gcc_v16qi, __gcc_v16qi, int); +float __builtin_neon_vrecpss_f32(float, float); +double __builtin_neon_vrecpxd_f64(double); +float __builtin_neon_vrecpxs_f32(float); +__gcc_v8qi __builtin_neon_vrhadd_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrhaddq_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrnd32x_f32(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrnd32x_f64(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrnd32xq_f32(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrnd32xq_f64(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrnd32z_f32(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrnd32z_f64(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrnd32zq_f32(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrnd32zq_f64(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrnd64x_f32(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrnd64x_f64(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrnd64xq_f32(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrnd64xq_f64(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrnd64z_f32(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrnd64z_f64(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrnd64zq_f32(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrnd64zq_f64(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrnd_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrnd_v(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrnda_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrnda_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrndaq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrndaq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrndi_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrndi_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrndiq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrndiq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrndm_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrndm_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrndmq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrndmq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrndn_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrndn_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrndnq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrndnq_v(__gcc_v16qi, int); +float __builtin_neon_vrndns_f32(float); +__gcc_v8qi __builtin_neon_vrndp_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrndp_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrndpq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrndpq_v(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrndq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrndq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrndx_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrndx_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vrndxq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrndxq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrshl_v(__gcc_v8qi, __gcc_v8qi, int); +int64_t int64_t int64_t __builtin_neon_vrshld_s64(void); +unsigned int64_t unsigned int64_t int64_t __builtin_neon_vrshld_u64(void); +__gcc_v16qi __builtin_neon_vrshlq_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vrshr_n_v(__gcc_v8qi, int, int); +int64_t int64_t int __builtin_neon_vrshrd_n_s64(void); +unsigned int64_t unsigned int64_t int __builtin_neon_vrshrd_n_u64(void); +__gcc_v8qi __builtin_neon_vrshrn_n_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vrshrq_n_v(__gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vrsqrte_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrsqrte_v(__gcc_v8qi, int); +double __builtin_neon_vrsqrted_f64(double); +__gcc_v16qi __builtin_neon_vrsqrteq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrsqrteq_v(__gcc_v16qi, int); +float __builtin_neon_vrsqrtes_f32(float); +__gcc_v8qi __builtin_neon_vrsqrts_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vrsqrts_v(__gcc_v8qi, __gcc_v8qi, int); +double __builtin_neon_vrsqrtsd_f64(double, double); +__gcc_v16qi __builtin_neon_vrsqrtsq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vrsqrtsq_v(__gcc_v16qi, __gcc_v16qi, int); +float __builtin_neon_vrsqrtss_f32(float, float); +__gcc_v8qi __builtin_neon_vrsra_n_v(__gcc_v8qi, __gcc_v8qi, int, int); +int64_t int64_t int64_t int __builtin_neon_vrsrad_n_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t int __builtin_neon_vrsrad_n_u64(void); +__gcc_v16qi __builtin_neon_vrsraq_n_v(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vrsubhn_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vscale_f16(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vscale_f32(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vscaleq_f16(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vscaleq_f32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vscaleq_f64(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v2sf __builtin_neon_vset_lane_f32(float, __gcc_v2sf, int); +__gcc_v4hi __builtin_neon_vset_lane_i16(short, __gcc_v4hi, int); +__gcc_v2si __builtin_neon_vset_lane_i32(int, __gcc_v2si, int); +__gcc_v8qi __builtin_neon_vset_lane_i8(signed char, __gcc_v8qi, int); +__gcc_v8hf __builtin_neon_vsetq_lane_bf16(__bf16, __gcc_v8hf, int); +__gcc_v4sf __builtin_neon_vsetq_lane_f32(float, __gcc_v4sf, int); +__gcc_v2df __builtin_neon_vsetq_lane_f64(double, __gcc_v2df, int); +__gcc_v8hi __builtin_neon_vsetq_lane_i16(short, __gcc_v8hi, int); +__gcc_v4si __builtin_neon_vsetq_lane_i32(int, __gcc_v4si, int); +__gcc_v16qi __builtin_neon_vsetq_lane_i8(signed char, __gcc_v16qi, int); +__gcc_v4si __builtin_neon_vsha1cq_u32(__gcc_v4usi, unsigned int, __gcc_v4usi); +unsigned int __builtin_neon_vsha1h_u32(unsigned int); +__gcc_v4si __builtin_neon_vsha1mq_u32(__gcc_v4usi, unsigned int, __gcc_v4usi); +__gcc_v4si __builtin_neon_vsha1pq_u32(__gcc_v4usi, unsigned int, __gcc_v4usi); +__gcc_v16qi __builtin_neon_vsha1su0q_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha1su1q_u32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha256h2q_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha256hq_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha256su0q_u32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha256su1q_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha512h2q_u64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha512hq_u64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha512su0q_u64(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsha512su1q_u64(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vshl_n_v(__gcc_v8qi, int, int); +__gcc_v8qi __builtin_neon_vshl_v(__gcc_v8qi, __gcc_v8qi, int); +int64_t int64_t int __builtin_neon_vshld_n_s64(void); +unsigned int64_t unsigned int64_t int __builtin_neon_vshld_n_u64(void); +int64_t int64_t int64_t __builtin_neon_vshld_s64(void); +unsigned int64_t unsigned int64_t int64_t __builtin_neon_vshld_u64(void); +__gcc_v16qi __builtin_neon_vshll_n_v(__gcc_v8qi, int, int); +__gcc_v16qi __builtin_neon_vshlq_n_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vshlq_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vshr_n_v(__gcc_v8qi, int, int); +int64_t int64_t int __builtin_neon_vshrd_n_s64(void); +unsigned int64_t unsigned int64_t int __builtin_neon_vshrd_n_u64(void); +__gcc_v8qi __builtin_neon_vshrn_n_v(__gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vshrq_n_v(__gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vsli_n_v(__gcc_v8qi, __gcc_v8qi, int, int); +int64_t int64_t int64_t int __builtin_neon_vslid_n_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t int __builtin_neon_vslid_n_u64(void); +__gcc_v16qi __builtin_neon_vsliq_n_v(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vsm3partw1q_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsm3partw2q_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsm3ss1q_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsm3tt1aq_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vsm3tt1bq_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vsm3tt2aq_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vsm3tt2bq_u32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v16qi __builtin_neon_vsm4ekeyq_u32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsm4eq_u32(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vsqadd_v(__gcc_v8qi, __gcc_v8qi, int); +unsigned char __builtin_neon_vsqaddb_u8(unsigned char, signed char); +unsigned int64_t unsigned int64_t int64_t __builtin_neon_vsqaddd_u64(void); +unsigned short __builtin_neon_vsqaddh_u16(unsigned short, short); +__gcc_v16qi __builtin_neon_vsqaddq_v(__gcc_v16qi, __gcc_v16qi, int); +unsigned int __builtin_neon_vsqadds_u32(unsigned int, int); +__gcc_v8qi __builtin_neon_vsqrt_f16(__gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vsqrt_v(__gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vsqrtq_f16(__gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vsqrtq_v(__gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vsra_n_v(__gcc_v8qi, __gcc_v8qi, int, int); +int64_t int64_t int64_t int __builtin_neon_vsrad_n_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t int __builtin_neon_vsrad_n_u64(void); +__gcc_v16qi __builtin_neon_vsraq_n_v(__gcc_v16qi, __gcc_v16qi, int, int); +__gcc_v8qi __builtin_neon_vsri_n_v(__gcc_v8qi, __gcc_v8qi, int, int); +int64_t int64_t int64_t int __builtin_neon_vsrid_n_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t int __builtin_neon_vsrid_n_u64(void); +__gcc_v16qi __builtin_neon_vsriq_n_v(__gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vst1_bf16(void *, __gcc_v8qi, int); +void __builtin_neon_vst1_bf16_x2(void *, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst1_bf16_x3(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst1_bf16_x4(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst1_lane_bf16(void *, __gcc_v8qi, int, int); +void __builtin_neon_vst1_lane_v(void *, __gcc_v8qi, int, int); +void __builtin_neon_vst1_v(void *, __gcc_v8qi, int); +void __builtin_neon_vst1_x2_v(void *, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst1_x3_v(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst1_x4_v(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst1q_bf16(void *, __gcc_v16qi, int); +void __builtin_neon_vst1q_bf16_x2(void *, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst1q_bf16_x3(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst1q_bf16_x4(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst1q_lane_bf16(void *, __gcc_v16qi, int, int); +void __builtin_neon_vst1q_lane_v(void *, __gcc_v16qi, int, int); +void __builtin_neon_vst1q_v(void *, __gcc_v16qi, int); +void __builtin_neon_vst1q_x2_v(void *, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst1q_x3_v(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst1q_x4_v(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst2_bf16(void *, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst2_lane_bf16(void *, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vst2_lane_v(void *, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vst2_v(void *, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst2q_bf16(void *, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst2q_lane_bf16(void *, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vst2q_lane_v(void *, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vst2q_v(void *, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst3_bf16(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst3_lane_bf16(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vst3_lane_v(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vst3_v(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst3q_bf16(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst3q_lane_bf16(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vst3q_lane_v(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vst3q_v(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst4_bf16(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst4_lane_bf16(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vst4_lane_v(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int, int); +void __builtin_neon_vst4_v(void *, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vst4q_bf16(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vst4q_lane_bf16(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vst4q_lane_v(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vst4q_v(void *, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vstl1_lane_f64(void *, __gcc_v8qi, int, int); +void __builtin_neon_vstl1_lane_p64(void *, __gcc_v8qi, int, int); +void __builtin_neon_vstl1_lane_s64(void *, __gcc_v8qi, int, int); +void __builtin_neon_vstl1_lane_u64(void *, __gcc_v8qi, int, int); +void __builtin_neon_vstl1q_lane_f64(void *, __gcc_v16qi, int, int); +void __builtin_neon_vstl1q_lane_p64(void *, __gcc_v16qi, int, int); +void __builtin_neon_vstl1q_lane_s64(void *, __gcc_v16qi, int, int); +void __builtin_neon_vstl1q_lane_u64(void *, __gcc_v16qi, int, int); +void __builtin_neon_vstrq_p128(void *, unsigned __int128_t); +int64_t int64_t int64_t __builtin_neon_vsubd_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vsubd_u64(void); +__gcc_v8qi __builtin_neon_vsubhn_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vtbl1_v(__gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vtbl2_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vtbl3_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vtbl4_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vtbx1_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vtbx2_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vtbx3_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v8qi __builtin_neon_vtbx4_v(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vtrn_v(void *, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vtrnq_v(void *, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vtst_v(__gcc_v8qi, __gcc_v8qi, int); +unsigned int64_t int64_t int64_t __builtin_neon_vtstd_s64(void); +unsigned int64_t unsigned int64_t unsigned int64_t __builtin_neon_vtstd_u64(void); +__gcc_v16qi __builtin_neon_vtstq_v(__gcc_v16qi, __gcc_v16qi, int); +__gcc_v8qi __builtin_neon_vuqadd_v(__gcc_v8qi, __gcc_v8qi, int); +signed char __builtin_neon_vuqaddb_s8(signed char, unsigned char); +int64_t int64_t unsigned int64_t __builtin_neon_vuqaddd_s64(void); +short __builtin_neon_vuqaddh_s16(short, unsigned short); +__gcc_v16qi __builtin_neon_vuqaddq_v(__gcc_v16qi, __gcc_v16qi, int); +int __builtin_neon_vuqadds_s32(int, unsigned int); +__gcc_v8qi __builtin_neon_vusdot_s32(__gcc_v8qi, __gcc_v8qi, __gcc_v8qi, int); +__gcc_v16qi __builtin_neon_vusdotq_s32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vusmmlaq_s32(__gcc_v16qi, __gcc_v16qi, __gcc_v16qi, int); +void __builtin_neon_vuzp_v(void *, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vuzpq_v(void *, __gcc_v16qi, __gcc_v16qi, int); +__gcc_v16qi __builtin_neon_vxarq_u64(__gcc_v16qi, __gcc_v16qi, int, int); +void __builtin_neon_vzip_v(void *, __gcc_v8qi, __gcc_v8qi, int); +void __builtin_neon_vzipq_v(void *, __gcc_v16qi, __gcc_v16qi, int); +// clang-format on diff --git a/src/ansi-c/library/arm_neon.c b/src/ansi-c/library/arm_neon.c new file mode 100644 index 00000000000..2cab4c2ad10 --- /dev/null +++ b/src/ansi-c/library/arm_neon.c @@ -0,0 +1,1895 @@ +/* FUNCTION: __builtin_neon_vabd_v */ + +// Arm instruction(s): SABD, UABD (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vabd_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + { + int d = (int)x[i] - (int)y[i]; + r[i] = d < 0 ? -d : d; + } + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + { + int d = (int)x[i] - (int)y[i]; + r[i] = d < 0 ? -d : d; + } + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + { + long long d = (long long)x[i] - (long long)y[i]; + r[i] = d < 0 ? -d : d; + } + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i]; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i]; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i]; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vabdq_v */ + +// Arm instruction(s): SABD, UABD (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vabdq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + { + int d = (int)x[i] - (int)y[i]; + r[i] = d < 0 ? -d : d; + } + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + { + int d = (int)x[i] - (int)y[i]; + r[i] = d < 0 ? -d : d; + } + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + { + long long d = (long long)x[i] - (long long)y[i]; + r[i] = d < 0 ? -d : d; + } + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i]; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i]; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i]; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vbsl_v */ + +// Arm instruction(s): BSL (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); + +__gcc_v8qi +__builtin_neon_vbsl_v(__gcc_v8qi mask, __gcc_v8qi a, __gcc_v8qi b, int type) +{ + (void)type; + return (mask & a) | (~mask & b); +} + +/* FUNCTION: __builtin_neon_vbslq_v */ + +// Arm instruction(s): BSL (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); + +__gcc_v16qi +__builtin_neon_vbslq_v(__gcc_v16qi mask, __gcc_v16qi a, __gcc_v16qi b, int type) +{ + (void)type; + return (mask & a) | (~mask & b); +} + +/* FUNCTION: __builtin_neon_vhadd_v */ + +// Arm instruction(s): SHADD, UHADD (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vhadd_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] + (int)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((int)x[i] + (int)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + r[i] = ((long long)x[i] + (long long)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] + (int)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((int)x[i] + (int)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = ((long long)x[i] + (long long)y[i]) >> 1; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vhaddq_v */ + +// Arm instruction(s): SHADD, UHADD (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vhaddq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + r[i] = ((int)x[i] + (int)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] + (int)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((long long)x[i] + (long long)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + r[i] = ((int)x[i] + (int)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] + (int)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((long long)x[i] + (long long)y[i]) >> 1; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vhsub_v */ + +// Arm instruction(s): SHSUB, UHSUB (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vhsub_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] - (int)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((int)x[i] - (int)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + r[i] = ((long long)x[i] - (long long)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] - (int)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((int)x[i] - (int)y[i]) >> 1; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = ((long long)x[i] - (long long)y[i]) >> 1; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vhsubq_v */ + +// Arm instruction(s): SHSUB, UHSUB (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vhsubq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + r[i] = ((int)x[i] - (int)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] - (int)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((long long)x[i] - (long long)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + r[i] = ((int)x[i] - (int)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] - (int)y[i]) >> 1; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((long long)x[i] - (long long)y[i]) >> 1; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vmax_v */ + +// Arm instruction(s): SMAX, UMAX (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vmax_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vmaxq_v */ + +// Arm instruction(s): SMAX, UMAX (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vmaxq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] > y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vmin_v */ + +// Arm instruction(s): SMIN, UMIN (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vmin_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vminq_v */ + +// Arm instruction(s): SMIN, UMIN (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vminq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] < y[i] ? x[i] : y[i]; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vpadd_v */ + +// Arm instruction(s): ADDP (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vpadd_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned char)x[2 * i] + (unsigned char)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned char)y[2 * i] + (unsigned char)y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned short)x[2 * i] + (unsigned short)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned short)y[2 * i] + (unsigned short)y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + int h = 2 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned int)x[2 * i] + (unsigned int)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned int)y[2 * i] + (unsigned int)y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned char)x[2 * i] + (unsigned char)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned char)y[2 * i] + (unsigned char)y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned short)x[2 * i] + (unsigned short)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned short)y[2 * i] + (unsigned short)y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + int h = 2 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned int)x[2 * i] + (unsigned int)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned int)y[2 * i] + (unsigned int)y[2 * i + 1]; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vpaddq_v */ + +// Arm instruction(s): ADDP (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef long long __gcc_v2di_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vpaddq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + int h = 16 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned char)x[2 * i] + (unsigned char)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned char)y[2 * i] + (unsigned char)y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned short)x[2 * i] + (unsigned short)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned short)y[2 * i] + (unsigned short)y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned int)x[2 * i] + (unsigned int)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned int)y[2 * i] + (unsigned int)y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 35: + { + __gcc_v2di_s x = (__gcc_v2di_s)a, y = (__gcc_v2di_s)b, r; + int h = 2 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned long long)x[2 * i] + (unsigned long long)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = + (unsigned long long)y[2 * i] + (unsigned long long)y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + int h = 16 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned char)x[2 * i] + (unsigned char)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned char)y[2 * i] + (unsigned char)y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned short)x[2 * i] + (unsigned short)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned short)y[2 * i] + (unsigned short)y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned int)x[2 * i] + (unsigned int)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = (unsigned int)y[2 * i] + (unsigned int)y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 51: + { + __gcc_v2di_u x = (__gcc_v2di_u)a, y = (__gcc_v2di_u)b, r; + int h = 2 / 2; + for(int i = 0; i < h; i++) + r[i] = (unsigned long long)x[2 * i] + (unsigned long long)x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = + (unsigned long long)y[2 * i] + (unsigned long long)y[2 * i + 1]; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vpmax_v */ + +// Arm instruction(s): SMAXP, UMAXP (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vpmax_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + int h = 2 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + int h = 2 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vpmaxq_v */ + +// Arm instruction(s): SMAXP, UMAXP (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vpmaxq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + int h = 16 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + int h = 16 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] > x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] > y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vpmin_v */ + +// Arm instruction(s): SMINP, UMINP (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vpmin_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + int h = 2 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + int h = 2 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vpminq_v */ + +// Arm instruction(s): SMINP, UMINP (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vpminq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + int h = 16 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + int h = 16 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + int h = 8 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + int h = 4 / 2; + for(int i = 0; i < h; i++) + r[i] = x[2 * i] < x[2 * i + 1] ? x[2 * i] : x[2 * i + 1]; + for(int i = 0; i < h; i++) + r[h + i] = y[2 * i] < y[2 * i + 1] ? y[2 * i] : y[2 * i + 1]; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vqadd_v */ + +// Arm instruction(s): SQADD, UQADD (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef long long __gcc_v1di_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); +typedef unsigned long long __gcc_v1di_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vqadd_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + { + int s = (int)x[i] + (int)y[i]; + r[i] = s < -128 ? -128 : (s > 127 ? 127 : s); + } + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + { + int s = (int)x[i] + (int)y[i]; + r[i] = s < -32768 ? -32768 : (s > 32767 ? 32767 : s); + } + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + { + long long s = (long long)x[i] + (long long)y[i]; + r[i] = s < -2147483648 ? -2147483648 : (s > 2147483647 ? 2147483647 : s); + } + return (__gcc_v8qi)r; + } + case 3: + { + __gcc_v1di_s x = (__gcc_v1di_s)a, y = (__gcc_v1di_s)b, r; + for(int i = 0; i < 1; i++) + { + long long s = + (long long)((unsigned long long)x[i] + (unsigned long long)y[i]); + r[i] = + ((x[i] ^ s) & (y[i] ^ s)) < 0 + ? (x[i] < 0 ? (-9223372036854775807LL - 1) : 9223372036854775807LL) + : s; + } + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + { + int s = (int)x[i] + (int)y[i]; + r[i] = s > 255 ? 255 : s; + } + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + { + int s = (int)x[i] + (int)y[i]; + r[i] = s > 65535 ? 65535 : s; + } + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + { + long long s = (long long)x[i] + (long long)y[i]; + r[i] = s > 4294967295 ? 4294967295 : s; + } + return (__gcc_v8qi)r; + } + case 19: + { + __gcc_v1di_u x = (__gcc_v1di_u)a, y = (__gcc_v1di_u)b, r; + for(int i = 0; i < 1; i++) + { + unsigned long long s = x[i] + y[i]; + r[i] = s < x[i] ? 18446744073709551615ULL : s; + } + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vqaddq_v */ + +// Arm instruction(s): SQADD, UQADD (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef long long __gcc_v2di_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vqaddq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + { + int s = (int)x[i] + (int)y[i]; + r[i] = s < -128 ? -128 : (s > 127 ? 127 : s); + } + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + { + int s = (int)x[i] + (int)y[i]; + r[i] = s < -32768 ? -32768 : (s > 32767 ? 32767 : s); + } + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + { + long long s = (long long)x[i] + (long long)y[i]; + r[i] = s < -2147483648 ? -2147483648 : (s > 2147483647 ? 2147483647 : s); + } + return (__gcc_v16qi)r; + } + case 35: + { + __gcc_v2di_s x = (__gcc_v2di_s)a, y = (__gcc_v2di_s)b, r; + for(int i = 0; i < 2; i++) + { + long long s = + (long long)((unsigned long long)x[i] + (unsigned long long)y[i]); + r[i] = + ((x[i] ^ s) & (y[i] ^ s)) < 0 + ? (x[i] < 0 ? (-9223372036854775807LL - 1) : 9223372036854775807LL) + : s; + } + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + { + int s = (int)x[i] + (int)y[i]; + r[i] = s > 255 ? 255 : s; + } + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + { + int s = (int)x[i] + (int)y[i]; + r[i] = s > 65535 ? 65535 : s; + } + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + { + long long s = (long long)x[i] + (long long)y[i]; + r[i] = s > 4294967295 ? 4294967295 : s; + } + return (__gcc_v16qi)r; + } + case 51: + { + __gcc_v2di_u x = (__gcc_v2di_u)a, y = (__gcc_v2di_u)b, r; + for(int i = 0; i < 2; i++) + { + unsigned long long s = x[i] + y[i]; + r[i] = s < x[i] ? 18446744073709551615ULL : s; + } + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vqsub_v */ + +// Arm instruction(s): SQSUB, UQSUB (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef long long __gcc_v1di_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); +typedef unsigned long long __gcc_v1di_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vqsub_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + { + int s = (int)x[i] - (int)y[i]; + r[i] = s < -128 ? -128 : (s > 127 ? 127 : s); + } + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + { + int s = (int)x[i] - (int)y[i]; + r[i] = s < -32768 ? -32768 : (s > 32767 ? 32767 : s); + } + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + { + long long s = (long long)x[i] - (long long)y[i]; + r[i] = s < -2147483648 ? -2147483648 : (s > 2147483647 ? 2147483647 : s); + } + return (__gcc_v8qi)r; + } + case 3: + { + __gcc_v1di_s x = (__gcc_v1di_s)a, y = (__gcc_v1di_s)b, r; + for(int i = 0; i < 1; i++) + { + long long d = + (long long)((unsigned long long)x[i] - (unsigned long long)y[i]); + r[i] = + ((x[i] ^ y[i]) & (x[i] ^ d)) < 0 + ? (x[i] < 0 ? (-9223372036854775807LL - 1) : 9223372036854775807LL) + : d; + } + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : 0; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : 0; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : 0; + return (__gcc_v8qi)r; + } + case 19: + { + __gcc_v1di_u x = (__gcc_v1di_u)a, y = (__gcc_v1di_u)b, r; + for(int i = 0; i < 1; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : 0; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vqsubq_v */ + +// Arm instruction(s): SQSUB, UQSUB (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef long long __gcc_v2di_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vqsubq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + { + int s = (int)x[i] - (int)y[i]; + r[i] = s < -128 ? -128 : (s > 127 ? 127 : s); + } + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + { + int s = (int)x[i] - (int)y[i]; + r[i] = s < -32768 ? -32768 : (s > 32767 ? 32767 : s); + } + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + { + long long s = (long long)x[i] - (long long)y[i]; + r[i] = s < -2147483648 ? -2147483648 : (s > 2147483647 ? 2147483647 : s); + } + return (__gcc_v16qi)r; + } + case 35: + { + __gcc_v2di_s x = (__gcc_v2di_s)a, y = (__gcc_v2di_s)b, r; + for(int i = 0; i < 2; i++) + { + long long d = + (long long)((unsigned long long)x[i] - (unsigned long long)y[i]); + r[i] = + ((x[i] ^ y[i]) & (x[i] ^ d)) < 0 + ? (x[i] < 0 ? (-9223372036854775807LL - 1) : 9223372036854775807LL) + : d; + } + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : 0; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : 0; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : 0; + return (__gcc_v16qi)r; + } + case 51: + { + __gcc_v2di_u x = (__gcc_v2di_u)a, y = (__gcc_v2di_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = x[i] > y[i] ? x[i] - y[i] : 0; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vrhadd_v */ + +// Arm instruction(s): SRHADD, URHADD (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vrhadd_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] + (int)y[i] + 1) >> 1; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((int)x[i] + (int)y[i] + 1) >> 1; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + r[i] = ((long long)x[i] + (long long)y[i] + 1) >> 1; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] + (int)y[i] + 1) >> 1; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((int)x[i] + (int)y[i] + 1) >> 1; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = ((long long)x[i] + (long long)y[i] + 1) >> 1; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vrhaddq_v */ + +// Arm instruction(s): SRHADD, URHADD (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vrhaddq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + r[i] = ((int)x[i] + (int)y[i] + 1) >> 1; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] + (int)y[i] + 1) >> 1; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((long long)x[i] + (long long)y[i] + 1) >> 1; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + r[i] = ((int)x[i] + (int)y[i] + 1) >> 1; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = ((int)x[i] + (int)y[i] + 1) >> 1; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = ((long long)x[i] + (long long)y[i] + 1) >> 1; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vtst_v */ + +// Arm instruction(s): CMTST (per ACLE advsimd.md) + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); +typedef signed char __gcc_v8qi_s __attribute__((__vector_size__(8))); +typedef short __gcc_v4hi_s __attribute__((__vector_size__(8))); +typedef int __gcc_v2si_s __attribute__((__vector_size__(8))); +typedef long long __gcc_v1di_s __attribute__((__vector_size__(8))); +typedef unsigned char __gcc_v8qi_u __attribute__((__vector_size__(8))); +typedef unsigned short __gcc_v4hi_u __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); +typedef unsigned long long __gcc_v1di_u __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_neon_vtst_v(__gcc_v8qi a, __gcc_v8qi b, int type) +{ + switch(type) + { + case 0: + { + __gcc_v8qi_s x = (__gcc_v8qi_s)a, y = (__gcc_v8qi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v8qi)r; + } + case 1: + { + __gcc_v4hi_s x = (__gcc_v4hi_s)a, y = (__gcc_v4hi_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v8qi)r; + } + case 2: + { + __gcc_v2si_s x = (__gcc_v2si_s)a, y = (__gcc_v2si_s)b, r; + for(int i = 0; i < 2; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v8qi)r; + } + case 3: + { + __gcc_v1di_s x = (__gcc_v1di_s)a, y = (__gcc_v1di_s)b, r; + for(int i = 0; i < 1; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v8qi)r; + } + case 16: + { + __gcc_v8qi_u x = (__gcc_v8qi_u)a, y = (__gcc_v8qi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v8qi)r; + } + case 17: + { + __gcc_v4hi_u x = (__gcc_v4hi_u)a, y = (__gcc_v4hi_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v8qi)r; + } + case 18: + { + __gcc_v2si_u x = (__gcc_v2si_u)a, y = (__gcc_v2si_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v8qi)r; + } + case 19: + { + __gcc_v1di_u x = (__gcc_v1di_u)a, y = (__gcc_v1di_u)b, r; + for(int i = 0; i < 1; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v8qi)r; + } + } + + __gcc_v8qi r = {0}; + return r; +} + +/* FUNCTION: __builtin_neon_vtstq_v */ + +// Arm instruction(s): CMTST (per ACLE advsimd.md) + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); +typedef short __gcc_v8hi_s __attribute__((__vector_size__(16))); +typedef int __gcc_v4si_s __attribute__((__vector_size__(16))); +typedef long long __gcc_v2di_s __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_neon_vtstq_v(__gcc_v16qi a, __gcc_v16qi b, int type) +{ + switch(type) + { + case 32: + { + __gcc_v16qi_s x = (__gcc_v16qi_s)a, y = (__gcc_v16qi_s)b, r; + for(int i = 0; i < 16; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v16qi)r; + } + case 33: + { + __gcc_v8hi_s x = (__gcc_v8hi_s)a, y = (__gcc_v8hi_s)b, r; + for(int i = 0; i < 8; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v16qi)r; + } + case 34: + { + __gcc_v4si_s x = (__gcc_v4si_s)a, y = (__gcc_v4si_s)b, r; + for(int i = 0; i < 4; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v16qi)r; + } + case 35: + { + __gcc_v2di_s x = (__gcc_v2di_s)a, y = (__gcc_v2di_s)b, r; + for(int i = 0; i < 2; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v16qi)r; + } + case 48: + { + __gcc_v16qi_u x = (__gcc_v16qi_u)a, y = (__gcc_v16qi_u)b, r; + for(int i = 0; i < 16; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v16qi)r; + } + case 49: + { + __gcc_v8hi_u x = (__gcc_v8hi_u)a, y = (__gcc_v8hi_u)b, r; + for(int i = 0; i < 8; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v16qi)r; + } + case 50: + { + __gcc_v4si_u x = (__gcc_v4si_u)a, y = (__gcc_v4si_u)b, r; + for(int i = 0; i < 4; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v16qi)r; + } + case 51: + { + __gcc_v2di_u x = (__gcc_v2di_u)a, y = (__gcc_v2di_u)b, r; + for(int i = 0; i < 2; i++) + r[i] = (x[i] & y[i]) != 0 ? -1 : 0; + return (__gcc_v16qi)r; + } + } + + __gcc_v16qi r = {0}; + return r; +} diff --git a/src/ansi-c/library/x86_intrinsics.c b/src/ansi-c/library/x86_intrinsics.c new file mode 100644 index 00000000000..bbffaa598b5 --- /dev/null +++ b/src/ansi-c/library/x86_intrinsics.c @@ -0,0 +1,3372 @@ +// x86 SIMD intrinsic models for CBMC +// Generated by scripts/generate_intrinsic_models.py +// Models: 204 + +/* FUNCTION: __builtin_ia32_pabsb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pabsb128(__gcc_v16qi a) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] < 0 ? -a_[j] : a_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_pabsb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pabsb256(__gcc_v32qi a) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] < 0 ? -a_[j] : a_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_pabsd128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pabsd128(__gcc_v4si a) +{ + __gcc_v4si a_ = a; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] < 0 ? (int)(0u - (unsigned)a_[j]) : a_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pabsd256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pabsd256(__gcc_v8si a) +{ + __gcc_v8si a_ = a; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] < 0 ? (int)(0u - (unsigned)a_[j]) : a_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pabsw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pabsw128(__gcc_v8hi a) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] < 0 ? -a_[j] : a_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pabsw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pabsw256(__gcc_v16hi a) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] < 0 ? -a_[j] : a_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddb */ + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_ia32_paddb(__gcc_v8qi a, __gcc_v8qi b) +{ + __gcc_v8qi a_ = a; + __gcc_v8qi b_ = b; + __gcc_v8qi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] + b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_paddb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi a_ = a; + __gcc_v16qi b_ = b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] + b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_paddb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi a_ = a; + __gcc_v16qi b_ = b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_paddb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi a_ = a; + __gcc_v32qi b_ = b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] + b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_paddb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi a_ = a; + __gcc_v32qi b_ = b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_paddb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi a_ = a; + __gcc_v64qi b_ = b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddd */ + +typedef int __gcc_v2si __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v2si __builtin_ia32_paddd(__gcc_v2si a, __gcc_v2si b) +{ + __gcc_v2si_u a_ = (__gcc_v2si_u)a; + __gcc_v2si_u b_ = (__gcc_v2si_u)b; + __gcc_v2si_u dst; + for(int j = 0; j < 2; j++) + dst[j] = a_[j] + b_[j]; + return (__gcc_v2si)dst; +} + +/* FUNCTION: __builtin_ia32_paddd128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_paddd128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si_u dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] + b_[j]; + return (__gcc_v4si)dst; +} + +/* FUNCTION: __builtin_ia32_paddd128_mask */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_paddd128_mask( + __gcc_v4si a, + __gcc_v4si b, + __gcc_v4si src, + unsigned char k) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddd256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_paddd256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si_u dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] + b_[j]; + return (__gcc_v8si)dst; +} + +/* FUNCTION: __builtin_ia32_paddd256_mask */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_paddd256_mask( + __gcc_v8si a, + __gcc_v8si b, + __gcc_v8si src, + unsigned char k) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddd512_mask */ + +typedef int __gcc_v16si __attribute__((__vector_size__(64))); +typedef unsigned int __gcc_v16si_u __attribute__((__vector_size__(64))); + +__gcc_v16si __builtin_ia32_paddd512_mask( + __gcc_v16si a, + __gcc_v16si b, + __gcc_v16si src, + unsigned short k) +{ + __gcc_v16si_u a_ = (__gcc_v16si_u)a; + __gcc_v16si_u b_ = (__gcc_v16si_u)b; + __gcc_v16si dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddq128 */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_paddq128(__gcc_v2di a, __gcc_v2di b) +{ + __gcc_v2di_u a_ = (__gcc_v2di_u)a; + __gcc_v2di_u b_ = (__gcc_v2di_u)b; + __gcc_v2di_u dst; + for(int j = 0; j < 2; j++) + dst[j] = a_[j] + b_[j]; + return (__gcc_v2di)dst; +} + +/* FUNCTION: __builtin_ia32_paddq128_mask */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_paddq128_mask( + __gcc_v2di a, + __gcc_v2di b, + __gcc_v2di src, + unsigned char k) +{ + __gcc_v2di_u a_ = (__gcc_v2di_u)a; + __gcc_v2di_u b_ = (__gcc_v2di_u)b; + __gcc_v2di dst; + for(int j = 0; j < 2; j++) + dst[j] = (k >> j) & 1 ? (long long)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddq256 */ + +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); + +__gcc_v4di __builtin_ia32_paddq256(__gcc_v4di a, __gcc_v4di b) +{ + __gcc_v4di_u a_ = (__gcc_v4di_u)a; + __gcc_v4di_u b_ = (__gcc_v4di_u)b; + __gcc_v4di_u dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] + b_[j]; + return (__gcc_v4di)dst; +} + +/* FUNCTION: __builtin_ia32_paddq256_mask */ + +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); + +__gcc_v4di __builtin_ia32_paddq256_mask( + __gcc_v4di a, + __gcc_v4di b, + __gcc_v4di src, + unsigned char k) +{ + __gcc_v4di_u a_ = (__gcc_v4di_u)a; + __gcc_v4di_u b_ = (__gcc_v4di_u)b; + __gcc_v4di dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (long long)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddq512_mask */ + +typedef long long __gcc_v8di __attribute__((__vector_size__(64))); +typedef unsigned long long __gcc_v8di_u __attribute__((__vector_size__(64))); + +__gcc_v8di __builtin_ia32_paddq512_mask( + __gcc_v8di a, + __gcc_v8di b, + __gcc_v8di src, + unsigned char k) +{ + __gcc_v8di_u a_ = (__gcc_v8di_u)a; + __gcc_v8di_u b_ = (__gcc_v8di_u)b; + __gcc_v8di dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (long long)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddsb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_paddsb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi_s dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] + b_[j]) < -128 ? -128 + : (a_[j] + b_[j]) > 127 ? 127 + : a_[j] + b_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_paddsb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_paddsb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j]) < -128 ? -128 : (a_[j] + b_[j]) > 127 ? 127 : a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddsb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_paddsb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi_s dst; + for(int j = 0; j < 32; j++) + dst[j] = (a_[j] + b_[j]) < -128 ? -128 + : (a_[j] + b_[j]) > 127 ? 127 + : a_[j] + b_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_paddsb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_paddsb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j]) < -128 ? -128 : (a_[j] + b_[j]) > 127 ? 127 : a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddsb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef signed char __gcc_v64qi_s __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_paddsb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_s a_ = (__gcc_v64qi_s)a; + __gcc_v64qi_s b_ = (__gcc_v64qi_s)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j]) < -128 ? -128 : (a_[j] + b_[j]) > 127 ? 127 : a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddsw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_paddsw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (a_[j] + b_[j]) < -32768 ? -32768 + : (a_[j] + b_[j]) > 32767 ? 32767 + : a_[j] + b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddsw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_paddsw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] + b_[j]) < -32768 ? -32768 : (a_[j] + b_[j]) > 32767 ? 32767 : a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddsw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_paddsw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] + b_[j]) < -32768 ? -32768 + : (a_[j] + b_[j]) > 32767 ? 32767 + : a_[j] + b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddsw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_paddsw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] + b_[j]) < -32768 ? -32768 : (a_[j] + b_[j]) > 32767 ? 32767 : a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddsw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_paddsw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi a_ = a; + __gcc_v32hi b_ = b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] + b_[j]) < -32768 ? -32768 : (a_[j] + b_[j]) > 32767 ? 32767 : a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddusb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_paddusb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] + b_[j]) > 255 ? 255 : a_[j] + b_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_paddusb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_paddusb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j]) > 255 ? 255 : a_[j] + b_[j]) + : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddusb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_paddusb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi_u dst; + for(int j = 0; j < 32; j++) + dst[j] = (a_[j] + b_[j]) > 255 ? 255 : a_[j] + b_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_paddusb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_paddusb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j]) > 255 ? 255 : a_[j] + b_[j]) + : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddusb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef unsigned char __gcc_v64qi_u __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_paddusb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_u a_ = (__gcc_v64qi_u)a; + __gcc_v64qi_u b_ = (__gcc_v64qi_u)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j]) > 255 ? 255 : a_[j] + b_[j]) + : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddusw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_paddusw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi_u dst; + for(int j = 0; j < 8; j++) + dst[j] = (a_[j] + b_[j]) > 65535 ? 65535 : a_[j] + b_[j]; + return (__gcc_v8hi)dst; +} + +/* FUNCTION: __builtin_ia32_paddusw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_paddusw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 + ? (short)((a_[j] + b_[j]) > 65535 ? 65535 : a_[j] + b_[j]) + : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddusw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_paddusw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] + b_[j]) > 65535 ? 65535 : a_[j] + b_[j]; + return (__gcc_v16hi)dst; +} + +/* FUNCTION: __builtin_ia32_paddusw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_paddusw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 + ? (short)((a_[j] + b_[j]) > 65535 ? 65535 : a_[j] + b_[j]) + : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddusw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); +typedef unsigned short __gcc_v32hi_u __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_paddusw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi_u a_ = (__gcc_v32hi_u)a; + __gcc_v32hi_u b_ = (__gcc_v32hi_u)b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 + ? (short)((a_[j] + b_[j]) > 65535 ? 65535 : a_[j] + b_[j]) + : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddw */ + +typedef short __gcc_v4hi __attribute__((__vector_size__(8))); + +__gcc_v4hi __builtin_ia32_paddw(__gcc_v4hi a, __gcc_v4hi b) +{ + __gcc_v4hi a_ = a; + __gcc_v4hi b_ = b; + __gcc_v4hi dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] + b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_paddw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] + b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_paddw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_paddw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] + b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_paddw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_paddw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_paddw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi a_ = a; + __gcc_v32hi b_ = b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] + b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pand128 */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_pand128(__gcc_v2di a, __gcc_v2di b) +{ + __gcc_v2di a_ = a; + __gcc_v2di b_ = b; + __gcc_v2di dst; + for(int j = 0; j < 2; j++) + dst[j] = a_[j] & b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pandn128 */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_pandn128(__gcc_v2di a, __gcc_v2di b) +{ + __gcc_v2di a_ = a; + __gcc_v2di b_ = b; + __gcc_v2di dst; + for(int j = 0; j < 2; j++) + dst[j] = ~a_[j] & b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pavgb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pavgb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] + b_[j] + 1) >> 1; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_pavgb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pavgb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j] + 1) >> 1) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pavgb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pavgb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi_u dst; + for(int j = 0; j < 32; j++) + dst[j] = (a_[j] + b_[j] + 1) >> 1; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_pavgb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pavgb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j] + 1) >> 1) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pavgb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef unsigned char __gcc_v64qi_u __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_pavgb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_u a_ = (__gcc_v64qi_u)a; + __gcc_v64qi_u b_ = (__gcc_v64qi_u)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] + b_[j] + 1) >> 1) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pavgw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pavgw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi_u dst; + for(int j = 0; j < 8; j++) + dst[j] = (a_[j] + b_[j] + 1) >> 1; + return (__gcc_v8hi)dst; +} + +/* FUNCTION: __builtin_ia32_pavgw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pavgw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] + b_[j] + 1) >> 1) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pavgw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pavgw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] + b_[j] + 1) >> 1; + return (__gcc_v16hi)dst; +} + +/* FUNCTION: __builtin_ia32_pavgw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pavgw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] + b_[j] + 1) >> 1) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pavgw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); +typedef unsigned short __gcc_v32hi_u __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_pavgw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi_u a_ = (__gcc_v32hi_u)a; + __gcc_v32hi_u b_ = (__gcc_v32hi_u)b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] + b_[j] + 1) >> 1) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpeqb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pcmpeqb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi a_ = a; + __gcc_v16qi b_ = b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] == b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpeqb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pcmpeqb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi a_ = a; + __gcc_v32qi b_ = b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] == b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpeqd128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pcmpeqd128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si a_ = a; + __gcc_v4si b_ = b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] == b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpeqd256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pcmpeqd256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si a_ = a; + __gcc_v8si b_ = b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] == b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpeqw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pcmpeqw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] == b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpeqw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pcmpeqw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] == b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpgtb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pcmpgtb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi_s dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] > b_[j] ? -1 : 0; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_pcmpgtb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pcmpgtb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi_s dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] > b_[j] ? -1 : 0; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_pcmpgtd128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pcmpgtd128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si a_ = a; + __gcc_v4si b_ = b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] > b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpgtd256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pcmpgtd256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si a_ = a; + __gcc_v8si b_ = b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] > b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpgtw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pcmpgtw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] > b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pcmpgtw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pcmpgtw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] > b_[j] ? -1 : 0; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pmaxsb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi_s dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pmaxsb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pmaxsb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi_s dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pmaxsb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef signed char __gcc_v64qi_s __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_pmaxsb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_s a_ = (__gcc_v64qi_s)a; + __gcc_v64qi_s b_ = (__gcc_v64qi_s)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsd128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pmaxsd128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si a_ = a; + __gcc_v4si b_ = b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsd128_mask */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pmaxsd128_mask( + __gcc_v4si a, + __gcc_v4si b, + __gcc_v4si src, + unsigned char k) +{ + __gcc_v4si a_ = a; + __gcc_v4si b_ = b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsd256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pmaxsd256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si a_ = a; + __gcc_v8si b_ = b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsd256_mask */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pmaxsd256_mask( + __gcc_v8si a, + __gcc_v8si b, + __gcc_v8si src, + unsigned char k) +{ + __gcc_v8si a_ = a; + __gcc_v8si b_ = b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsd512_mask */ + +typedef int __gcc_v16si __attribute__((__vector_size__(64))); + +__gcc_v16si __builtin_ia32_pmaxsd512_mask( + __gcc_v16si a, + __gcc_v16si b, + __gcc_v16si src, + unsigned short k) +{ + __gcc_v16si a_ = a; + __gcc_v16si b_ = b; + __gcc_v16si dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pmaxsw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pmaxsw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pmaxsw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pmaxsw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxsw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_pmaxsw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi a_ = a; + __gcc_v32hi b_ = b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxub128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pmaxub128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_pmaxub128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pmaxub128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxub256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pmaxub256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi_u dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_pmaxub256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pmaxub256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxub512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef unsigned char __gcc_v64qi_u __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_pmaxub512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_u a_ = (__gcc_v64qi_u)a; + __gcc_v64qi_u b_ = (__gcc_v64qi_u)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxud128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pmaxud128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si_u dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return (__gcc_v4si)dst; +} + +/* FUNCTION: __builtin_ia32_pmaxud128_mask */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pmaxud128_mask( + __gcc_v4si a, + __gcc_v4si b, + __gcc_v4si src, + unsigned char k) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxud256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pmaxud256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si_u dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return (__gcc_v8si)dst; +} + +/* FUNCTION: __builtin_ia32_pmaxud256_mask */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pmaxud256_mask( + __gcc_v8si a, + __gcc_v8si b, + __gcc_v8si src, + unsigned char k) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxud512_mask */ + +typedef int __gcc_v16si __attribute__((__vector_size__(64))); +typedef unsigned int __gcc_v16si_u __attribute__((__vector_size__(64))); + +__gcc_v16si __builtin_ia32_pmaxud512_mask( + __gcc_v16si a, + __gcc_v16si b, + __gcc_v16si src, + unsigned short k) +{ + __gcc_v16si_u a_ = (__gcc_v16si_u)a; + __gcc_v16si_u b_ = (__gcc_v16si_u)b; + __gcc_v16si dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxuw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pmaxuw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi_u dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return (__gcc_v8hi)dst; +} + +/* FUNCTION: __builtin_ia32_pmaxuw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pmaxuw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxuw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pmaxuw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] > b_[j] ? a_[j] : b_[j]; + return (__gcc_v16hi)dst; +} + +/* FUNCTION: __builtin_ia32_pmaxuw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pmaxuw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmaxuw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); +typedef unsigned short __gcc_v32hi_u __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_pmaxuw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi_u a_ = (__gcc_v32hi_u)a; + __gcc_v32hi_u b_ = (__gcc_v32hi_u)b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] > b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pminsb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi_s dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_pminsb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pminsb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pminsb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi_s dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_pminsb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pminsb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef signed char __gcc_v64qi_s __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_pminsb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_s a_ = (__gcc_v64qi_s)a; + __gcc_v64qi_s b_ = (__gcc_v64qi_s)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsd128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pminsd128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si a_ = a; + __gcc_v4si b_ = b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsd128_mask */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pminsd128_mask( + __gcc_v4si a, + __gcc_v4si b, + __gcc_v4si src, + unsigned char k) +{ + __gcc_v4si a_ = a; + __gcc_v4si b_ = b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsd256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pminsd256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si a_ = a; + __gcc_v8si b_ = b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsd256_mask */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pminsd256_mask( + __gcc_v8si a, + __gcc_v8si b, + __gcc_v8si src, + unsigned char k) +{ + __gcc_v8si a_ = a; + __gcc_v8si b_ = b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsd512_mask */ + +typedef int __gcc_v16si __attribute__((__vector_size__(64))); + +__gcc_v16si __builtin_ia32_pminsd512_mask( + __gcc_v16si a, + __gcc_v16si b, + __gcc_v16si src, + unsigned short k) +{ + __gcc_v16si a_ = a; + __gcc_v16si b_ = b; + __gcc_v16si dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pminsw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pminsw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pminsw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pminsw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminsw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_pminsw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi a_ = a; + __gcc_v32hi b_ = b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminub128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pminub128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_pminub128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_pminub128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminub256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pminub256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi_u dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_pminub256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_pminub256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminub512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef unsigned char __gcc_v64qi_u __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_pminub512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_u a_ = (__gcc_v64qi_u)a; + __gcc_v64qi_u b_ = (__gcc_v64qi_u)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminud128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pminud128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si_u dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return (__gcc_v4si)dst; +} + +/* FUNCTION: __builtin_ia32_pminud128_mask */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pminud128_mask( + __gcc_v4si a, + __gcc_v4si b, + __gcc_v4si src, + unsigned char k) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminud256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pminud256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si_u dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return (__gcc_v8si)dst; +} + +/* FUNCTION: __builtin_ia32_pminud256_mask */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pminud256_mask( + __gcc_v8si a, + __gcc_v8si b, + __gcc_v8si src, + unsigned char k) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminud512_mask */ + +typedef int __gcc_v16si __attribute__((__vector_size__(64))); +typedef unsigned int __gcc_v16si_u __attribute__((__vector_size__(64))); + +__gcc_v16si __builtin_ia32_pminud512_mask( + __gcc_v16si a, + __gcc_v16si b, + __gcc_v16si src, + unsigned short k) +{ + __gcc_v16si_u a_ = (__gcc_v16si_u)a; + __gcc_v16si_u b_ = (__gcc_v16si_u)b; + __gcc_v16si dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminuw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pminuw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi_u dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return (__gcc_v8hi)dst; +} + +/* FUNCTION: __builtin_ia32_pminuw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pminuw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminuw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pminuw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] < b_[j] ? a_[j] : b_[j]; + return (__gcc_v16hi)dst; +} + +/* FUNCTION: __builtin_ia32_pminuw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pminuw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pminuw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); +typedef unsigned short __gcc_v32hi_u __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_pminuw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi_u a_ = (__gcc_v32hi_u)a; + __gcc_v32hi_u b_ = (__gcc_v32hi_u)b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] < b_[j] ? a_[j] : b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmulld128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pmulld128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si_u dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] * b_[j]; + return (__gcc_v4si)dst; +} + +/* FUNCTION: __builtin_ia32_pmulld128_mask */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pmulld128_mask( + __gcc_v4si a, + __gcc_v4si b, + __gcc_v4si src, + unsigned char k) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] * b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmulld256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pmulld256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si_u dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] * b_[j]; + return (__gcc_v8si)dst; +} + +/* FUNCTION: __builtin_ia32_pmulld256_mask */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pmulld256_mask( + __gcc_v8si a, + __gcc_v8si b, + __gcc_v8si src, + unsigned char k) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] * b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmulld512_mask */ + +typedef int __gcc_v16si __attribute__((__vector_size__(64))); +typedef unsigned int __gcc_v16si_u __attribute__((__vector_size__(64))); + +__gcc_v16si __builtin_ia32_pmulld512_mask( + __gcc_v16si a, + __gcc_v16si b, + __gcc_v16si src, + unsigned short k) +{ + __gcc_v16si_u a_ = (__gcc_v16si_u)a; + __gcc_v16si_u b_ = (__gcc_v16si_u)b; + __gcc_v16si dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] * b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmullw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pmullw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] * b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmullw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_pmullw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] * b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmullw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pmullw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] * b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmullw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_pmullw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] * b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pmullw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_pmullw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi a_ = a; + __gcc_v32hi b_ = b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] * b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_por128 */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_por128(__gcc_v2di a, __gcc_v2di b) +{ + __gcc_v2di a_ = a; + __gcc_v2di b_ = b; + __gcc_v2di dst; + for(int j = 0; j < 2; j++) + dst[j] = a_[j] | b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_por256 */ + +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); + +__gcc_v4di __builtin_ia32_por256(__gcc_v4di a, __gcc_v4di b) +{ + __gcc_v4di a_ = a; + __gcc_v4di b_ = b; + __gcc_v4di dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] | b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pslldi128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_pslldi128(__gcc_v4si a, int b) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u dst; + for(int j = 0; j < 4; j++) + dst[j] = (unsigned)b >= 32 ? 0 : a_[j] << b; + return (__gcc_v4si)dst; +} + +/* FUNCTION: __builtin_ia32_pslldi256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_pslldi256(__gcc_v8si a, int b) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u dst; + for(int j = 0; j < 8; j++) + dst[j] = (unsigned)b >= 32 ? 0 : a_[j] << b; + return (__gcc_v8si)dst; +} + +/* FUNCTION: __builtin_ia32_psllqi128 */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_psllqi128(__gcc_v2di a, int b) +{ + __gcc_v2di_u a_ = (__gcc_v2di_u)a; + __gcc_v2di_u dst; + for(int j = 0; j < 2; j++) + dst[j] = (unsigned)b >= 64 ? 0 : a_[j] << b; + return (__gcc_v2di)dst; +} + +/* FUNCTION: __builtin_ia32_psllqi256 */ + +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); + +__gcc_v4di __builtin_ia32_psllqi256(__gcc_v4di a, int b) +{ + __gcc_v4di_u a_ = (__gcc_v4di_u)a; + __gcc_v4di_u dst; + for(int j = 0; j < 4; j++) + dst[j] = (unsigned)b >= 64 ? 0 : a_[j] << b; + return (__gcc_v4di)dst; +} + +/* FUNCTION: __builtin_ia32_psllwi128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_psllwi128(__gcc_v8hi a, int b) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u dst; + for(int j = 0; j < 8; j++) + dst[j] = (unsigned)b >= 16 ? 0 : a_[j] << b; + return (__gcc_v8hi)dst; +} + +/* FUNCTION: __builtin_ia32_psllwi256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psllwi256(__gcc_v16hi a, int b) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = (unsigned)b >= 16 ? 0 : a_[j] << b; + return (__gcc_v16hi)dst; +} + +/* FUNCTION: __builtin_ia32_psradi128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_psradi128(__gcc_v4si a, int b) +{ + __gcc_v4si a_ = a; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = (unsigned)b >= 32 ? (a_[j] < 0 ? -1 : 0) : a_[j] >> b; + return dst; +} + +/* FUNCTION: __builtin_ia32_psradi256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_psradi256(__gcc_v8si a, int b) +{ + __gcc_v8si a_ = a; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = (unsigned)b >= 32 ? (a_[j] < 0 ? -1 : 0) : a_[j] >> b; + return dst; +} + +/* FUNCTION: __builtin_ia32_psrawi128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_psrawi128(__gcc_v8hi a, int b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (unsigned)b >= 16 ? (a_[j] < 0 ? -1 : 0) : a_[j] >> b; + return dst; +} + +/* FUNCTION: __builtin_ia32_psrawi256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psrawi256(__gcc_v16hi a, int b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (unsigned)b >= 16 ? (a_[j] < 0 ? -1 : 0) : a_[j] >> b; + return dst; +} + +/* FUNCTION: __builtin_ia32_psrldi128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_psrldi128(__gcc_v4si a, int b) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u dst; + for(int j = 0; j < 4; j++) + dst[j] = (unsigned)b >= 32 ? 0 : a_[j] >> b; + return (__gcc_v4si)dst; +} + +/* FUNCTION: __builtin_ia32_psrldi256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_psrldi256(__gcc_v8si a, int b) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u dst; + for(int j = 0; j < 8; j++) + dst[j] = (unsigned)b >= 32 ? 0 : a_[j] >> b; + return (__gcc_v8si)dst; +} + +/* FUNCTION: __builtin_ia32_psrlqi128 */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_psrlqi128(__gcc_v2di a, int b) +{ + __gcc_v2di_u a_ = (__gcc_v2di_u)a; + __gcc_v2di_u dst; + for(int j = 0; j < 2; j++) + dst[j] = (unsigned)b >= 64 ? 0 : a_[j] >> b; + return (__gcc_v2di)dst; +} + +/* FUNCTION: __builtin_ia32_psrlqi256 */ + +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); + +__gcc_v4di __builtin_ia32_psrlqi256(__gcc_v4di a, int b) +{ + __gcc_v4di_u a_ = (__gcc_v4di_u)a; + __gcc_v4di_u dst; + for(int j = 0; j < 4; j++) + dst[j] = (unsigned)b >= 64 ? 0 : a_[j] >> b; + return (__gcc_v4di)dst; +} + +/* FUNCTION: __builtin_ia32_psrlwi128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_psrlwi128(__gcc_v8hi a, int b) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u dst; + for(int j = 0; j < 8; j++) + dst[j] = (unsigned)b >= 16 ? 0 : a_[j] >> b; + return (__gcc_v8hi)dst; +} + +/* FUNCTION: __builtin_ia32_psrlwi256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psrlwi256(__gcc_v16hi a, int b) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = (unsigned)b >= 16 ? 0 : a_[j] >> b; + return (__gcc_v16hi)dst; +} + +/* FUNCTION: __builtin_ia32_psubb */ + +typedef char __gcc_v8qi __attribute__((__vector_size__(8))); + +__gcc_v8qi __builtin_ia32_psubb(__gcc_v8qi a, __gcc_v8qi b) +{ + __gcc_v8qi a_ = a; + __gcc_v8qi b_ = b; + __gcc_v8qi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] - b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_psubb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi a_ = a; + __gcc_v16qi b_ = b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] - b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_psubb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi a_ = a; + __gcc_v16qi b_ = b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_psubb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi a_ = a; + __gcc_v32qi b_ = b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = a_[j] - b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_psubb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi a_ = a; + __gcc_v32qi b_ = b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_psubb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi a_ = a; + __gcc_v64qi b_ = b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubd */ + +typedef int __gcc_v2si __attribute__((__vector_size__(8))); +typedef unsigned int __gcc_v2si_u __attribute__((__vector_size__(8))); + +__gcc_v2si __builtin_ia32_psubd(__gcc_v2si a, __gcc_v2si b) +{ + __gcc_v2si_u a_ = (__gcc_v2si_u)a; + __gcc_v2si_u b_ = (__gcc_v2si_u)b; + __gcc_v2si_u dst; + for(int j = 0; j < 2; j++) + dst[j] = a_[j] - b_[j]; + return (__gcc_v2si)dst; +} + +/* FUNCTION: __builtin_ia32_psubd128 */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_psubd128(__gcc_v4si a, __gcc_v4si b) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si_u dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] - b_[j]; + return (__gcc_v4si)dst; +} + +/* FUNCTION: __builtin_ia32_psubd128_mask */ + +typedef int __gcc_v4si __attribute__((__vector_size__(16))); +typedef unsigned int __gcc_v4si_u __attribute__((__vector_size__(16))); + +__gcc_v4si __builtin_ia32_psubd128_mask( + __gcc_v4si a, + __gcc_v4si b, + __gcc_v4si src, + unsigned char k) +{ + __gcc_v4si_u a_ = (__gcc_v4si_u)a; + __gcc_v4si_u b_ = (__gcc_v4si_u)b; + __gcc_v4si dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubd256 */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_psubd256(__gcc_v8si a, __gcc_v8si b) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si_u dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] - b_[j]; + return (__gcc_v8si)dst; +} + +/* FUNCTION: __builtin_ia32_psubd256_mask */ + +typedef int __gcc_v8si __attribute__((__vector_size__(32))); +typedef unsigned int __gcc_v8si_u __attribute__((__vector_size__(32))); + +__gcc_v8si __builtin_ia32_psubd256_mask( + __gcc_v8si a, + __gcc_v8si b, + __gcc_v8si src, + unsigned char k) +{ + __gcc_v8si_u a_ = (__gcc_v8si_u)a; + __gcc_v8si_u b_ = (__gcc_v8si_u)b; + __gcc_v8si dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubd512_mask */ + +typedef int __gcc_v16si __attribute__((__vector_size__(64))); +typedef unsigned int __gcc_v16si_u __attribute__((__vector_size__(64))); + +__gcc_v16si __builtin_ia32_psubd512_mask( + __gcc_v16si a, + __gcc_v16si b, + __gcc_v16si src, + unsigned short k) +{ + __gcc_v16si_u a_ = (__gcc_v16si_u)a; + __gcc_v16si_u b_ = (__gcc_v16si_u)b; + __gcc_v16si dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (int)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubq128 */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_psubq128(__gcc_v2di a, __gcc_v2di b) +{ + __gcc_v2di_u a_ = (__gcc_v2di_u)a; + __gcc_v2di_u b_ = (__gcc_v2di_u)b; + __gcc_v2di_u dst; + for(int j = 0; j < 2; j++) + dst[j] = a_[j] - b_[j]; + return (__gcc_v2di)dst; +} + +/* FUNCTION: __builtin_ia32_psubq128_mask */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); +typedef unsigned long long __gcc_v2di_u __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_psubq128_mask( + __gcc_v2di a, + __gcc_v2di b, + __gcc_v2di src, + unsigned char k) +{ + __gcc_v2di_u a_ = (__gcc_v2di_u)a; + __gcc_v2di_u b_ = (__gcc_v2di_u)b; + __gcc_v2di dst; + for(int j = 0; j < 2; j++) + dst[j] = (k >> j) & 1 ? (long long)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubq256 */ + +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); + +__gcc_v4di __builtin_ia32_psubq256(__gcc_v4di a, __gcc_v4di b) +{ + __gcc_v4di_u a_ = (__gcc_v4di_u)a; + __gcc_v4di_u b_ = (__gcc_v4di_u)b; + __gcc_v4di_u dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] - b_[j]; + return (__gcc_v4di)dst; +} + +/* FUNCTION: __builtin_ia32_psubq256_mask */ + +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); +typedef unsigned long long __gcc_v4di_u __attribute__((__vector_size__(32))); + +__gcc_v4di __builtin_ia32_psubq256_mask( + __gcc_v4di a, + __gcc_v4di b, + __gcc_v4di src, + unsigned char k) +{ + __gcc_v4di_u a_ = (__gcc_v4di_u)a; + __gcc_v4di_u b_ = (__gcc_v4di_u)b; + __gcc_v4di dst; + for(int j = 0; j < 4; j++) + dst[j] = (k >> j) & 1 ? (long long)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubq512_mask */ + +typedef long long __gcc_v8di __attribute__((__vector_size__(64))); +typedef unsigned long long __gcc_v8di_u __attribute__((__vector_size__(64))); + +__gcc_v8di __builtin_ia32_psubq512_mask( + __gcc_v8di a, + __gcc_v8di b, + __gcc_v8di src, + unsigned char k) +{ + __gcc_v8di_u a_ = (__gcc_v8di_u)a; + __gcc_v8di_u b_ = (__gcc_v8di_u)b; + __gcc_v8di dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (long long)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubsb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_psubsb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi_s dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] - b_[j]) < -128 ? -128 + : (a_[j] - b_[j]) > 127 ? 127 + : a_[j] - b_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_psubsb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef signed char __gcc_v16qi_s __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_psubsb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_s a_ = (__gcc_v16qi_s)a; + __gcc_v16qi_s b_ = (__gcc_v16qi_s)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] - b_[j]) < -128 ? -128 : (a_[j] - b_[j]) > 127 ? 127 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubsb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_psubsb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi_s dst; + for(int j = 0; j < 32; j++) + dst[j] = (a_[j] - b_[j]) < -128 ? -128 + : (a_[j] - b_[j]) > 127 ? 127 + : a_[j] - b_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_psubsb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef signed char __gcc_v32qi_s __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_psubsb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_s a_ = (__gcc_v32qi_s)a; + __gcc_v32qi_s b_ = (__gcc_v32qi_s)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] - b_[j]) < -128 ? -128 : (a_[j] - b_[j]) > 127 ? 127 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubsb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef signed char __gcc_v64qi_s __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_psubsb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_s a_ = (__gcc_v64qi_s)a; + __gcc_v64qi_s b_ = (__gcc_v64qi_s)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = (k >> j) & 1 ? (char)((a_[j] - b_[j]) < -128 ? -128 : (a_[j] - b_[j]) > 127 ? 127 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubsw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_psubsw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] - b_[j]) < -32768 ? -32768 : (a_[j] - b_[j]) > 32767 ? 32767 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubsw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psubsw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] - b_[j]) < -32768 ? -32768 + : (a_[j] - b_[j]) > 32767 ? 32767 + : a_[j] - b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubsw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psubsw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] - b_[j]) < -32768 ? -32768 : (a_[j] - b_[j]) > 32767 ? 32767 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubsw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_psubsw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi a_ = a; + __gcc_v32hi b_ = b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)((a_[j] - b_[j]) < -32768 ? -32768 : (a_[j] - b_[j]) > 32767 ? 32767 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubusb128 */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_psubusb128(__gcc_v16qi a, __gcc_v16qi b) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]; + return (__gcc_v16qi)dst; +} + +/* FUNCTION: __builtin_ia32_psubusb128_mask */ + +typedef char __gcc_v16qi __attribute__((__vector_size__(16))); +typedef unsigned char __gcc_v16qi_u __attribute__((__vector_size__(16))); + +__gcc_v16qi __builtin_ia32_psubusb128_mask( + __gcc_v16qi a, + __gcc_v16qi b, + __gcc_v16qi src, + unsigned short k) +{ + __gcc_v16qi_u a_ = (__gcc_v16qi_u)a; + __gcc_v16qi_u b_ = (__gcc_v16qi_u)b; + __gcc_v16qi dst; + for(int j = 0; j < 16; j++) + dst[j] = + (k >> j) & 1 ? (char)((a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubusb256 */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_psubusb256(__gcc_v32qi a, __gcc_v32qi b) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi_u dst; + for(int j = 0; j < 32; j++) + dst[j] = (a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]; + return (__gcc_v32qi)dst; +} + +/* FUNCTION: __builtin_ia32_psubusb256_mask */ + +typedef char __gcc_v32qi __attribute__((__vector_size__(32))); +typedef unsigned char __gcc_v32qi_u __attribute__((__vector_size__(32))); + +__gcc_v32qi __builtin_ia32_psubusb256_mask( + __gcc_v32qi a, + __gcc_v32qi b, + __gcc_v32qi src, + unsigned int k) +{ + __gcc_v32qi_u a_ = (__gcc_v32qi_u)a; + __gcc_v32qi_u b_ = (__gcc_v32qi_u)b; + __gcc_v32qi dst; + for(int j = 0; j < 32; j++) + dst[j] = + (k >> j) & 1 ? (char)((a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubusb512_mask */ + +typedef char __gcc_v64qi __attribute__((__vector_size__(64))); +typedef unsigned char __gcc_v64qi_u __attribute__((__vector_size__(64))); + +__gcc_v64qi __builtin_ia32_psubusb512_mask( + __gcc_v64qi a, + __gcc_v64qi b, + __gcc_v64qi src, + unsigned long long k) +{ + __gcc_v64qi_u a_ = (__gcc_v64qi_u)a; + __gcc_v64qi_u b_ = (__gcc_v64qi_u)b; + __gcc_v64qi dst; + for(int j = 0; j < 64; j++) + dst[j] = + (k >> j) & 1 ? (char)((a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubusw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); +typedef unsigned short __gcc_v8hi_u __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_psubusw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi_u a_ = (__gcc_v8hi_u)a; + __gcc_v8hi_u b_ = (__gcc_v8hi_u)b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = + (k >> j) & 1 ? (short)((a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubusw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psubusw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi_u dst; + for(int j = 0; j < 16; j++) + dst[j] = (a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]; + return (__gcc_v16hi)dst; +} + +/* FUNCTION: __builtin_ia32_psubusw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); +typedef unsigned short __gcc_v16hi_u __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psubusw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi_u a_ = (__gcc_v16hi_u)a; + __gcc_v16hi_u b_ = (__gcc_v16hi_u)b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = + (k >> j) & 1 ? (short)((a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubusw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); +typedef unsigned short __gcc_v32hi_u __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_psubusw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi_u a_ = (__gcc_v32hi_u)a; + __gcc_v32hi_u b_ = (__gcc_v32hi_u)b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = + (k >> j) & 1 ? (short)((a_[j] - b_[j]) < 0 ? 0 : a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubw */ + +typedef short __gcc_v4hi __attribute__((__vector_size__(8))); + +__gcc_v4hi __builtin_ia32_psubw(__gcc_v4hi a, __gcc_v4hi b) +{ + __gcc_v4hi a_ = a; + __gcc_v4hi b_ = b; + __gcc_v4hi dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] - b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubw128 */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_psubw128(__gcc_v8hi a, __gcc_v8hi b) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = a_[j] - b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubw128_mask */ + +typedef short __gcc_v8hi __attribute__((__vector_size__(16))); + +__gcc_v8hi __builtin_ia32_psubw128_mask( + __gcc_v8hi a, + __gcc_v8hi b, + __gcc_v8hi src, + unsigned char k) +{ + __gcc_v8hi a_ = a; + __gcc_v8hi b_ = b; + __gcc_v8hi dst; + for(int j = 0; j < 8; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubw256 */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psubw256(__gcc_v16hi a, __gcc_v16hi b) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = a_[j] - b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubw256_mask */ + +typedef short __gcc_v16hi __attribute__((__vector_size__(32))); + +__gcc_v16hi __builtin_ia32_psubw256_mask( + __gcc_v16hi a, + __gcc_v16hi b, + __gcc_v16hi src, + unsigned short k) +{ + __gcc_v16hi a_ = a; + __gcc_v16hi b_ = b; + __gcc_v16hi dst; + for(int j = 0; j < 16; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_psubw512_mask */ + +typedef short __gcc_v32hi __attribute__((__vector_size__(64))); + +__gcc_v32hi __builtin_ia32_psubw512_mask( + __gcc_v32hi a, + __gcc_v32hi b, + __gcc_v32hi src, + unsigned int k) +{ + __gcc_v32hi a_ = a; + __gcc_v32hi b_ = b; + __gcc_v32hi dst; + for(int j = 0; j < 32; j++) + dst[j] = (k >> j) & 1 ? (short)(a_[j] - b_[j]) : src[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pxor128 */ + +typedef long long __gcc_v2di __attribute__((__vector_size__(16))); + +__gcc_v2di __builtin_ia32_pxor128(__gcc_v2di a, __gcc_v2di b) +{ + __gcc_v2di a_ = a; + __gcc_v2di b_ = b; + __gcc_v2di dst; + for(int j = 0; j < 2; j++) + dst[j] = a_[j] ^ b_[j]; + return dst; +} + +/* FUNCTION: __builtin_ia32_pxor256 */ + +typedef long long __gcc_v4di __attribute__((__vector_size__(32))); + +__gcc_v4di __builtin_ia32_pxor256(__gcc_v4di a, __gcc_v4di b) +{ + __gcc_v4di a_ = a; + __gcc_v4di b_ = b; + __gcc_v4di dst; + for(int j = 0; j < 4; j++) + dst[j] = a_[j] ^ b_[j]; + return dst; +} diff --git a/src/ansi-c/library_check.sh b/src/ansi-c/library_check.sh index 6883984ae26..0c9202b2194 100755 --- a/src/ansi-c/library_check.sh +++ b/src/ansi-c/library_check.sh @@ -101,7 +101,25 @@ perl -p -i -e 's/^_mm_setr_epi(16|32)\n//' __functions # cbmc/SIMD1 perl -p -i -e 's/^_mm_setr_pi16\n//' __functions # cbmc/SIMD1 perl -p -i -e 's/^_mm_subs_ep[iu]16\n//' __functions # cbmc/SIMD1 -ls ../../regression/cbmc-library/ | egrep -v '(Makefile|CMakeLists.txt)' | sort -u > __tests +# Functions exercised by the aggregate regression/cbmc/SIMD* smoke tests are +# covered there rather than by an individual cbmc-library test; treat them as +# exempt. +grep -rhoE '__builtin_(ia32|neon)_[A-Za-z0-9_]+' ../../regression/cbmc/SIMD* \ + 2>/dev/null | sort -u > __simd_covered +comm -23 __functions __simd_covered > __functions.new +mv __functions.new __functions +rm __simd_covered + +# The __builtin_ia32_* and __builtin_neon_* tests are consolidated into a single +# directory per family; a function is covered when a .c file underneath +# references it (rather than by having a directory of its own). +{ + ls ../../regression/cbmc-library/ | \ + egrep -v '(Makefile|CMakeLists.txt|tests.log|^__builtin_ia32$|^__builtin_neon$)' + grep -rhoE '__builtin_(ia32|neon)_[A-Za-z0-9_]+' --include='*.c' \ + ../../regression/cbmc-library/__builtin_ia32 \ + ../../regression/cbmc-library/__builtin_neon 2>/dev/null +} | sort -u > __tests diff -u __tests __functions ec="${?}" rm __functions __tests diff --git a/src/ansi-c/parser.y b/src/ansi-c/parser.y index 91526b2b8e6..07c0c8bc55a 100644 --- a/src/ansi-c/parser.y +++ b/src/ansi-c/parser.y @@ -168,6 +168,7 @@ int yyansi_cerror(const std::string &error); %token TOK_GCC_ATTRIBUTE_TRANSPARENT_UNION "transparent_union" %token TOK_GCC_ATTRIBUTE_PACKED "packed" %token TOK_GCC_ATTRIBUTE_VECTOR_SIZE "vector_size" +%token TOK_GCC_ATTRIBUTE_NEON_VECTOR_TYPE "neon_vector_type" %token TOK_GCC_ATTRIBUTE_MODE "mode" %token TOK_GCC_ATTRIBUTE_GNU_INLINE "__gnu_inline__" %token TOK_GCC_ATTRIBUTE_WEAK "weak" @@ -1681,6 +1682,8 @@ gcc_type_attribute: { $$=$1; set($$, ID_transparent_union); } | TOK_GCC_ATTRIBUTE_VECTOR_SIZE '(' comma_expression ')' { $$=$1; set($$, ID_frontend_vector); parser_stack($$).add(ID_size)=parser_stack($3); } + | TOK_GCC_ATTRIBUTE_NEON_VECTOR_TYPE '(' comma_expression ')' + { $$=$1; set($$, ID_frontend_vector); parser_stack($$).add(ID_size)=parser_stack($3); parser_stack($$).set(ID_C_vector_lanes, true); } | TOK_GCC_ATTRIBUTE_ALIGNED { $$=$1; set($$, ID_aligned); } | TOK_GCC_ATTRIBUTE_ALIGNED '(' comma_expression ')' diff --git a/src/ansi-c/scanner.l b/src/ansi-c/scanner.l index f9c7b8674ce..aa8d4f95e45 100644 --- a/src/ansi-c/scanner.l +++ b/src/ansi-c/scanner.l @@ -1672,6 +1672,9 @@ enable_or_disable ("enable"|"disable") "vector_size" | "__vector_size__" { BEGIN(GCC_ATTRIBUTE3); loc(); return TOK_GCC_ATTRIBUTE_VECTOR_SIZE; } +"neon_vector_type" | +"__neon_vector_type__" { BEGIN(GCC_ATTRIBUTE3); loc(); return TOK_GCC_ATTRIBUTE_NEON_VECTOR_TYPE; } + "mode" | "__mode__" { BEGIN(GCC_ATTRIBUTE3); loc(); return TOK_GCC_ATTRIBUTE_MODE; } diff --git a/src/util/irep_ids.def b/src/util/irep_ids.def index 8f1cfe001ab..cecfd010763 100644 --- a/src/util/irep_ids.def +++ b/src/util/irep_ids.def @@ -370,6 +370,7 @@ IREP_ID_ONE(designator) IREP_ID_ONE(member_designator) IREP_ID_ONE(index_designator) IREP_ID_TWO(C_constant, #constant) +IREP_ID_TWO(C_vector_lanes, #vector_lanes) IREP_ID_TWO(C_volatile, #volatile) IREP_ID_TWO(C_restricted, #restricted) IREP_ID_TWO(C_identifier, #identifier)